GitHub Java Corpus
Date Available
2017-01-10Type
datasetData Creator
Allamanis, MiltiadisSutton, Charles
Publisher
University of Edinburgh: School of InformaticsRelation (Is Referenced By)
https://dl.acm.org/citation.cfm?id=2487127Metadata
Show full item recordAltmetric
Citation
Allamanis, Miltiadis; Sutton, Charles. (2017). GitHub Java Corpus, 2012 [dataset]. University of Edinburgh: School of Informatics. https://doi.org/10.7488/ds/1690.Description
The GitHub Java Corpus is a snapshot of all open-source Java code on GitHub in October 2012 that is contained in open-source projects that at the time had at least one fork. It contains code from 14,785 projects amounting to about 352 million lines of code. The dataset has been used to study coding practice in Java at a large scale.The following licence files are associated with this item: