Annotated Reference Corpus of Scottish Gaelic (ARCOSG)
Date Available
2016-05-25Type
datasetData Creator
Lamb, WilliamArbuthnot, Sharon
Naismith, Susanna
Danso, Samuel
Publisher
University of Edinburgh. School of Literatures, Languages and Cultures. Celtic and Scottish StudiesRelation (Is Referenced By)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.672.813&rep=rep1&type=pdf#page=11Metadata
Show full item recordAltmetric
Citation
Lamb, William; Arbuthnot, Sharon; Naismith, Susanna; Danso, Samuel. (2016). Annotated Reference Corpus of Scottish Gaelic (ARCOSG), 1997-2016 [dataset]. University of Edinburgh. School of Literatures, Languages and Cultures. Celtic and Scottish Studies. https://doi.org/10.7488/ds/1411.Description
A representative, tagged corpus of Scottish Gaelic, divided into 8 registers (4 spoken, 4 written) of approximately 10k words each. The corpus is presented as individual txt files. The corpus was hand-tagged by Lamb, Arbuthnot and Naismith and separately verified by them. It uses the Brown format tag separators ('/': e.g. 'agus/Cc') and an annotation scheme derived from the Irish PAROLE tagset (see Uí Dhonnchadha, E. and van Genabith, J. 2006. A Part-of-Speech tagger for Irish using finite state morphology and constraint grammar disambiguation. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), 2241-2244.). The annotation scheme is described in a PDF included with the data: Lamb, W. and Naismith, S (2014) Scottish Gaelic Part-of-Speech Annotation Guidelines. This work was funded by Bòrd na Gàidhlig and Carnegie Trust for the Universities of Scotland.The following licence files are associated with this item: