Institute for Language, Cognition and Computation (ILCC)
Browse by
The Institute for Language, Cognition and Computation (ILCC) is dedicated to the pursuit of basic and applied research on computational approaches to language, communication, and cognition. Primary research areas include: Natural language processing and computational linguistics; Spoken language processing; Information extraction, retrieval and presentation; Dialogue and multimodal interaction; Computational theories of human cognition; Educational and assistive technology.
Collections in this community
-
Automatic Language Generation and Summarisation
Datasets for Automatic Language Generation and Summarisation tasks. -
Bias Detection
Datasets for Bias Detection -
Hiberlink
Hiberlink: Time Travel for the Scholarly Web -
ManySStuBs4J
Java code from a wide variety of projects -
The Visual and Linguistic Treebank
The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations.
Recent Submissions
-
SUPERSEDED - The Edinburgh International Accents of English Corpus
## This item has been replaced by the one which can be found at https://datashare.ed.ac.uk/handle/10283/4836 - https://doi.org/10.7488/ds/3832 ##. English is the most widely spoken language in the world, used daily by ... -
XWikis Corpus
The XWikis Corpus (Perez-Beltrachini and Lapata, 2021) provides datasets with different language pairs and directions for cross-lingual abstractive document summarisation. This current version includes four languages: ... -
Fill In The World interaction data
This dataset contains server logs with user interaction data from user studies with the "Fill In The World" language learning game. "Fill In The World" is available at http://fitw.azurewebsites.net/ -
multimodal TRIPOD
The data contain multimodal features extracted for the TRIPOD dataset and used in the AAAI 2021 paper "Movie Summarization via Sparse Graph Construction". The data contain 122 pickle files, each one corresponding to a movie ... -
Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020
The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an ... -
ManySStuBs4J Dataset
The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variations of the dataset. One mined from the 100 Java Maven Projects and one mined from the top ... -
WikiCatSum
WikiCatSum is a domain specific Multi-Document Summarisation (MDS) dataset. It assumes the summarisation task of generating Wikipedia lead sections for Wikipedia entities of a certain domain (e.g. Companies) from the set ... -
SUPERSEDED - ManySStuBs4J Dataset
## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2628 ## The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are ... -
Hiberlink project data
Summary files (in XML format) listing URIs referenced in papers from arXiv, Elsevier, and PMC respectively (approximately 1 million URIs from 3 million papers in total). The focus of the Hiberlink project was to assess the ... -
Visual and Linguistic Treebank
(2014-09-04)The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations. The images are freely available from the Action Recognition ...