The Institute for Language, Cognition and Computation (ILCC) is dedicated to the pursuit of basic and applied research on computational approaches to language, communication, and cognition. Primary research areas include: Natural language processing and computational linguistics; Spoken language processing; Information extraction, retrieval and presentation; Dialogue and multimodal interaction; Computational theories of human cognition; Educational and assistive technology.

Collections in this community

Recent Submissions

  • XWikis Corpus 

    Perez-Beltrachini, Laura; Lapata, Mirella
    The XWikis Corpus (Perez-Beltrachini and Lapata, 2021) provides datasets with different language pairs and directions for cross-lingual abstractive document summarisation. This current version includes four languages: ...
  • Fill In The World interaction data 

    Mikucionis, Vidminas; Robertson, Judy
    This dataset contains server logs with user interaction data from user studies with the "Fill In The World" language learning game. "Fill In The World" is available at
  • multimodal TRIPOD 

    Papalampidi, P; Keller, F; Lapata, M
    The data contain multimodal features extracted for the TRIPOD dataset and used in the AAAI 2021 paper "Movie Summarization via Sparse Graph Construction". The data contain 122 pickle files, each one corresponding to a movie ...
  • Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020 

    Havens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The
    The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an ...
  • ManySStuBs4J Dataset 

    Karampatsis, Rafael-Michael
    The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variations of the dataset. One mined from the 100 Java Maven Projects and one mined from the top ...
  • WikiCatSum 

    Perez-Beltrachini, Laura; Liu, Yang; Lapata, Mirella
    WikiCatSum is a domain specific Multi-Document Summarisation (MDS) dataset. It assumes the summarisation task of generating Wikipedia lead sections for Wikipedia entities of a certain domain (e.g. Companies) from the set ...
  • SUPERSEDED - ManySStuBs4J Dataset 

    Karampatsis, Rafael-Michael
    ## This item has been replaced by the one which can be found at ## The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are ...
  • Hiberlink project data 

    Tobin, Richard; Grover, Claire; Zhou, Ke
    Summary files (in XML format) listing URIs referenced in papers from arXiv, Elsevier, and PMC respectively (approximately 1 million URIs from 3 million papers in total). The focus of the Hiberlink project was to assess the ...
  • Visual and Linguistic Treebank 

    Elliott, Desmond; Keller, Frank (2014-09-04)
    The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations. The images are freely available from the Action Recognition ...