The Institute for Language, Cognition and Computation (ILCC) is dedicated to the pursuit of basic and applied research on computational approaches to language, communication, and cognition. Primary research areas include: Natural language processing and computational linguistics; Spoken language processing; Information extraction, retrieval and presentation; Dialogue and multimodal interaction; Computational theories of human cognition; Educational and assistive technology.

Collections in this community

Recent Submissions

  • multimodal TRIPOD 

    Papalampidi, P; Keller, F; Lapata, M
    The data contain multimodal features extracted for the TRIPOD dataset and used in the AAAI 2021 paper "Movie Summarization via Sparse Graph Construction". The data contain 122 pickle files, each one corresponding to a movie ...
  • Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020 

    Havens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The
    The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an ...
  • ManySStuBs4J Dataset 

    Karampatsis, Rafael-Michael
    The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variations of the dataset. One mined from the 100 Java Maven Projects and one mined from the top ...
  • WikiCatSum 

    Perez-Beltrachini, Laura; Liu, Yang; Lapata, Mirella
    WikiCatSum is a domain specific Multi-Document Summarisation (MDS) dataset. It assumes the summarisation task of generating Wikipedia lead sections for Wikipedia entities of a certain domain (e.g. Companies) from the set ...
  • SUPERSEDED - ManySStuBs4J Dataset 

    Karampatsis, Rafael-Michael
    ## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2628 ## The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are ...
  • Hiberlink project data 

    Tobin, Richard; Grover, Claire; Zhou, Ke
    Summary files (in XML format) listing URIs referenced in papers from arXiv, Elsevier, and PMC respectively (approximately 1 million URIs from 3 million papers in total). The focus of the Hiberlink project was to assess the ...
  • Visual and Linguistic Treebank 

    Elliott, Desmond; Keller, Frank (2014-09-04)
    The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations. The images are freely available from the Action Recognition ...