• Visual and Linguistic Treebank 

      Elliott, Desmond; Keller, Frank (2014-09-04)
      The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations. The images are freely available from the Action Recognition ...
    • Hiberlink project data 

      Tobin, Richard; Grover, Claire; Zhou, Ke
      Summary files (in XML format) listing URIs referenced in papers from arXiv, Elsevier, and PMC respectively (approximately 1 million URIs from 3 million papers in total). The focus of the Hiberlink project was to assess the ...
    • SUPERSEDED - ManySStuBs4J Dataset 

      Karampatsis, Rafael-Michael
      ## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2628 ## The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are ...
    • WikiCatSum 

      Perez-Beltrachini, Laura; Liu, Yang; Lapata, Mirella
      WikiCatSum is a domain specific Multi-Document Summarisation (MDS) dataset. It assumes the summarisation task of generating Wikipedia lead sections for Wikipedia entities of a certain domain (e.g. Companies) from the set ...
    • ManySStuBs4J Dataset 

      Karampatsis, Rafael-Michael
      The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variations of the dataset. One mined from the 100 Java Maven Projects and one mined from the top ...
    • Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020 

      Havens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The
      The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an ...
    • multimodal TRIPOD 

      Papalampidi, P; Keller, F; Lapata, M
      The data contain multimodal features extracted for the TRIPOD dataset and used in the AAAI 2021 paper "Movie Summarization via Sparse Graph Construction". The data contain 122 pickle files, each one corresponding to a movie ...
    • Fill In The World interaction data 

      Mikucionis, Vidminas; Robertson, Judy
      This dataset contains server logs with user interaction data from user studies with the "Fill In The World" language learning game. "Fill In The World" is available at http://fitw.azurewebsites.net/
    • XWikis Corpus 

      Perez-Beltrachini, Laura; Lapata, Mirella
      The XWikis Corpus (Perez-Beltrachini and Lapata, 2021) provides datasets with different language pairs and directions for cross-lingual abstractive document summarisation. This current version includes four languages: ...