
Tim Holzheim, Wolfgang Fahl


id  2
title  Extracting contextual information from the PDF full text of the papers
objective  Task 2 was designed to test the ability to extract data from the full text of the

papers. It follows last year’s Task 2, which focused on extracting information about citations. The rationale was that the network of citations of a paper – including papers citing it or cited by that paper – is an important dimension to assess its relevance and to contextualise it within a research area. This year we included further contextual information. Scientific papers are not isolated units. Factors that directly or indirectly contribute to the origin and development of a paper include citations, the institutions the authors are affiliated to, funding agencies, and the venue where a paper was presented. Participants had to make such information explicit and exploit it to answer queries providing a deeper understanding of the context in which papers were written. The dataset’s format is another difference from 2014. Instead of XML sources, we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant format for publishing scientific papers, despite being designed for printing. The internal structure of a PDF paper does not correspond to the logical structure of its content, rather to a sequence of layouting and formatting commands. The challenge for participants was to recover the logical structure, to extract contextual information, and to represent it as semantic assertions.

since  2015-08-25


id  2
title  Extracting contextual information from the PDF full text of the papers
objective  Task 2 was designed to test the ability to extract data from the full text of the

papers. It follows last year’s Task 2, which focused on extracting information about citations. The rationale was that the network of citations of a paper – including papers citing it or cited by that paper – is an important dimension to assess its relevance and to contextualise it within a research area. This year we included further contextual information. Scientific papers are not isolated units. Factors that directly or indirectly contribute to the origin and development of a paper include citations, the institutions the authors are affiliated to, funding agencies, and the venue where a paper was presented. Participants had to make such information explicit and exploit it to answer queries providing a deeper understanding of the context in which papers were written. The dataset’s format is another difference from 2014. Instead of XML sources, we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant format for publishing scientific papers, despite being designed for printing. The internal structure of a PDF paper does not correspond to the logical structure of its content, rather to a sequence of layouting and formatting commands. The challenge for participants was to recover the logical structure, to extract contextual information, and to represent it as semantic assertions.

since  2015-08-25


 TitleDescriptionTry it!WDQS UrlRelevance
Q2.1Affiliations of Authors (Paper X)Identify the affiliations of the authors of paper X3
Q2.2Papers from country at WorkshopIdentify the papers presented at workshop X and written by researchers affiliated to an organisation located in country Y3
Q2.3Works cited by paperIdentify all works cited by paper XG7PGHO6UBC2
Q2.4Works cited by paper after yearIdentify all works cited by paper X and published after year YQ32kCB6UBF3
Q2.5Journal papers cited by a paperIdentify all journal papers cited by paper X4
Q2.6grants for research in paperdentify the grant(s) that supported the research presented in paper X (or part of it)5
Q2.7Funding agency for research of paperIdentify the funding agencies that funded the research presented in paper X (or part of it)5
Q2.8EU projects supporting research in paperIdentify the EU project(s) that supported the research presented in paper X (or part of it)5
Q2.9Ontologies mentioned in abstract of paperIdentify ontologies mentioned in the abstract of paper X5
Q2.10Ontologies introduced in paper abstractIdentify ontologies introduced in paper X (according to the abstract)5


 TitleDescriptionTry it!WDQS UrlRelevance
Q2.1Affiliations of Authors (Paper X)Identify the affiliations of the authors of paper X3
Q2.2Papers from country at WorkshopIdentify the papers presented at workshop X and written by researchers affiliated to an organisation located in country Y3
Q2.3Works cited by paperIdentify all works cited by paper XG7PGHO6UBC2
Q2.4Works cited by paper after yearIdentify all works cited by paper X and published after year YQ32kCB6UBF3
Q2.5Journal papers cited by a paperIdentify all journal papers cited by paper X4
Q2.6grants for research in paperdentify the grant(s) that supported the research presented in paper X (or part of it)5
Q2.7Funding agency for research of paperIdentify the funding agencies that funded the research presented in paper X (or part of it)5
Q2.8EU projects supporting research in paperIdentify the EU project(s) that supported the research presented in paper X (or part of it)5
Q2.9Ontologies mentioned in abstract of paperIdentify ontologies mentioned in the abstract of paper X5
Q2.10Ontologies introduced in paper abstractIdentify ontologies introduced in paper X (according to the abstract)5
🖨 🚪