Task2
Task
Task | |
---|---|
edit | |
id | 2 |
title | Extracting contextual information from the PDF full text of the papers |
objective | Task 2 was designed to test the ability to extract data from the full text of the
papers. It follows last year’s Task 2, which focused on extracting information about citations. The rationale was that the network of citations of a paper – including papers citing it or cited by that paper – is an important dimension to assess its relevance and to contextualise it within a research area. This year we included further contextual information. Scientific papers are not isolated units. Factors that directly or indirectly contribute to the origin and development of a paper include citations, the institutions the authors are affiliated to, funding agencies, and the venue where a paper was presented. Par- ticipants had to make such information explicit and exploit it to answer queries providing a deeper understanding of the context in which papers were written. The dataset’s format is another difference from 2014. Instead of XML sources, we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant format for publishing scientific papers, despite being designed for printing. The internal structure of a PDF paper does not correspond to the logical structure of its content, rather to a sequence of layouting and formatting commands. The challenge for participants was to recover the logical structure, to extract contextual information, and to represent it as semantic assertions. |
since | 2015-08-25 |