Difference between revisions of "Task2"

From BITPlan cr Wiki
Jump to navigation Jump to search
Line 12: Line 12:
 
not isolated units. Factors that directly or indirectly contribute to the origin
 
not isolated units. Factors that directly or indirectly contribute to the origin
 
and development of a paper include citations, the institutions the authors are
 
and development of a paper include citations, the institutions the authors are
affiliated to, funding agencies, and the venue where a paper was presented. Par-
+
affiliated to, funding agencies, and the venue where a paper was presented. Participants
ticipants had to make such information explicit and exploit it to answer queries
+
had to make such information explicit and exploit it to answer queries
 
providing a deeper understanding of the context in which papers were written.
 
providing a deeper understanding of the context in which papers were written.
 
The dataset’s format is another difference from 2014. Instead of XML sources,
 
The dataset’s format is another difference from 2014. Instead of XML sources,

Revision as of 06:47, 21 March 2023

Task

Task
edit
id  2
title  Extracting contextual information from the PDF full text of the papers
objective  Task 2 was designed to test the ability to extract data from the full text of the

papers. It follows last year’s Task 2, which focused on extracting information about citations. The rationale was that the network of citations of a paper – including papers citing it or cited by that paper – is an important dimension to assess its relevance and to contextualise it within a research area. This year we included further contextual information. Scientific papers are not isolated units. Factors that directly or indirectly contribute to the origin and development of a paper include citations, the institutions the authors are affiliated to, funding agencies, and the venue where a paper was presented. Participants had to make such information explicit and exploit it to answer queries providing a deeper understanding of the context in which papers were written. The dataset’s format is another difference from 2014. Instead of XML sources, we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant format for publishing scientific papers, despite being designed for printing. The internal structure of a PDF paper does not correspond to the logical structure of its content, rather to a sequence of layouting and formatting commands. The challenge for participants was to recover the logical structure, to extract contextual information, and to represent it as semantic assertions.

since  2015-08-25

Freitext