Difference between revisions of "Task2"

From BITPlan cr Wiki
Jump to navigation Jump to search
(Created page with "=Task= {{Task |id=2 |title=Extracting contextual information from the PDF full text of the papers |storemode=property }} =Freitext=")
 
Line 4: Line 4:
 
|id=2
 
|id=2
 
|title=Extracting contextual information from the PDF full text of the papers
 
|title=Extracting contextual information from the PDF full text of the papers
 +
|objective=Task 2 was designed to test the ability to extract data from the full text of the
 +
papers. It follows last year’s Task 2, which focused on extracting information
 +
about citations. The rationale was that the network of citations of a paper –
 +
including papers citing it or cited by that paper – is an important dimension to
 +
assess its relevance and to contextualise it within a research area.
 +
This year we included further contextual information. Scientific papers are
 +
not isolated units. Factors that directly or indirectly contribute to the origin
 +
and development of a paper include citations, the institutions the authors are
 +
affiliated to, funding agencies, and the venue where a paper was presented. Par-
 +
ticipants had to make such information explicit and exploit it to answer queries
 +
providing a deeper understanding of the context in which papers were written.
 +
The dataset’s format is another difference from 2014. Instead of XML sources,
 +
we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant
 +
format for publishing scientific papers, despite being designed for printing. The
 +
internal structure of a PDF paper does not correspond to the logical structure
 +
of its content, rather to a sequence of layouting and formatting commands.
 +
The challenge for participants was to recover the logical structure, to extract
 +
contextual information, and to represent it as semantic assertions.
 +
|since=2015-08-25
 
|storemode=property
 
|storemode=property
 
}}
 
}}
 
=Freitext=
 
=Freitext=

Revision as of 06:47, 21 March 2023

Task

Task
edit
id  2
title  Extracting contextual information from the PDF full text of the papers
objective  Task 2 was designed to test the ability to extract data from the full text of the

papers. It follows last year’s Task 2, which focused on extracting information about citations. The rationale was that the network of citations of a paper – including papers citing it or cited by that paper – is an important dimension to assess its relevance and to contextualise it within a research area. This year we included further contextual information. Scientific papers are not isolated units. Factors that directly or indirectly contribute to the origin and development of a paper include citations, the institutions the authors are affiliated to, funding agencies, and the venue where a paper was presented. Par- ticipants had to make such information explicit and exploit it to answer queries providing a deeper understanding of the context in which papers were written. The dataset’s format is another difference from 2014. Instead of XML sources, we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant format for publishing scientific papers, despite being designed for printing. The internal structure of a PDF paper does not correspond to the logical structure of its content, rather to a sequence of layouting and formatting commands. The challenge for participants was to recover the logical structure, to extract contextual information, and to represent it as semantic assertions.

since  2015-08-25

Freitext