Difference between revisions of "Task2"
(Created page with "=Task= {{Task |id=2 |title=Extracting contextual information from the PDF full text of the papers |storemode=property }} =Freitext=") |
|||
(2 intermediate revisions by one other user not shown) | |||
Line 4: | Line 4: | ||
|id=2 | |id=2 | ||
|title=Extracting contextual information from the PDF full text of the papers | |title=Extracting contextual information from the PDF full text of the papers | ||
+ | |objective=Task 2 was designed to test the ability to extract data from the full text of the | ||
+ | papers. It follows last year’s Task 2, which focused on extracting information | ||
+ | about citations. The rationale was that the network of citations of a paper – | ||
+ | including papers citing it or cited by that paper – is an important dimension to | ||
+ | assess its relevance and to contextualise it within a research area. | ||
+ | This year we included further contextual information. Scientific papers are | ||
+ | not isolated units. Factors that directly or indirectly contribute to the origin | ||
+ | and development of a paper include citations, the institutions the authors are | ||
+ | affiliated to, funding agencies, and the venue where a paper was presented. Participants | ||
+ | had to make such information explicit and exploit it to answer queries | ||
+ | providing a deeper understanding of the context in which papers were written. | ||
+ | The dataset’s format is another difference from 2014. Instead of XML sources, | ||
+ | we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant | ||
+ | format for publishing scientific papers, despite being designed for printing. The | ||
+ | internal structure of a PDF paper does not correspond to the logical structure | ||
+ | of its content, rather to a sequence of layouting and formatting commands. | ||
+ | The challenge for participants was to recover the logical structure, to extract | ||
+ | contextual information, and to represent it as semantic assertions. | ||
+ | |since=2015-08-25 | ||
|storemode=property | |storemode=property | ||
}} | }} | ||
− | = | + | |
+ | ==Queries== | ||
+ | {{#ask: [[Concept:Query]] [[Query task::{{PAGENAME}}]] | ||
+ | |?Query title=Title | ||
+ | |?Query description=Description | ||
+ | |?Query tryiturl=Try it! | ||
+ | |?Query wdqsurl=WDQS Url | ||
+ | |?Query relevance=Relevance | ||
+ | |sort=Query id | ||
+ | }} |
Latest revision as of 08:57, 21 March 2023
Task
Task | |
---|---|
edit | |
id | 2 |
title | Extracting contextual information from the PDF full text of the papers |
objective | Task 2 was designed to test the ability to extract data from the full text of the
papers. It follows last year’s Task 2, which focused on extracting information about citations. The rationale was that the network of citations of a paper – including papers citing it or cited by that paper – is an important dimension to assess its relevance and to contextualise it within a research area. This year we included further contextual information. Scientific papers are not isolated units. Factors that directly or indirectly contribute to the origin and development of a paper include citations, the institutions the authors are affiliated to, funding agencies, and the venue where a paper was presented. Participants had to make such information explicit and exploit it to answer queries providing a deeper understanding of the context in which papers were written. The dataset’s format is another difference from 2014. Instead of XML sources, we used PDF this year, taken from CEUR-WS.org. PDF is still the predominant format for publishing scientific papers, despite being designed for printing. The internal structure of a PDF paper does not correspond to the logical structure of its content, rather to a sequence of layouting and formatting commands. The challenge for participants was to recover the logical structure, to extract contextual information, and to represent it as semantic assertions. |
since | 2015-08-25 |
Queries
Title | Description | Try it! | WDQS Url | Relevance | |
---|---|---|---|---|---|
Q2.1 | Affiliations of Authors (Paper X) | Identify the affiliations of the authors of paper X | 3 | ||
Q2.2 | Papers from country at Workshop | Identify the papers presented at workshop X and written by researchers affiliated to an organisation located in country Y | 3 | ||
Q2.3 | Works cited by paper | Identify all works cited by paper X | G7PGHO | 6UBC | 2 |
Q2.4 | Works cited by paper after year | Identify all works cited by paper X and published after year Y | Q32kCB | 6UBF | 3 |
Q2.5 | Journal papers cited by a paper | Identify all journal papers cited by paper X | 4 | ||
Q2.6 | grants for research in paper | dentify the grant(s) that supported the research presented in paper X (or part of it) | 5 | ||
Q2.7 | Funding agency for research of paper | Identify the funding agencies that funded the research presented in paper X (or part of it) | 5 | ||
Q2.8 | EU projects supporting research in paper | Identify the EU project(s) that supported the research presented in paper X (or part of it) | 5 | ||
Q2.9 | Ontologies mentioned in abstract of paper | Identify ontologies mentioned in the abstract of paper X | 5 | ||
Q2.10 | Ontologies introduced in paper abstract | Identify ontologies introduced in paper X (according to the abstract) | 5 |