Acronym definition see Acronym

Problem Statement

E.g. for Natural Language Processing for/using Knowledge Graphs (e.g. entity linking and resolution using target knowledge such as Wikidata and DBpedia, foundation models)

Usecase: Lookup an event by Acronym e.g.

ESWC -> https://www.openresearch.org/wiki/ESWC
ESWC -> https://scholia.toolforge.org/event-series/Q17012957

In the process of digitalization of scientific publishing PID have been introduced for quite a few entities such as Papers(DOI), Authors (ORCID), Organizations(ROR) but unfortunately not for scientific events and series where the most common disambiguating identifier is still acronyms/short names such as ESWC 2023/Semantics '23. (Only very few instances have PIDs DOI (200)/ pseudo PIDs Wikidata Id (9000/1000).

We estimate that some 5000 (dblp)-25.000 and some 50.000 (dblp) to 250.000 events/eventseries that have - public digital traces (as part of their lifecyle) such as homepages, entries in public cfps, library indices for their proceedings, homepages - would still need PIDs.

To create PIDs and enter the metadata in public KGs such as wikidata acronyms look like a promising tool for disambiguation (as has been proven by the Work of Simon Cobb using OpenRefine ...)

Given a piece of natural language text (and it's context) and a semi structure corpus of digital traces of scientific communication assembled from different sources we'd like to perform a two step process:

Assert whether the text (char string) is an acronym for some (1 or more) event or event series (likelihood)
Map the acronym to the knowledge graph of proceedings/events/eventseries

Common Sense Assumptions / Situation

(under the assumption there is a common sense ..)

An acronym such a ISWC identifies one or more scientific events series (Semantic Web / Wearable computing)
Typically an acronym/year combination is used to identify installments of such events e.g. ISWC 2022 / ESWC 2022 / Semantic' 2023
The referencing of such events is done using these acronyms during the whole lifecycle:

announcements are done via e.g. http//iswc.2022.org
CFPs are done using e.g. ISWC 2023 / ISWC in the title/metadata of the cfp
indexing is done using the acronyms e.g. in dblp / TIBKat/Wikidata
citations are done e.g. using citation "Proceedings ISWC 2022, pp. 153-159 ...)

PIDs are not common yet

Ideal Idea of digitization of this realm

In an ideal world there we would be a KG that represents the entities: Proceedings, Event an EventSeries and mostly allows to interlink them by acronyms, with some exceptions where acronyms are ambiguous and disambiguation via other metadata is necessary - ideally a PID is available for each entity type to avoid the disambiguation need.

Approach

Resource bounded data cleaning, disambiguation and knowledge graph extraction.

Maximizing the "overall" effort/result ratio is the goal. Please note that the effort to maximize the effort/result ratio is part of the effort.

Assumption: separating the input data into standard cases, corner cases and exotic cases according to a Zipfian / longtail / pareto distribution allows to simplify the necessary formalization avoid accidential complexity and get to a better effort/result ratio.

Research questions

What do acronyms for scientific events and event series look like and how formal can they be described?
How well do acronyms disambiguate scientific events and event series?
How well is the acronym information curated in metadata sources for events and event series
How well are acronyms used in citations of scientific events and event series?
Acronym checker - does the Acronym fit the long version ...

Method

See Approach ...

Acronym paper

Contents

Problem Statement

Common Sense Assumptions / Situation

Ideal Idea of digitization of this realm

Approach

Research questions

Method

Problem Statement[edit]

Common Sense Assumptions / Situation[edit]

Ideal Idea of digitization of this realm[edit]

Approach[edit]

Research questions[edit]

Method[edit]

What do acronyms for scientific events and event series look like and how formal can they be described?[edit]

What do acronyms for scientific events and event series look like and how formal can they be described?[edit]

Contents

Results[edit]

Contents

What do acronyms look like[edit]

Length distribution[edit]

WikiCFP[edit]

Standard case[edit]

Corner cases[edit]

Exotic cases / Outliers[edit]