Difference between revisions of "Acronym paper"

From BITPlan cr Wiki
Jump to navigation Jump to search
Line 3: Line 3:
  
 
E.g. for Natural Language Processing for/using Knowledge Graphs (e.g. entity linking and resolution using target knowledge such as Wikidata and DBpedia, foundation models)
 
E.g. for Natural Language Processing for/using Knowledge Graphs (e.g. entity linking and resolution using target knowledge such as Wikidata and DBpedia, foundation models)
 +
 +
In the process of digitalization of scientific publishing PID have been introduced for quite a few entities such as Papers(DOI), Authors (ORCID), Organizations(ROR) but unfortunately not for scientific events and series where the most common disambiguating identifier is still
 +
acronyms/short names such as ESWC 2023/Semantics '23. (Only very few instances have PIDs DOI (200)/ pseudo PIDs Wikidata Id (9000/1000).
 +
 +
We estimate  that some 5000 (dblp)-25.000 and some 50.000 (dblp) to 250.000 events/eventseries that have public digital traces would still need PIDs.
 +
  
 
# What do acronyms for scientific events and event series look like and how formal can they be described?
 
# What do acronyms for scientific events and event series look like and how formal can they be described?

Revision as of 18:29, 1 March 2023

Research questions

E.g. for Natural Language Processing for/using Knowledge Graphs (e.g. entity linking and resolution using target knowledge such as Wikidata and DBpedia, foundation models)

In the process of digitalization of scientific publishing PID have been introduced for quite a few entities such as Papers(DOI), Authors (ORCID), Organizations(ROR) but unfortunately not for scientific events and series where the most common disambiguating identifier is still acronyms/short names such as ESWC 2023/Semantics '23. (Only very few instances have PIDs DOI (200)/ pseudo PIDs Wikidata Id (9000/1000).

We estimate that some 5000 (dblp)-25.000 and some 50.000 (dblp) to 250.000 events/eventseries that have public digital traces would still need PIDs.


  1. What do acronyms for scientific events and event series look like and how formal can they be described?
  2. How well do acronyms disambiguate scientific events and event series?
  3. How well is the acronym information curated in metadata sources for events and event series
  4. How well are acronyms used in citations of scientific events and event series?
  5. Acronym checker - does the Acronym fit the long version ...

Method

What do acronyms for scientific events and event series look like and how formal can they be described?

Results

What do acronyms look like

Length distribution

AcronymHistograms

WikiCFP

Standard case

60% of all WikiCFP acronyms extracted are matching the regular expression

[A-Z]+\s*[12][0-9]{3}

e.g. ISWC 2012

43990/73731 ( 59.7%)  matches for [A-Z]+\s*[12][0-9]{3}
654/43989 (  1.5%)  year different
Corner cases

long acronyms tend to indicate the extraction has not worked or there is some other issue with the acronym such as indicating a joint / colocated situation

SELECT acronym
FROM "event_wikicfp"
where length(acronym)=40

The acroynm entries with a length of 40 are mostly not acronyms ...

...
Political Theology Agenda Symposium 2010
Knowledge Engineering Special Issue 2010
CFP MapReduce Special Issue of CCPE 2010
AOSD - Student Research Competition 2011
special session for Wireless VITAE 2011
Political Theology Agenda Symposium 2011
12th EANN / 7th AIAI Joint Congress 2011 
...
Exotic cases / Outliers

There is only one entry in wikicfp where the extracted acronym was longer than 50 chars.

SELECT acronym,url
FROM "event_wikicfp"
where length(acronym)>50

call for chapters - images of female aggression 2016 http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=52302

This is not a call for papers for scientific events at all.