Difference between revisions of "Editor Extraction and Reconciliation"

From BITPlan cr Wiki
Jump to navigation Jump to search
 
Line 191: Line 191:
  
  
=== dblp in-sync with wikidata? ====
+
[[File:editors_wikidata_reconciliation.png|800px]]
 +
 
 +
=== dblp in-sync with wikidata? ===
 
Using all ids we get from dblp to look up  wikidata ids yields 1467 Qids (+62 with conflicts).
 
Using all ids we get from dblp to look up  wikidata ids yields 1467 Qids (+62 with conflicts).
 
Comparing this to the wikidata ids that are already available in dblp (with the query below):
 
Comparing this to the wikidata ids that are already available in dblp (with the query below):
Line 215: Line 217:
 
Thus, with the wikidata queries we were only able to get 77 additional Qids indicating that dblp synchronizes their editor records with the additional ids that are available at wikidata.  
 
Thus, with the wikidata queries we were only able to get 77 additional Qids indicating that dblp synchronizes their editor records with the additional ids that are available at wikidata.  
  
[[File:editors_wikidata_reconciliation.png|800px]]
 
 
[[Category:Text2KG]]
 
[[Category:Text2KG]]

Latest revision as of 16:40, 11 March 2023

Editor Extraction

  • covered volumes 1-3354
    • optimized for volumes 600+
  • 11764 Editor records
  • for 228 volumes no editors could be extracted

Volume editor distribution.png

Reconciliation

dblp reconciliation

Volume Editors of CEUR-WS in dblp

PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT DISTINCT ?vol_number 
   (GROUP_CONCAT(DISTINCT ?name; separator="|") as ?names) 
   (GROUP_CONCAT(DISTINCT ?dblp_id; separator="|") as ?concat_dblp_id)
WHERE {
  ?volume dblp:publishedIn "CEUR Workshop Proceedings" ;
    dblp:publishedInSeries "CEUR Workshop Proceedings" ;
    dblp:publishedInSeriesVolume ?vol_number;
    dblp:hasSignature ?editors.
    ?editors dblp:signatureDblpName ?name ;
        dblp:signatureCreator ?dblp_id ;
        dblp:signatureOrdinal ?editor_ordinal ;
        dblp:signaturePublication ?dblp_publication_id ;
        a dblp:EditorSignature.
}
GROUP BY  ?vol_number

Volume Editors of CEUR-WS in dblp with identifiers

PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX litre: <http://purl.org/spar/literal/>
SELECT DISTINCT 
	(group_concat(DISTINCT ?nameVar;separator='|') as ?name) 
	(group_concat(DISTINCT ?homepageVar;separator='|') as ?homepage)
	(group_concat(DISTINCT ?affiliationVar;separator='|') as ?affiliation)
	(group_concat(DISTINCT ?dblpVar;separator='|') as ?dblp)
	(group_concat(DISTINCT ?wikidataVar;separator='|') as ?wikidata)
	(group_concat(DISTINCT ?orcidVar;separator='|') as ?orcid)
	(group_concat(DISTINCT ?googleScholarVar;separator='|') as ?googleScholar)
	(group_concat(DISTINCT ?acmVar;separator='|') as ?acm)
	(group_concat(DISTINCT ?twitterVar;separator='|') as ?twitter)
	(group_concat(DISTINCT ?githubVar;separator='|') as ?github)
	(group_concat(DISTINCT ?viafVar;separator='|') as ?viaf)
	(group_concat(DISTINCT ?scigraphVar;separator='|') as ?scigraph)
	(group_concat(DISTINCT ?zbmathVar;separator='|') as ?zbmath)
	(group_concat(DISTINCT ?researchGateVar;separator='|') as ?researchGate)
	(group_concat(DISTINCT ?mathGenealogyVar;separator='|') as ?mathGenealogy)
	(group_concat(DISTINCT ?locVar;separator='|') as ?loc)
	(group_concat(DISTINCT ?linkedinVar;separator='|') as ?linkedin)
	(group_concat(DISTINCT ?lattesVar;separator='|') as ?lattes)
	(group_concat(DISTINCT ?isniVar;separator='|') as ?isni)
	(group_concat(DISTINCT ?ieeeVar;separator='|') as ?ieee)
	(group_concat(DISTINCT ?geprisVar;separator='|') as ?gepris)
	(group_concat(DISTINCT ?gndVar;separator='|') as ?gnd)
WHERE{
	?proceeding dblp:publishedIn "CEUR Workshop Proceedings";
		dblp:publishedInSeriesVolume ?volume;
		dblp:editedBy ?editor.
	?editor dblp:primaryCreatorName ?nameVar.
	OPTIONAL{?editor dblp:primaryHomepage ?homepageVar.}
	OPTIONAL{?editor dblp:primaryAffiliation ?affiliationVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?dblp_blank.
		?dblp_blank datacite:usesIdentifierScheme datacite:dblp;
		litre:hasLiteralValue ?dblpVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?wikidata_blank.
		?wikidata_blank datacite:usesIdentifierScheme datacite:wikidata;
		litre:hasLiteralValue ?wikidataVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?orcid_blank.
		?orcid_blank datacite:usesIdentifierScheme datacite:orcid;
		litre:hasLiteralValue ?orcidVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?googleScholar_blank.
		?googleScholar_blank datacite:usesIdentifierScheme datacite:google-scholar;
		litre:hasLiteralValue ?googleScholarVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?acm_blank.
		?acm_blank datacite:usesIdentifierScheme datacite:acm;
		litre:hasLiteralValue ?acmVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?twitter_blank.
		?twitter_blank datacite:usesIdentifierScheme datacite:twitter;
		litre:hasLiteralValue ?twitterVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?github_blank.
		?github_blank datacite:usesIdentifierScheme datacite:github;
		litre:hasLiteralValue ?githubVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?viaf_blank.
		?viaf_blank datacite:usesIdentifierScheme datacite:viaf;
		litre:hasLiteralValue ?viafVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?scigraph_blank.
		?scigraph_blank datacite:usesIdentifierScheme datacite:scigraph;
		litre:hasLiteralValue ?scigraphVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?zbmath_blank.
		?zbmath_blank datacite:usesIdentifierScheme datacite:zbmath;
		litre:hasLiteralValue ?zbmathVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?researchGate_blank.
		?researchGate_blank datacite:usesIdentifierScheme datacite:research-gate;
		litre:hasLiteralValue ?researchGateVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?mathGenealogy_blank.
		?mathGenealogy_blank datacite:usesIdentifierScheme datacite:math-genealogy;
		litre:hasLiteralValue ?mathGenealogyVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?loc_blank.
		?loc_blank datacite:usesIdentifierScheme datacite:loc;
		litre:hasLiteralValue ?locVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?linkedin_blank.
		?linkedin_blank datacite:usesIdentifierScheme datacite:linkedin;
		litre:hasLiteralValue ?linkedinVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?lattes_blank.
		?lattes_blank datacite:usesIdentifierScheme datacite:lattes;
		litre:hasLiteralValue ?lattesVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?isni_blank.
		?isni_blank datacite:usesIdentifierScheme datacite:isni;
		litre:hasLiteralValue ?isniVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?ieee_blank.
		?ieee_blank datacite:usesIdentifierScheme datacite:ieee;
		litre:hasLiteralValue ?ieeeVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?gepris_blank.
		?gepris_blank datacite:usesIdentifierScheme datacite:gepris;
		litre:hasLiteralValue ?geprisVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?gnd_blank.
		?gnd_blank datacite:usesIdentifierScheme datacite:gnd;
		litre:hasLiteralValue ?gndVar.}
}
GROUP BY ?editor

Comparing Extracted and dblp Editors

  • editor by volume comparison
    • 2233 volume the extracted editors match the dblp editors
    • 807 volumes are missing in dblp (editors extracted)
    • 27 volumes more editors were extracted than in dblp
    • 387 volumes dblp has more editors than we could extract
  • 9321 out of 11764 editor records can be reconciled
    • 79.23%

Wikidata Reconciliation

Using the ids queried from dblp to find the corresponding wikidata entry.

Current strategy:

Input
List of different identifiers that are known about a editor
Output
SPARQL query

Example:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?person ?personLabel
WHERE
{
  {OPTIONAL{ ?person wdt:P856 <http://www.stefandecker.org>.} }
  UNION
  {OPTIONAL{ ?person wdt:P227 "173443443".} } # gnd
  UNION
  {OPTIONAL{ ?person wdt:P2456 "d/StefanDecker".} } # dblp
  ?person rdfs:label ?personLabel. FILTER(lang(?personLabel)="en")
}

Depending on the available identifiers the query is adjusted accordingly by adding the corresponding OPTIONAL clauses.

Running these queries for all 4942 editors known by dblp we get:

  • 1467 editors were found in wikidata
  • 62 editor records in dblp have a conflict with wikidata
  • 3413 dblp editor records were not found in wikidata

The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown


Editors wikidata reconciliation.png

dblp in-sync with wikidata?

Using all ids we get from dblp to look up wikidata ids yields 1467 Qids (+62 with conflicts). Comparing this to the wikidata ids that are already available in dblp (with the query below):

PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX litre: <http://purl.org/spar/literal/>
SELECT DISTINCT 
	(group_concat(DISTINCT ?nameVar;separator='|') as ?name) 
	(group_concat(DISTINCT ?wikidataVar;separator='|') as ?wikidata)
WHERE{
	?proceeding dblp:publishedIn "CEUR Workshop Proceedings";
		dblp:publishedInSeriesVolume ?volume;
		dblp:editedBy ?editor.
	?editor dblp:primaryCreatorName ?nameVar.
	?editor datacite:hasIdentifier ?wikidata_blank.
	?wikidata_blank datacite:usesIdentifierScheme datacite:wikidata;
					litre:hasLiteralValue ?wikidataVar.
}
GROUP BY ?editor

The query yields 1390 (1446 with conflicts) editors with wikidata ids. Thus, with the wikidata queries we were only able to get 77 additional Qids indicating that dblp synchronizes their editor records with the additional ids that are available at wikidata.