Editor Extraction and Reconciliation

From BITPlan cr Wiki
Revision as of 10:13, 10 March 2023 by Wf (talk | contribs)
Jump to navigation Jump to search

Editor Extraction

  • covered volumes 1-3354
    • optimized for volumes 600+
  • 11764 Editor records
  • for 228 volumes no editors could be extracted

Volume editor distribution.png

Reconciliation

dblp reconciliation

Volume Editors of CEUR-WS in dblp

PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT DISTINCT ?vol_number 
   (GROUP_CONCAT(DISTINCT ?name; separator="|") as ?names) 
   (GROUP_CONCAT(DISTINCT ?dblp_id; separator="|") as ?concat_dblp_id)
WHERE {
  ?volume dblp:publishedIn "CEUR Workshop Proceedings" ;
    dblp:publishedInSeries "CEUR Workshop Proceedings" ;
    dblp:publishedInSeriesVolume ?vol_number;
    dblp:hasSignature ?editors.
    ?editors dblp:signatureDblpName ?name ;
        dblp:signatureCreator ?dblp_id ;
        dblp:signatureOrdinal ?editor_ordinal ;
        dblp:signaturePublication ?dblp_publication_id ;
        a dblp:EditorSignature.
}
GROUP BY  ?vol_number

Volume Editors of CEUR-WS in dblp with identifiers

PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX litre: <http://purl.org/spar/literal/>
SELECT DISTINCT 
	(group_concat(DISTINCT ?nameVar;separator='|') as ?name) 
	(group_concat(DISTINCT ?homepageVar;separator='|') as ?homepage)
	(group_concat(DISTINCT ?affiliationVar;separator='|') as ?affiliation)
	(group_concat(DISTINCT ?dblpVar;separator='|') as ?dblp)
	(group_concat(DISTINCT ?wikidataVar;separator='|') as ?wikidata)
	(group_concat(DISTINCT ?orcidVar;separator='|') as ?orcid)
	(group_concat(DISTINCT ?googleScholarVar;separator='|') as ?googleScholar)
	(group_concat(DISTINCT ?acmVar;separator='|') as ?acm)
	(group_concat(DISTINCT ?twitterVar;separator='|') as ?twitter)
	(group_concat(DISTINCT ?githubVar;separator='|') as ?github)
	(group_concat(DISTINCT ?viafVar;separator='|') as ?viaf)
	(group_concat(DISTINCT ?scigraphVar;separator='|') as ?scigraph)
	(group_concat(DISTINCT ?zbmathVar;separator='|') as ?zbmath)
	(group_concat(DISTINCT ?researchGateVar;separator='|') as ?researchGate)
	(group_concat(DISTINCT ?mathGenealogyVar;separator='|') as ?mathGenealogy)
	(group_concat(DISTINCT ?locVar;separator='|') as ?loc)
	(group_concat(DISTINCT ?linkedinVar;separator='|') as ?linkedin)
	(group_concat(DISTINCT ?lattesVar;separator='|') as ?lattes)
	(group_concat(DISTINCT ?isniVar;separator='|') as ?isni)
	(group_concat(DISTINCT ?ieeeVar;separator='|') as ?ieee)
	(group_concat(DISTINCT ?geprisVar;separator='|') as ?gepris)
	(group_concat(DISTINCT ?gndVar;separator='|') as ?gnd)
WHERE{
	?proceeding dblp:publishedIn "CEUR Workshop Proceedings";
		dblp:publishedInSeriesVolume ?volume;
		dblp:editedBy ?editor.
	?editor dblp:primaryCreatorName ?nameVar.
	OPTIONAL{?editor dblp:primaryHomepage ?homepageVar.}
	OPTIONAL{?editor dblp:primaryAffiliation ?affiliationVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?dblp_blank.
		?dblp_blank datacite:usesIdentifierScheme datacite:dblp;
		litre:hasLiteralValue ?dblpVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?wikidata_blank.
		?wikidata_blank datacite:usesIdentifierScheme datacite:wikidata;
		litre:hasLiteralValue ?wikidataVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?orcid_blank.
		?orcid_blank datacite:usesIdentifierScheme datacite:orcid;
		litre:hasLiteralValue ?orcidVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?googleScholar_blank.
		?googleScholar_blank datacite:usesIdentifierScheme datacite:google-scholar;
		litre:hasLiteralValue ?googleScholarVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?acm_blank.
		?acm_blank datacite:usesIdentifierScheme datacite:acm;
		litre:hasLiteralValue ?acmVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?twitter_blank.
		?twitter_blank datacite:usesIdentifierScheme datacite:twitter;
		litre:hasLiteralValue ?twitterVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?github_blank.
		?github_blank datacite:usesIdentifierScheme datacite:github;
		litre:hasLiteralValue ?githubVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?viaf_blank.
		?viaf_blank datacite:usesIdentifierScheme datacite:viaf;
		litre:hasLiteralValue ?viafVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?scigraph_blank.
		?scigraph_blank datacite:usesIdentifierScheme datacite:scigraph;
		litre:hasLiteralValue ?scigraphVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?zbmath_blank.
		?zbmath_blank datacite:usesIdentifierScheme datacite:zbmath;
		litre:hasLiteralValue ?zbmathVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?researchGate_blank.
		?researchGate_blank datacite:usesIdentifierScheme datacite:research-gate;
		litre:hasLiteralValue ?researchGateVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?mathGenealogy_blank.
		?mathGenealogy_blank datacite:usesIdentifierScheme datacite:math-genealogy;
		litre:hasLiteralValue ?mathGenealogyVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?loc_blank.
		?loc_blank datacite:usesIdentifierScheme datacite:loc;
		litre:hasLiteralValue ?locVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?linkedin_blank.
		?linkedin_blank datacite:usesIdentifierScheme datacite:linkedin;
		litre:hasLiteralValue ?linkedinVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?lattes_blank.
		?lattes_blank datacite:usesIdentifierScheme datacite:lattes;
		litre:hasLiteralValue ?lattesVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?isni_blank.
		?isni_blank datacite:usesIdentifierScheme datacite:isni;
		litre:hasLiteralValue ?isniVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?ieee_blank.
		?ieee_blank datacite:usesIdentifierScheme datacite:ieee;
		litre:hasLiteralValue ?ieeeVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?gepris_blank.
		?gepris_blank datacite:usesIdentifierScheme datacite:gepris;
		litre:hasLiteralValue ?geprisVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?gnd_blank.
		?gnd_blank datacite:usesIdentifierScheme datacite:gnd;
		litre:hasLiteralValue ?gndVar.}
}
GROUP BY ?editor

Comparing Extracted and dblp Editors

  • editor by volume comparison
    • 2233 volume the extracted editors match the dblp editors
    • 807 volumes are missing in dblp (editors extracted)
    • 27 volumes more editors were extracted than in dblp
    • 387 volumes dblp has more editors than we could extract
  • 9321 out of 11764 editor records can be reconciled
    • 79.23%

Wikidata Reconciliation

Using the ids queried from dblp to find the corresponding wikidata entry.

Current strategy:

Input
List of different identifiers that are known about a editor
Output
SPARQL query

Example:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?person ?personLabel
WHERE
{
  {OPTIONAL{ ?person wdt:P856 <http://www.stefandecker.org>.} }
  UNION
  {OPTIONAL{ ?person wdt:P227 "173443443".} } # gnd
  UNION
  {OPTIONAL{ ?person wdt:P2456 "d/StefanDecker".} } # dblp
  ?person rdfs:label ?personLabel. FILTER(lang(?personLabel)="en")
}

Depending on the available identifiers the query is adjusted accordingly by adding the corresponding OPTIONAL clauses.

Running these queries for all 4942 editors known by dblp we get:

  • 1467 editors were found in wikidata
  • 62 editor records in dblp have a conflict with wikidata
  • 3413 dblp editor records were not found in wikidata

The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown


Editors wikidata reconciliation.png