Snapquery EKAW 2024 paper

Key Points

Query rot versus link rot
Transparency vs. complexity of SPARQL queries
Use cases for named queries
Ambiguity of names
Persistent identifiers
1. Query hashes and short_urls
How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
Parameterized queries
1. https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
2. https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
3. Scholia Jinja templates
technical debt and accidential complexity
Wikidata example queries
Scholia and Wikidata graph split
Other knowledge graphs, e.g., DBLP, OpenStreetMap
Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
An Ngo Lam's ESWC 2023 paper as a pointer to the style of comparison
Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
List of standard refactoring activities and the support by this approach
SPARQL standard changes
Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
W3C test set - why did we not use that as an example
Useability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
Natural Language input
1. Automatic syntax repairs
2. automatic conversion of SQL input, SPARQL output.
A closed issue should have at least one example that runs
Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases.
https://stackoverflow.com/questions/tagged/sparql
https://www.semantic-web-journal.net/system/files/swj3076.pdf
https://arxiv.org/pdf/cs/0605124
https://arxiv.org/pdf/1402.0576 optimizing queries
https://www.w3.org/TR/REC-rdf-syntax/
https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
ESWC 2019 proceedings (978-3-030-21348-0.pdf)
Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error

Related Work

Testsuites

W3C SPARQL 1.1 Test Suite

Official test suite developed by the W3C SPARQL Working Group
Designed to test conformance to the SPARQL 1.1 specification
Covers a wide range of SPARQL features and edge cases
Primarily focused on correctness rather than performance

see https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing/Running_TFT

Benchmarks

An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"

Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
Uses complete version of Wikidata knowledge graph
Compares importing time, loading time, exporting time, and query performance
Evaluates 328 queries defined by Wikidata users
Also uses SP2Bench synthetic benchmark for comparison
Provides detailed analysis of query execution plans and profiling information
Offers insights on triplestore performance with large-scale real-world data

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020

DOI: 10.5281/zenodo.4035223

Focuses on evaluating performance of graph pattern matching in SPARQL engines
Uses a subset of Wikidata as the dataset
Provides a large set of SPARQL basic graph patterns
Designed to test the benefits of worst-case optimal join algorithms
Exhibits a variety of increasingly complex join patterns
Allows for systematic testing of query optimization techniques
Offers insights into the performance characteristics of different SPARQL engines on complex graph patterns

References

a b Wolfgang Fahl; Tim Holzheim; Christoph Lange; Stefan Decker. (2023) "Semantification of CEUR-WS with Wikidata as a Target Knowledge Graph" . url: https://ceur-ws.org/Vol-3447/Text2KG_Paper_13.pdf
a b | Christoph Lange;Angelo Di Iorio. (2014) "Semantic Publishing Challenge – Assessing the Quality of Scientific Output" - 61-76 pages. doi: 10.1007/978-3-319-12024-9_8
^ Paul Warren;Paul Mulholland. (2020) "A Comparison of the Cognitive Difficulties Posed by SPARQL Query Constructs" - 3-19 pages. doi: 10.1007/978-3-030-61244-3_1at: EKAW 2022
^ Paul Warren;Paul Mulholland. (2018) "Using SPARQL – The Practitioners’ Viewpoint" - 485-500 pages. doi: 10.1007/978-3-030-03667-6_31
^ | Muhammad Saleem;Muhammad Intizar Ali;Aidan Hogan;Qaiser Mehmood;Axel-Cyrille Ngonga Ngomo. (2015) "LSQ: The Linked SPARQL Queries Dataset" - 261-269 pages. doi: 10.1007/978-3-319-25010-6_15
^ Johannes Lorey;Felix Naumann. (2013) "Detecting SPARQL Query Templates for Data Prefetching" - 124-139 pages. doi: 10.1007/978-3-642-38288-8_9
^ Angela Bonifati;Wim Martens;Thomas Timm. (2020) "An analytical study of large SPARQL query logs" - 655-679 pages. doi: 10.1007/s00778-019-00558-9

Snapquery EKAW 2024 paper

Contents

Key Points

Related Work

Testsuites

W3C SPARQL 1.1 Test Suite

Benchmarks

An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020

References

Navigation menu

Search