Snapquery EKAW 2024 paper
Revision as of 13:21, 9 July 2024 by Wf (talk | contribs) (→Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020)
Key Points
- Query rot versus link rot
- Transparency vs. complexity of SPARQL queries
- Use cases for named queries
- Ambiguity of names
- Persistent identifiers
- Query hashes and short_urls
- How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
- Parameterized queries
- technical debt and accidential complexity
- Wikidata example queries
- Scholia and Wikidata graph split
- Other knowledge graphs, e.g., DBLP, OpenStreetMap
- Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
- An Ngo Lam's ESWC 2023 paper as a pointer to the style of comparison
- Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
- List of standard refactoring activities and the support by this approach
- SPARQL standard changes
- Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
- W3C test set - why did we not use that as an example
- Useability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
- https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
- Natural Language input
- Automatic syntax repairs
- automatic conversion of SQL input, SPARQL output.
- A closed issue should have at least one example that runs
- Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases.
- https://stackoverflow.com/questions/tagged/sparql
- https://www.semantic-web-journal.net/system/files/swj3076.pdf
- https://arxiv.org/pdf/cs/0605124
- https://arxiv.org/pdf/1402.0576 optimizing queries
- https://www.w3.org/TR/REC-rdf-syntax/
- https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
- Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020 (DOI: 10.5281/zenodo.4035223)
- ESWC 2019 proceedings (978-3-030-21348-0.pdf)
- Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error
Related Work
Benchmarks
An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"
- Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
- Uses complete version of Wikidata knowledge graph
- Compares importing time, loading time, exporting time, and query performance
- Evaluates 328 queries defined by Wikidata users
- Also uses SP2Bench synthetic benchmark for comparison
- Provides detailed analysis of query execution plans and profiling information
- Offers insights on triplestore performance with large-scale real-world data
Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020
DOI: 10.5281/zenodo.4035223
- Focuses on evaluating performance of graph pattern matching in SPARQL engines
- Uses a subset of Wikidata as the dataset
- Provides a large set of SPARQL basic graph patterns
- Designed to test the benefits of worst-case optimal join algorithms
- Exhibits a variety of increasingly complex join patterns
- Allows for systematic testing of query optimization techniques
- Offers insights into the performance characteristics of different SPARQL engines on complex graph patterns
References
- ^ Paul Warren;Paul Mulholland. (2018) "Using SPARQL – The Practitioners’ Viewpoint" - 485-500 pages. doi: 10.1007/978-3-030-03667-6_31
- ^ Johannes Lorey;Felix Naumann. (2013) "Detecting SPARQL Query Templates for Data Prefetching" - 124-139 pages. doi: 10.1007/978-3-642-38288-8_9
- ^ Angela Bonifati;Wim Martens;Thomas Timm. (2020) "An analytical study of large SPARQL query logs" - 655-679 pages. doi: 10.1007/s00778-019-00558-9