Snapquery EKAW 2024 paper submission: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
| Line 1: | Line 1: | ||
== | == Introduction == | ||
# Query rot versus link rot | # ★★★★★ Query rot versus link rot | ||
# Transparency vs. complexity of SPARQL queries | # ★★★★☆ Transparency vs. complexity of SPARQL queries | ||
# Use cases for named queries | # ★★★★☆ Use cases for named queries | ||
# | # ★★★★☆ Persistent identifiers | ||
# ★★★☆☆ Query hashes and short_urls | |||
# | == Mitigation Query Rot using snapquery == | ||
# ★★★★★ Parameterized queries | |||
# Parameterized queries | # ★★★☆☆ https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries | ||
# | # ★★★☆☆ https://jena.apache.org/documentation/query/parameterized-sparql-strings.html | ||
# | # ★★★★☆ Scholia Jinja templates | ||
# | # ★★★★☆ Technical debt and accidental complexity | ||
# | # ★★★☆☆ How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names? | ||
# Wikidata example queries | == SnapQuery Implementation == | ||
# Scholia and Wikidata graph split | # ★★★★☆ SPARQL standard changes | ||
# Other knowledge graphs, e.g., DBLP, OpenStreetMap | # ★★★☆☆ Natural Language input | ||
# Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter | # ★★★☆☆ Automatic syntax repairs | ||
# | # ★★★☆☆ Automatic conversion of SQL input, SPARQL output | ||
== Evaluation == | |||
# List of standard refactoring activities and the support by this approach | #★★★★☆ Wikidata example queries | ||
# | #★★★★★ Scholia and Wikidata graph split | ||
#★★★☆☆ Other knowledge graphs, e.g., DBLP, OpenStreetMap | |||
# | #★★☆☆☆ Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter | ||
#★★★★★ Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26 | |||
# https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines | #★★★★☆ List of standard refactoring activities and the support by this approach | ||
# | #★★★★☆ Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned | ||
#★★★☆☆ Usability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ | |||
#★★★★☆ https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines | |||
#★★★★☆ A closed issue should have at least one example that runs | |||
# Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases | == Conclusion and Future Work == | ||
# https://stackoverflow.com/questions/tagged/sparql | #★★★★★ Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases | ||
# https://www.semantic-web-journal.net/system/files/swj3076.pdf | #★★★☆☆ Ambiguity of names | ||
# https://arxiv.org/pdf/cs/0605124 | == Additional Resources == | ||
# https://arxiv.org/pdf/1402.0576 optimizing queries | #★★☆☆☆ https://stackoverflow.com/questions/tagged/sparql | ||
# https://www.w3.org/TR/REC-rdf-syntax/ | #★★★☆☆ https://www.semantic-web-journal.net/system/files/swj3076.pdf | ||
# https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules | #★★☆☆☆ https://arxiv.org/pdf/cs/0605124 | ||
# ESWC 2019 proceedings (978-3-030-21348-0.pdf) | #★★★☆☆ https://arxiv.org/pdf/1402.0576 optimizing queries | ||
# Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error | #★★☆☆☆ https://www.w3.org/TR/REC-rdf-syntax/ | ||
#★★☆☆☆ https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules | |||
#★★★☆☆ ESWC 2019 proceedings (978-3-030-21348-0.pdf) | |||
#★★☆☆☆ Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error | |||
== Related Work == | == Related Work == | ||
★★★☆☆ Link rot | |||
★★★★☆ Information Hiding and Dependency Inversion Principles | |||
★★★☆☆ Federated Queries | |||
★★★☆☆ grlc | |||
★★☆☆☆ querypulator | |||
=== Testsuites === | === Testsuites === | ||
==== W3C SPARQL 1.1 Test Suite ==== | ==== ★★★☆☆ W3C SPARQL 1.1 Test Suite ==== | ||
W3C test set - why did we not use that as an example | |||
* Official test suite developed by the W3C SPARQL Working Group | * Official test suite developed by the W3C SPARQL Working Group | ||
| Line 49: | Line 59: | ||
=== Benchmarks === | === Benchmarks === | ||
==== An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata" ==== | ==== ★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata" ==== | ||
* Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine | * Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine | ||
* Uses complete version of Wikidata knowledge graph | * Uses complete version of Wikidata knowledge graph | ||
Revision as of 13:29, 9 July 2024
Introduction
- ★★★★★ Query rot versus link rot
- ★★★★☆ Transparency vs. complexity of SPARQL queries
- ★★★★☆ Use cases for named queries
- ★★★★☆ Persistent identifiers
- ★★★☆☆ Query hashes and short_urls
Mitigation Query Rot using snapquery
- ★★★★★ Parameterized queries
- ★★★☆☆ https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
- ★★★☆☆ https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
- ★★★★☆ Scholia Jinja templates
- ★★★★☆ Technical debt and accidental complexity
- ★★★☆☆ How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
SnapQuery Implementation
- ★★★★☆ SPARQL standard changes
- ★★★☆☆ Natural Language input
- ★★★☆☆ Automatic syntax repairs
- ★★★☆☆ Automatic conversion of SQL input, SPARQL output
Evaluation
- ★★★★☆ Wikidata example queries
- ★★★★★ Scholia and Wikidata graph split
- ★★★☆☆ Other knowledge graphs, e.g., DBLP, OpenStreetMap
- ★★☆☆☆ Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
- ★★★★★ Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
- ★★★★☆ List of standard refactoring activities and the support by this approach
- ★★★★☆ Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
- ★★★☆☆ Usability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
- ★★★★☆ https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
- ★★★★☆ A closed issue should have at least one example that runs
Conclusion and Future Work
- ★★★★★ Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases
- ★★★☆☆ Ambiguity of names
Additional Resources
- ★★☆☆☆ https://stackoverflow.com/questions/tagged/sparql
- ★★★☆☆ https://www.semantic-web-journal.net/system/files/swj3076.pdf
- ★★☆☆☆ https://arxiv.org/pdf/cs/0605124
- ★★★☆☆ https://arxiv.org/pdf/1402.0576 optimizing queries
- ★★☆☆☆ https://www.w3.org/TR/REC-rdf-syntax/
- ★★☆☆☆ https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
- ★★★☆☆ ESWC 2019 proceedings (978-3-030-21348-0.pdf)
- ★★☆☆☆ Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error
Related Work
★★★☆☆ Link rot ★★★★☆ Information Hiding and Dependency Inversion Principles ★★★☆☆ Federated Queries ★★★☆☆ grlc ★★☆☆☆ querypulator
Testsuites
★★★☆☆ W3C SPARQL 1.1 Test Suite
W3C test set - why did we not use that as an example
- Official test suite developed by the W3C SPARQL Working Group
- Designed to test conformance to the SPARQL 1.1 specification
- Covers a wide range of SPARQL features and edge cases
- Primarily focused on correctness rather than performance
see https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing/Running_TFT
Benchmarks
★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"
- Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
- Uses complete version of Wikidata knowledge graph
- Compares importing time, loading time, exporting time, and query performance
- Evaluates 328 queries defined by Wikidata users
- Also uses SP2Bench synthetic benchmark for comparison
- Provides detailed analysis of query execution plans and profiling information
- Offers insights on triplestore performance with large-scale real-world data
Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020
DOI: 10.5281/zenodo.4035223
- Focuses on evaluating performance of graph pattern matching in SPARQL engines
- Uses a subset of Wikidata as the dataset
- Provides a large set of SPARQL basic graph patterns
- Designed to test the benefits of worst-case optimal join algorithms
- Exhibits a variety of increasingly complex join patterns
- Allows for systematic testing of query optimization techniques
- Offers insights into the performance characteristics of different SPARQL engines on complex graph patterns