Difference between revisions of "Snapquery EKAW 2024 paper"

From BITPlan cr Wiki
Jump to navigation Jump to search
Line 1: Line 1:
== Key Points ==
+
== Introduction ==
# Query rot versus link rot
+
# ★★★★★ Query rot versus link rot
# Transparency vs. complexity of SPARQL queries
+
# ★★★★☆ Transparency vs. complexity of SPARQL queries
# Use cases for named queries
+
# ★★★★☆ Use cases for named queries
# Ambiguity of names
+
# ★★★★☆ Persistent identifiers
# Persistent identifiers
+
# ★★★☆☆ Query hashes and short_urls
## Query hashes and short_urls
+
== Mitigation Query Rot using snapquery ==
# How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
+
# ★★★★★ Parameterized queries
# Parameterized queries
+
# ★★★☆☆ https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
## https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
+
# ★★★☆☆ https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
## https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
+
# ★★★★☆ Scholia Jinja templates
## Scholia Jinja templates
+
# ★★★★☆ Technical debt and accidental complexity
# technical debt and accidential complexity
+
# ★★★☆☆ How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
# Wikidata example queries
+
== SnapQuery Implementation ==
# Scholia and Wikidata graph split
+
# ★★★★☆ SPARQL standard changes
# Other knowledge graphs, e.g., DBLP, OpenStreetMap
+
# ★★★☆☆ Natural Language input
# Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
+
# ★★★☆☆ Automatic syntax repairs
# An Ngo Lam's ESWC 2023 paper as a pointer to the style of comparison
+
# ★★★☆☆ Automatic conversion of SQL input, SPARQL output
# Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
+
== Evaluation ==
# List of standard refactoring activities and the support by this approach
+
#★★★★☆ Wikidata example queries
# SPARQL standard changes
+
#★★★★★ Scholia and Wikidata graph split
# Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
+
#★★★☆☆ Other knowledge graphs, e.g., DBLP, OpenStreetMap
# W3C test set - why did we not use that as an example
+
#★★☆☆☆ Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
# Useability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
+
#★★★★★ Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
# https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
+
#★★★★☆ List of standard refactoring activities and the support by this approach
# Natural Language input
+
#★★★★☆ Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
## Automatic syntax repairs
+
#★★★☆☆ Usability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
## automatic conversion of SQL input, SPARQL output.
+
#★★★★☆ https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
# A closed issue should have at least one example that runs
+
#★★★★☆ A closed issue should have at least one example that runs
# Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases.
+
== Conclusion and Future Work ==
# https://stackoverflow.com/questions/tagged/sparql
+
#★★★★★ Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases
# https://www.semantic-web-journal.net/system/files/swj3076.pdf
+
#★★★☆☆ Ambiguity of names
# https://arxiv.org/pdf/cs/0605124
+
== Additional Resources ==
# https://arxiv.org/pdf/1402.0576 optimizing queries
+
#★★☆☆☆ https://stackoverflow.com/questions/tagged/sparql
# https://www.w3.org/TR/REC-rdf-syntax/
+
#★★★☆☆ https://www.semantic-web-journal.net/system/files/swj3076.pdf
# https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
+
#★★☆☆☆ https://arxiv.org/pdf/cs/0605124
# ESWC 2019 proceedings (978-3-030-21348-0.pdf)
+
#★★★☆☆ https://arxiv.org/pdf/1402.0576 optimizing queries
# Linked Data Fragments https://linkeddatafragments.org/  e.g. https://ldfclient.wmflabs.org/ 404 error
+
#★★☆☆☆ https://www.w3.org/TR/REC-rdf-syntax/
 +
#★★☆☆☆ https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
 +
#★★★☆☆ ESWC 2019 proceedings (978-3-030-21348-0.pdf)
 +
#★★☆☆☆ Linked Data Fragments https://linkeddatafragments.org/  e.g. https://ldfclient.wmflabs.org/ 404 error
  
 
== Related Work ==
 
== Related Work ==
 +
★★★☆☆ Link rot
 +
★★★★☆ Information Hiding and Dependency Inversion Principles
 +
★★★☆☆ Federated Queries
 +
★★★☆☆ grlc
 +
★★☆☆☆ querypulator
 +
 
=== Testsuites ===
 
=== Testsuites ===
==== W3C SPARQL 1.1 Test Suite ====
+
==== ★★★☆☆ W3C SPARQL 1.1 Test Suite ====
 +
W3C test set - why did we not use that as an example
  
 
* Official test suite developed by the W3C SPARQL Working Group
 
* Official test suite developed by the W3C SPARQL Working Group
Line 49: Line 59:
  
 
=== Benchmarks ===
 
=== Benchmarks ===
==== An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata" ====
+
==== ★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata" ====
 
* Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
 
* Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
 
* Uses complete version of Wikidata knowledge graph
 
* Uses complete version of Wikidata knowledge graph

Revision as of 14:29, 9 July 2024

Introduction

  1. ★★★★★ Query rot versus link rot
  2. ★★★★☆ Transparency vs. complexity of SPARQL queries
  3. ★★★★☆ Use cases for named queries
  4. ★★★★☆ Persistent identifiers
  5. ★★★☆☆ Query hashes and short_urls

Mitigation Query Rot using snapquery

  1. ★★★★★ Parameterized queries
  2. ★★★☆☆ https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
  3. ★★★☆☆ https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
  4. ★★★★☆ Scholia Jinja templates
  5. ★★★★☆ Technical debt and accidental complexity
  6. ★★★☆☆ How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?

SnapQuery Implementation

  1. ★★★★☆ SPARQL standard changes
  2. ★★★☆☆ Natural Language input
  3. ★★★☆☆ Automatic syntax repairs
  4. ★★★☆☆ Automatic conversion of SQL input, SPARQL output

Evaluation

  1. ★★★★☆ Wikidata example queries
  2. ★★★★★ Scholia and Wikidata graph split
  3. ★★★☆☆ Other knowledge graphs, e.g., DBLP, OpenStreetMap
  4. ★★☆☆☆ Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
  5. ★★★★★ Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
  6. ★★★★☆ List of standard refactoring activities and the support by this approach
  7. ★★★★☆ Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
  8. ★★★☆☆ Usability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
  9. ★★★★☆ https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
  10. ★★★★☆ A closed issue should have at least one example that runs

Conclusion and Future Work

  1. ★★★★★ Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases
  2. ★★★☆☆ Ambiguity of names

Additional Resources

  1. ★★☆☆☆ https://stackoverflow.com/questions/tagged/sparql
  2. ★★★☆☆ https://www.semantic-web-journal.net/system/files/swj3076.pdf
  3. ★★☆☆☆ https://arxiv.org/pdf/cs/0605124
  4. ★★★☆☆ https://arxiv.org/pdf/1402.0576 optimizing queries
  5. ★★☆☆☆ https://www.w3.org/TR/REC-rdf-syntax/
  6. ★★☆☆☆ https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
  7. ★★★☆☆ ESWC 2019 proceedings (978-3-030-21348-0.pdf)
  8. ★★☆☆☆ Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error

Related Work

★★★☆☆ Link rot ★★★★☆ Information Hiding and Dependency Inversion Principles ★★★☆☆ Federated Queries ★★★☆☆ grlc ★★☆☆☆ querypulator

Testsuites

★★★☆☆ W3C SPARQL 1.1 Test Suite

W3C test set - why did we not use that as an example
  • Official test suite developed by the W3C SPARQL Working Group
  • Designed to test conformance to the SPARQL 1.1 specification
  • Covers a wide range of SPARQL features and edge cases
  • Primarily focused on correctness rather than performance

see https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing/Running_TFT

Benchmarks

★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"

  • Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
  • Uses complete version of Wikidata knowledge graph
  • Compares importing time, loading time, exporting time, and query performance
  • Evaluates 328 queries defined by Wikidata users
  • Also uses SP2Bench synthetic benchmark for comparison
  • Provides detailed analysis of query execution plans and profiling information
  • Offers insights on triplestore performance with large-scale real-world data

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020

DOI: 10.5281/zenodo.4035223

  • Focuses on evaluating performance of graph pattern matching in SPARQL engines
  • Uses a subset of Wikidata as the dataset
  • Provides a large set of SPARQL basic graph patterns
  • Designed to test the benefits of worst-case optimal join algorithms
  • Exhibits a variety of increasingly complex join patterns
  • Allows for systematic testing of query optimization techniques
  • Offers insights into the performance characteristics of different SPARQL engines on complex graph patterns

References

  1. a b  Wolfgang Fahl; Tim Holzheim; Christoph Lange; Stefan Decker. (2023) "Semantification of CEUR-WS with Wikidata as a Target Knowledge Graph" . url: https://ceur-ws.org/Vol-3447/Text2KG_Paper_13.pdf
  2. a b  |  Christoph Lange;Angelo Di Iorio. (2014) "Semantic Publishing Challenge – Assessing the Quality of Scientific Output" - 61-76 pages. doi: 10.1007/978-3-319-12024-9_8
  3. ^  Paul Warren;Paul Mulholland. (2020) "A Comparison of the Cognitive Difficulties Posed by SPARQL Query Constructs" - 3-19 pages. doi: 10.1007/978-3-030-61244-3_1at: EKAW 2022
  4. ^  Paul Warren;Paul Mulholland. (2018) "Using SPARQL – The Practitioners’ Viewpoint" - 485-500 pages. doi: 10.1007/978-3-030-03667-6_31
  5. ^  |  Muhammad Saleem;Muhammad Intizar Ali;Aidan Hogan;Qaiser Mehmood;Axel-Cyrille Ngonga Ngomo. (2015) "LSQ: The Linked SPARQL Queries Dataset" - 261-269 pages. doi: 10.1007/978-3-319-25010-6_15
  6. ^  Johannes Lorey;Felix Naumann. (2013) "Detecting SPARQL Query Templates for Data Prefetching" - 124-139 pages. doi: 10.1007/978-3-642-38288-8_9
  7. ^  Angela Bonifati;Wim Martens;Thomas Timm. (2020) "An analytical study of large SPARQL query logs" - 655-679 pages. doi: 10.1007/s00778-019-00558-9