Difference between revisions of "Snapquery EKAW 2024 paper"

Revision as of 15:29, 9 July 2024

Introduction

★★★★★ Query rot versus link rot
★★★★☆ Transparency vs. complexity of SPARQL queries
★★★★☆ Use cases for named queries
★★★★☆ Persistent identifiers
★★★☆☆ Query hashes and short_urls

Mitigation Query Rot using snapquery

★★★★★ Parameterized queries
★★★☆☆ https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
★★★☆☆ https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
★★★★☆ Scholia Jinja templates
★★★★☆ Technical debt and accidental complexity
★★★☆☆ How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?

SnapQuery Implementation

★★★★☆ SPARQL standard changes
★★★☆☆ Natural Language input
★★★☆☆ Automatic syntax repairs
★★★☆☆ Automatic conversion of SQL input, SPARQL output

Evaluation

★★★★☆ Wikidata example queries
★★★★★ Scholia and Wikidata graph split
★★★☆☆ Other knowledge graphs, e.g., DBLP, OpenStreetMap
★★☆☆☆ Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
★★★★★ Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
★★★★☆ List of standard refactoring activities and the support by this approach
★★★★☆ Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
★★★☆☆ Usability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
★★★★☆ https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
★★★★☆ A closed issue should have at least one example that runs

Conclusion and Future Work

★★★★★ Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases
★★★☆☆ Ambiguity of names

Additional Resources

★★☆☆☆ https://stackoverflow.com/questions/tagged/sparql
★★★☆☆ https://www.semantic-web-journal.net/system/files/swj3076.pdf
★★☆☆☆ https://arxiv.org/pdf/cs/0605124
★★★☆☆ https://arxiv.org/pdf/1402.0576 optimizing queries
★★☆☆☆ https://www.w3.org/TR/REC-rdf-syntax/
★★☆☆☆ https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
★★★☆☆ ESWC 2019 proceedings (978-3-030-21348-0.pdf)
★★☆☆☆ Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error

Related Work

★★★☆☆ Link rot ★★★★☆ Information Hiding and Dependency Inversion Principles ★★★☆☆ Federated Queries ★★★☆☆ grlc ★★☆☆☆ querypulator

Testsuites

★★★☆☆ W3C SPARQL 1.1 Test Suite

W3C test set - why did we not use that as an example

Official test suite developed by the W3C SPARQL Working Group
Designed to test conformance to the SPARQL 1.1 specification
Covers a wide range of SPARQL features and edge cases
Primarily focused on correctness rather than performance

see https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing/Running_TFT

Benchmarks

★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"

Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
Uses complete version of Wikidata knowledge graph
Compares importing time, loading time, exporting time, and query performance
Evaluates 328 queries defined by Wikidata users
Also uses SP2Bench synthetic benchmark for comparison
Provides detailed analysis of query execution plans and profiling information
Offers insights on triplestore performance with large-scale real-world data

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020

DOI: 10.5281/zenodo.4035223

Focuses on evaluating performance of graph pattern matching in SPARQL engines
Uses a subset of Wikidata as the dataset
Provides a large set of SPARQL basic graph patterns
Designed to test the benefits of worst-case optimal join algorithms
Exhibits a variety of increasingly complex join patterns
Allows for systematic testing of query optimization techniques
Offers insights into the performance characteristics of different SPARQL engines on complex graph patterns

References

a b Wolfgang Fahl; Tim Holzheim; Christoph Lange; Stefan Decker. (2023) "Semantification of CEUR-WS with Wikidata as a Target Knowledge Graph" . url: https://ceur-ws.org/Vol-3447/Text2KG_Paper_13.pdf
^ | Christoph Lange;Angelo Di Iorio. (2014) "Semantic Publishing Challenge – Assessing the Quality of Scientific Output" - 61-76 pages. doi: 10.1007/978-3-319-12024-9_8
^ Paul Warren;Paul Mulholland. (2020) "A Comparison of the Cognitive Difficulties Posed by SPARQL Query Constructs" - 3-19 pages. doi: 10.1007/978-3-030-61244-3_1at: EKAW 2022
^ Paul Warren;Paul Mulholland. (2018) "Using SPARQL – The Practitioners’ Viewpoint" - 485-500 pages. doi: 10.1007/978-3-030-03667-6_31
^ | Muhammad Saleem;Muhammad Intizar Ali;Aidan Hogan;Qaiser Mehmood;Axel-Cyrille Ngonga Ngomo. (2015) "LSQ: The Linked SPARQL Queries Dataset" - 261-269 pages. doi: 10.1007/978-3-319-25010-6_15
^ Johannes Lorey;Felix Naumann. (2013) "Detecting SPARQL Query Templates for Data Prefetching" - 124-139 pages. doi: 10.1007/978-3-642-38288-8_9
^ Angela Bonifati;Wim Martens;Thomas Timm. (2020) "An analytical study of large SPARQL query logs" - 655-679 pages. doi: 10.1007/s00778-019-00558-9

@@ Line 1: / Line 1: @@
-== Key Points ==
+== Introduction ==
-# Query rot versus link rot
+# ★★★★★ Query rot versus link rot
-# Transparency vs. complexity of SPARQL queries
+# ★★★★☆ Transparency vs. complexity of SPARQL queries
-# Use cases for named queries
+# ★★★★☆ Use cases for named queries
-# Ambiguity of names
+# ★★★★☆ Persistent identifiers
-# Persistent identifiers
+# ★★★☆☆ Query hashes and short_urls
-## Query hashes and short_urls
+== Mitigation Query Rot using snapquery ==
-# How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
+# ★★★★★ Parameterized queries
-# Parameterized queries
+# ★★★☆☆ https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
-## https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
+# ★★★☆☆ https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
-## https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
+# ★★★★☆ Scholia Jinja templates
-## Scholia Jinja templates
+# ★★★★☆ Technical debt and accidental complexity
-# technical debt and accidential complexity
+# ★★★☆☆ How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
-# Wikidata example queries
+== SnapQuery Implementation ==
-# Scholia and Wikidata graph split
+# ★★★★☆ SPARQL standard changes
-# Other knowledge graphs, e.g., DBLP, OpenStreetMap
+# ★★★☆☆ Natural Language input
-# Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
+# ★★★☆☆ Automatic syntax repairs
-# An Ngo Lam's ESWC 2023 paper as a pointer to the style of comparison
+# ★★★☆☆ Automatic conversion of SQL input, SPARQL output
-# Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
+== Evaluation ==
-# List of standard refactoring activities and the support by this approach
+#★★★★☆ Wikidata example queries
-# SPARQL standard changes
+#★★★★★ Scholia and Wikidata graph split
-# Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
+#★★★☆☆ Other knowledge graphs, e.g., DBLP, OpenStreetMap
-# W3C test set - why did we not use that as an example
+#★★☆☆☆ Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
-# Useability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
+#★★★★★ Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
-# https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
+#★★★★☆ List of standard refactoring activities and the support by this approach
-# Natural Language input
+#★★★★☆ Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
-## Automatic syntax repairs
+#★★★☆☆ Usability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
-## automatic conversion of SQL input, SPARQL output.
+#★★★★☆ https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
-# A closed issue should have at least one example that runs
+#★★★★☆ A closed issue should have at least one example that runs
-# Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases.
+== Conclusion and Future Work ==
-# https://stackoverflow.com/questions/tagged/sparql
+#★★★★★ Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases
-# https://www.semantic-web-journal.net/system/files/swj3076.pdf
+#★★★☆☆ Ambiguity of names
-# https://arxiv.org/pdf/cs/0605124
+== Additional Resources ==
-# https://arxiv.org/pdf/1402.0576 optimizing queries
+#★★☆☆☆ https://stackoverflow.com/questions/tagged/sparql
-# https://www.w3.org/TR/REC-rdf-syntax/
+#★★★☆☆ https://www.semantic-web-journal.net/system/files/swj3076.pdf
-# https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
+#★★☆☆☆ https://arxiv.org/pdf/cs/0605124
-# ESWC 2019 proceedings (978-3-030-21348-0.pdf)
+#★★★☆☆ https://arxiv.org/pdf/1402.0576 optimizing queries
-# Linked Data Fragments https://linkeddatafragments.org/  e.g. https://ldfclient.wmflabs.org/ 404 error
+#★★☆☆☆ https://www.w3.org/TR/REC-rdf-syntax/
+#★★☆☆☆ https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
+#★★★☆☆ ESWC 2019 proceedings (978-3-030-21348-0.pdf)
+#★★☆☆☆ Linked Data Fragments https://linkeddatafragments.org/  e.g. https://ldfclient.wmflabs.org/ 404 error
 == Related Work ==
+★★★☆☆ Link rot
+★★★★☆ Information Hiding and Dependency Inversion Principles
+★★★☆☆ Federated Queries
+★★★☆☆ grlc
+★★☆☆☆ querypulator
 === Testsuites ===
-==== W3C SPARQL 1.1 Test Suite ====
+==== ★★★☆☆ W3C SPARQL 1.1 Test Suite ====
+ W3C test set - why did we not use that as an example
 * Official test suite developed by the W3C SPARQL Working Group
@@ Line 49: / Line 59: @@
 === Benchmarks ===
-==== An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata" ====
+==== ★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata" ====
 * Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
 * Uses complete version of Wikidata knowledge graph

Difference between revisions of "Snapquery EKAW 2024 paper"

Revision as of 15:29, 9 July 2024

Contents

Introduction

Mitigation Query Rot using snapquery

SnapQuery Implementation

Evaluation

Conclusion and Future Work

Additional Resources

Related Work

Testsuites

★★★☆☆ W3C SPARQL 1.1 Test Suite

Benchmarks

★★★★☆ An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020

References

Navigation menu

Search