Snapquery EKAW 2024 paper

From BITPlan cr Wiki
Revision as of 14:25, 9 July 2024 by Wf (talk | contribs)
Jump to navigation Jump to search

Key Points

  1. Query rot versus link rot
  2. Transparency vs. complexity of SPARQL queries
  3. Use cases for named queries
  4. Ambiguity of names
  5. Persistent identifiers
    1. Query hashes and short_urls
  6. How to deal with aspects that do not (usually) influence the execution of a SPARQL query, like whitespace, comments, capitalization and variable names?
  7. Parameterized queries
    1. https://web.archive.org/web/20150512231123/http://answers.semanticweb.com:80/questions/12147/whats-the-best-way-to-parameterize-sparql-queries
    2. https://jena.apache.org/documentation/query/parameterized-sparql-strings.html
    3. Scholia Jinja templates
  8. technical debt and accidential complexity
  9. Wikidata example queries
  10. Scholia and Wikidata graph split
  11. Other knowledge graphs, e.g., DBLP, OpenStreetMap
  12. Perhaps also some NFDI examples or some custom knowledge graphs like FAIRJupyter
  13. An Ngo Lam's ESWC 2023 paper as a pointer to the style of comparison
  14. Quality criteria https://github.com/WolfgangFahl/snapquery/issues/26
  15. List of standard refactoring activities and the support by this approach
  16. SPARQL standard changes
  17. Getting your own copy of Wikidata; the infrastructure effort needs to be mentioned
  18. W3C test set - why did we not use that as an example
  19. Useability evaluation https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
  20. https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
  21. Natural Language input
    1. Automatic syntax repairs
    2. automatic conversion of SQL input, SPARQL output.
  22. A closed issue should have at least one example that runs
  23. Hypothesis by Stefan Decker: Query rot is more prominent in KG environments than with relational databases.
  24. https://stackoverflow.com/questions/tagged/sparql
  25. https://www.semantic-web-journal.net/system/files/swj3076.pdf
  26. https://arxiv.org/pdf/cs/0605124
  27. https://arxiv.org/pdf/1402.0576 optimizing queries
  28. https://www.w3.org/TR/REC-rdf-syntax/
  29. https://biblio.ugent.be/publication/8632551/file/8653456 Towards supporting multiple semantics of named graphs using N3 rules
  30. ESWC 2019 proceedings (978-3-030-21348-0.pdf)
  31. Linked Data Fragments https://linkeddatafragments.org/ e.g. https://ldfclient.wmflabs.org/ 404 error

Related Work

Testsuites

W3C SPARQL 1.1 Test Suite

  • Official test suite developed by the W3C SPARQL Working Group
  • Designed to test conformance to the SPARQL 1.1 specification
  • Covers a wide range of SPARQL features and edge cases
  • Primarily focused on correctness rather than performance

see https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing/Running_TFT

Benchmarks

An Ngoc Lam et al.'s ESWC 2023 paper "Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata"

  • Evaluates performance of 5 RDF triplestores and 1 experimental SPARQL engine
  • Uses complete version of Wikidata knowledge graph
  • Compares importing time, loading time, exporting time, and query performance
  • Evaluates 328 queries defined by Wikidata users
  • Also uses SP2Bench synthetic benchmark for comparison
  • Provides detailed analysis of query execution plans and profiling information
  • Offers insights on triplestore performance with large-scale real-world data

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL by Aidan Hogan et al., 2020

DOI: 10.5281/zenodo.4035223

  • Focuses on evaluating performance of graph pattern matching in SPARQL engines
  • Uses a subset of Wikidata as the dataset
  • Provides a large set of SPARQL basic graph patterns
  • Designed to test the benefits of worst-case optimal join algorithms
  • Exhibits a variety of increasingly complex join patterns
  • Allows for systematic testing of query optimization techniques
  • Offers insights into the performance characteristics of different SPARQL engines on complex graph patterns

References

  1. a b  Wolfgang Fahl; Tim Holzheim; Christoph Lange; Stefan Decker. (2023) "Semantification of CEUR-WS with Wikidata as a Target Knowledge Graph" . url: https://ceur-ws.org/Vol-3447/Text2KG_Paper_13.pdf
  2. ^  |  Christoph Lange;Angelo Di Iorio. (2014) "Semantic Publishing Challenge – Assessing the Quality of Scientific Output" - 61-76 pages. doi: 10.1007/978-3-319-12024-9_8
  3. ^  Paul Warren;Paul Mulholland. (2020) "A Comparison of the Cognitive Difficulties Posed by SPARQL Query Constructs" - 3-19 pages. doi: 10.1007/978-3-030-61244-3_1at: EKAW 2022
  4. ^  Paul Warren;Paul Mulholland. (2018) "Using SPARQL – The Practitioners’ Viewpoint" - 485-500 pages. doi: 10.1007/978-3-030-03667-6_31
  5. ^  |  Muhammad Saleem;Muhammad Intizar Ali;Aidan Hogan;Qaiser Mehmood;Axel-Cyrille Ngonga Ngomo. (2015) "LSQ: The Linked SPARQL Queries Dataset" - 261-269 pages. doi: 10.1007/978-3-319-25010-6_15
  6. ^  Johannes Lorey;Felix Naumann. (2013) "Detecting SPARQL Query Templates for Data Prefetching" - 124-139 pages. doi: 10.1007/978-3-642-38288-8_9
  7. ^  Angela Bonifati;Wim Martens;Thomas Timm. (2020) "An analytical study of large SPARQL query logs" - 655-679 pages. doi: 10.1007/s00778-019-00558-9