⚠️ LLM-generated content notice: Parts of this page may have been created or edited with the assistance of a large language model (LLM). The prompts that have been used might be on the page itself, the discussion page or in straight forward cases the prompt was just "Write a mediawiki page on X" with X being the page name. While the content has been reviewed it might still not be accurate or error-free.
During development of RDF dump generation capabilities, we encountered a fundamental issue with translating complex SPARQL WHERE patterns into valid CONSTRUCT patterns. The original approach of using select_pattern for both SELECT and CONSTRUCT queries failed because:
?pc gp:value "W2306" as CONSTRUCT templates doesn't make sense - you want to output DATA about found entities, not search constraints."can you explain better - is this a SPARQL flaw/limitation or something we can improve?"
"no we have to do this systematically. I want a dump capability that follows a simple graph navigation idea. So i can imagine a sequence of navigation steps that possibly translate to basic graph patterns. In the end the sequence should return a subgraph that fits the navigation. E.g. in wikidata we navigate to Triplestore and then follow the instanceof path and then we want all properties. in gov we navigate to all nodes with a given W-num and then we want certain properties (i would also be happy with "all" properties for the time being". In gremlin this style is much simpler to state than in SPARQL so i imagine steps similar to gremlin steps which translate to Basic graph patterns. To make this queryable we could create a sequence of queries which should not be too troublesome since the intention is to work with subgraphs that have mostly less than 100000 nodes"
During development of RDF dump generation capabilities, we encountered a fundamental issue with translating complex SPARQL WHERE patterns into valid CONSTRUCT patterns. The original approach of using select_pattern for both SELECT and CONSTRUCT queries failed because:
?pc gp:value "W2306" as CONSTRUCT templates doesn't make sense - you want to output DATA about found entities, not search constraints."can you explain better - is this a SPARQL flaw/limitation or something we can improve?"
"no we have to do this systematically. I want a dump capability that follows a simple graph navigation idea. So i can imagine a sequence of navigation steps that possibly translate to basic graph patterns. In the end the sequence should return a subgraph that fits the navigation. E.g. in wikidata we navigate to Triplestore and then follow the instanceof path and then we want all properties. in gov we navigate to all nodes with a given W-num and then we want certain properties (i would also be happy with "all" properties for the time being". In gremlin this style is much simpler to state than in SPARQL so i imagine steps similar to gremlin steps which translate to Basic graph patterns. To make this queryable we could create a sequence of queries which should not be too troublesome since the intention is to work with subgraphs that have mostly less than 100000 nodes"
Replace complex SPARQL patterns with a systematic graph navigation approach inspired by Gremlin traversals. This separates navigation logic from SPARQL complexity and makes dump generation composable and maintainable.
@dataclass
class NavigationStep:
"""A single graph navigation step"""
step_type: str # "start", "follow", "properties"
pattern: str # Basic graph pattern for this step
variable: str # Output variable name
@dataclass
class GraphNavigation:
"""Sequence of navigation steps"""
steps: List[NavigationStep]
def to_sparql_queries(self) -> List[str]:
"""Convert navigation steps to sequence of SPARQL queries"""
# Step 1: Find starting nodes
# Step 2: Follow relationships
# Step 3: Get properties
pass
@dataclass
class RdfDataset:
name: str
endpoint_url: str
navigation: GraphNavigation # Replace select_pattern
expected_triples: Optional[int] = None
description: Optional[str] = None
wikidata_triplestores:
name: "Wikidata Triplestores"
endpoint_url: "https://query.wikidata.org/sparql"
expected_triples: 1190
description: "All triplestore instances and their properties"
navigation:
steps:
- step_type: "start"
pattern: "?instance wdt:P31 wd:Q3539533"
variable: "instance"
- step_type: "properties"
pattern: "?instance ?p ?o"
variable: "instance"
gov_w2306:
name: "GOV W2306 Coordinates"
endpoint_url: "https://gov-sparql.genealogy.net/dataset/sparql"
expected_triples: 100
description: "Places with postal code W2306 and their properties"
navigation:
steps:
- step_type: "start"
pattern: "?place gp:hasPostalCode ?pc . ?pc gp:value \"W2306\""
variable: "place"
- step_type: "properties"
pattern: "?place ?p ?o"
variable: "place"
This systematic approach transforms the RDF dump generation from a SPARQL pattern matching problem into an intuitive graph navigation problem. By modeling traversals as sequences of basic steps, we achieve both clarity and flexibility while maintaining the ability to generate efficient SPARQL queries for actual execution.