⚠️ LLM-generated content notice: Parts of this page may have been created or edited with the assistance of a large language model (LLM). The prompts that have been used might be on the page itself, the discussion page or in straight forward cases the prompt was just "Write a mediawiki page on X" with X being the page name. While the content has been reviewed it might still not be accurate or error-free.
I would like to write a "syntax matters!" critique on the RDF world as a scientific paper. My argument starts with SiDIF (87,000 views), which expresses facts verbatim like
dad loves mum. Why on earth should I writehttp://somedomain/somepath/dad ...? That leads to "query rot" when frameworks are mirrored or graphs split. The idea behind those IRIs is unique identification, which fails miserably in practice.I also have high-traffic resources: SPARQL (230,000 views) and Gremlin (160,000 views). These view counts have increased massively in the past 3 years due to LLM scraping. Where should I start, and what is prior work?
Your critique centers on three axes: **Syntactic Diffuseness** (verbosity), **Identifier Fragility** (query rot), and **Token Efficiency** (the LLM scraping perspective).
I would like to write a "syntax matters!" critique on the RDF world as a scientific paper. My argument starts with SiDIF (87,000 views), which expresses facts verbatim like
dad loves mum. Why on earth should I writehttp://somedomain/somepath/dad ...? That leads to "query rot" when frameworks are mirrored or graphs split. The idea behind those IRIs is unique identification, which fails miserably in practice.I also have high-traffic resources: SPARQL (230,000 views) and Gremlin (160,000 views). These view counts have increased massively in the past 3 years due to LLM scraping. Where should I start, and what is prior work?
Your critique centers on three axes: **Syntactic Diffuseness** (verbosity), **Identifier Fragility** (query rot), and **Token Efficiency** (the LLM scraping perspective).
To convert the `dad loves mum` argument into a scientific critique, apply these frameworks:
Green's framework classifies notations by their information density. RDF's IRI structure creates extreme *Diffuseness* (too many symbols to express one fact). SiDIF acts as a counter-example of high-density notation. 1
Your argument about "query rot" challenges the central dogma of the Semantic Web: that global identifiers (IRIs) are robust. In practice, separating identity from location breaks when graphs are mirrored (e.g., Development vs. Production environments).
The massive increase in view counts on your SPARQL, Gremlin, and SiDIF pages due to LLM scraping introduces a novel scientific argument: **Token Economics**.
LLMs are trained on text. Code that is verbose (RDF/XML or full IRIs) consumes more context window (tokens) for less semantic value than concise formats.
The insistence on full IRIs is tied to the heavy logical foundations of OWL.
The market shift toward Labelled Property Graphs (LPG) like Neo4j is a direct result of this syntactic failure. 4
The current standardization of RDF-star is an admission that standard Reification failed. 5
JSON-LD was an attempt to hide the IRIs that you critique, admitting that native RDF syntax was a barrier. 6