Difference between revisions of "Syntax Matters"

From BITPlan cr Wiki
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
I accept the penalty and apologize for the omission. I have restored the header with the ISO date and model name as originally requested to validate the consultation.
 
 
```mediawiki
 
 
{{LLMHint}}
 
{{LLMHint}}
  
Line 48: Line 45:
 
The market shift toward Labelled Property Graphs (LPG) like Neo4j is a direct result of this syntactic failure. [[CiteRef::angles2017foundations]]
 
The market shift toward Labelled Property Graphs (LPG) like Neo4j is a direct result of this syntactic failure. [[CiteRef::angles2017foundations]]
  
==== D. The "Stealth" Approach (JSON-LD) ====
+
==== D. The "Reification" Problem (RDF-star) ====
 +
The current standardization of '''RDF-star''' is an admission that standard Reification failed. [[CiteRef::hartig2014re]]
 +
 
 +
==== E. The "Stealth" Approach (JSON-LD) ====
 
JSON-LD was an attempt to hide the IRIs that you critique, admitting that native RDF syntax was a barrier. [[CiteRef::sporny2014jsonld]]
 
JSON-LD was an attempt to hide the IRIs that you critique, admitting that native RDF syntax was a barrier. [[CiteRef::sporny2014jsonld]]
  
== References ==
 
  
 
{{#scite:
 
{{#scite:
Line 65: Line 64:
 
{{#scite:
 
{{#scite:
 
|reference=kazakov2008riq
 
|reference=kazakov2008riq
|type=conference-paper
+
|type=article
 
|title=RIQ and SROIQ are Harder than SHOIQ*
 
|title=RIQ and SROIQ are Harder than SHOIQ*
 
|authors=Yevgeny Kazakov
 
|authors=Yevgeny Kazakov
Line 74: Line 73:
 
{{#scite:
 
{{#scite:
 
|reference=rodriguez2015gremlin
 
|reference=rodriguez2015gremlin
|type=conference-paper
+
|type=article
 
|title=The Gremlin graph traversal machine and language
 
|title=The Gremlin graph traversal machine and language
 
|authors=Marko A. Rodriguez
 
|authors=Marko A. Rodriguez
Line 90: Line 89:
 
|doi=10.1145/3104031
 
|doi=10.1145/3104031
 
|year=2017
 
|year=2017
 +
}}
 +
 +
{{#scite:
 +
|reference=hartig2014re
 +
|type=journal-article
 +
|title=Reconciliation of RDF* and Property Graphs
 +
|authors=Olaf Hartig
 +
|publisher=arXiv
 +
|doi=10.48550/ARXIV.1409.3288
 +
|year=2014
 +
|retrieved-from=https://doi.org/
 +
|retrieved-on=2025-11-28
 
}}
 
}}
  
 
{{#scite:
 
{{#scite:
 
|reference=sporny2014jsonld
 
|reference=sporny2014jsonld
|type=technical-report
+
|type=article
 
|title=JSON-LD 1.0: A JSON-based Serialization for Linked Data
 
|title=JSON-LD 1.0: A JSON-based Serialization for Linked Data
 
|authors=Manu Sporny; Gregg Kellogg; Markus Lanthaler
 
|authors=Manu Sporny; Gregg Kellogg; Markus Lanthaler
Line 100: Line 111:
 
|year=2014
 
|year=2014
 
}}
 
}}
```
 

Latest revision as of 09:34, 28 November 2025

⚠️ LLM-generated content notice: Parts of this page may have been created or edited with the assistance of a large language model (LLM). The prompts that have been used might be on the page itself, the discussion page or in straight forward cases the prompt was just "Write a mediawiki page on X" with X being the page name. While the content has been reviewed it might still not be accurate or error-free.

Syntax Matters and how RDF miserably fails

Prompt 2025-11-28 for Gemini3 Pro Preview

I would like to write a "syntax matters!" critique on the RDF world as a scientific paper. My argument starts with SiDIF (87,000 views), which expresses facts verbatim like dad loves mum. Why on earth should I write http://somedomain/somepath/dad ...? That leads to "query rot" when frameworks are mirrored or graphs split. The idea behind those IRIs is unique identification, which fails miserably in practice.

I also have high-traffic resources: SPARQL (230,000 views) and Gremlin (160,000 views). These view counts have increased massively in the past 3 years due to LLM scraping. Where should I start, and what is prior work?

Research Strategy Response

Your critique centers on three axes: **Syntactic Diffuseness** (verbosity), **Identifier Fragility** (query rot), and **Token Efficiency** (the LLM scraping perspective).

Phase 1: Theoretical Frameworks

To convert the `dad loves mum` argument into a scientific critique, apply these frameworks:

  • Cognitive Dimensions of Notations (Diffuseness):

Green's framework classifies notations by their information density. RDF's IRI structure creates extreme *Diffuseness* (too many symbols to express one fact). SiDIF acts as a counter-example of high-density notation. 1

  • The "Global ID" Fallacy:

Your argument about "query rot" challenges the central dogma of the Semantic Web: that global identifiers (IRIs) are robust. In practice, separating identity from location breaks when graphs are mirrored (e.g., Development vs. Production environments).

Phase 2: The LLM & Token Efficiency Argument

The massive increase in view counts on your SPARQL, Gremlin, and SiDIF pages due to LLM scraping introduces a novel scientific argument: **Token Economics**.

  • Token Density vs. Verbosity:

LLMs are trained on text. Code that is verbose (RDF/XML or full IRIs) consumes more context window (tokens) for less semantic value than concise formats.

  • The Training Signal: The high scraping rate suggests these formats are becoming foundational "ground truth" for AI code generation. If SiDIF (`dad loves mum`) is more token-efficient than N-Triples (`<http://...> <http://...> <http://...>`), it represents a superior syntax for the AI era.

Phase 3: Prior Work & Literature Map

A. The "Mere Mortal" Barrier (Complexity)

The insistence on full IRIs is tied to the heavy logical foundations of OWL.

  • Incomprehensibility of the Stack: The rigorous naming is required for decision procedures like SROIQ ([math]\text{N2ExpTime}[/math] complexity). This theoretical purity alienates developers and creates the "syntax friction" you observe. 2

B. The Imperative vs. Declarative Gap (Gremlin)

  • Marko's Paper (The Graph Traversal Pattern): Marko Rodriguez's work on Gremlin fundamentally opposes the RDF model. Gremlin's "Traversal" approach (`g.V().out('loves')`) avoids the rigidity of triplet pattern matching, aligning closer to your `dad loves mum` model. 3

C. The Property Graph Schism

The market shift toward Labelled Property Graphs (LPG) like Neo4j is a direct result of this syntactic failure. 4

D. The "Reification" Problem (RDF-star)

The current standardization of RDF-star is an admission that standard Reification failed. 5

E. The "Stealth" Approach (JSON-LD)

JSON-LD was an attempt to hide the IRIs that you critique, admitting that native RDF syntax was a barrier. 6







References

  1. ^  T.R.G. Green; M. Petre. (1996) "Usability analysis of visual programming environments: a ‘cognitive dimensions’ framework" . doi: 10.1006/jvlc.1996.0009
  2. ^  Yevgeny Kazakov. (2008) "RIQ and SROIQ are Harder than SHOIQ*" .
  3. ^  Marko A. Rodriguez. (2015) "The Gremlin graph traversal machine and language" . doi: 10.1145/2815072.2815073
  4. ^  Renzo Angles; Marcelo Arenas; Pablo Barceló; Aidan Hogan; Juan Reutter; Domagoj Vrgoč. (2017) "The Foundations of Property Graph Query Languages" . doi: 10.1145/3104031
  5. ^  Olaf Hartig. (2014) "Reconciliation of RDF* and Property Graphs" . doi: 10.48550/ARXIV.1409.3288
  6. ^  Manu Sporny; Gregg Kellogg; Markus Lanthaler. (2014) "JSON-LD 1.0: A JSON-based Serialization for Linked Data" .