Difference between revisions of "Semantify3"

From BITPlan cr Wiki
Jump to navigation Jump to search
(Created page with "semantify³ - extract knowledge graph ready triples from human readable annotations whereever possible - Syntax matters! = What it is = Inspired by {{Link|target=Syntax_Mat...")
(No difference)

Revision as of 17:52, 29 November 2025

semantify³ - extract knowledge graph ready triples from human readable annotations whereever possible - Syntax matters!

What it is

Inspired by Syntax_Matters thoughts this work wants to showcase a straightforward approach that is intented to be better than state of the art meta data embeeding formats such as RDFa and Microformat.

Goal

Compute readable content is ubiquitous these days. There are many different file formats that have emerged and proliferated with hight speed. Acronyms such as HTML, JSON, XML, CSV, PDF, BIBTEX, DOC(X),XLS(X), PPT(X) are well known to most computer users and in daily use. Still a query such as "Give me all powerpoint files mentioned in my excel table courses.xlsx - find the keywords in the notes and then look up in my bibtex files which authors are mentioned to look them up in dblp, wikidata and google scholar" is easily leading to a frustating amount of effort.

The Semantic Web promise of a long time ago has not been fulfilled yet. Personally i believe we are much closer than in the past decade wher the adoption rate of propsed solutions has been sub par. The critical success factor for metadata adoption is Human Readability. If the authors/curators cannot easily write, verify, and maintain the metadata, the system collapses.

Therefore, metadata embedding should not pollute the document object model (DOM) like RDFa or Microformats. Instead, it should reside in distinct, clean blocks. We propose using standard Markdown backticks to indicate the syntax (e.g., ` ```yaml `) combined with a UTF-8 World-Wide Web Marker (e.g., 🌐🕸) to explicitly signal to crawlers that the embedded markup is intended for ingestion into a Knowledge Graph - the World Wide Web of Semantic Data.

Syntax matters!

Below is a comparison of legacy embedding methods versus the proposed clean-code approach using backticks and markers.

Legacy: Microformats

Microformats attempt to use existing HTML `class` attributes to convey meaning. This couples data to design, leading to fragility; if you change the CSS class name for styling, you might accidentally break your data graph.

<!-- Microformats: Fragile and tied to CSS classes -->
<div class="h-product">
  <p>Kaufen Sie den
    <span class="p-name">Staubsauger XF704</span>
    <img class="u-photo" src="acmeXF704.jpg" alt="" />
  </p>
</div>

Legacy: RDFa

RDFa represents the absolute low-point of the Semantic Web. It forces data into the HTML structure using attributes like `vocab` and `property`, creating a verbose, unreadable mess that is difficult for humans to parse visually and impossible to maintain.

<!-- RDFa: Visual clutter. The data is effectively hidden inside tags. -->
<div vocab="http://schema.org/" typeof="Product">
  <p>Kaufen Sie den
     <span property="name">Staubsauger XF704</span>
     jetzt im Sonderangebot!
     <img property="image" src="acmeXF704.jpg" />
  </p>
</div>

Proposal: Spider Marker + Backticks (YAML)

We propose using standard Markdown code fences. The Spider Marker (🕸) tells the machine "This is data," and the backticks tell the editor "This is YAML." The data remains clean and separated from the HTML.

```yaml
# 🕸 Knowledge Graph Block
Products:
  StaubsaugerXF704:
    name: Staubsauger XF704
    image: acmeXF704.jpg
```
Products:
  StaubsaugerXF704:
    name: Staubsauger XF704
    image: acmeXF704.jpg

Proposal: Spider Marker + Backticks (SiDIF)

SiDIF allows for sentence-like structure, making the logic immediately apparent to the human reader, encapsulated safely in a code block.

```sidif
# 🕸 Knowledge Graph Block
StaubsaugerXF704 isA Product
  "Staubsauger XF704" is name of it
  "acmeXF704.jpg" is image of it
```
StaubsaugerXF704 isA Product
  "Staubsauger XF704" is name of it
  "acmeXF704.jpg" is image of it

index.html example

sed -i '/🌐🕸 Embedded/{N;s/\(.*\)\n\(.*```yaml\)/\2\n\1/}' *.conf