Revision as of 16:55, 29 November 2025

Here is the corrected version of the text with grammar, spelling, and styling fixes applied.

1. 1. Summary of fixes

**Typos**: `whereever` → `wherever`, `intented` → `intended`, `embeeding` → `embedding`, `hight` → `high`, `frustating` → `frustrating`, `wher` → `where`, `propsed` → `proposed`.
**Grammar**: `Compute readable` → `Computer-readable`, `I` capitalized.
**Formatting**: Added spaces between file formats (e.g., `DOC(X), XLS(X)`). Adjusted hyphenation for "knowledge graph-ready" and "state-of-the-art".

1. 1. Corrected Text

semantify³ - extract knowledge graph-ready triples from human-readable annotations wherever possible - Syntax matters!

What it is

Inspired by Syntax_Matters thoughts, this work wants to showcase a straightforward approach that is intended to be better than state-of-the-art metadata embedding formats such as RDFa and Microformat.

Goal

Computer-readable content is ubiquitous these days. There are many different file formats that have emerged and proliferated with high speed. Acronyms such as HTML, JSON, XML, CSV, PDF, BibTeX, DOC(X), XLS(X), and PPT(X) are well known to most computer users and in daily use. Still, a query such as "Give me all PowerPoint files mentioned in my Excel table courses.xlsx - find the keywords in the notes and then look up in my BibTeX files which authors are mentioned to look them up in DBLP, Wikidata, and Google Scholar" easily leads to a frustrating amount of effort.

The Semantic Web promise of a long time ago has not been fulfilled yet. Personally, I believe we are much closer than in the past decade where the adoption rate of proposed solutions has been subpar. The critical success factor for metadata adoption is Human Readability. If the authors/curators cannot easily write, verify, and maintain the metadata, the system collapses.

Therefore, metadata embedding should not pollute the Document Object Model (DOM) like RDFa or Microformats. Instead, it should reside in distinct, clean blocks. We propose using standard Markdown backticks to indicate the syntax (e.g., ` ```yaml `) combined with a UTF-8 World Wide Web Marker (e.g., 🌐🕸) to explicitly signal to crawlers that the embedded markup is intended for ingestion into a Knowledge Graph - the World Wide Web of Semantic Data.

Syntax matters!

Below is a comparison of legacy embedding methods versus the proposed clean-code approach using backticks and markers.

Legacy: Microformats

Microformats attempt to use existing HTML `class` attributes to convey meaning. This couples data to design, leading to fragility; if you change the CSS class name for styling, you might accidentally break your data graph.

<!-- Microformats: Fragile and tied to CSS classes -->
<div class="h-product">
  <p>Kaufen Sie den
    <span class="p-name">Staubsauger XF704</span>
    <img class="u-photo" src="acmeXF704.jpg" alt="" />
  </p>
</div>

Legacy: RDFa

RDFa represents the absolute low point of the Semantic Web. It forces data into the HTML structure using attributes like `vocab` and `property`, creating a verbose, unreadable mess that is difficult for humans to parse visually and impossible to maintain.

<!-- RDFa: Visual clutter. The data is effectively hidden inside tags. -->
<div vocab="http://schema.org/" typeof="Product">
  <p>Kaufen Sie den
     <span property="name">Staubsauger XF704</span>
     jetzt im Sonderangebot!
     <img property="image" src="acmeXF704.jpg" />
  </p>
</div>

Proposal: Spider Marker + Backticks (YAML)

We propose using standard Markdown code fences. The Spider Marker (🕸) tells the machine "This is data," and the backticks tell the editor "This is YAML." The data remains clean and separated from the HTML.

```yaml
# 🕸 Knowledge Graph Block
Products:
  StaubsaugerXF704:
    name: Staubsauger XF704
    image: acmeXF704.jpg
```

Products:
  StaubsaugerXF704:
    name: Staubsauger XF704
    image: acmeXF704.jpg

Proposal: Spider Marker + Backticks (SiDIF)

SiDIF allows for sentence-like structure, making the logic immediately apparent to the human reader, encapsulated safely in a code block.

```sidif
# 🕸 Knowledge Graph Block
StaubsaugerXF704 isA Product
  "Staubsauger XF704" is name of it
  "acmeXF704.jpg" is image of it
```

StaubsaugerXF704 isA Product
  "Staubsauger XF704" is name of it
  "acmeXF704.jpg" is image of it

Example

Let's eat our own dog food!

https://github.com/BITPlan/semantify3 has the proposed annotations. You can search for them with a `grep -R`.

@@ Line 1: / Line 1: @@
-semantify³ - extract knowledge graph ready triples from human readable annotations whereever  possible - Syntax matters!
+Here is the corrected version of the text with grammar, spelling, and styling fixes applied.
+### Summary of fixes
+*   **Typos**: `whereever` → `wherever`, `intented` → `intended`, `embeeding` → `embedding`, `hight` → `high`, `frustating` → `frustrating`, `wher` → `where`, `propsed` → `proposed`.
+*   **Grammar**: `Compute readable` → `Computer-readable`, `I` capitalized.
+*   **Formatting**: Added spaces between file formats (e.g., `DOC(X), XLS(X)`). Adjusted hyphenation for "knowledge graph-ready" and "state-of-the-art".
+### Corrected Text
+semantify³ - extract knowledge graph-ready triples from human-readable annotations wherever possible - Syntax matters!
 = What it is =
-Inspired by  {{Link|target=Syntax_Matters}} thoughts this work wants to showcase a straightforward approach that is intented to be better than state of the
+Inspired by {{Link|target=Syntax_Matters}} thoughts, this work wants to showcase a straightforward approach that is intended to be better than state-of-the-art metadata embedding formats such as [https://en.wikipedia.org/wiki/RDFa RDFa] and [https://en.wikipedia.org/wiki/Microformat Microformat].
-art meta data embeeding formats such as [https://en.wikipedia.org/wiki/RDFa RDFa] and [https://en.wikipedia.org/wiki/Microformat Microformat].
 = Goal =
-Compute readable content is ubiquitous these days. There are many different file formats that have emerged and proliferated with hight speed. Acronyms such
+Computer-readable content is ubiquitous these days. There are many different file formats that have emerged and proliferated with high speed. Acronyms such as HTML, JSON, XML, CSV, PDF, BibTeX, DOC(X), XLS(X), and PPT(X) are well known to most computer users and in daily use.
-as HTML, JSON, XML, CSV, PDF, BIBTEX, DOC(X),XLS(X), PPT(X) are well known to most computer users and in daily use.
+Still, a query such as "Give me all PowerPoint files mentioned in my Excel table courses.xlsx - find the keywords in the notes and then look up in my BibTeX files which authors are mentioned to look them up in DBLP, Wikidata, and Google Scholar" easily leads to a frustrating amount of effort.
-Still a query such as "Give me all powerpoint files mentioned in my excel table courses.xlsx - find the keywords in the notes and then look up in my bibtex files which authors are mentioned to look them up in dblp, wikidata and google scholar" is easily leading to a frustating amount of effort.
-The Semantic Web promise of a long time ago has not been fulfilled yet. Personally i believe we are much closer than in the past decade wher the adoption rate of
+The Semantic Web promise of a long time ago has not been fulfilled yet. Personally, I believe we are much closer than in the past decade where the adoption rate of proposed solutions has been subpar.
-propsed solutions has been sub par.
 The critical success factor for metadata adoption is '''Human Readability'''. If the authors/curators cannot easily write, verify, and maintain the metadata, the system collapses.
-Therefore, metadata embedding should not pollute the document object model (DOM) like RDFa or Microformats. Instead, it should reside in distinct, clean blocks. We propose using standard '''Markdown backticks''' to indicate the syntax (e.g., ` ```yaml `) combined with a '''UTF-8 World-Wide Web Marker''' (e.g., 🌐🕸) to explicitly signal to crawlers that the embedded markup is intended for ingestion into a Knowledge Graph - the World Wide Web of Semantic Data.
+Therefore, metadata embedding should not pollute the Document Object Model (DOM) like RDFa or Microformats. Instead, it should reside in distinct, clean blocks. We propose using standard '''Markdown backticks''' to indicate the syntax (e.g., ` ```yaml `) combined with a '''UTF-8 World Wide Web Marker''' (e.g., 🌐🕸) to explicitly signal to crawlers that the embedded markup is intended for ingestion into a Knowledge Graph - the World Wide Web of Semantic Data.
 === Syntax matters! ===
@@ Line 32: / Line 38: @@
 ==== Legacy: RDFa ====
-RDFa represents the absolute low-point of the Semantic Web. It forces data into the HTML structure using attributes like `vocab` and `property`, creating a verbose, unreadable mess that is difficult for humans to parse visually and impossible to maintain.
+RDFa represents the absolute low point of the Semantic Web. It forces data into the HTML structure using attributes like `vocab` and `property`, creating a verbose, unreadable mess that is difficult for humans to parse visually and impossible to maintain.
 <source lang='html'>
 <!-- RDFa: Visual clutter. The data is effectively hidden inside tags. -->
@@ Line 78: / Line 84: @@
 </syntaxhighlight>
 = Example =
-== Let's eat our own dog food ! ==
+== Let's eat our own dog food! ==
 https://github.com/BITPlan/semantify3
-has the proposed annotations
+has the proposed annotations.
-you can search the with a grep -R
+You can search for them with a `grep -R`.

Semantify3: Difference between revisions

Revision as of 16:55, 29 November 2025

Contents

What it is

Goal

Syntax matters!

Legacy: Microformats

Legacy: RDFa

Proposal: Spider Marker + Backticks (YAML)

Proposal: Spider Marker + Backticks (SiDIF)

Example

Let's eat our own dog food!

Navigation menu

Semantify3: Difference between revisions

Revision as of 16:55, 29 November 2025

What it is

Goal

Syntax matters!

Legacy: Microformats

Legacy: RDFa

Proposal: Spider Marker + Backticks (YAML)

Proposal: Spider Marker + Backticks (SiDIF)

Example

Let's eat our own dog food!

Navigation menu

Search