Difference between revisions of "Semantify3"
| Line 1: | Line 1: | ||
| − | semantify³ - extract knowledge graph ready triples from human readable annotations | + | Here is the corrected version of the text with grammar, spelling, and styling fixes applied. |
| + | |||
| + | ### Summary of fixes | ||
| + | * **Typos**: `whereever` → `wherever`, `intented` → `intended`, `embeeding` → `embedding`, `hight` → `high`, `frustating` → `frustrating`, `wher` → `where`, `propsed` → `proposed`. | ||
| + | * **Grammar**: `Compute readable` → `Computer-readable`, `I` capitalized. | ||
| + | * **Formatting**: Added spaces between file formats (e.g., `DOC(X), XLS(X)`). Adjusted hyphenation for "knowledge graph-ready" and "state-of-the-art". | ||
| + | |||
| + | ### Corrected Text | ||
| + | |||
| + | semantify³ - extract knowledge graph-ready triples from human-readable annotations wherever possible - Syntax matters! | ||
= What it is = | = What it is = | ||
| − | Inspired by | + | Inspired by {{Link|target=Syntax_Matters}} thoughts, this work wants to showcase a straightforward approach that is intended to be better than state-of-the-art metadata embedding formats such as [https://en.wikipedia.org/wiki/RDFa RDFa] and [https://en.wikipedia.org/wiki/Microformat Microformat]. |
| − | art | ||
= Goal = | = Goal = | ||
| − | + | Computer-readable content is ubiquitous these days. There are many different file formats that have emerged and proliferated with high speed. Acronyms such as HTML, JSON, XML, CSV, PDF, BibTeX, DOC(X), XLS(X), and PPT(X) are well known to most computer users and in daily use. | |
| − | as HTML, JSON, XML, CSV, PDF, | + | Still, a query such as "Give me all PowerPoint files mentioned in my Excel table courses.xlsx - find the keywords in the notes and then look up in my BibTeX files which authors are mentioned to look them up in DBLP, Wikidata, and Google Scholar" easily leads to a frustrating amount of effort. |
| − | Still a query such as "Give me all | ||
| − | The Semantic Web promise of a long time ago has not been fulfilled yet. Personally | + | The Semantic Web promise of a long time ago has not been fulfilled yet. Personally, I believe we are much closer than in the past decade where the adoption rate of proposed solutions has been subpar. |
| − | |||
The critical success factor for metadata adoption is '''Human Readability'''. If the authors/curators cannot easily write, verify, and maintain the metadata, the system collapses. | The critical success factor for metadata adoption is '''Human Readability'''. If the authors/curators cannot easily write, verify, and maintain the metadata, the system collapses. | ||
| − | Therefore, metadata embedding should not pollute the | + | Therefore, metadata embedding should not pollute the Document Object Model (DOM) like RDFa or Microformats. Instead, it should reside in distinct, clean blocks. We propose using standard '''Markdown backticks''' to indicate the syntax (e.g., ` ```yaml `) combined with a '''UTF-8 World Wide Web Marker''' (e.g., 🌐🕸) to explicitly signal to crawlers that the embedded markup is intended for ingestion into a Knowledge Graph - the World Wide Web of Semantic Data. |
=== Syntax matters! === | === Syntax matters! === | ||
| Line 32: | Line 38: | ||
==== Legacy: RDFa ==== | ==== Legacy: RDFa ==== | ||
| − | RDFa represents the absolute low | + | RDFa represents the absolute low point of the Semantic Web. It forces data into the HTML structure using attributes like `vocab` and `property`, creating a verbose, unreadable mess that is difficult for humans to parse visually and impossible to maintain. |
<source lang='html'> | <source lang='html'> | ||
<!-- RDFa: Visual clutter. The data is effectively hidden inside tags. --> | <!-- RDFa: Visual clutter. The data is effectively hidden inside tags. --> | ||
| Line 78: | Line 84: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
= Example = | = Example = | ||
| − | == Let's eat our own dog food ! == | + | == Let's eat our own dog food! == |
https://github.com/BITPlan/semantify3 | https://github.com/BITPlan/semantify3 | ||
| − | has the proposed annotations | + | has the proposed annotations. |
| − | + | You can search for them with a `grep -R`. | |
Revision as of 17:55, 29 November 2025
Here is the corrected version of the text with grammar, spelling, and styling fixes applied.
- Summary of fixes
- **Typos**: `whereever` → `wherever`, `intented` → `intended`, `embeeding` → `embedding`, `hight` → `high`, `frustating` → `frustrating`, `wher` → `where`, `propsed` → `proposed`.
- **Grammar**: `Compute readable` → `Computer-readable`, `I` capitalized.
- **Formatting**: Added spaces between file formats (e.g., `DOC(X), XLS(X)`). Adjusted hyphenation for "knowledge graph-ready" and "state-of-the-art".
- Corrected Text
semantify³ - extract knowledge graph-ready triples from human-readable annotations wherever possible - Syntax matters!
What it is
Inspired by Syntax_Matters thoughts, this work wants to showcase a straightforward approach that is intended to be better than state-of-the-art metadata embedding formats such as RDFa and Microformat.
Goal
Computer-readable content is ubiquitous these days. There are many different file formats that have emerged and proliferated with high speed. Acronyms such as HTML, JSON, XML, CSV, PDF, BibTeX, DOC(X), XLS(X), and PPT(X) are well known to most computer users and in daily use. Still, a query such as "Give me all PowerPoint files mentioned in my Excel table courses.xlsx - find the keywords in the notes and then look up in my BibTeX files which authors are mentioned to look them up in DBLP, Wikidata, and Google Scholar" easily leads to a frustrating amount of effort.
The Semantic Web promise of a long time ago has not been fulfilled yet. Personally, I believe we are much closer than in the past decade where the adoption rate of proposed solutions has been subpar. The critical success factor for metadata adoption is Human Readability. If the authors/curators cannot easily write, verify, and maintain the metadata, the system collapses.
Therefore, metadata embedding should not pollute the Document Object Model (DOM) like RDFa or Microformats. Instead, it should reside in distinct, clean blocks. We propose using standard Markdown backticks to indicate the syntax (e.g., ` ```yaml `) combined with a UTF-8 World Wide Web Marker (e.g., 🌐🕸) to explicitly signal to crawlers that the embedded markup is intended for ingestion into a Knowledge Graph - the World Wide Web of Semantic Data.
Syntax matters!
Below is a comparison of legacy embedding methods versus the proposed clean-code approach using backticks and markers.
Legacy: Microformats
Microformats attempt to use existing HTML `class` attributes to convey meaning. This couples data to design, leading to fragility; if you change the CSS class name for styling, you might accidentally break your data graph.
<!-- Microformats: Fragile and tied to CSS classes -->
<div class="h-product">
<p>Kaufen Sie den
<span class="p-name">Staubsauger XF704</span>
<img class="u-photo" src="acmeXF704.jpg" alt="" />
</p>
</div>
Legacy: RDFa
RDFa represents the absolute low point of the Semantic Web. It forces data into the HTML structure using attributes like `vocab` and `property`, creating a verbose, unreadable mess that is difficult for humans to parse visually and impossible to maintain.
<!-- RDFa: Visual clutter. The data is effectively hidden inside tags. -->
<div vocab="http://schema.org/" typeof="Product">
<p>Kaufen Sie den
<span property="name">Staubsauger XF704</span>
jetzt im Sonderangebot!
<img property="image" src="acmeXF704.jpg" />
</p>
</div>
Proposal: Spider Marker + Backticks (YAML)
We propose using standard Markdown code fences. The Spider Marker (🕸) tells the machine "This is data," and the backticks tell the editor "This is YAML." The data remains clean and separated from the HTML.
```yaml
# 🕸 Knowledge Graph Block
Products:
StaubsaugerXF704:
name: Staubsauger XF704
image: acmeXF704.jpg
```
Products:
StaubsaugerXF704:
name: Staubsauger XF704
image: acmeXF704.jpg
Proposal: Spider Marker + Backticks (SiDIF)
SiDIF allows for sentence-like structure, making the logic immediately apparent to the human reader, encapsulated safely in a code block.
```sidif # 🕸 Knowledge Graph Block StaubsaugerXF704 isA Product "Staubsauger XF704" is name of it "acmeXF704.jpg" is image of it ```
StaubsaugerXF704 isA Product
"Staubsauger XF704" is name of it
"acmeXF704.jpg" is image of it
Example
Let's eat our own dog food!
https://github.com/BITPlan/semantify3 has the proposed annotations. You can search for them with a `grep -R`.