Snapquery Metaquery Experiment

Wolfgang Fahl

Claude Sonnet 4.5

Claude Sonnet 4.5[edit]

Meta Queries: Eating Our Own Dog Food[edit]

Generation Prompt[edit]

Create a MediaWiki table and section for the meta queries describing our "eating our own dog food" approach.

Context: We have 15 meta queries that analyze snapquery's own query execution data. Currently these use SQL on JSON/YAML data for development agility. The key point: the same query names and contracts will work when we migrate to SPARQL/RDF format (as proposed by SIB project via Jerven Bolleman's issue #59). This proves our technology-agnostic approach - snapquery will use its own query-set tool to convert from current JSON/YAML to SPARQL/RDF without breaking client code.

Input: YAML file with meta queries (query_count, query_success, query_failures_by_category, etc.)

Output needed: 
- MediaWiki section explaining the self-hosting validation concept
- Table listing query names and purposes (NOT the SQL code)
- Explain future migration path to SPARQL/RDF

Overview[edit]

Snapquery demonstrates its core principle of technology-agnostic query management by applying it to itself - a practice known as "eating your own dog food." We maintain 15 meta queries that analyze snapquery's own query execution data, currently implemented using SQL queries over JSON/YAML storage for development agility.

The crucial validation: these same named queries with identical contracts will seamlessly migrate to SPARQL queries over RDF data when we transition snapquery's internal storage to a knowledge graph. We will use snapquery's own query-set tool to perform this migration, converting the current JSON/YAML format to SPARQL/RDF following the format proposed by the SIB project via Jerven Bolleman's issue #59.

This self-hosting approach proves that:

Query contracts remain stable across technology changes
Client code doesn't break when backend storage evolves
The abstraction layer successfully isolates queries from execution technology
Snapquery can manage its own infrastructure evolution

Meta Query Inventory[edit]

Table 1: Meta Queries for Analyzing Snapquery Query Sets
Query Name	Purpose
`query_count`	Count total number of named queries in the system
`query_success`	Analyze successful query executions grouped by endpoint
`query_failures_by_category`	Break down query failures by error category, domain, and namespace
`query_failures_by_category_grouped`	Aggregate failure patterns showing which error categories affect which domains and namespaces
`query_failures_by_category_grouped_counted`	Advanced failure analysis with ranked counts across domains, namespaces, and endpoints per error category
`query_failures_by_database_count`	Count query failures grouped by underlying database technology
`query_success_by_namespace`	Analyze success counts organized by namespace and endpoint
`query_namespace_endpoint_matrix_with_distinct`	Comprehensive matrix showing total, successful, and failed query counts for each namespace-endpoint combination
`query_stats`	Calculate execution statistics including duration and result set sizes per query
`params_stats`	Analyze distribution of parameter usage patterns across queries
`query_details_stats`	Compute overall statistics on query complexity (parameter counts, line counts, sizes)
`domain_namespace_stats`	Count queries grouped by domain and namespace
`all_endpoints`	List all registered endpoints with metadata
`error_histogram`	Generate frequency distribution of errors per query to identify most problematic queries
`scholia_jinja_for_loops`	Analyze Jinja template usage patterns in Scholia queries

Migration Path[edit]

The meta queries currently execute as SQL queries on snapquery's JSON/YAML data model. When we migrate to RDF storage with full SPARQL endpoint capabilities:

The query names remain identical (query_count, query_failures_by_category_grouped_counted, etc.)
The query contracts (inputs/outputs) remain unchanged
The implementation evolves from SQL to SPARQL
Client code using these queries requires zero modifications
The format conversion follows SIB project specifications (Jerven Bolleman's issue #59)

For example, the query_failures_by_category_grouped_counted meta query currently uses SQL window functions and aggregations. After migration, it will use equivalent SPARQL aggregation patterns and GROUP_CONCAT operations - but applications calling this named query won't notice the difference.

Self-Hosting Validation[edit]

This approach validates snapquery's architecture in the most practical way possible: we build snapquery using the same principles we advocate for SPARQL query management. By starting with SQL/JSON/YAML for rapid iteration while maintaining the architectural vision for full RDF/SPARQL support, we ensure that:

Our technology-agnostic abstraction layer actually works
Migration paths are real, not theoretical
The query-set tool can handle complex transformations
"Query rot" mitigation strategies apply across different backend technologies

When we complete the migration to SPARQL/RDF, these 15 meta queries will continue functioning without modification - the ultimate proof that query abstraction isolates applications from infrastructure evolution.

GPT-5[edit]

Meta Queries (WF 2025-05-06): Dogfooding and Future-Proofing[edit]

We “eat our own dog food” by managing operational meta queries with the very same query-set tool, registry, and pipelines used for end-user queries. We started with agile JSON/YAML declarations and a relational backend, while intentionally defining a stable query contract that can be lifted to a full SPARQL/RDF stack without changing how users or dashboards interact with results.

Key points:

Same contract, multiple backends: the named queries below expose stable result shapes that will be preserved when we enable a SPARQL endpoint and RDF publishing.
Future-proof via SIB format: our query-set tool converts the current JSON/YAML inputs to SPARQL/RDF in the format proposed by the SIB project (via Jeven Bollemans issue #59), so the exact same query set can target RDF graphs.
One pipeline for all: declarations, scheduling, execution, and monitoring are unified; meta queries ride the same CI, caching, and reporting paths as user-facing queries.
Purpose only: the table intentionally lists purposes, not code, to document intent and contract without disclosing implementation details.

Meta Queries (WF 2025-05-06): Dogfooding and Future-Proofing[edit]

Key points:

Same contract, multiple backends: the named queries below expose stable result shapes that will be preserved when we enable a SPARQL endpoint and RDF publishing.
Future-proof via SIB format: our query-set tool converts the current JSON/YAML inputs to SPARQL/RDF in the format proposed by the SIB project (via Jeven Bollemans issue #59), so the exact same query set can target RDF graphs.
One pipeline for all: declarations, scheduling, execution, and monitoring are unified; meta queries ride the same CI, caching, and reporting paths as user-facing queries.
Purpose only: the table intentionally lists purposes, not code, to document intent and contract without disclosing implementation details.

Meta Query Catalog (purpose-only)[edit]

Name	Purpose
query_count	Total number of registered named queries.
query_success	Count of successful executions per endpoint.
query_failures_by_category	Failures grouped by domain, namespace, and error category.
query_failures_by_category_grouped	Aggregated failures per error category with grouped domains and namespaces.
query_failures_by_category_grouped_counted	Failure rollups per category with per-domain/namespace/endpoint tallies.
query_failures_by_database_count	Failure counts grouped by backend database type.
query_success_by_namespace	Successful runs per namespace and endpoint.
query_namespace_endpoint_matrix_with_distinct	Matrix of distinct successful/failed queries per domain–namespace–endpoint, including totals and sums.
all_queries	Inventory of all named queries for introspection and cataloging.
error_histogram	Per-query error frequency to identify hotspots.
query_stats	Per-query execution summaries (durations and records) for performance tracking.
params_stats	Frequency of parameter signatures to understand usage patterns.
query_details_stats	Distribution of query complexity metrics (parameters, lines, size).
domain_namespace_stats	Query counts per domain and namespace (filterable).
all_endpoints	Catalog of available endpoints and their characteristics.
scholia_jinja_for_loops	Analysis of template for-loop usage patterns in query text.

Contract and Migration[edit]

Contract stability: each meta query’s output schema is treated as a contract that remains stable across backends.
SPARQL/RDF lift: contracts are mapped 1:1 to SPARQL/RDF shapes using the SIB-aligned conversion (Jeven Bollemans issue #59).
Seamless switch: dashboards and exports consume the contract, not the implementation, enabling a backend switch without user-facing changes.

Prompt used to generate this section[edit]

create a latex table and section for the meta queries and explain how we eat our own dog food

Grok-4[edit]

Meta Queries[edit]

Snapquery demonstrates its future-proof design by "eating its own dog food"—applying its principles to manage and analyze query metadata while starting with agile, lightweight technologies. To enable rapid development and iteration, we initially implemented metadata storage using JSON/YAML files and an internal SQLite database, with SQL-based meta-queries for analysis. This choice prioritizes agility and ease of prototyping over immediate full RDF compliance. However, snapquery is designed to be technology-agnostic, allowing seamless migration to a complete SPARQL endpoint and RDF capabilities in the future. Using snapquery itself, we can refactor and migrate these meta-queries and data structures to RDF without disrupting existing functionality, ensuring long-term scalability and standards compliance. Specifically, our query-set tool supports converting current JSON and YAML inputs to SPARQL/RDF formats as proposed by the SIB project (via Jeven Bollemans' issue #59), maintaining the same query contract across technologies.

These meta-queries operate on query details, execution statistics, and endpoint information, enabling systematic analysis of query performance, failures, and characteristics across namespaces, domains, and endpoints. This self-referential approach showcases snapquery's extensibility: while currently using SQL for meta-analysis, it can uniformly handle SPARQL or other query languages, supporting proactive monitoring and optimization of query sets. The meta-queries are stored in metaqueries.yaml and can be executed via snapquery's CLI or API.

This page serves as the primary MediaWiki documentation for snapquery's meta-queries. It was initially generated using an LLM prompt emphasizing snapquery's future-proof migration strategy. A LaTeX adaptation of this content is available for paper submissions, but this wiki version includes the full meta-query YAML for collaborative editing (see below or the source repository).

The generation of this section was guided by the following prompt: “are you completly nuts? my input was [full YAML content of meta-queries] and i asked for a table and section describing our own eaty-our own dog food style. To document this we need a mediawiki version first. The table will of course not disclose the code for how the query works just the puppose. That shows off that we will later be able to use the same query contract for sparql / rdf. our query-set tool allows us to conver the current json and yaml input to SPARQL/RDF in the format proposed by the SIB project via Jeven Bollemans issue #59 You must add the prompt”

Meta Queries[edit]

Selected Meta Queries in snapquery[edit]

Query Name	Purpose
query_count	Counts the total number of named queries in the system.
query_success	Aggregates successful query executions per endpoint, ordered by count.
query_failures_by_category	Groups query failures by error category, domain, and namespace for detailed failure analysis.
query_failures_by_category_grouped	Aggregates failures by category, collecting associated domains and namespaces.
query_failures_by_category_grouped_counted	Provides detailed aggregated counts of failures per category, including breakdowns by domain, namespace, and endpoint.
query_failures_by_database_count	Counts failures grouped by the underlying database type of the endpoints.
query_success_by_namespace	Aggregates successful queries per namespace and endpoint, useful for namespace performance comparison.
query_namespace_endpoint_matrix_with_distinct	Creates a matrix view of success and failure statistics per domain, namespace, and endpoint, including distinct counts.
all_queries	Retrieves all named queries for comprehensive listing.
error_histogram	Generates a histogram of error occurrences per query ID.
query_stats	Computes execution statistics (e.g., duration and record counts) per query ID.
params_stats	Analyzes the distribution of query parameters across details.
query_details_stats	Calculates aggregate statistics on query properties like parameter counts, line lengths, and sizes.
domain_namespace_stats	Counts queries per domain and namespace, with wildcard support for filtering.
all_endpoints	Lists all configured endpoints with their key attributes.
scholia_jinja_for_loops	Analyzes the usage of Jinja for-loops in Scholia-related queries for pattern identification.

Full Meta-Query YAML[edit]

For collaborative editing and reference, the full YAML content is provided here (note: this includes SQL details for internal use; public documentation focuses on purposes only):

# Meta Queries
# WF 2025-05-06
'query_count':
    sql: |
      SELECT COUNT(*)
      FROM NamedQuery
'query_success':
    sql: |
        SELECT 
           COUNT(*) AS count,
           endpoint_name
        
        FROM "QueryStats"
        WHERE records>0
        GROUP BY endpoint_name
        ORDER BY 1 DESC
'query_failures_by_category':
    sql: |
        SELECT 
        count(*) as count,
        nq.domain,
        nq.namespace,
        error_category
        FROM QueryStats qs
        JOIN NamedQuery nq
        ON qs.query_id=nq.query_id
        WHERE error_category IS NOT NULL 
        
        GROUP BY error_category,nq.namespace,nq.domain
        ORDER BY 1 DESC
'query_failures_by_category_grouped':
    sql: |
        SELECT 
          count(*) AS count,
          GROUP_CONCAT(DISTINCT nq.domain) AS domains,
          GROUP_CONCAT(DISTINCT nq.namespace) AS namespaces,
          error_category
        FROM QueryStats qs
        JOIN NamedQuery nq ON qs.query_id = nq.query_id
        WHERE error_category IS NOT NULL
        GROUP BY error_category
        ORDER BY  count DESC;
'query_failures_by_category_grouped_counted':
    sql: |
        SELECT 
          error_category,
          SUM(entry_count) AS total_count,
          GROUP_CONCAT(DISTINCT domain_counts ORDER BY domain_count DESC) AS domain_counts,
          GROUP_CONCAT(DISTINCT namespace_counts ORDER BY namespace_count DESC) AS namespace_counts,
          GROUP_CONCAT(DISTINCT endpoint_counts ORDER BY endpoint_count DESC) AS endpoint_counts
        FROM (
          SELECT 
            error_category,
            domain,
            namespace,
            endpoint_name,
            COUNT(*) AS entry_count,
            domain || ' (' || SUM(COUNT(*)) OVER (PARTITION BY error_category, domain) || ')' AS domain_counts,
            namespace || ' (' || SUM(COUNT(*)) OVER (PARTITION BY error_category, namespace) || ')' AS namespace_counts,
            endpoint_name || ' (' || SUM(COUNT(*)) OVER (PARTITION BY error_category, endpoint_name) || ')' AS endpoint_counts,
            SUM(COUNT(*)) OVER (PARTITION BY error_category, domain) AS domain_count,
            SUM(COUNT(*)) OVER (PARTITION BY error_category, namespace) AS namespace_count,
            SUM(COUNT(*)) OVER (PARTITION BY error_category, endpoint_name) AS endpoint_count
          FROM QueryStats qs
          JOIN NamedQuery nq ON qs.query_id = nq.query_id
          WHERE error_category IS NOT NULL
          GROUP BY error_category, domain, namespace, endpoint_name
        ) sub
        GROUP BY error_category
        ORDER BY total_count DESC;
'query_failures_by_database_count':
    sql: |
        SELECT
           COUNT(*) AS count,
           ep.database 
        FROM QueryStats qs
        JOIN Endpoint ep
        ON qs.endpoint_name=ep.name
        WHERE qs.error_msg IS NOT NULL
        ORDER BY 1 DESC
'query_success_by_namespace':
    sql: |
        SELECT 
           COUNT(*) AS count,
           nq.namespace,
           qs.endpoint_name
           
        FROM QueryStats qs
        JOIN NamedQuery nq on qs.query_id=nq.query_id
        WHERE records>0
        GROUP BY endpoint_name,namespace
        ORDER BY 2 ASC,1 DESC
# This query calculates statistics for named queries across 
# different domains, namespaces and endpoints.
# It joins NamedQuery with a subquery from QueryStats 
# to get both success and failure counts.
# Another subquery is used to get the total count of 
# distinct queries per namespace.
# 
# The results show:
# - domain
# - namespace
# - endpoint name (or 'No Endpoint' if null)
# - total count of distinct queries in the namespace
# - count of distinct successful queries
# - count of distinct failed queries
# - sum of success counts
# - sum of failure counts
#
# Note: A query is considered failed if it has a non-null error_msg.
# The records count is used for successful queries.
'query_namespace_endpoint_matrix_with_distinct':
  sql: |
    SELECT
        nq.domain,
        nq.namespace,
        COALESCE(qs.endpoint_name, 'No Endpoint') AS endpoint_name,
        total.total_count AS total,
        COUNT(DISTINCT (CASE WHEN qs.success_count > 0 THEN qs.query_id END)) AS distinct_successful,
        COUNT(DISTINCT (CASE WHEN qs.failure_count > 0 THEN qs.query_id END)) AS distinct_failed,
        SUM(qs.success_count) AS success_count,
        SUM(qs.failure_count) AS failure_count
    FROM NamedQuery nq
        LEFT JOIN (
            SELECT 
                query_id,
                endpoint_name,
                context,
                SUM(CASE WHEN error_msg IS NULL AND records > 0 THEN 1 ELSE 0 END) AS success_count,
                SUM(CASE WHEN error_msg IS NOT NULL THEN 1 ELSE 0 END) AS failure_count
            FROM QueryStats
            GROUP BY query_id, endpoint_name, context
        ) qs ON nq.query_id = qs.query_id
        LEFT JOIN (
          SELECT domain, namespace, COUNT(DISTINCT(query_id)) AS total_count
          FROM NamedQuery
          GROUP BY domain, namespace
        ) total ON nq.domain = total.domain AND nq.namespace = total.namespace
    GROUP BY nq.domain, nq.namespace, qs.endpoint_name, total.total_count
'all_queries':
    sql: |
      SELECT * FROM NamedQuery
'error_histogram':
    sql: |
      SELECT COUNT(*), query_id
      FROM QueryStats
      WHERE error_msg is not null
      GROUP BY query_id
      ORDER BY 1 DESC
'query_stats':
    sql: |
      SELECT query_id,
        COUNT(duration) AS count,
        MIN(duration) AS min_time,
        MAX(duration) AS max_time,
        AVG(duration) AS avg_time,
        MIN(records) AS min,
        MAX(records) AS max,
        AVG(records) AS avg
      FROM QueryStats
        GROUP by query_id
        ORDER BY 1 DESC;
'params_stats':
    sql: |
        SELECT count(*),
            params 
        FROM "QueryDetails" 
        GROUP BY params 
        ORDER BY 1 desc
'query_details_stats':
    sql: |
      SELECT 
        MIN(param_count) AS min_param_count,
        MAX(param_count) AS max_param_count,
        AVG(param_count) AS avg_param_count,
        MIN(lines) AS min_lines,
        MAX(lines) AS max_lines,
        AVG(lines) AS avg_lines,
        MIN(size) AS min_size,
        MAX(size) AS max_size,
        AVG(size) AS avg_size
      FROM 
        QueryDetails;
'domain_namespace_stats':
  sql: |
    SELECT 
      domain,
      namespace,
      COUNT(*) AS query_count
    FROM NamedQuery
    WHERE domain LIKE ? AND namespace LIKE ?
    GROUP BY domain, namespace
    ORDER BY 3 DESC
'all_endpoints':
    sql: |
        SELECT 
            name,
            endpoint,
            lang,
            website,
            database,
            method
        FROM Endpoint
'scholia_jinja_for_loops':
    sql: |
        SELECT
          count(*),
          substr(
            sparql,
            instr(sparql, '{% for') + length('{% for'),  -- Start position right after "{% for"
            instr(substr(sparql, instr(sparql, '{% for')), '%}') - length('{% for')  -- Length of substring
          ) as for_loop_content
        FROM
          NamedQuery
        WHERE
          sparql LIKE '%{% for%' ESCAPE '\' and for_loop_content like "%in%"
        group by for_loop_content
        order by 1 desc

Snapquery Metaquery Experiment

Contents

Claude Sonnet 4.5

Contents

Claude Sonnet 4.5[edit]

Meta Queries: Eating Our Own Dog Food[edit]

Contents

Meta Queries: Eating Our Own Dog Food[edit]

Generation Prompt[edit]

Overview[edit]

Meta Query Inventory[edit]

Migration Path[edit]

Self-Hosting Validation[edit]

Contents

GPT-5[edit]

Meta Queries (WF 2025-05-06): Dogfooding and Future-Proofing[edit]

Meta Queries (WF 2025-05-06): Dogfooding and Future-Proofing[edit]

Meta Query Catalog (purpose-only)[edit]

Contract and Migration[edit]

Prompt used to generate this section[edit]

Grok-4[edit]

Meta Queries[edit]

Meta Queries[edit]

Selected Meta Queries in snapquery[edit]

Full Meta-Query YAML[edit]