Create a MediaWiki table and section for the meta queries describing our "eating our own dog food" approach. Context: We have 15 meta queries that analyze snapquery's own query execution data. Currently these use SQL on JSON/YAML data for development agility. The key point: the same query names and contracts will work when we migrate to SPARQL/RDF format (as proposed by SIB project via Jerven Bolleman's issue #59). This proves our technology-agnostic approach - snapquery will use its own query-set tool to convert from current JSON/YAML to SPARQL/RDF without breaking client code. Input: YAML file with meta queries (query_count, query_success, query_failures_by_category, etc.) Output needed: - MediaWiki section explaining the self-hosting validation concept - Table listing query names and purposes (NOT the SQL code) - Explain future migration path to SPARQL/RDF
Snapquery demonstrates its core principle of technology-agnostic query management by applying it to itself - a practice known as "eating your own dog food." We maintain 15 meta queries that analyze snapquery's own query execution data, currently implemented using SQL queries over JSON/YAML storage for development agility.
The crucial validation: these same named queries with identical contracts will seamlessly migrate to SPARQL queries over RDF data when we transition snapquery's internal storage to a knowledge graph. We will use snapquery's own query-set tool to perform this migration, converting the current JSON/YAML format to SPARQL/RDF following the format proposed by the SIB project via Jerven Bolleman's issue #59.
This self-hosting approach proves that:
| Query Name | Purpose |
|---|---|
query_count |
Count total number of named queries in the system |
query_success |
Analyze successful query executions grouped by endpoint |
query_failures_by_category |
Break down query failures by error category, domain, and namespace |
query_failures_by_category_grouped |
Aggregate failure patterns showing which error categories affect which domains and namespaces |
query_failures_by_category_grouped_counted |
Advanced failure analysis with ranked counts across domains, namespaces, and endpoints per error category |
query_failures_by_database_count |
Count query failures grouped by underlying database technology |
query_success_by_namespace |
Analyze success counts organized by namespace and endpoint |
query_namespace_endpoint_matrix_with_distinct |
Comprehensive matrix showing total, successful, and failed query counts for each namespace-endpoint combination |
query_stats |
Calculate execution statistics including duration and result set sizes per query |
params_stats |
Analyze distribution of parameter usage patterns across queries |
query_details_stats |
Compute overall statistics on query complexity (parameter counts, line counts, sizes) |
domain_namespace_stats |
Count queries grouped by domain and namespace |
all_endpoints |
List all registered endpoints with metadata |
error_histogram |
Generate frequency distribution of errors per query to identify most problematic queries |
scholia_jinja_for_loops |
Analyze Jinja template usage patterns in Scholia queries |
The meta queries currently execute as SQL queries on snapquery's JSON/YAML data model. When we migrate to RDF storage with full SPARQL endpoint capabilities:
query_count, query_failures_by_category_grouped_counted, etc.)For example, the query_failures_by_category_grouped_counted meta query currently uses SQL window functions and aggregations. After migration, it will use equivalent SPARQL aggregation patterns and GROUP_CONCAT operations - but applications calling this named query won't notice the difference.
This approach validates snapquery's architecture in the most practical way possible: we build snapquery using the same principles we advocate for SPARQL query management. By starting with SQL/JSON/YAML for rapid iteration while maintaining the architectural vision for full RDF/SPARQL support, we ensure that:
When we complete the migration to SPARQL/RDF, these 15 meta queries will continue functioning without modification - the ultimate proof that query abstraction isolates applications from infrastructure evolution.
We “eat our own dog food” by managing operational meta queries with the very same query-set tool, registry, and pipelines used for end-user queries. We started with agile JSON/YAML declarations and a relational backend, while intentionally defining a stable query contract that can be lifted to a full SPARQL/RDF stack without changing how users or dashboards interact with results.
Key points:
We “eat our own dog food” by managing operational meta queries with the very same query-set tool, registry, and pipelines used for end-user queries. We started with agile JSON/YAML declarations and a relational backend, while intentionally defining a stable query contract that can be lifted to a full SPARQL/RDF stack without changing how users or dashboards interact with results.
Key points:
| Name | Purpose |
|---|---|
| query_count | Total number of registered named queries. |
| query_success | Count of successful executions per endpoint. |
| query_failures_by_category | Failures grouped by domain, namespace, and error category. |
| query_failures_by_category_grouped | Aggregated failures per error category with grouped domains and namespaces. |
| query_failures_by_category_grouped_counted | Failure rollups per category with per-domain/namespace/endpoint tallies. |
| query_failures_by_database_count | Failure counts grouped by backend database type. |
| query_success_by_namespace | Successful runs per namespace and endpoint. |
| query_namespace_endpoint_matrix_with_distinct | Matrix of distinct successful/failed queries per domain–namespace–endpoint, including totals and sums. |
| all_queries | Inventory of all named queries for introspection and cataloging. |
| error_histogram | Per-query error frequency to identify hotspots. |
| query_stats | Per-query execution summaries (durations and records) for performance tracking. |
| params_stats | Frequency of parameter signatures to understand usage patterns. |
| query_details_stats | Distribution of query complexity metrics (parameters, lines, size). |
| domain_namespace_stats | Query counts per domain and namespace (filterable). |
| all_endpoints | Catalog of available endpoints and their characteristics. |
| scholia_jinja_for_loops | Analysis of template for-loop usage patterns in query text. |
create a latex table and section for the meta queries and explain how we eat our own dog food
Snapquery demonstrates its future-proof design by "eating its own dog food"—applying its principles to manage and analyze query metadata while starting with agile, lightweight technologies. To enable rapid development and iteration, we initially implemented metadata storage using JSON/YAML files and an internal SQLite database, with SQL-based meta-queries for analysis. This choice prioritizes agility and ease of prototyping over immediate full RDF compliance. However, snapquery is designed to be technology-agnostic, allowing seamless migration to a complete SPARQL endpoint and RDF capabilities in the future. Using snapquery itself, we can refactor and migrate these meta-queries and data structures to RDF without disrupting existing functionality, ensuring long-term scalability and standards compliance. Specifically, our query-set tool supports converting current JSON and YAML inputs to SPARQL/RDF formats as proposed by the SIB project (via Jeven Bollemans' issue #59), maintaining the same query contract across technologies.
These meta-queries operate on query details, execution statistics, and endpoint information, enabling systematic analysis of query performance, failures, and characteristics across namespaces, domains, and endpoints. This self-referential approach showcases snapquery's extensibility: while currently using SQL for meta-analysis, it can uniformly handle SPARQL or other query languages, supporting proactive monitoring and optimization of query sets. The meta-queries are stored in metaqueries.yaml and can be executed via snapquery's CLI or API.
This page serves as the primary MediaWiki documentation for snapquery's meta-queries. It was initially generated using an LLM prompt emphasizing snapquery's future-proof migration strategy. A LaTeX adaptation of this content is available for paper submissions, but this wiki version includes the full meta-query YAML for collaborative editing (see below or the source repository).
The generation of this section was guided by the following prompt: “are you completly nuts? my input was [full YAML content of meta-queries] and i asked for a table and section describing our own eaty-our own dog food style. To document this we need a mediawiki version first. The table will of course not disclose the code for how the query works just the puppose. That shows off that we will later be able to use the same query contract for sparql / rdf. our query-set tool allows us to conver the current json and yaml input to SPARQL/RDF in the format proposed by the SIB project via Jeven Bollemans issue #59 You must add the prompt”
Snapquery demonstrates its future-proof design by "eating its own dog food"—applying its principles to manage and analyze query metadata while starting with agile, lightweight technologies. To enable rapid development and iteration, we initially implemented metadata storage using JSON/YAML files and an internal SQLite database, with SQL-based meta-queries for analysis. This choice prioritizes agility and ease of prototyping over immediate full RDF compliance. However, snapquery is designed to be technology-agnostic, allowing seamless migration to a complete SPARQL endpoint and RDF capabilities in the future. Using snapquery itself, we can refactor and migrate these meta-queries and data structures to RDF without disrupting existing functionality, ensuring long-term scalability and standards compliance. Specifically, our query-set tool supports converting current JSON and YAML inputs to SPARQL/RDF formats as proposed by the SIB project (via Jeven Bollemans' issue #59), maintaining the same query contract across technologies.
These meta-queries operate on query details, execution statistics, and endpoint information, enabling systematic analysis of query performance, failures, and characteristics across namespaces, domains, and endpoints. This self-referential approach showcases snapquery's extensibility: while currently using SQL for meta-analysis, it can uniformly handle SPARQL or other query languages, supporting proactive monitoring and optimization of query sets. The meta-queries are stored in metaqueries.yaml and can be executed via snapquery's CLI or API.
This page serves as the primary MediaWiki documentation for snapquery's meta-queries. It was initially generated using an LLM prompt emphasizing snapquery's future-proof migration strategy. A LaTeX adaptation of this content is available for paper submissions, but this wiki version includes the full meta-query YAML for collaborative editing (see below or the source repository).
The generation of this section was guided by the following prompt: “are you completly nuts? my input was [full YAML content of meta-queries] and i asked for a table and section describing our own eaty-our own dog food style. To document this we need a mediawiki version first. The table will of course not disclose the code for how the query works just the puppose. That shows off that we will later be able to use the same query contract for sparql / rdf. our query-set tool allows us to conver the current json and yaml input to SPARQL/RDF in the format proposed by the SIB project via Jeven Bollemans issue #59 You must add the prompt”
| Query Name | Purpose |
|---|---|
| query_count | Counts the total number of named queries in the system. |
| query_success | Aggregates successful query executions per endpoint, ordered by count. |
| query_failures_by_category | Groups query failures by error category, domain, and namespace for detailed failure analysis. |
| query_failures_by_category_grouped | Aggregates failures by category, collecting associated domains and namespaces. |
| query_failures_by_category_grouped_counted | Provides detailed aggregated counts of failures per category, including breakdowns by domain, namespace, and endpoint. |
| query_failures_by_database_count | Counts failures grouped by the underlying database type of the endpoints. |
| query_success_by_namespace | Aggregates successful queries per namespace and endpoint, useful for namespace performance comparison. |
| query_namespace_endpoint_matrix_with_distinct | Creates a matrix view of success and failure statistics per domain, namespace, and endpoint, including distinct counts. |
| all_queries | Retrieves all named queries for comprehensive listing. |
| error_histogram | Generates a histogram of error occurrences per query ID. |
| query_stats | Computes execution statistics (e.g., duration and record counts) per query ID. |
| params_stats | Analyzes the distribution of query parameters across details. |
| query_details_stats | Calculates aggregate statistics on query properties like parameter counts, line lengths, and sizes. |
| domain_namespace_stats | Counts queries per domain and namespace, with wildcard support for filtering. |
| all_endpoints | Lists all configured endpoints with their key attributes. |
| scholia_jinja_for_loops | Analyzes the usage of Jinja for-loops in Scholia-related queries for pattern identification. |
For collaborative editing and reference, the full YAML content is provided here (note: this includes SQL details for internal use; public documentation focuses on purposes only):
# Meta Queries
# WF 2025-05-06
'query_count':
sql: |
SELECT COUNT(*)
FROM NamedQuery
'query_success':
sql: |
SELECT
COUNT(*) AS count,
endpoint_name
FROM "QueryStats"
WHERE records>0
GROUP BY endpoint_name
ORDER BY 1 DESC
'query_failures_by_category':
sql: |
SELECT
count(*) as count,
nq.domain,
nq.namespace,
error_category
FROM QueryStats qs
JOIN NamedQuery nq
ON qs.query_id=nq.query_id
WHERE error_category IS NOT NULL
GROUP BY error_category,nq.namespace,nq.domain
ORDER BY 1 DESC
'query_failures_by_category_grouped':
sql: |
SELECT
count(*) AS count,
GROUP_CONCAT(DISTINCT nq.domain) AS domains,
GROUP_CONCAT(DISTINCT nq.namespace) AS namespaces,
error_category
FROM QueryStats qs
JOIN NamedQuery nq ON qs.query_id = nq.query_id
WHERE error_category IS NOT NULL
GROUP BY error_category
ORDER BY count DESC;
'query_failures_by_category_grouped_counted':
sql: |
SELECT
error_category,
SUM(entry_count) AS total_count,
GROUP_CONCAT(DISTINCT domain_counts ORDER BY domain_count DESC) AS domain_counts,
GROUP_CONCAT(DISTINCT namespace_counts ORDER BY namespace_count DESC) AS namespace_counts,
GROUP_CONCAT(DISTINCT endpoint_counts ORDER BY endpoint_count DESC) AS endpoint_counts
FROM (
SELECT
error_category,
domain,
namespace,
endpoint_name,
COUNT(*) AS entry_count,
domain || ' (' || SUM(COUNT(*)) OVER (PARTITION BY error_category, domain) || ')' AS domain_counts,
namespace || ' (' || SUM(COUNT(*)) OVER (PARTITION BY error_category, namespace) || ')' AS namespace_counts,
endpoint_name || ' (' || SUM(COUNT(*)) OVER (PARTITION BY error_category, endpoint_name) || ')' AS endpoint_counts,
SUM(COUNT(*)) OVER (PARTITION BY error_category, domain) AS domain_count,
SUM(COUNT(*)) OVER (PARTITION BY error_category, namespace) AS namespace_count,
SUM(COUNT(*)) OVER (PARTITION BY error_category, endpoint_name) AS endpoint_count
FROM QueryStats qs
JOIN NamedQuery nq ON qs.query_id = nq.query_id
WHERE error_category IS NOT NULL
GROUP BY error_category, domain, namespace, endpoint_name
) sub
GROUP BY error_category
ORDER BY total_count DESC;
'query_failures_by_database_count':
sql: |
SELECT
COUNT(*) AS count,
ep.database
FROM QueryStats qs
JOIN Endpoint ep
ON qs.endpoint_name=ep.name
WHERE qs.error_msg IS NOT NULL
ORDER BY 1 DESC
'query_success_by_namespace':
sql: |
SELECT
COUNT(*) AS count,
nq.namespace,
qs.endpoint_name
FROM QueryStats qs
JOIN NamedQuery nq on qs.query_id=nq.query_id
WHERE records>0
GROUP BY endpoint_name,namespace
ORDER BY 2 ASC,1 DESC
# This query calculates statistics for named queries across
# different domains, namespaces and endpoints.
# It joins NamedQuery with a subquery from QueryStats
# to get both success and failure counts.
# Another subquery is used to get the total count of
# distinct queries per namespace.
#
# The results show:
# - domain
# - namespace
# - endpoint name (or 'No Endpoint' if null)
# - total count of distinct queries in the namespace
# - count of distinct successful queries
# - count of distinct failed queries
# - sum of success counts
# - sum of failure counts
#
# Note: A query is considered failed if it has a non-null error_msg.
# The records count is used for successful queries.
'query_namespace_endpoint_matrix_with_distinct':
sql: |
SELECT
nq.domain,
nq.namespace,
COALESCE(qs.endpoint_name, 'No Endpoint') AS endpoint_name,
total.total_count AS total,
COUNT(DISTINCT (CASE WHEN qs.success_count > 0 THEN qs.query_id END)) AS distinct_successful,
COUNT(DISTINCT (CASE WHEN qs.failure_count > 0 THEN qs.query_id END)) AS distinct_failed,
SUM(qs.success_count) AS success_count,
SUM(qs.failure_count) AS failure_count
FROM NamedQuery nq
LEFT JOIN (
SELECT
query_id,
endpoint_name,
context,
SUM(CASE WHEN error_msg IS NULL AND records > 0 THEN 1 ELSE 0 END) AS success_count,
SUM(CASE WHEN error_msg IS NOT NULL THEN 1 ELSE 0 END) AS failure_count
FROM QueryStats
GROUP BY query_id, endpoint_name, context
) qs ON nq.query_id = qs.query_id
LEFT JOIN (
SELECT domain, namespace, COUNT(DISTINCT(query_id)) AS total_count
FROM NamedQuery
GROUP BY domain, namespace
) total ON nq.domain = total.domain AND nq.namespace = total.namespace
GROUP BY nq.domain, nq.namespace, qs.endpoint_name, total.total_count
'all_queries':
sql: |
SELECT * FROM NamedQuery
'error_histogram':
sql: |
SELECT COUNT(*), query_id
FROM QueryStats
WHERE error_msg is not null
GROUP BY query_id
ORDER BY 1 DESC
'query_stats':
sql: |
SELECT query_id,
COUNT(duration) AS count,
MIN(duration) AS min_time,
MAX(duration) AS max_time,
AVG(duration) AS avg_time,
MIN(records) AS min,
MAX(records) AS max,
AVG(records) AS avg
FROM QueryStats
GROUP by query_id
ORDER BY 1 DESC;
'params_stats':
sql: |
SELECT count(*),
params
FROM "QueryDetails"
GROUP BY params
ORDER BY 1 desc
'query_details_stats':
sql: |
SELECT
MIN(param_count) AS min_param_count,
MAX(param_count) AS max_param_count,
AVG(param_count) AS avg_param_count,
MIN(lines) AS min_lines,
MAX(lines) AS max_lines,
AVG(lines) AS avg_lines,
MIN(size) AS min_size,
MAX(size) AS max_size,
AVG(size) AS avg_size
FROM
QueryDetails;
'domain_namespace_stats':
sql: |
SELECT
domain,
namespace,
COUNT(*) AS query_count
FROM NamedQuery
WHERE domain LIKE ? AND namespace LIKE ?
GROUP BY domain, namespace
ORDER BY 3 DESC
'all_endpoints':
sql: |
SELECT
name,
endpoint,
lang,
website,
database,
method
FROM Endpoint
'scholia_jinja_for_loops':
sql: |
SELECT
count(*),
substr(
sparql,
instr(sparql, '{% for') + length('{% for'), -- Start position right after "{% for"
instr(substr(sparql, instr(sparql, '{% for')), '%}') - length('{% for') -- Length of substring
) as for_loop_content
FROM
NamedQuery
WHERE
sparql LIKE '%{% for%' ESCAPE '\' and for_loop_content like "%in%"
group by for_loop_content
order by 1 desc