Package Authoring
Semantic Rails packages are authored as a directory of YAML files under the
schema_version: 1 contract. The loader normalizes the ergonomic
authoring shape into a single internal runtime contract.
Measures vs metrics — the conceptual split
Before any syntax, understand the two layers. They are distinct surfaces in the catalog, with distinct purposes.
Measures are primitives. A measure is a columnar fact (a sum, a count, a count-distinct) that the API can query flexibly — by any reachable entity, time grain, dimension breakdown, or aggregation function within the measure's allowed set. Measures are the building blocks. ARR is a measure: sum of monthly ARR contributions, queryable by customer, by segment, by month.
Metrics are governed access patterns. A metric is a named, stable
contract that codifies a specific use of one or more measures — with conditions,
filters, time alignment, or composition (ratio, cumulative, derived). Metrics exist
for governance and clarity. NRR is a metric:
(start_arr + expansion_arr − churn_arr) / start_arr with specific
cohort and time-alignment conditions.
Implications:
- Not every measure needs a corresponding metric. Many measures are queryable as primitives.
- The catalog lists
measuresandmetricsas distinct surfaces. Both are queryable; only metrics carry stable governance. - Measures do not auto-publish to metrics. Authors who want a measure exposed as a governed metric write it explicitly in the
metrics:block.
Directory shape
configs/semantic_rails/<package>/
package.yml # identity, warehouse, connection, seeds
graph.yml # canonical entities + explicit relationships
policies.yml # optional — visibility / access / release
caveats.yml # optional — advisory interpretation context
models/ # one file per warehouse table or mart
...
metrics/ # optional — governed access patterns
...
segments/ # optional — entity-bounded membership filters
examples/ # optional — runnable example queries
tests/ # optional — package-local regression tests
The loader merges every YAML file under models/**,
metrics/**, and segments/* into a single
PackageConfig.
Conceptual model
| Layer | Role | Design guidance |
|---|---|---|
graph |
Canonical entities, keys, and explicit relationships | Author entity identity once. Most relationships are inferred from model.entities: blocks; the graph carries non-default rules (rollup safety, SCD2 validity, cardinality overrides). |
models |
Grain, exposed entities, dimensions, times, measures | One file per warehouse table or mart. model.entities: declares which entities the model exposes; the planner infers join paths from co-declared FK references. |
metrics |
Governed access patterns — ratios, cumulative, derived, conversion | Author only access patterns that deserve a stable contract. Most kinds use direct named fields (measure:, numerator:, denominator:). kind: derived uses the expression AST. |
What the loader does for you
Setting package.namespace (or letting it default to
package.id) buys you a lot of YAML you never have to write. Author the
business meaning; the loader fills in identifiers and traversal.
- Key-derived IDs — every object's
idandnamecome frompackage.namespace + key. Override withas:only when you need to preserve a public reference. - Auto-created key dimensions — entity keys come from
graph.entities.<x>.key:; you don't authorkind: iddimensions. - Inferred relationships — any pair of entities co-declared in a
model.entities:block produces a defaultRelationshipConfig. Author explicit overrides ingraph.relationships:only when you need non-default rules. - Primary entity auto-detect — the primary entity is the one whose
graph.entities.<x>.modelpoints at this model. The model's grain is derived from that entity'skey:; do not author a separategrain:field. - Backing date/timestamp dimensions — auto-created from
times:blocks; thetimes:entry IS the temporal role. - Default aggregations — the accumulation class drives the allowed-aggregation set;
disallowed_aggregations:subtracts from it.
How this package model differs
Semantic Rails packages are not a BI facade, a metrics-only spec, a warehouse-native semantic object, or a new analyst query language. They are versioned runtime inputs: the same files drive discovery, validation, compile-only SQL, explain output, execution, examples, and package tests.
| Compared with | Different center of gravity | Semantic Rails package implication |
|---|---|---|
| dbt Semantic Layer / MetricFlow | Metrics and semantic models on top of dbt models | The package includes graph, policy, guided-builder metadata, examples, and tests as part of the runtime contract. |
| Cube | Semantic APIs, BI integrations, caching, and pre-aggregation workflows | The repo is smaller: it focuses on Query IR, planner diagnostics, compile/explain, and execution paths across nine warehouses. |
| Malloy | A language that combines semantic modeling and querying | Packages stay declarative; callers use API routes and Query IR rather than adopting a separate analysis language. |
| Snowflake Semantic Views | Warehouse-native semantic metadata and SQL/Cortex interfaces inside Snowflake | Packages live in the repo and can run locally on DuckDB or against Snowflake and seven other warehouses through CLI or native connector paths. |
Package metadata
schema_version: 1
package:
id: shop
namespace: shop
warehouse: duckdb
default_db: data/shop.duckdb
seed: { kind: sql_script, source: data/seed.sql }
schema_strict: true # opt-in v1 strict validation (recommended)
defaults:
dimension:
groupable: true
filterable: true
time:
timezone: UTC
supported_grains: [day, week, month, quarter, year]
schema_strict: true rejects legacy authoring forms (scalar
key:, redundant grain: alongside entities:,
etc.) and unknown values for typed fields like kind:. It may not
reject unknown top-level field names — a typo such as
dimentions: on a model: block can be silently
dropped. Always confirm the catalog contains every object you authored
(uv run semantic-rails catalog --package <id>) before
trusting a parse-clean result. Recommended for new packages.
Entity graph
graph:
entities:
order:
label: Order
key: [order_id] # list form is required (even for one column)
model: orders # which model declares this entity as primary
disallowed_names: [ord_id, orderid]
order_item:
label: Order item
key: [order_item_id]
model: order_items
customer:
label: Customer
key: [customer_id]
model: customers
disallowed_names: [cust_id, custid, customerid]
product:
label: Product
key: [product_id]
model: products
# Explicit overrides only — most relationships are inferred.
relationships:
customer_history_x_customer:
entities: [customer_history, customer]
cardinality: many_to_one
safety: requires_rewrite
temporal_validity:
valid_from: effective_from
valid_to: effective_to
rollup_safe:
forward: [sum, count]
reverse: []
disallowed_names: is the explicit anti-pattern guard. The validator
rejects any model that authors a column, dimension, or measure with a name in the
list and points at the canonical column or the expr: escape hatch for
intentional renames.
Model example
model:
id: orders
label: Orders
relation: shop_order
# grain derived from primary entity's key (graph's order.key) —
# do NOT author `grain:` alongside `entities:` (strict mode rejects it).
entities:
order: {} # primary (grain matches order.key)
customer: {} # FK reference; column = customer_id
product: {} # FK reference
times:
ordered_at:
label: Order time
column: ordered_at
kind: timestamp
class: event_time
supported_grains: [day, week, month, quarter, year]
default: true
dimensions:
status:
label: Order Status
kind: categorical
measures:
revenue_usd:
label: Revenue (USD)
kind: aggregate
expr: order_total_cents / 100.0
default_agg: sum
accumulation: { kind: flow }
value_type: currency
order_count:
label: Order Count
kind: entity_count
entity_key: order
accumulation: { kind: event }
value_type: count
model.entities: is required on every model and lists which entities the
model exposes. The primary entity is auto-detected; FK references infer relationships
with other models that expose the same entity. When the column name differs from the
entity's canonical key, override with expr:.
Metric example
metrics:
revenue_usd:
label: Revenue (USD)
description: Total revenue. Codified for stable reference.
kind: aggregate
measure: revenue_usd
value_type: currency
aov_usd:
label: Average order value (USD)
kind: ratio
numerator: revenue_usd
denominator: order_count
null_behavior: null_if_zero
value_type: currency
time: ordered_at
Most metric kinds use direct named fields (measure:,
numerator:, denominator:). kind: derived and
kind: conversion use the full expression AST.
value_type: is required on every metric.
Snowflake packages
Snowflake-backed packages use the same semantic model with
package.warehouse: snowflake and either a Snow CLI or native connector
block. Native deployments use env/file credential indirection and the optional
connector extra.
package:
id: shop_snowflake
namespace: shop
warehouse: snowflake
connection:
kind: snowflake_cli
name: shop_dev
package:
id: shop_native
namespace: shop
warehouse: snowflake
connection:
kind: snowflake_native
name: prod_native
options:
account_env: SNOWFLAKE_ACCOUNT
user_env: SNOWFLAKE_USER
password_env: SNOWFLAKE_PASSWORD
warehouse: COMPUTE_WH
query_tag: semantic-rails
Path-finding behavior
When a query asks for "X per Y" where X is a measure on one model and Y is a dimension on a different entity, the planner walks the inferred entity graph for the shortest path. This is automatic — you don't author join paths.
- "orders per customer": the orders model has both
order_idandcustomer_id; planner uses that table directly. - "items per customer": no single table has all three columns; planner walks
order_item → order → customervia inferred relationships. - "items per customer" when a denormalized table contains all three: planner prefers the direct table over the multi-hop path.
When to add a metric
Add a metric when an access pattern deserves a stable contract — when consumers
(dashboards, queries, agents) should not have to re-derive the computation. Don't
add a metric just because a measure exists; the catalog already lists measures as
queryable primitives. Reserve the metrics: block for ratios, governed
cumulative/rolling/PTD windows, and derived expressions over multiple measures.
Continue to Query Planner for how authored packages turn into validated plans and executable SQL.