Authoring Reference
The complete per-attribute reference for every YAML field an author writes in a Semantic Rails package under the v1 contract. For each component you get its purpose, fields, defaults, behavior, the errors it raises, and a runnable example.
The narrative authoring guide lives in Package Authoring. This page is meant to be scanned with Cmd + F while you write YAML. The API Reference documents the wire shape used by the runtime.
Measures vs metrics
Before any syntax, set the frame. Measures and metrics are two distinct surfaces in the catalog with two distinct purposes.
(start_arr + expansion_arr − churn_arr) / start_arr with
cohort and time-alignment conditions.
Implications:
- Not every measure needs a corresponding metric. Measures are queryable as primitives.
- The catalog lists
measuresandmetricsas distinct surfaces. - Measures do not auto-publish to metrics. Author the metric explicitly when an access pattern deserves a stable contract.
What the loader does for you
The v1 contract is terse because the loader fills in identifiers and traversal deterministically. Author the business meaning; the loader handles:
- Key-derived IDs. Every object's
idandnamecome frompackage.namespace + key. Override withas:only to preserve a public reference. - Auto-created key dimensions. From
graph.entities.<x>.key:for every model that exposes the entity. - Inferred relationships. Any pair of entities co-declared in
model.entities:produces a defaultRelationshipConfig. Author overrides ingraph.relationships:only when needed. - Primary entity auto-detect. The entity whose canonical key matches
model.grain:. No marker needed. - Backing time dimensions. Auto-created from each
times:entry'scolumn:. - Default aggregations. Accumulation class drives the allowed set; subtract with
disallowed_aggregations:.
The full table is in auto-derivations summary.
At a glance
A package has seven concrete components plus the expression AST. Each one auto-derives
most identifiers when package.namespace is set.
schema_strict.
2
graph.yml
Canonical entities, disallowed_names, explicit relationship overrides.
3
models/<id>.yml
Grain, model.entities:, dimensions, times:, measures.
4
metrics/**.yml
Governed access patterns. Direct named fields per kind; AST for the long tail.
5
segments/*.yml
Entity-bounded membership filters anchored on a basis metric.
6
policies.yml
Visibility, access, release labels, protected objects.
7
caveats.yml
Human-authored advisory context surfaced as query warnings.
+
Expression AST
The shape of every expr:, kind: derived body, and metric filter.
↑
Auto-derivations
Everything the loader fills in for you when you set namespace.
↑
Error codes
Every load- and plan-time error mapped to the field that caused it.
↑
Smallest valid package
The minimum YAML the loader accepts. A safe starting point.
Package directory layout
| Path | Required? | Purpose |
|---|---|---|
package.yml |
Required | Identity, warehouse target, connection, seeds, schema_strict flag. |
graph.yml |
Required | Canonical entities + explicit relationship overrides. |
models/**/*.yml |
At least one | One model per warehouse table or mart. Declares model.entities:, times:, dimensions, measures. |
metrics/**/*.yml |
Optional | Governed access patterns: ratio, cumulative, derived, conversion, etc. |
segments/*.yml |
Optional | Reusable filter sets anchored on an entity plus a basis metric. |
policies.yml |
Optional | Visibility, access, release labels, protected objects. |
caveats.yml |
Optional | Advisory business, definition, or data-quality context emitted as warnings when relevant. |
examples/*.yml |
Optional | Named example queries surfaced through discovery and inspect routes. |
tests/*.yml |
Optional | Package-local regression tests. The runner walks this directory directly. |
package.yml
Declares the schema version, identity, warehouse target, seed inputs (DuckDB) or
connection (every other warehouse), declared environments, the schema_strict
flag, and defaults inherited by dimensions, temporal roles, and measures.
Top-level fields
| Field | Type | Status | Behavior |
|---|---|---|---|
schema_version |
integer | Required | Only accepted value is 1; anything else raises INVALID_CONFIG. |
package.id |
string | Required | Stable identifier used by the CLI (--package <id>) and the catalog. |
package.namespace |
string | Auto-derived | Defaults to package.id. Drives every auto-derived ID. Setting it explicitly stabilizes IDs. |
package.warehouse |
enum | Required | One of duckdb, snowflake, postgres, bigquery, databricks, motherduck, ducklake, athena, clickhouse. |
package.default_db |
string | Conditional | Required for warehouse: duckdb. |
package.connection |
object | Conditional | Required for every warehouse except duckdb. See connection-mode. |
package.seed |
object | Conditional | Required for DuckDB. Sub-fields: kind, source, optional post_sql, null_strings. |
package.environments |
list<string> | Optional | Declared environment names policies can target. |
package.schema_strict |
boolean | Optional | Default false. true rejects legacy authoring forms (scalar key:, redundant grain: alongside entities:, etc.) and unknown values for typed fields like kind: — see strict-mode rules. Recommended for new packages. See also the scope note below. |
defaults |
map<string, map> | Optional | Per-row inheritable defaults: defaults.dimension, defaults.time, defaults.measure, defaults.relationship. |
Example
For remote warehouses (Snowflake, Postgres, BigQuery, Databricks, and the rest),
swap seed for connection
— see connection-mode decision.
schema_version: 1
package:
id: shop
namespace: shop
warehouse: duckdb
default_db: data/shop.duckdb
seed: { kind: sql_script, source: data/seed.sql }
schema_strict: true
defaults:
dimension: { groupable: true, filterable: true }
time:
timezone: UTC
supported_grains: [day, week, month, quarter, year]
graph.yml
The canonical entity registry. Each entry declares one business object the package exposes (customer, order, product, …), its key column, and any non-default relationships to other entities.
graph.entities:
| Field | Type | Status | Behavior |
|---|---|---|---|
key |
string | list<string> | Required | Canonical column name(s). String for single-key; list for compound (e.g. [customer_id, valid_from]). Single source for the entity's column — models bind to it automatically. |
label · description |
string | Optional | Display metadata. |
allowed_as_root |
boolean | Optional | Default true. Set false for snapshot/junction entities. |
synonyms |
list<string> | Optional | Alternate human terms for resolve / discover. Excessively broad terms produce AMBIGUOUS_ALIAS. |
disallowed_names |
list<string> | Optional | Explicit anti-pattern guard. Names that may never appear as a column/dimension/measure on any model. Validator rejects and suggests expr: for intentional renames. |
graph.relationships:
Most relationships are inferred from FK references in
model.entities: blocks. Author an explicit
graph.relationships: entry only when you need a non-default rule:
per-direction rollup safety, SCD2 temporal_validity:, custom
cardinality:, or allowed_directions: restriction.
Relationships are bidirectional. The entities: field
is an unordered pair; cardinality and rollup safety are expressed relative to
that pair.
| Field | Type | Status | Behavior |
|---|---|---|---|
graph.relationships.<name>.entities |
[string, string] | Required | The unordered pair of entities the relationship connects. |
graph.relationships.<name>.cardinality |
enum | Auto-derived |
one_to_one, many_to_one, one_to_many,
or many_to_many. many_to_one means "first entity
is many; second is one." The loader infers from key roles; override only
when needed.
|
graph.relationships.<name>.safety |
enum | Auto-derived |
safe, requires_rewrite, unsafe.
Defaults from cardinality (safe for 1:1/N:1;
requires_rewrite for 1:N/M:N).
unsafe joins are rejected outright.
|
graph.relationships.<name>.allowed_directions |
list<string> | Optional |
Default [forward, reverse]. Restricts which traversal directions
the planner may pick.
|
graph.relationships.<name>.rollup_safe |
{forward, reverse} | Optional |
Per-direction rollup safety. forward: lists aggregations safe
when rolling up from the first entity to the second; reverse:
lists those safe when rolling up the other way. Either may be empty.
Replaces the legacy per-model parent_entity: block.
|
graph.relationships.<name>.temporal_validity |
{valid_from, valid_to} | Required for SCD2 | Names the columns that bound an SCD2 record's validity. The planner automatically appends a validity-range predicate to the join condition. Without it, joins to history entities can return duplicate rows. |
target_key_role · source_key_role |
enum | Optional | primary, unique, foreign, natural. Disambiguates compound-key joins. |
target_key_type |
enum | Optional | Default primary. Either primary or identifier. |
path_preference |
integer | Optional | Default 100. Lower wins when multiple paths are valid — resolves AMBIGUOUS_PATH. |
Example
graph:
entities:
customer:
label: Customer
key: customer_id
synonyms: [buyer, account]
disallowed_names: [cust_id, custid, customerid]
order:
label: Order
key: order_id
disallowed_names: [ord_id, orderid]
order_item:
label: Order item
key: order_item_id
product:
label: Product
key: product_id
customer_history:
label: Customer history
key: [customer_id, valid_from]
allowed_as_root: false
relationships:
customer_history_x_customer:
entities: [customer_history, customer]
cardinality: many_to_one
safety: requires_rewrite
temporal_validity:
valid_from: effective_from
valid_to: effective_to
rollup_safe:
forward: [sum, count]
reverse: []
Models (models/<id>.yml)
A model declares one warehouse table or mart, the entities it exposes, and the
dimensions, temporal roles, and measures attached to its grain. The loader merges
every YAML file under models/** into the package config.
Top-level model fields
| Field | Type | Status | Behavior |
|---|---|---|---|
model.id |
string | Auto-derived | From the YAML filename when omitted. |
model.relation |
string | Required | Physical relation: warehouse table, view, or seed name. |
model.grain |
list<string> | Required | One row per this. Drives planner fanout safety. Compound grain supported. |
model.label · description |
string | Optional | Display metadata. |
model.defaults |
map | Optional | Per-model overrides for defaults.dimension, defaults.time, etc. Merge: package → model → per-row. |
model.entities:
Required on every model. Declares which entities the model exposes. Each entry
binds by default to the entity's canonical column from
graph.entities.<x>.key:; override per entity with
expr: when the model's column name differs.
The primary entity is auto-detected: whichever entity's canonical
key column matches model.grain:. For compound-grain models, all
entities whose keys are in the grain are co-primary.
| Field | Type | Status | Behavior |
|---|---|---|---|
model.entities.<entity>.expr |
string | Optional |
Override the column the entity binds to. Use when the model's column name
differs from graph.entities.<entity>.key:.
|
model.entities.<entity>.label |
string | Optional | Per-binding display override; rarely needed. |
model.entities.bridge |
boolean | Optional |
Block-level option (sibling of the entity entries). Default true.
Set false when this model should NOT be auto-used as a join
path between other entities. Queries within the model still work; the planner
just won't route through it.
|
bridge: false
Junction/mapping tables (m:n bridges), denormalized snapshots that shouldn't be
joined to live data, partial bridges where the data isn't complete enough for
arbitrary multi-hop traversal.
Examples
# Default binding — columns auto-resolved from graph
model:
id: orders
relation: shop_order
# grain derived from primary entity's key (graph's order.key)
entities:
order: {} # primary; column = graph's order_id
customer: {} # FK reference; column = graph's customer_id
product: {}
# Column rename via expr:
model:
id: order_renamed_columns
relation: shop_order
entities:
order: { expr: ord_id } # primary, column renamed
customer: { expr: cust_id } # FK, column renamed
# Junction table — not used as a join path. Compound grain comes from
# the bridge's two primary entity keys, not an authored `grain:` field.
model:
id: customer_segment_membership
relation: shop_customer_segment
entities:
bridge: false
customer: {}
segment: {}
times: (temporal roles)
The times: block key IS the temporal role. The backing date/timestamp
dimension is auto-created from column:. default: true
picks the implicit time axis when a query omits time.temporal_role.
| Field | Type | Status | Behavior |
|---|---|---|---|
times.<key>.column |
string | Required | The model column. Auto-creates a backing dimension. |
times.<key>.kind |
enum | Required | date or timestamp. |
times.<key>.class |
enum | Required | event_time, calendar_time, as_of_time, or state_time. Distinguishes flow / calendar / snapshot / SCD2-validity semantics. |
times.<key>.supported_grains |
list<string> | Optional | Default [day, week, month, quarter, year]. |
times.<key>.default |
boolean | Optional | Picks the implicit time axis. At most one default: true per model. |
times.<key>.timezone · label |
string | Optional | Default timezone UTC; label is display text. |
dimensions:
Behavioral attributes attached to the model's grain. The dimension's
kind drives type coercion in SQL lowering and the filter-operator menu
surfaced through build-options.
Key dimensions auto-create from graph.entities.<x>.key. Backing
date/timestamp dimensions auto-create from times:. You only author
behavioral dimensions here.
| Field | Type | Status | Behavior |
|---|---|---|---|
kind |
enum | Required | categorical, boolean, integer, continuous, number, percent, currency, date, timestamp. |
column |
string | Auto-derived | From the key. Override only if the physical column differs. |
domain |
string | object | Optional | Value-domain ID or inline {values: […]}. Powers valid-values; without it raises NO_VALID_VALUES_SOURCE. |
label · description |
string | Optional | Display metadata. |
filterable · groupable |
boolean | Optional | Default true. false hides the dimension from suggestions. |
measures:
Each measure declares the explicit triple
(kind, accumulation, value_type) plus the
aggregation expression. expr: and default_agg: live
directly on the measure — no expression: wrapper.
Measures do not auto-publish to metrics. To expose a measure as a governed metric, write the metric explicitly under metrics.
Core fields
| Field | Type | Status | Behavior |
|---|---|---|---|
measures.<key>.kind |
enum | Required |
aggregate · entity_count. Drives compiler
dispatch: entity_count uses
COUNT(DISTINCT entity_key); aggregate uses
default_agg.
|
measures.<key>.accumulation |
object | Required |
Always object form. kind is required and must be one of the
enum {flow, stock, event, population}. stock
additionally carries snapshot:
(start_of_period | end_of_period).
Examples: |
measures.<key>.value_type |
enum | Required |
currency, count, number,
percent, boolean. Required — explicit
declaration; no inferred default.
|
measures.<key>.expr |
string | expression AST | Required for aggregate |
The column or scalar expression aggregated by default_agg.
String forms parse as Python-like expressions; objects use the
expression AST.
|
measures.<key>.entity_key |
string | Required for entity_count |
The entity name (e.g. order, customer). The loader
resolves the entity's canonical column from graph.entities.<x>.key.
|
measures.<key>.default_agg |
string | Required for aggregate |
sum, avg, min, max,
count, count_distinct, median,
percentile, first_value, last_value.
The default aggregation the API uses if the caller doesn't specify one.
|
measures.<key>.disallowed_aggregations |
list<string> | Optional |
Subtract from the accumulation-derived allowed set. Effective allowed =
derived − disallowed. Use to remove specific aggregations that don't
make business sense (e.g. [median] on revenue).
|
measures.<key>.label · description |
string | Optional | Display metadata. |
measures.<key>.time |
string | Optional |
The temporal role this measure can be queried over. When omitted, inherits
from the model's default: true entry in times:.
|
measures.<key>.comparison_family · comparison_mode |
string | Optional | Drives same_query vs coordinated_queries selection. Load-bearing for the planner. |
measures.<key>.validity_windows |
list<{from, to, semantics}> | Optional |
Time ranges where the measure is meaningful. Queries outside the window
raise MEASURE_VALIDITY_BOUNDARY or surface as caveats depending
on cross_window_policy.
|
measures.<key>.cross_window_policy |
enum | Optional | Default caveat. strict turns out-of-window queries into errors. |
measures.<key>.external_discontinuities |
list<{from, to, what, magnitude_estimate_pct}> | Optional | Documents known external breaks. Surfaces in inspect as caveats. |
Full model example
model:
id: orders
label: Orders
relation: shop_order
# grain derived from primary entity's key — do NOT author alongside entities:
entities:
order: {} # primary
customer: {} # FK reference
product: {} # FK reference
times:
ordered_at:
column: ordered_at
kind: timestamp
class: event_time
default: true
dimensions:
status: { kind: categorical }
measures:
revenue_usd:
label: Revenue (USD)
kind: aggregate
expr: order_total_cents / 100.0
default_agg: sum
accumulation: { kind: flow }
value_type: currency
disallowed_aggregations: [median]
order_count:
kind: entity_count
entity_key: order
accumulation: { kind: event }
value_type: count
Metrics (metrics/**.yml)
Metrics codify governed access patterns. Each metric carries a kind:
that determines the required fields. Common kinds use direct named fields;
kind: derived and kind: conversion use the full
expression AST.
References to other measures/metrics use package-relative keys
(revenue_usd), not fully qualified IDs
(metric.shop.revenue_usd). The loader resolves keys.
Top-level fields (any kind)
| Field | Type | Status | Behavior |
|---|---|---|---|
metrics.<key>.kind |
enum | Required | One of: aggregate, ratio, cumulative, rolling, prior_period, period_to_date, semi_additive, derived, conversion. Selects required fields (see below). |
metrics.<key>.value_type |
enum | Required | currency, count, number, percent, boolean. The metric's output type may differ from the underlying measure. |
metrics.<key>.label · description · examples |
string / list | Optional | Display metadata. examples are sample query phrases the runtime can echo back in compile’s explain payload. |
metrics.<key>.as |
string | Optional | Override the auto-derived ID. Same-namespace only. Validator warns if redundant. |
metrics.<key>.time |
string | Optional | Default time axis. Metrics do NOT inherit the model's default_time. |
metrics.<key>.comparison_family · comparison_mode |
string | Optional | Load-bearing for query plan selection. Same as on measures. |
Required fields per kind
kind |
Direct named fields |
|---|---|
aggregate | measure: <key> |
ratio | numerator, denominator, null_behavior (default null_if_zero) |
cumulative | measure; optional window |
rolling | measure, window: {unit, value} |
prior_period | measure, period |
period_to_date | measure, period (resets per period; distinct from cumulative) |
semi_additive | measure; underlying measure must have accumulation: { kind: stock } |
derived | expression: <AST> — long-tail case |
conversion | expression: { kind: conversion, base, converted, entity, window, matching_mode } |
Aggregate, ratio, and time-series examples
These kinds use direct named fields (no expression AST). Anchors: aggregate, ratio, cumulative, rolling, prior_period, period_to_date.
cumulative query raises
CUMULATIVE_TIME_FILTER_UNSUPPORTED. Use
period_to_date for a bounded running total within a period.
metrics:
# kind: aggregate — publish a measure
revenue_usd:
label: Revenue (USD)
kind: aggregate
measure: revenue_usd
value_type: currency
# kind: ratio — direct numerator/denominator
aov_usd:
label: Average order value (USD)
kind: ratio
numerator: revenue_usd
denominator: order_count
null_behavior: null_if_zero
value_type: currency
time: ordered_at
# time-series kinds: same shape, different window/period field
cumulative_revenue_usd:
kind: cumulative
measure: revenue_usd
value_type: currency
revenue_28d:
kind: rolling
measure: revenue_usd
window: { unit: day, value: 28 }
value_type: currency
revenue_prior_month:
kind: prior_period
measure: revenue_usd
period: month
value_type: currency
revenue_mtd:
kind: period_to_date
measure: revenue_usd
period: month
value_type: currency
derived metric (AST escape hatch)
Use kind: derived for arbitrary formulas over multiple metrics. The
full expression AST is authored under expression:. References to
other metrics use bare keys; the loader resolves them to qualified IDs.
metrics:
margin_pct:
label: Gross Margin (%)
description: (revenue − cogs) / revenue
kind: derived
value_type: percent
expression:
kind: arithmetic
op: divide
left:
kind: arithmetic
op: subtract
left: { kind: metric, metric: revenue_usd }
right: { kind: metric, metric: cogs_usd }
right: { kind: metric, metric: revenue_usd }
conversion metric
Counts entities where a base event is followed by a
converted event within a bounded window. Authored as the AST.
entity, window: {unit, value},
matching_mode, and the two event sides
(base, converted) are all mandatory. Missing fields
raise CONVERSION_ENTITY_REQUIRED,
CONVERSION_WINDOW_REQUIRED, or
CONVERSION_MATCHING_MODE_REQUIRED.
metrics:
signup_to_first_order_7d:
label: Signup → first order (7d)
kind: conversion
value_type: count
expression:
kind: conversion
base: { metric: signup_count }
converted: { metric: order_count }
entity: customer
window: { unit: day, value: 7 }
matching_mode: first_converted_after_base
constant_properties: [region]
Segments (segments/*.yml)
Reusable membership filters anchored on an entity plus a basis metric. Previewed
by POST /api/v1/segment-preview, validated by
/segment-validate, and explained by /segment-explain.
| Field | Type | Status | Behavior |
|---|---|---|---|
entity |
string | Required | Entity grain at which segment membership is tested. |
basis_metric |
string | Required | Drives population size and preview rows. |
preview_dimensions |
list<string> | Optional | Dimensions surfaced in segment-preview. |
membership.where |
list<{field, op, value}> | Optional | Dimension-level predicates only — not expression AST. |
membership.metric_filters |
list<{expression, op, value}> | Optional | Expression-level filters on metric values. Full AST; typically {metric: …} or metric_predicate. |
membership.time · path_policy |
map | Optional | Pin the time axis or join path. |
Example
segments:
high_value_customers:
entity: customer
basis_metric: revenue_usd
preview_dimensions: [region]
membership:
metric_filters:
- expression:
metric: lifetime_revenue_usd
op: ">="
value: 1000
- expression:
kind: metric_predicate
input: { metric: order_count }
entity: customer
op: ">="
value: 3
scope_mode: entity_only
window: { unit: day, value: 90 }
op: "is_true"
Policies (policies.yml)
Optional. Policies live in a single top-level file under
semantic_policies:. Four kinds are recognized by the runtime:
package_release, object_visibility,
object_access, protected_object.
| Field | Type | Status | Behavior |
|---|---|---|---|
id |
string | Required | Stable identifier. |
kind |
enum | Required | package_release, object_visibility, object_access, protected_object. |
action |
string | Required | Paired with kind: visibility uses hidden/visible; access uses deny/redact. |
audiences · environments · object_ids |
list<string> | Optional | Filter by audience/environment, or target specific object IDs (empty = package-wide). |
config · rationale |
map / string | Optional | Kind-specific config (e.g. { mask: "***" }); rationale returned in POLICY_DENIED hints. |
Caveats (caveats.yml)
Optional. Caveats live under semantic_caveats: and surface
as advisory SEMANTIC_CAVEAT_APPLIED warnings on
validate, compile, and query. They never
change SQL, rows, access, discovery, or policy behavior.
| Field | Type | Status | Behavior |
|---|---|---|---|
id · kind · message |
string / enum | Required | kind is one of business_event, definition_change, or data_quality. |
object_ids · entity_values |
list | Required* | At least one targeting field is required. Entity-value caveats fire only when the matching value is filtered or the dimension is exposed, and only on the declared dimension — declare one row per dimension a value is commonly reached through. |
time.at · time.from/to |
date | Optional | Point or half-open range trigger. Time-bound caveats require explicit query time or an inferable comparison window. |
audiences · environments · severity · owner · references |
list / string | Optional | Optional context gates and metadata for the warning payload. severity: info adds definitional framing; warning (the default) means the matched window or slice itself is affected. |
Expression AST reference
Every expression:, expr:, and metric-filter body is a small
AST. The kind: field selects the node shape; the runtime rejects unknown
kinds with INVALID_EXPRESSION_AST.
Authors may write {measure: …} as shorthand for
{kind: measure, measure: …}, and {metric: …}
as shorthand for {kind: metric, metric: …}; both expand
identically.
kind |
Required fields | Allowed in |
|---|---|---|
measure |
measure (+ optional aggregation, temporal_role) |
measure expr, metric expr, segment metric_filters, query select |
aggregate |
measure, aggregation |
query select; rare in metric expr (use kind: aggregate metric instead) |
metric |
metric |
metric expr (derived/conversion), segment metric_filters, query select |
column |
column |
measure expr, scalar expressions inside metric expr |
literal |
value |
any scalar context |
arithmetic |
op, left, right (+ optional null_behavior) |
metric expr, measure expr |
comparison |
op, left, right |
boolean predicates, case.whens[*].when |
boolean |
op, args |
predicate trees |
call · case · in / not_in · nullif · date_add · between / not_between |
see SDK schema for kind-specific required fields. between takes expr, low, high (+ optional negated) and desugars at parse time to BooleanExpr(AND, [Comparison(>=), Comparison(<=)]). |
scalar expr inside measure / metric |
cumulative |
input (+ optional partition_by, window_scope) |
metric expr (top of cumulative-kind metric, when named fields aren't enough) |
rolling |
input, window: {unit, value} |
metric expr (top of rolling-kind metric) |
prior_period |
input, offset: {unit, value} |
metric expr (top of prior_period-kind metric) |
period_to_date |
input, period |
metric expr (top of period_to_date-kind metric) |
metric_predicate |
input, entity, op, scope_mode (in package context); optional value, time_grain, time_alignment, window |
segment metric_filters, query metric_filters |
scoped_aggregate |
measure (+ optional aggregation, temporal_role, predicates[*].metric, where, null_behavior) |
metric expr |
aggregate_if |
aggregation (one of count, count_distinct, sum, avg, min, max, median, percentile), condition; required value for all aggregations except count. Column refs inside condition / value must specify entity or table — there is no surrounding measure to inherit from. Compiles natively to COUNT_IF / SUM_IF on Snowflake and to portable <AGG>(CASE WHEN cond THEN value END) elsewhere. |
query select[*].expression or metric_filters[*].expression |
ratio |
numerator, denominator (+ optional null_behavior, default null_if_zero) |
metric expr (typically inside kind: derived) |
conversion |
base, converted, entity, window: {unit, value}, matching_mode |
metric expr (top of conversion-kind metric) |
metric_predicate alignment matrix
metric_predicate is the most field-heavy AST node; this matrix shows
which combinations of scope_mode and time_alignment are
legal. Other combinations raise INVALID_METRIC_PREDICATE.
scope_mode |
Allowed time_alignment |
May declare time_grain? |
Use case |
|---|---|---|---|
contextual |
same_query_period (only) |
Yes | "Customers whose revenue this period exceeds X." |
entity_only |
query_window or rolling_window_in_period |
No | "Customers with ≥ 3 orders in any 90-day window across history." |
Where each expression kind is allowed
The planner enforces context-specific subsets. Putting a metric AST in
where: is the most common authoring trap; where: only
accepts dimension-level predicates.
| Context | Allowed | Not allowed |
|---|---|---|
query.where |
Dimension-level predicates only: {field, op, value}, {field, op: in, values: [...]} — key is field, not dimension. |
Expression AST nodes — use metric_filters instead. |
query.metric_filters |
Full expression AST: metric, aggregate, aggregate_if, metric_predicate, scoped_aggregate, ratio, arithmetic, comparison, boolean, between. |
Bare column nodes (the planner cannot resolve a column outside a measure body). |
measure.expr |
Scalar AST: column, literal, arithmetic, comparison, boolean, call, case, nullif, date_add, in, between. |
metric, cumulative, rolling, prior_period, period_to_date, conversion, metric_predicate, scoped_aggregate, aggregate_if (aggregate_if is query-level only — declare a normal measure if you want to bake the conditional aggregate into the catalog). |
metric.expression (kind: derived / conversion) |
Everything except column at the top. |
Bare column at the top (use a measure body for raw column arithmetic). |
segment.membership.where[*] |
Dimension-level predicates only (same shape as query.where). |
Expression AST. |
segment.membership.metric_filters[*] |
Same as query.metric_filters, plus metric_predicate with required scope_mode. |
metric_predicate without scope_mode in package context. |
Auto-derivations summary
Every rule below fires only when the field is missing or empty — explicit
values are preserved unchanged. package.namespace (default
package.id) is the master switch.
| What | Derived from |
|---|---|
| Entity ID | entity.<ns>_<key> |
| Entity name | <ns>.<TitleKey>, e.g. shop.Customer |
| Key dimensions | graph.entities.<x>.key for every model that exposes the entity |
| Backing date/timestamp dimension | Each times: entry's column: |
| Primary entity | Entity whose canonical key matches model.grain |
| Inferred relationships | Pairs of entities co-declared in model.entities: (excluding bridge-disabled models) |
| Cardinality | Key roles: 1:1, N:1, 1:N, M:N |
| Join safety | safe for 1:1 / N:1; requires_rewrite for 1:N / M:N |
| Allowed aggregations | kind + accumulation; subtract with disallowed_aggregations |
| Measure ID | measure.<ns>.<key> |
| Metric ID | metric.<ns>.<key>; override with as: |
| Default time on measures | Model's default: true entry in times:; metrics do NOT inherit |
Strict-mode validation rules
When package.schema_strict: true is set, the loader rejects every
legacy authoring form. This is the recommended setting for new packages.
Removed boilerplate (use auto-derivation instead)
id:on semantic objects — auto-derived fromnamespace + key; useas:only to preserve a public reference.name:on objects — auto-derived from key.kind: iddimensions — key dimensions auto-create fromgraph.entities.<x>.key.- Duplicate date/timestamp dimension when
times:covers the same column — thetimes:entry creates the backing dimension. - Authored
model.grain:separate from primary — derived from primary entity key. - Top-level
temporal_role.<id>:separate registration — thetimes:block key IS the role. - Model-level
entity:(singular) +keys.foreign:+joins:blocks — usemodel.entities:; overrides ingraph.relationships:. - Per-model
parent_entity:block — rollup safety lives ongraph.relationships.<name>.rollup_safe. entity.key_roles(authored) — auto-derived from key role pairs.
Renames
agg_function:→default_agg:on measures.accumulation: stock+ siblingsnapshot_policy:→ nestedaccumulation: { kind: stock, snapshot: end_of_period }.expression:wrapper aroundexpr:/default_agg:onkind: aggregatemeasures → un-nest. Writeexpr:anddefault_agg:directly.- Buried
expression:AST on metric kinds with direct named fields (aggregate,ratio,cumulative,rolling,prior_period,period_to_date,semi_additive) → use direct fields. AST authoring is reserved forkind: derivedandkind: conversion.
Required additions
accumulationalways object form;kindmust be in{flow, stock, event, population}.- Every authored metric must declare
value_type:explicitly. The metric output type may differ from the underlying measure. - At most one
times:entry per model may havedefault: true. - No name on a model (column, dimension, measure, entity binding) may appear in any graph entity's
disallowed_names:.
Dropped (no replacement)
auto_publish:on measures — auto-publish is gone. Author explicit metrics.package.default_account_id— vestigial, no consumer.dimension.preferred_filter_ops— metadata-only, no planner gating.measure.clock_variants,comparison_peers,preferred_companion_metrics— metadata-only on measures.comparison_familyandcomparison_modestay (load-bearing).preferred_companion_metricsis allowed on metrics as advisory governance metadata; companion-metric relationships are too volatile to lock in at the measure layer.topics:on any object — no validation, no scaling pattern.policy.kind: plan_constraint— runtime no-op. The four real kinds arepackage_release,object_visibility,object_access,protected_object.
Warnings (advisory, not blocking):
as: used where the resulting ID matches the auto-computed one (use is unnecessary).
schema_strict does and does not catch
schema_strict: true rejects the legacy authoring forms listed
above and unknown values for typed fields (e.g. kind: catagorical
is caught with a list of valid kinds). It may not reject unknown top-level
field names on every object — a typo such as
dimentions: on a model: block can be silently
dropped along with the entire mistyped block. Always cross-check authoring
outcomes with the check command and confirm the catalog
contains every object you authored before trusting a parse-clean result.
Authoring contract version
Every package must declare schema_version: 1 at the top of
package.yml. Any other value raises INVALID_CONFIG at
load time.
schema_version is a stamp, not a feature toggle. The number bumps on a
breaking authoring change — not on additive features. The runtime no longer
has a version-dispatch path; 1 is the only contract.
Connection-mode decision
Every warehouse except duckdb declares a
package.connection block whose kind selects the
connector. For warehouse: snowflake, kind selects
between two paths.
| Use case | Pick | Why |
|---|---|---|
Local development, snow CLI configured |
kind: snowflake_cli |
No env vars needed. The runtime shells out to
snow connection test and reuses the CLI's named profile.
|
| Production / container deployments | kind: snowflake_native |
Pulls account, user, and password from environment variables (or files via
password_file_env). Supports query_tag for
attribution. Requires the connector extra:
uv sync --extra snowflake-native.
|
Both kinds require name. snowflake_cli reads it as the CLI
profile name; snowflake_native reads it as a logical label surfaced in
explain and operational metadata.
Native connection options
account_env·user_env·password_env— env var names holding the credential. Literal credentials are rejected.password_file_env— env var holding a file path; the file's content is the password.warehouse·role·database·schema— per-session defaults applied at connection time.query_tag— attached to every query; surfaces in Snowflake query history.
Other warehouses
The remaining warehouses each expose a single native kind:
postgres_native, bigquery_native,
databricks_native, motherduck_native,
ducklake_native, athena_native,
clickhouse_native. The same secrets convention applies everywhere:
credentials are never literals: they arrive through env-var indirection
(*_env names) or file paths (*_file), while non-secret
locators like host or database may be literal or
env-indirected.
package:
warehouse: postgres
connection:
kind: postgres_native
host_env: SR_POSTGRES_HOST
port: 5432
database: sr_jaffle
schema: public
user_env: SR_POSTGRES_USER
password_env: SR_POSTGRES_PASSWORD
The full option list for each warehouse lives in the per-warehouse
*_CONNECTION_OPTIONS tuples in
semantic_rails/dialects.py, with canonical env-var names in
.env.example at the repo root. Redshift is coming next; until then
see docs/ADDING_A_DIALECT.md.
Error code → root cause
Every error code raised at load or plan time maps to a small number of authoring
fields. Use this table when an error appears in validate-config,
compile, or explain. The full inventory and request /
response shapes live in the
API reference.
| Code | Meaning | Typical authoring fix |
|---|---|---|
INVALID_CONFIG |
YAML failed authoring-contract validation |
Check schema_version,
package.warehouse, and
package.seed /
package.connection.
|
STRICT_LEGACY_FORM_REJECTED |
Strict mode rejected a legacy authoring form | See strict-mode rules for the full table of rejected forms and replacements. |
DISALLOWED_NAME |
Column / dimension / measure used a name in graph.entities.<x>.disallowed_names |
Use the canonical column from graph.entities.<x>.key or declare an expr: rename in model.entities:. |
INVALID_ACCUMULATION_KIND |
accumulation.kind not in {flow, stock, event, population} |
Use one of the four canonical values. |
METRIC_VALUE_TYPE_REQUIRED |
Authored metric is missing value_type: |
Declare value_type: explicitly. The metric's output type may differ from the underlying measure. |
MULTIPLE_DEFAULT_TIME |
More than one times: entry on a model has default: true |
Pick one default per model. |
OBJECT_NOT_FOUND |
Referenced ID does not exist | Verify the auto-derived form (auto-derivations). |
INVALID_EXPRESSION_AST |
Expression node missing required fields or unknown kind |
Cross-reference the expression matrix. |
AMBIGUOUS_ALIAS |
Human token resolves to multiple objects | Tighten graph.entities.<key>.synonyms or use stable IDs. |
AMBIGUOUS_PATH · PATH_NOT_FOUND |
Join path ambiguous or missing | Pin path_preference / query.path_policy; add an FK in model.entities:. |
FANOUT_UNSAFE · ROLLUP_UNSAFE |
Join or rollup smears values | Check cardinality, safety, and per-direction rollup_safe. |
UNSUPPORTED_AGGREGATION |
Aggregation not in the measure's allowed list | Adjust disallowed_aggregations or pick another. |
INVALID_TEMPORAL_ROLE · INCOMPATIBLE_TEMPORAL_ROLE |
Time grain or role rejected | Check times.<key>.supported_grains and metric time:. |
MEASURE_VALIDITY_BOUNDARY |
Query reaches outside validity window | Adjust the query or revise validity_windows / cross_window_policy. |
MIXED_GRAIN_INVALID · REWRITE_NOT_SUPPORTED |
Mixed grains cannot be safely combined | Split the query, align time grains, or expose pre-aggregated measures. |
CUMULATIVE_TIME_FILTER_UNSUPPORTED |
Time filter inside a cumulative query | Use period_to_date or move filter to partition_by. |
CONVERSION_* (entity / window / matching_mode required, not supported) |
Conversion metric malformed | See conversion metric kind: entity, window: {unit, value}, matching_mode all required. |
PREDICATE_* · INVALID_METRIC_PREDICATE |
metric_predicate malformed or context-incompatible |
See the metric_predicate row + alignment matrix in expression AST. Package context requires scope_mode. |
DUPLICATE_OUTPUT_ALIAS · INVALID_ORDER_BY |
Query select / order-by structural issue | Rename with as: ...; reference a select alias or grouped dimension. |
NO_VALID_VALUES_SOURCE |
valid-values on a dimension without a domain |
Add dimensions.<key>.domain. |
POLICY_DENIED · OUT_OF_SCOPE |
Policy rejected request, or request outside semantic-query planning | Check policies.yml + audience headers. The runtime’s scope classifier runs inline on every discover / plan call and returns out_of_scope / low_relevance blocks with closest matches when applicable. |
INVALID_QUERY · QUERY_EXECUTION_ERROR |
Query structurally invalid or warehouse rejected SQL | Confirm against the planner contract; read explain for warehouse error. |
Smallest valid package
One entity, one model with one dimension / time / measure, no metrics, no segment, no policy. Extend section-by-section by referring back to the relevant component above.
# package.yml
schema_version: 1
package:
id: shop_min
namespace: shop_min
warehouse: duckdb
default_db: data/shop_min.duckdb
seed: { kind: sql_script, source: data/seed.sql }
schema_strict: true
# graph.yml
graph:
entities:
order:
label: Order
key: [order_id] # list form is required
model: orders # which model declares this entity as primary
# models/orders.yml
model:
id: orders # required; the mapping key alone is not enough
label: Orders
relation: shop_min_order
# grain: is derived from the primary entity's key — do NOT author it
# alongside `entities:` (strict mode rejects the combination).
entities:
order: {} # primary; grain comes from graph's order.key
times:
ordered_at:
column: ordered_at
kind: timestamp
class: event_time
default: true
dimensions:
status: { kind: categorical }
measures:
order_count:
kind: entity_count
entity_key: order
accumulation: { kind: event }
value_type: count
# metrics/order_count.yml — at least one metric recipe is required
metrics:
order_count:
label: Order Count
description: Count of orders. Codified for stable reference.
kind: aggregate
measure: order_count
value_type: count
model.id:— required on every model. The mapping key alone is not sufficient.graph.entities.<x>.key:— list form ([order_id]), even for a single column.graph.entities.<x>.model:— binds the entity to the model that declares it as primary.- Do not author
model.grain:whenentities:is present — the loader derives grain from the primary entity's key. Authoring both is rejected.
uv run semantic-rails parse-config --path /path/to/shop_min returns
ok: true; the catalog exposes
measure.shop_min.order_count and
metric.shop_min.order_count. A package must declare at least one
metric recipe to load — the metrics/ file above is not optional.
measure_class, computed catalog fields, runtime decoration)
live in semantic_rails/schema.py and are out of scope here.
Next: pair this reference with the narrative Package Authoring guide, Query Planner for how authored objects flow into plans, or the API Reference for the wire shape on every route.