Semantic Rails authoring reference

Measures vs metrics

Before any syntax, set the frame. Measures and metrics are two distinct surfaces in the catalog with two distinct purposes.

measures Primitives — building blocks Columnar facts (sum, count, count-distinct). Queryable flexibly by any reachable entity, time grain, dimension, or aggregation function within the measure's allowed set. ARR is a measure: sum of monthly contributions.

metrics Governed access patterns Named, stable contracts that codify a specific use of one or more measures with conditions, filters, time alignment, or composition. NRR is a metric: (start_arr + expansion_arr − churn_arr) / start_arr with cohort and time-alignment conditions.

Implications:

Not every measure needs a corresponding metric. Measures are queryable as primitives.
The catalog lists measures and metrics as distinct surfaces.
Measures do not auto-publish to metrics. Author the metric explicitly when an access pattern deserves a stable contract.

What the loader does for you

The v1 contract is terse because the loader fills in identifiers and traversal deterministically. Author the business meaning; the loader handles:

Key-derived IDs. Every object's id and name come from package.namespace + key. Override with as: only to preserve a public reference.
Auto-created key dimensions. From graph.entities.<x>.key: for every model that exposes the entity.
Inferred relationships. Any pair of entities co-declared in model.entities: produces a default RelationshipConfig. Author overrides in graph.relationships: only when needed.
Primary entity auto-detect. The entity whose canonical key matches model.grain:. No marker needed.
Backing time dimensions. Auto-created from each times: entry's column:.
Default aggregations. Accumulation class drives the allowed set; subtract with disallowed_aggregations:.

The full table is in auto-derivations summary.

At a glance

A package has seven concrete components plus the expression AST. Each one auto-derives most identifiers when package.namespace is set.

1 package.yml Schema version, identity, warehouse, connection, seeds, defaults, schema_strict. 2 graph.yml Canonical entities, disallowed_names, explicit relationship overrides. 3 models/<id>.yml Grain, model.entities:, dimensions, times:, measures. 4 metrics/**.yml Governed access patterns. Direct named fields per kind; AST for the long tail. 5 segments/*.yml Entity-bounded membership filters anchored on a basis metric. 6 policies.yml Visibility, access, release labels, protected objects. 7 caveats.yml Human-authored advisory context surfaced as query warnings. + Expression AST The shape of every expr:, kind: derived body, and metric filter. ↑ Auto-derivations Everything the loader fills in for you when you set namespace. ↑ Error codes Every load- and plan-time error mapped to the field that caused it. ↑ Smallest valid package The minimum YAML the loader accepts. A safe starting point.

Package directory layout

Path	Required?	Purpose
`package.yml`	Required	Identity, warehouse target, connection, seeds, `schema_strict` flag.
`graph.yml`	Required	Canonical entities + explicit relationship overrides.
`models/*/.yml`	At least one	One model per warehouse table or mart. Declares `model.entities:`, `times:`, dimensions, measures.
`metrics/*/.yml`	Optional	Governed access patterns: ratio, cumulative, derived, conversion, etc.
`segments/*.yml`	Optional	Reusable filter sets anchored on an entity plus a basis metric.
`policies.yml`	Optional	Visibility, access, release labels, protected objects.
`caveats.yml`	Optional	Advisory business, definition, or data-quality context emitted as warnings when relevant.
`examples/*.yml`	Optional	Named example queries surfaced through discovery and inspect routes.
`tests/*.yml`	Optional	Package-local regression tests. The runner walks this directory directly.

Component 1 / 6

package.yml

Declares the schema version, identity, warehouse target, seed inputs (DuckDB) or connection (every other warehouse), declared environments, the schema_strict flag, and defaults inherited by dimensions, temporal roles, and measures.

When to edit Once at package creation, then rarely — whenever you add a deployment environment, switch warehouses, or update seed assets.

Top-level fields

Field	Type	Status	Behavior
`schema_version`	integer	Required	Only accepted value is `1`; anything else raises `INVALID_CONFIG`.
`package.id`	string	Required	Stable identifier used by the CLI (`--package <id>`) and the catalog.
`package.namespace`	string	Auto-derived	Defaults to `package.id`. Drives every auto-derived ID. Setting it explicitly stabilizes IDs.
`package.warehouse`	enum	Required	One of `duckdb`, `snowflake`, `postgres`, `bigquery`, `databricks`, `motherduck`, `ducklake`, `athena`, `clickhouse`.
`package.default_db`	string	Conditional	Required for `warehouse: duckdb`.
`package.connection`	object	Conditional	Required for every warehouse except `duckdb`. See connection-mode.
`package.seed`	object	Conditional	Required for DuckDB. Sub-fields: `kind`, `source`, optional `post_sql`, `null_strings`.
`package.environments`	list<string>	Optional	Declared environment names policies can target.
`package.schema_strict`	boolean	Optional	Default `false`. `true` rejects legacy authoring forms (scalar `key:`, redundant `grain:` alongside `entities:`, etc.) and unknown values for typed fields like `kind:` — see strict-mode rules. Recommended for new packages. See also the scope note below.
`defaults`	map<string, map>	Optional	Per-row inheritable defaults: `defaults.dimension`, `defaults.time`, `defaults.measure`, `defaults.relationship`.

Example

For remote warehouses (Snowflake, Postgres, BigQuery, Databricks, and the rest), swap seed for connection — see connection-mode decision.

package.yml — DuckDB

schema_version: 1

package:
  id: shop
  namespace: shop
  warehouse: duckdb
  default_db: data/shop.duckdb
  seed: { kind: sql_script, source: data/seed.sql }
  schema_strict: true

defaults:
  dimension: { groupable: true, filterable: true }
  time:
    timezone: UTC
    supported_grains: [day, week, month, quarter, year]

Component 2 / 6

graph.yml

The canonical entity registry. Each entry declares one business object the package exposes (customer, order, product, …), its key column, and any non-default relationships to other entities.

`graph.entities:`

Field	Type	Status	Behavior
`key`	string \| list<string>	Required	Canonical column name(s). String for single-key; list for compound (e.g. `[customer_id, valid_from]`). Single source for the entity's column — models bind to it automatically.
`label` · `description`	string	Optional	Display metadata.
`allowed_as_root`	boolean	Optional	Default `true`. Set `false` for snapshot/junction entities.
`synonyms`	list<string>	Optional	Alternate human terms for `resolve` / `discover`. Excessively broad terms produce `AMBIGUOUS_ALIAS`.
`disallowed_names`	list<string>	Optional	Explicit anti-pattern guard. Names that may never appear as a column/dimension/measure on any model. Validator rejects and suggests `expr:` for intentional renames.

`graph.relationships:`

Most relationships are inferred from FK references in model.entities: blocks. Author an explicit graph.relationships: entry only when you need a non-default rule: per-direction rollup safety, SCD2 temporal_validity:, custom cardinality:, or allowed_directions: restriction.

Relationships are bidirectional. The entities: field is an unordered pair; cardinality and rollup safety are expressed relative to that pair.

Field	Type	Status	Behavior
`graph.relationships.<name>.entities`	[string, string]	Required	The unordered pair of entities the relationship connects.
`graph.relationships.<name>.cardinality`	enum	Auto-derived	`one_to_one`, `many_to_one`, `one_to_many`, or `many_to_many`. `many_to_one` means "first entity is many; second is one." The loader infers from key roles; override only when needed.
`graph.relationships.<name>.safety`	enum	Auto-derived	`safe`, `requires_rewrite`, `unsafe`. Defaults from cardinality (`safe` for 1:1/N:1; `requires_rewrite` for 1:N/M:N). `unsafe` joins are rejected outright.
`graph.relationships.<name>.allowed_directions`	list<string>	Optional	Default `[forward, reverse]`. Restricts which traversal directions the planner may pick.
`graph.relationships.<name>.rollup_safe`	{forward, reverse}	Optional	Per-direction rollup safety. `forward:` lists aggregations safe when rolling up from the first entity to the second; `reverse:` lists those safe when rolling up the other way. Either may be empty. Replaces the legacy per-model `parent_entity:` block.
`graph.relationships.<name>.temporal_validity`	{valid_from, valid_to}	Required for SCD2	Names the columns that bound an SCD2 record's validity. The planner automatically appends a validity-range predicate to the join condition. Without it, joins to history entities can return duplicate rows.
`target_key_role` · `source_key_role`	enum	Optional	`primary`, `unique`, `foreign`, `natural`. Disambiguates compound-key joins.
`target_key_type`	enum	Optional	Default `primary`. Either `primary` or `identifier`.
`path_preference`	integer	Optional	Default `100`. Lower wins when multiple paths are valid — resolves `AMBIGUOUS_PATH`.

Example

graph.yml — shop

graph:
  entities:
    customer:
      label: Customer
      key: customer_id
      synonyms: [buyer, account]
      disallowed_names: [cust_id, custid, customerid]
    order:
      label: Order
      key: order_id
      disallowed_names: [ord_id, orderid]
    order_item:
      label: Order item
      key: order_item_id
    product:
      label: Product
      key: product_id
    customer_history:
      label: Customer history
      key: [customer_id, valid_from]
      allowed_as_root: false

  relationships:
    customer_history_x_customer:
      entities: [customer_history, customer]
      cardinality: many_to_one
      safety: requires_rewrite
      temporal_validity:
        valid_from: effective_from
        valid_to: effective_to
      rollup_safe:
        forward: [sum, count]
        reverse: []

Component 3 / 6

Models (models/<id>.yml)

A model declares one warehouse table or mart, the entities it exposes, and the dimensions, temporal roles, and measures attached to its grain. The loader merges every YAML file under models/** into the package config.

Top-level model fields

Field	Type	Status	Behavior
`model.id`	string	Auto-derived	From the YAML filename when omitted.
`model.relation`	string	Required	Physical relation: warehouse table, view, or seed name.
`model.grain`	list<string>	Required	One row per this. Drives planner fanout safety. Compound grain supported.
`model.label` · `description`	string	Optional	Display metadata.
`model.defaults`	map	Optional	Per-model overrides for `defaults.dimension`, `defaults.time`, etc. Merge: package → model → per-row.
`model.default_variant`	string	Optional	Names the base physical variant, usually the transaction table.
`model.variants`	map<string, map>	Optional	Alternate physical tables for the same semantic model. Non-transaction variants normalize into exact aggregate relations the planner can route to.

`model.entities:`

Required on every model. Declares which entities the model exposes. Each entry binds by default to the entity's canonical column from graph.entities.<x>.key:; override per entity with expr: when the model's column name differs.

The primary entity is auto-detected: whichever entity's canonical key column matches model.grain:. For compound-grain models, all entities whose keys are in the grain are co-primary.

Field	Type	Status	Behavior
`model.entities.<entity>.expr`	string	Optional	Override the column the entity binds to. Use when the model's column name differs from `graph.entities.<entity>.key:`.
`model.entities.<entity>.label`	string	Optional	Per-binding display override; rarely needed.
`model.entities.bridge`	boolean	Optional	Block-level option (sibling of the entity entries). Default `true`. Set `false` when this model should NOT be auto-used as a join path between other entities. Queries within the model still work; the planner just won't route through it.

Use cases for bridge: false Junction/mapping tables (m:n bridges), denormalized snapshots that shouldn't be joined to live data, partial bridges where the data isn't complete enough for arbitrary multi-hop traversal.

Examples

model.entities patterns

# Default binding — columns auto-resolved from graph
model:
  id: orders
  relation: shop_order
  # grain derived from primary entity's key (graph's order.key)
  entities:
    order: {}                      # primary; column = graph's order_id
    customer: {}                   # FK reference; column = graph's customer_id
    product: {}

# Column rename via expr:
model:
  id: order_renamed_columns
  relation: shop_order
  entities:
    order: { expr: ord_id }        # primary, column renamed
    customer: { expr: cust_id }    # FK, column renamed

# Junction table — not used as a join path. Compound grain comes from
# the bridge's two primary entity keys, not an authored `grain:` field.
model:
  id: customer_segment_membership
  relation: shop_customer_segment
  entities:
    bridge: false
    customer: {}
    segment: {}

`times:` (temporal roles)

The times: block key IS the temporal role. The backing date/timestamp dimension is auto-created from column:. default: true picks the implicit time axis when a query omits time.temporal_role.

Field	Type	Status	Behavior
`times.<key>.column`	string	Required	The model column. Auto-creates a backing dimension.
`times.<key>.kind`	enum	Required	`date` or `timestamp`.
`times.<key>.class`	enum	Required	`event_time`, `calendar_time`, `as_of_time`, or `state_time`. Distinguishes flow / calendar / snapshot / SCD2-validity semantics.
`times.<key>.supported_grains`	list<string>	Optional	Default `[day, week, month, quarter, year]`.
`times.<key>.default`	boolean	Optional	Picks the implicit time axis. At most one `default: true` per model.
`times.<key>.timezone` · `label`	string	Optional	Default timezone `UTC`; label is display text.

`dimensions:`

Behavioral attributes attached to the model's grain. The dimension's kind drives type coercion in SQL lowering and the filter-operator menu surfaced through build-options.

Key dimensions auto-create from graph.entities.<x>.key. Backing date/timestamp dimensions auto-create from times:. You only author behavioral dimensions here.

Field	Type	Status	Behavior
`kind`	enum	Required	`categorical`, `boolean`, `integer`, `continuous`, `number`, `percent`, `currency`, `date`, `timestamp`.
`column`	string	Auto-derived	From the key. Override only if the physical column differs.
`domain`	string \| object	Optional	Value-domain ID or inline `{values: […]}`. Powers `valid-values`; without it raises `NO_VALID_VALUES_SOURCE`.
`label` · `description`	string	Optional	Display metadata.
`filterable` · `groupable`	boolean	Optional	Default `true`. `false` hides the dimension from suggestions.

`measures:`

Each measure declares the explicit triple (kind, accumulation, value_type) plus the aggregation expression. expr: and default_agg: live directly on the measure — no expression: wrapper.

Measures do not auto-publish to metrics. To expose a measure as a governed metric, write the metric explicitly under metrics.

Core fields

Field	Type	Status	Behavior
`measures.<key>.kind`	enum	Required	`aggregate` · `entity_count`. Drives compiler dispatch: `entity_count` uses `COUNT(DISTINCT entity_key)`; `aggregate` uses `default_agg`.
`measures.<key>.accumulation`	object	Required	Always object form. `kind` is required and must be one of the enum `{flow, stock, event, population}`. `stock` additionally carries `snapshot:` (`start_of_period` \| `end_of_period`). Examples: `{ kind: flow }`, `{ kind: event }`, `{ kind: stock, snapshot: end_of_period }`.
`measures.<key>.value_type`	enum	Required	`currency`, `count`, `number`, `percent`, `boolean`. Required — explicit declaration; no inferred default.
`measures.<key>.expr`	string \| expression AST	Required for `aggregate`	The column or scalar expression aggregated by `default_agg`. String forms parse as Python-like expressions; objects use the expression AST.
`measures.<key>.entity_key`	string	Required for `entity_count`	The entity name (e.g. `order`, `customer`). The loader resolves the entity's canonical column from `graph.entities.<x>.key`.
`measures.<key>.default_agg`	string	Required for `aggregate`	`sum`, `avg`, `min`, `max`, `count`, `count_distinct`, `median`, `percentile`, `first_value`, `last_value`. The default aggregation the API uses if the caller doesn't specify one.
`measures.<key>.rollup`	enum	Optional	Physical-variant routing hint. Use `additive` for measures that can be safely summed from a lower-grain rollup, or `precomputed` when the variant stores the final value. Unsupported values fall back to raw.
`measures.<key>.disallowed_aggregations`	list<string>	Optional	Subtract from the accumulation-derived allowed set. Effective allowed = derived − disallowed. Use to remove specific aggregations that don't make business sense (e.g. `[median]` on revenue).
`measures.<key>.label` · `description`	string	Optional	Display metadata.
`measures.<key>.time`	string	Optional	The temporal role this measure can be queried over. When omitted, inherits from the model's `default: true` entry in `times:`.
`measures.<key>.comparison_family` · `comparison_mode`	string	Optional	Drives `same_query` vs `coordinated_queries` selection. Load-bearing for the planner.
`measures.<key>.validity_windows`	list<{from, to, semantics}>	Optional	Time ranges where the measure is meaningful. Queries outside the window raise `MEASURE_VALIDITY_BOUNDARY` or surface as caveats depending on `cross_window_policy`.
`measures.<key>.cross_window_policy`	enum	Optional	Default `caveat`. `strict` turns out-of-window queries into errors.
`measures.<key>.external_discontinuities`	list<{from, to, what, magnitude_estimate_pct}>	Optional	Documents known external breaks. Surfaces in `inspect` as caveats.

`model.variants:` physical rollups

Use model.variants: when one semantic model has multiple physical tables at different time grains: transaction, daily, weekly, monthly, or another exact rollup. The model keeps one definition for entities, dimensions, times, and measures. Each variant declares only the physical differences.

The loader turns eligible non-transaction variants into aggregate relations. During compile, the planner can route compatible measure leaves to the rollup relation while preserving the same public measure and dimension IDs.

models/orders.yml — physical variants

model:
  id: orders
  relation: order_fact
  default_variant: tx
  grain: [order_id]
  entities:
    order: {}
    customer: {}
    store: {}
  times:
    ordered_at:
      column: ordered_at
      kind: timestamp
      class: event_time
      default: true
  dimensions:
    store_id: { kind: categorical }
    customer_id: { kind: categorical }
  measures:
    revenue_usd:
      kind: aggregate
      expr: order_total_cents / 100.0
      default_agg: sum
      rollup: additive
      accumulation: { kind: flow }
      value_type: currency

  variants:
    tx:
      relation: order_fact
      grain: { time: transaction, entities: [order] }
      covers: inherit_all

    monthly:
      relation: order_monthly
      grain: { time: month, entities: [store] }
      time: { role: ordered_at, column: month_start }
      excludes:
        dimensions: [customer_id]
      columns:
        store_id: store_id
        revenue_usd: revenue_usd
      eligible_time_grains: [month, quarter, year]
      selection: { priority: 50 }
      equivalence: { kind: exact }
      source: default

Field	Type	Status	Behavior
`variants.<key>.relation`	string	Required for rollups	The physical rollup table or view. Transaction variants usually point at `model.relation`.
`grain.time`	string	Required for rollups	`transaction`, `day`, `week`, `month`, `quarter`, `year`, or another configured grain.
`time.role` · `time.column`	object	Optional	The semantic temporal role and physical bucket column. Defaults to the model's default time role and column.
`columns`	map	Optional	Maps model measure/dimension keys to physical rollup columns. A measure without a mapped column is not routable.
`excludes`	object	Optional	Lists `entities`, `dimensions`, or `measures` missing from this physical variant.
`eligible_time_grains`	list<string>	Optional	Query grains this rollup can answer. Defaults to the variant grain and coarser grains.
`selection.priority`	integer	Optional	Tie-breaker when multiple exact rollups can answer the same leaf.
`equivalence.kind`	enum	Required for routing	Must be `exact` for automatic routing.
`source`	string	MVP constraint	Must be `default`. Different warehouse/source routing is future work.

Routing rule The planner uses a variant only when the query time grain, selected measures, grouped dimensions, and filtered dimensions are covered exactly. Otherwise it scans the raw model relation.

Full model example

models/orders.yml

model:
  id: orders
  label: Orders
  relation: shop_order
  # grain derived from primary entity's key — do NOT author alongside entities:
  entities:
    order: {}                      # primary
    customer: {}                   # FK reference
    product: {}                    # FK reference
  times:
    ordered_at:
      column: ordered_at
      kind: timestamp
      class: event_time
      default: true
  dimensions:
    status: { kind: categorical }
  measures:
    revenue_usd:
      label: Revenue (USD)
      kind: aggregate
      expr: order_total_cents / 100.0
      default_agg: sum
      accumulation: { kind: flow }
      value_type: currency
      disallowed_aggregations: [median]
    order_count:
      kind: entity_count
      entity_key: order
      accumulation: { kind: event }
      value_type: count

Component 4 / 6

Metrics (metrics/**.yml)

Metrics codify governed access patterns. Each metric carries a kind: that determines the required fields. Common kinds use direct named fields; kind: derived and kind: conversion use the full expression AST.

References to other measures/metrics use package-relative keys (revenue_usd), not fully qualified IDs (metric.shop.revenue_usd). The loader resolves keys.

Top-level fields (any kind)

Field	Type	Status	Behavior
`metrics.<key>.kind`	enum	Required	One of: `aggregate`, `ratio`, `cumulative`, `rolling`, `prior_period`, `period_to_date`, `semi_additive`, `derived`, `conversion`. Selects required fields (see below).
`metrics.<key>.value_type`	enum	Required	`currency`, `count`, `number`, `percent`, `boolean`. The metric's output type may differ from the underlying measure.
`metrics.<key>.label` · `description` · `examples`	string / list	Optional	Display metadata. `examples` are sample query phrases the runtime can echo back in `compile`’s explain payload.
`metrics.<key>.as`	string	Optional	Override the auto-derived ID. Same-namespace only. Validator warns if redundant.
`metrics.<key>.time`	string	Optional	Default time axis. Metrics do NOT inherit the model's `default_time`.
`metrics.<key>.comparison_family` · `comparison_mode`	string	Optional	Load-bearing for query plan selection. Same as on measures.

Required fields per `kind`

`kind`	Direct named fields
`aggregate`	`measure: <key>`
`ratio`	`numerator`, `denominator`, `null_behavior` (default `null_if_zero`)
`cumulative`	`measure`; optional `window`
`rolling`	`measure`, `window: {unit, value}`
`prior_period`	`measure`, `period`
`period_to_date`	`measure`, `period` (resets per period; distinct from `cumulative`)
`semi_additive`	`measure`; underlying measure must have `accumulation: { kind: stock }`
`derived`	`expression: <AST>` — long-tail case
`conversion`	`expression: { kind: conversion, base, converted, entity, window, matching_mode }`

Aggregate, ratio, and time-series examples

These kinds use direct named fields (no expression AST). Anchors: aggregate, ratio, cumulative, rolling, prior_period, period_to_date.

Common mistake Filtering on time inside a cumulative query raises CUMULATIVE_TIME_FILTER_UNSUPPORTED. Use period_to_date for a bounded running total within a period.

metrics — direct-named-field kinds

metrics:
  # kind: aggregate — publish a measure
  revenue_usd:
    label: Revenue (USD)
    kind: aggregate
    measure: revenue_usd
    value_type: currency

  # kind: ratio — direct numerator/denominator
  aov_usd:
    label: Average order value (USD)
    kind: ratio
    numerator: revenue_usd
    denominator: order_count
    null_behavior: null_if_zero
    value_type: currency
    time: ordered_at

  # time-series kinds: same shape, different window/period field
  cumulative_revenue_usd:
    kind: cumulative
    measure: revenue_usd
    value_type: currency

  revenue_28d:
    kind: rolling
    measure: revenue_usd
    window: { unit: day, value: 28 }
    value_type: currency

  revenue_prior_month:
    kind: prior_period
    measure: revenue_usd
    period: month
    value_type: currency

  revenue_mtd:
    kind: period_to_date
    measure: revenue_usd
    period: month
    value_type: currency

`derived` metric (AST escape hatch)

Use kind: derived for arbitrary formulas over multiple metrics. The full expression AST is authored under expression:. References to other metrics use bare keys; the loader resolves them to qualified IDs.

metrics/margin_pct.yml — derived

metrics:
  margin_pct:
    label: Gross Margin (%)
    description: (revenue − cogs) / revenue
    kind: derived
    value_type: percent
    expression:
      kind: arithmetic
      op: divide
      left:
        kind: arithmetic
        op: subtract
        left:  { kind: metric, metric: revenue_usd }
        right: { kind: metric, metric: cogs_usd }
      right: { kind: metric, metric: revenue_usd }

`conversion` metric

Counts entities where a base event is followed by a converted event within a bounded window. Authored as the AST.

Required fields entity, window: {unit, value}, matching_mode, and the two event sides (base, converted) are all mandatory. Missing fields raise CONVERSION_ENTITY_REQUIRED, CONVERSION_WINDOW_REQUIRED, or CONVERSION_MATCHING_MODE_REQUIRED.

metrics/signup_to_first_order_7d.yml — conversion

metrics:
  signup_to_first_order_7d:
    label: Signup → first order (7d)
    kind: conversion
    value_type: count
    expression:
      kind: conversion
      base:      { metric: signup_count }
      converted: { metric: order_count }
      entity: customer
      window: { unit: day, value: 7 }
      matching_mode: first_converted_after_base
      constant_properties: [region]

Component 5 / 6

Segments (segments/*.yml)

Reusable membership filters anchored on an entity plus a basis metric. Previewed by POST /api/v1/segment-preview, validated by /segment-validate, and explained by /segment-explain.

Field	Type	Status	Behavior
`entity`	string	Required	Entity grain at which segment membership is tested.
`basis_metric`	string	Required	Drives population size and preview rows.
`preview_dimensions`	list<string>	Optional	Dimensions surfaced in `segment-preview`.
`membership.where`	list<{field, op, value}>	Optional	Dimension-level predicates only — not expression AST.
`membership.metric_filters`	list<{expression, op, value}>	Optional	Expression-level filters on metric values. Full AST; typically `{metric: …}` or `metric_predicate`.
`membership.time` · `path_policy`	map	Optional	Pin the time axis or join path.

Example

segments/high_value_customers.yml

segments:
  high_value_customers:
    entity: customer
    basis_metric: revenue_usd
    preview_dimensions: [region]
    membership:
      metric_filters:
       - expression:
            metric: lifetime_revenue_usd
          op: ">="
          value: 1000
       - expression:
            kind: metric_predicate
            input: { metric: order_count }
            entity: customer
            op: ">="
            value: 3
            scope_mode: entity_only
            window: { unit: day, value: 90 }
          op: "is_true"

Component 6 / 7

Policies (policies.yml)

Optional. Policies live in a single top-level file under semantic_policies:. Four kinds are recognized by the runtime: package_release, object_visibility, object_access, protected_object.

Field	Type	Status	Behavior
`id`	string	Required	Stable identifier.
`kind`	enum	Required	`package_release`, `object_visibility`, `object_access`, `protected_object`.
`action`	string	Required	Paired with `kind`: `visibility` uses `hidden`/`visible`; `access` uses `deny`/`redact`.
`audiences` · `environments` · `object_ids`	list<string>	Optional	Filter by audience/environment, or target specific object IDs (empty = package-wide).
`config` · `rationale`	map / string	Optional	Kind-specific config (e.g. `{ mask: "***" }`); rationale returned in `POLICY_DENIED` hints.

Component 7 / 7

Caveats (caveats.yml)

Optional. Caveats live under semantic_caveats: and surface as advisory SEMANTIC_CAVEAT_APPLIED warnings on validate, compile, and query. They never change SQL, rows, access, discovery, or policy behavior.

Field	Type	Status	Behavior
`id` · `kind` · `message`	string / enum	Required	`kind` is one of `business_event`, `definition_change`, or `data_quality`.
`object_ids` · `entity_values`	list	Required*	At least one targeting field is required. Entity-value caveats fire only when the matching value is filtered or the dimension is exposed, and only on the declared dimension — declare one row per dimension a value is commonly reached through.
`time.at` · `time.from/to`	date	Optional	Point or half-open range trigger. Time-bound caveats require explicit query time or an inferable comparison window.
`audiences` · `environments` · `severity` · `owner` · `references`	list / string	Optional	Optional context gates and metadata for the warning payload. `severity: info` adds definitional framing; `warning` (the default) means the matched window or slice itself is affected.

Expression AST reference

Every expression:, expr:, and metric-filter body is a small AST. The kind: field selects the node shape; the runtime rejects unknown kinds with INVALID_EXPRESSION_AST.

Authors may write {measure: …} as shorthand for {kind: measure, measure: …}, and {metric: …} as shorthand for {kind: metric, metric: …}; both expand identically.

`kind`	Required fields	Allowed in
`measure`	`measure` (+ optional `aggregation`, `temporal_role`)	measure expr, metric expr, segment metric_filters, query select
`aggregate`	`measure`, `aggregation`	query select; rare in metric expr (use `kind: aggregate` metric instead)
`metric`	`metric`	metric expr (derived/conversion), segment metric_filters, query select
`column`	`column`	measure expr, scalar expressions inside metric expr
`literal`	`value`	any scalar context
`arithmetic`	`op`, `left`, `right` (+ optional `null_behavior`)	metric expr, measure expr
`comparison`	`op`, `left`, `right`	boolean predicates, `case.whens[*].when`
`boolean`	`op`, `args`	predicate trees
`call` · `case` · `in` / `not_in` · `nullif` · `date_add` · `between` / `not_between`	see SDK schema for kind-specific required fields. `between` takes `expr`, `low`, `high` (+ optional `negated`) and desugars at parse time to `BooleanExpr(AND, [Comparison(>=), Comparison(<=)])`.	scalar expr inside measure / metric
`cumulative`	`input` (+ optional `partition_by`, `window_scope`)	metric expr (top of `cumulative`-kind metric, when named fields aren't enough)
`rolling`	`input`, `window: {unit, value}`	metric expr (top of `rolling`-kind metric)
`prior_period`	`input`, `offset: {unit, value}`	metric expr (top of `prior_period`-kind metric)
`period_to_date`	`input`, `period`	metric expr (top of `period_to_date`-kind metric)
`metric_predicate`	`input`, `entity`, `op`, `scope_mode` (in package context); optional `value`, `time_grain`, `time_alignment`, `window`	segment metric_filters, query metric_filters
`scoped_aggregate`	`measure` (+ optional `aggregation`, `temporal_role`, `predicates[*].metric`, `where`, `null_behavior`)	metric expr
`aggregate_if`	`aggregation` (one of `count`, `count_distinct`, `sum`, `avg`, `min`, `max`, `median`, `percentile`), `condition`; required `value` for all aggregations except `count`. Column refs inside `condition` / `value` must specify `entity` or `table` — there is no surrounding measure to inherit from. Compiles natively to `COUNT_IF` / `SUM_IF` on Snowflake and to portable `<AGG>(CASE WHEN cond THEN value END)` elsewhere.	query `select[].expression` or `metric_filters[].expression`
`ratio`	`numerator`, `denominator` (+ optional `null_behavior`, default `null_if_zero`)	metric expr (typically inside `kind: derived`)
`conversion`	`base`, `converted`, `entity`, `window: {unit, value}`, `matching_mode`	metric expr (top of `conversion`-kind metric)

`metric_predicate` alignment matrix

metric_predicate is the most field-heavy AST node; this matrix shows which combinations of scope_mode and time_alignment are legal. Other combinations raise INVALID_METRIC_PREDICATE.

`scope_mode`	Allowed `time_alignment`	May declare `time_grain`?	Use case
`contextual`	`same_query_period` (only)	Yes	"Customers whose revenue this period exceeds X."
`entity_only`	`query_window` or `rolling_window_in_period`	No	"Customers with ≥ 3 orders in any 90-day window across history."

Where each expression kind is allowed

The planner enforces context-specific subsets. Putting a metric AST in where: is the most common authoring trap; where: only accepts dimension-level predicates.

Context	Allowed	Not allowed
`query.where`	Dimension-level predicates only: `{field, op, value}`, `{field, op: in, values: [...]}` — key is `field`, not `dimension`.	Expression AST nodes — use `metric_filters` instead.
`query.metric_filters`	Full expression AST: `metric`, `aggregate`, `aggregate_if`, `metric_predicate`, `scoped_aggregate`, `ratio`, `arithmetic`, `comparison`, `boolean`, `between`.	Bare `column` nodes (the planner cannot resolve a column outside a measure body).
`measure.expr`	Scalar AST: `column`, `literal`, `arithmetic`, `comparison`, `boolean`, `call`, `case`, `nullif`, `date_add`, `in`, `between`.	`metric`, `cumulative`, `rolling`, `prior_period`, `period_to_date`, `conversion`, `metric_predicate`, `scoped_aggregate`, `aggregate_if` (`aggregate_if` is query-level only — declare a normal measure if you want to bake the conditional aggregate into the catalog).
`metric.expression` (kind: derived / conversion)	Everything except `column` at the top.	Bare `column` at the top (use a measure body for raw column arithmetic).
`segment.membership.where[*]`	Dimension-level predicates only (same shape as `query.where`).	Expression AST.
`segment.membership.metric_filters[*]`	Same as `query.metric_filters`, plus `metric_predicate` with required `scope_mode`.	`metric_predicate` without `scope_mode` in package context.

Auto-derivations summary

Every rule below fires only when the field is missing or empty — explicit values are preserved unchanged. package.namespace (default package.id) is the master switch.

What	Derived from
Entity ID	`entity.<ns>_<key>`
Entity name	`<ns>.<TitleKey>`, e.g. `shop.Customer`
Key dimensions	`graph.entities.<x>.key` for every model that exposes the entity
Backing date/timestamp dimension	Each `times:` entry's `column:`
Primary entity	Entity whose canonical key matches `model.grain`
Inferred relationships	Pairs of entities co-declared in `model.entities:` (excluding bridge-disabled models)
Cardinality	Key roles: `1:1`, `N:1`, `1:N`, `M:N`
Join safety	`safe` for 1:1 / N:1; `requires_rewrite` for 1:N / M:N
Allowed aggregations	`kind` + `accumulation`; subtract with `disallowed_aggregations`
Aggregate relations	Eligible non-transaction `model.variants:` entries
Measure ID	`measure.<ns>.<key>`
Metric ID	`metric.<ns>.<key>`; override with `as:`
Default time on measures	Model's `default: true` entry in `times:`; metrics do NOT inherit

Strict-mode validation rules

When package.schema_strict: true is set, the loader rejects every legacy authoring form. This is the recommended setting for new packages.

Removed boilerplate (use auto-derivation instead)

id: on semantic objects — auto-derived from namespace + key; use as: only to preserve a public reference.
name: on objects — auto-derived from key.
kind: id dimensions — key dimensions auto-create from graph.entities.<x>.key.
Duplicate date/timestamp dimension when times: covers the same column — the times: entry creates the backing dimension.
Authored model.grain: separate from primary — derived from primary entity key.
Top-level temporal_role.<id>: separate registration — the times: block key IS the role.
Model-level entity: (singular) + keys.foreign: + joins: blocks — use model.entities:; overrides in graph.relationships:.
Per-model parent_entity: block — rollup safety lives on graph.relationships.<name>.rollup_safe.
entity.key_roles (authored) — auto-derived from key role pairs.

Renames

agg_function: → default_agg: on measures.
accumulation: stock + sibling snapshot_policy: → nested accumulation: { kind: stock, snapshot: end_of_period }.
expression: wrapper around expr: / default_agg: on kind: aggregate measures → un-nest. Write expr: and default_agg: directly.
Buried expression: AST on metric kinds with direct named fields (aggregate, ratio, cumulative, rolling, prior_period, period_to_date, semi_additive) → use direct fields. AST authoring is reserved for kind: derived and kind: conversion.

Required additions

accumulation always object form; kind must be in {flow, stock, event, population}.
Every authored metric must declare value_type: explicitly. The metric output type may differ from the underlying measure.
At most one times: entry per model may have default: true.
No name on a model (column, dimension, measure, entity binding) may appear in any graph entity's disallowed_names:.

Dropped (no replacement)

auto_publish: on measures — auto-publish is gone. Author explicit metrics.
package.default_account_id — vestigial, no consumer.
dimension.preferred_filter_ops — metadata-only, no planner gating.
measure.clock_variants, comparison_peers, preferred_companion_metrics — metadata-only on measures. comparison_family and comparison_mode stay (load-bearing). preferred_companion_metrics is allowed on metrics as advisory governance metadata; companion-metric relationships are too volatile to lock in at the measure layer.
topics: on any object — no validation, no scaling pattern.
policy.kind: plan_constraint — runtime no-op. The four real kinds are package_release, object_visibility, object_access, protected_object.

Warnings (advisory, not blocking): as: used where the resulting ID matches the auto-computed one (use is unnecessary).

What schema_strict does and does not catch

schema_strict: true rejects the legacy authoring forms listed above and unknown values for typed fields (e.g. kind: catagorical is caught with a list of valid kinds). It may not reject unknown top-level field names on every object — a typo such as dimentions: on a model: block can be silently dropped along with the entire mistyped block. Always cross-check authoring outcomes with the check command and confirm the catalog contains every object you authored before trusting a parse-clean result.

Authoring contract version

Every package must declare schema_version: 1 at the top of package.yml. Any other value raises INVALID_CONFIG at load time.

schema_version is a stamp, not a feature toggle. The number bumps on a breaking authoring change — not on additive features. The runtime no longer has a version-dispatch path; 1 is the only contract.

Connection-mode decision

Every warehouse except duckdb declares a package.connection block whose kind selects the connector. For warehouse: snowflake, kind selects between two paths.

Use case	Pick	Why
Local development, `snow` CLI configured	`kind: snowflake_cli`	No env vars needed. The runtime shells out to `snow connection test` and reuses the CLI's named profile.
Production / container deployments	`kind: snowflake_native`	Pulls account, user, and password from environment variables (or files via `password_file_env`). Supports `query_tag` for attribution. Requires the connector extra: `uv sync --extra snowflake-native`.

Both kinds require name. snowflake_cli reads it as the CLI profile name; snowflake_native reads it as a logical label surfaced in explain and operational metadata.

Native connection options

account_env · user_env · password_env — env var names holding the credential. Literal credentials are rejected.
password_file_env — env var holding a file path; the file's content is the password.
warehouse · role · database · schema — per-session defaults applied at connection time.
query_tag — attached to every query; surfaces in Snowflake query history.

Other warehouses

The remaining warehouses each expose a single native kind: postgres_native, bigquery_native, databricks_native, motherduck_native, ducklake_native, athena_native, clickhouse_native. The same secrets convention applies everywhere: credentials are never literals: they arrive through env-var indirection (*_env names) or file paths (*_file), while non-secret locators like host or database may be literal or env-indirected.

package.yml: Postgres connection

package:
  warehouse: postgres
  connection:
    kind: postgres_native
    host_env: SR_POSTGRES_HOST
    port: 5432
    database: sr_jaffle
    schema: public
    user_env: SR_POSTGRES_USER
    password_env: SR_POSTGRES_PASSWORD

The full option list for each warehouse lives in the per-warehouse *_CONNECTION_OPTIONS tuples in semantic_rails/dialects.py, with canonical env-var names in .env.example at the repo root. Redshift is coming next; until then see docs/ADDING_A_DIALECT.md.

Error code → root cause

Every error code raised at load or plan time maps to a small number of authoring fields. Use this table when an error appears in validate-config, compile, or explain. The full inventory and request / response shapes live in the API reference.

Code	Meaning	Typical authoring fix
`INVALID_CONFIG`	YAML failed authoring-contract validation	Check `schema_version`, `package.warehouse`, and `package.seed` / `package.connection`.
`STRICT_LEGACY_FORM_REJECTED`	Strict mode rejected a legacy authoring form	See strict-mode rules for the full table of rejected forms and replacements.
`DISALLOWED_NAME`	Column / dimension / measure used a name in `graph.entities.<x>.disallowed_names`	Use the canonical column from `graph.entities.<x>.key` or declare an `expr:` rename in `model.entities:`.
`INVALID_ACCUMULATION_KIND`	`accumulation.kind` not in `{flow, stock, event, population}`	Use one of the four canonical values.
`METRIC_VALUE_TYPE_REQUIRED`	Authored metric is missing `value_type:`	Declare `value_type:` explicitly. The metric's output type may differ from the underlying measure.
`MULTIPLE_DEFAULT_TIME`	More than one `times:` entry on a model has `default: true`	Pick one default per model.
`OBJECT_NOT_FOUND`	Referenced ID does not exist	Verify the auto-derived form (auto-derivations).
`INVALID_EXPRESSION_AST`	Expression node missing required fields or unknown `kind`	Cross-reference the expression matrix.
`AMBIGUOUS_ALIAS`	Human token resolves to multiple objects	Tighten `graph.entities.<key>.synonyms` or use stable IDs.
`AMBIGUOUS_PATH` · `PATH_NOT_FOUND`	Join path ambiguous or missing	Pin `path_preference` / `query.path_policy`; add an FK in `model.entities:`.
`FANOUT_UNSAFE` · `ROLLUP_UNSAFE`	Join or rollup smears values	Check `cardinality`, `safety`, and per-direction `rollup_safe`.
`UNSUPPORTED_AGGREGATION`	Aggregation not in the measure's allowed list	Adjust `disallowed_aggregations` or pick another.
`INVALID_TEMPORAL_ROLE` · `INCOMPATIBLE_TEMPORAL_ROLE`	Time grain or role rejected	Check `times.<key>.supported_grains` and metric `time:`.
`MEASURE_VALIDITY_BOUNDARY`	Query reaches outside validity window	Adjust the query or revise `validity_windows` / `cross_window_policy`.
`MIXED_GRAIN_INVALID` · `REWRITE_NOT_SUPPORTED`	Mixed grains cannot be safely combined	Split the query, align time grains, or expose pre-aggregated measures.
`CUMULATIVE_TIME_FILTER_UNSUPPORTED`	Time filter inside a cumulative query	Use `period_to_date` or move filter to `partition_by`.
`CONVERSION_*` (entity / window / matching_mode required, not supported)	Conversion metric malformed	See `conversion` metric kind: `entity`, `window: {unit, value}`, `matching_mode` all required.
`PREDICATE_*` · `INVALID_METRIC_PREDICATE`	`metric_predicate` malformed or context-incompatible	See the `metric_predicate` row + alignment matrix in expression AST. Package context requires `scope_mode`.
`DUPLICATE_OUTPUT_ALIAS` · `INVALID_ORDER_BY`	Query select / order-by structural issue	Rename with `as: ...`; reference a select alias or grouped dimension.
`NO_VALID_VALUES_SOURCE`	`valid-values` on a dimension without a domain	Add `dimensions.<key>.domain`.
`POLICY_DENIED` · `OUT_OF_SCOPE`	Policy rejected request, or request outside semantic-query planning	Check `policies.yml` + audience headers. The runtime’s scope classifier runs inline on every `discover` / `plan` call and returns `out_of_scope` / `low_relevance` blocks with closest matches when applicable.
`INVALID_QUERY` · `QUERY_EXECUTION_ERROR`	Query structurally invalid or warehouse rejected SQL	Confirm against the planner contract; read `explain` for warehouse error.

Smallest valid package

One entity, one model with one dimension / time / measure, no metrics, no segment, no policy. Extend section-by-section by referring back to the relevant component above.

shop_min — four files

# package.yml
schema_version: 1
package:
  id: shop_min
  namespace: shop_min
  warehouse: duckdb
  default_db: data/shop_min.duckdb
  seed: { kind: sql_script, source: data/seed.sql }
  schema_strict: true

# graph.yml
graph:
  entities:
    order:
      label: Order
      key: [order_id]                 # list form is required
      model: orders                   # which model declares this entity as primary

# models/orders.yml
model:
  id: orders                          # required; the mapping key alone is not enough
  label: Orders
  relation: shop_min_order
  # grain: is derived from the primary entity's key — do NOT author it
  # alongside `entities:` (strict mode rejects the combination).
  entities:
    order: {}                         # primary; grain comes from graph's order.key
  times:
    ordered_at:
      column: ordered_at
      kind: timestamp
      class: event_time
      default: true
  dimensions:
    status: { kind: categorical }
  measures:
    order_count:
      kind: entity_count
      entity_key: order
      accumulation: { kind: event }
      value_type: count

# metrics/order_count.yml — at least one metric recipe is required
metrics:
  order_count:
    label: Order Count
    description: Count of orders. Codified for stable reference.
    kind: aggregate
    measure: order_count
    value_type: count

Four fields the loader requires that older drafts omitted

model.id: — required on every model. The mapping key alone is not sufficient.
graph.entities.<x>.key: — list form ([order_id]), even for a single column.
graph.entities.<x>.model: — binds the entity to the model that declares it as primary.
Do not author model.grain: when entities: is present — the loader derives grain from the primary entity's key. Authoring both is rejected.

uv run semantic-rails parse-config --path /path/to/shop_min returns ok: true; the catalog exposes measure.shop_min.order_count and metric.shop_min.order_count. A package must declare at least one metric recipe to load — the metrics/ file above is not optional.

This page documents only what an author types under the v1 contract. Internal-only fields (measure_class, computed catalog fields, runtime decoration) live in semantic_rails/schema.py and are out of scope here.

Next: pair this reference with the narrative Package Authoring guide, Query Planner for how authored objects flow into plans, or the API Reference for the wire shape on every route.

Authoring Reference

Top-level fields

Example

graph.entities:

graph.relationships:

Example

Top-level model fields

model.entities:

Examples

times: (temporal roles)

dimensions:

measures:

Core fields

model.variants: physical rollups

Full model example

Top-level fields (any kind)

Required fields per kind

Aggregate, ratio, and time-series examples

derived metric (AST escape hatch)

conversion metric

Example

Caveats (caveats.yml)

metric_predicate alignment matrix

Removed boilerplate (use auto-derivation instead)

Renames

Required additions

Dropped (no replacement)

Native connection options

Other warehouses

`graph.entities:`

`graph.relationships:`

`model.entities:`

`times:` (temporal roles)

`dimensions:`

`measures:`

`model.variants:` physical rollups

Required fields per `kind`

`derived` metric (AST escape hatch)

`conversion` metric

`metric_predicate` alignment matrix