Package authoring in Semantic Rails

Measures vs metrics — the conceptual split

Before any syntax, understand the two layers. They are distinct surfaces in the catalog, with distinct purposes.

Measures are primitives. A measure is a columnar fact (a sum, a count, a count-distinct) that the API can query flexibly — by any reachable entity, time grain, dimension breakdown, or aggregation function within the measure's allowed set. Measures are the building blocks. ARR is a measure: sum of monthly ARR contributions, queryable by customer, by segment, by month.

Metrics are governed access patterns. A metric is a named, stable contract that codifies a specific use of one or more measures — with conditions, filters, time alignment, or composition (ratio, cumulative, derived). Metrics exist for governance and clarity. NRR is a metric: (start_arr + expansion_arr − churn_arr) / start_arr with specific cohort and time-alignment conditions.

Implications:

Not every measure needs a corresponding metric. Many measures are queryable as primitives.
The catalog lists measures and metrics as distinct surfaces. Both are queryable; only metrics carry stable governance.
Measures do not auto-publish to metrics. Authors who want a measure exposed as a governed metric write it explicitly in the metrics: block.

Directory shape

package layout

configs/semantic_rails/<package>/
  package.yml          # identity, warehouse, connection, seeds
  graph.yml            # canonical entities + explicit relationships
  policies.yml         # optional — visibility / access / release
  caveats.yml          # optional — advisory interpretation context
  models/              # one file per warehouse table or mart
    ...
  metrics/             # optional — governed access patterns
    ...
  segments/            # optional — entity-bounded membership filters
  examples/            # optional — runnable example queries
  tests/               # optional — package-local regression tests

The loader merges every YAML file under models/**, metrics/**, and segments/* into a single PackageConfig.

Conceptual model

Layer	Role	Design guidance
`graph`	Canonical entities, keys, and explicit relationships	Author entity identity once. Most relationships are inferred from `model.entities:` blocks; the graph carries non-default rules (rollup safety, SCD2 validity, cardinality overrides).
`models`	Grain, exposed entities, dimensions, times, measures	One file per warehouse table or mart. `model.entities:` declares which entities the model exposes; the planner infers join paths from co-declared FK references.
`metrics`	Governed access patterns — ratios, cumulative, derived, conversion	Author only access patterns that deserve a stable contract. Most kinds use direct named fields (`measure:`, `numerator:`, `denominator:`). `kind: derived` uses the expression AST.

What the loader does for you

Setting package.namespace (or letting it default to package.id) buys you a lot of YAML you never have to write. Author the business meaning; the loader fills in identifiers and traversal.

Key-derived IDs — every object's id and name come from package.namespace + key. Override with as: only when you need to preserve a public reference.
Auto-created key dimensions — entity keys come from graph.entities.<x>.key:; you don't author kind: id dimensions.
Inferred relationships — any pair of entities co-declared in a model.entities: block produces a default RelationshipConfig. Author explicit overrides in graph.relationships: only when you need non-default rules.
Primary entity auto-detect — the primary entity is the one whose graph.entities.<x>.model points at this model. The model's grain is derived from that entity's key:; do not author a separate grain: field.
Backing date/timestamp dimensions — auto-created from times: blocks; the times: entry IS the temporal role.
Default aggregations — the accumulation class drives the allowed-aggregation set; disallowed_aggregations: subtracts from it.

How this package model differs

Semantic Rails packages are not a BI facade, a metrics-only spec, a warehouse-native semantic object, or a new analyst query language. They are versioned runtime inputs: the same files drive discovery, validation, compile-only SQL, explain output, execution, examples, and package tests.

Compared with	Different center of gravity	Semantic Rails package implication
dbt Semantic Layer / MetricFlow	Metrics and semantic models on top of dbt models	The package includes graph, policy, guided-builder metadata, examples, and tests as part of the runtime contract.
Cube	Semantic APIs, BI integrations, caching, and pre-aggregation workflows	The repo is smaller: it focuses on Query IR, planner diagnostics, compile/explain, and execution paths across nine warehouses.
Malloy	A language that combines semantic modeling and querying	Packages stay declarative; callers use API routes and Query IR rather than adopting a separate analysis language.
Snowflake Semantic Views	Warehouse-native semantic metadata and SQL/Cortex interfaces inside Snowflake	Packages live in the repo and can run locally on DuckDB or against Snowflake and seven other warehouses through CLI or native connector paths.

Package metadata

package.yml

schema_version: 1

package:
  id: shop
  namespace: shop
  warehouse: duckdb
  default_db: data/shop.duckdb
  seed: { kind: sql_script, source: data/seed.sql }
  schema_strict: true             # opt-in v1 strict validation (recommended)

defaults:
  dimension:
    groupable: true
    filterable: true
  time:
    timezone: UTC
    supported_grains: [day, week, month, quarter, year]

schema_strict: true rejects legacy authoring forms (scalar key:, redundant grain: alongside entities:, etc.) and unknown values for typed fields like kind:. It may not reject unknown top-level field names — a typo such as dimentions: on a model: block can be silently dropped. Always confirm the catalog contains every object you authored (uv run semantic-rails catalog --package <id>) before trusting a parse-clean result. Recommended for new packages.

Entity graph

graph.yml

graph:
  entities:
    order:
      label: Order
      key: [order_id]                  # list form is required (even for one column)
      model: orders                    # which model declares this entity as primary
      disallowed_names: [ord_id, orderid]
    order_item:
      label: Order item
      key: [order_item_id]
      model: order_items
    customer:
      label: Customer
      key: [customer_id]
      model: customers
      disallowed_names: [cust_id, custid, customerid]
    product:
      label: Product
      key: [product_id]
      model: products

  # Explicit overrides only — most relationships are inferred.
  relationships:
    customer_history_x_customer:
      entities: [customer_history, customer]
      cardinality: many_to_one
      safety: requires_rewrite
      temporal_validity:
        valid_from: effective_from
        valid_to: effective_to
      rollup_safe:
        forward: [sum, count]
        reverse: []

disallowed_names: is the explicit anti-pattern guard. The validator rejects any model that authors a column, dimension, or measure with a name in the list and points at the canonical column or the expr: escape hatch for intentional renames.

Model example

models/orders.yml

model:
  id: orders
  label: Orders
  relation: shop_order
  # grain derived from primary entity's key (graph's order.key) —
  # do NOT author `grain:` alongside `entities:` (strict mode rejects it).

  entities:
    order: {}                        # primary (grain matches order.key)
    customer: {}                     # FK reference; column = customer_id
    product: {}                      # FK reference

  times:
    ordered_at:
      label: Order time
      column: ordered_at
      kind: timestamp
      class: event_time
      supported_grains: [day, week, month, quarter, year]
      default: true

  dimensions:
    status:
      label: Order Status
      kind: categorical

  measures:
    revenue_usd:
      label: Revenue (USD)
      kind: aggregate
      expr: order_total_cents / 100.0
      default_agg: sum
      accumulation: { kind: flow }
      value_type: currency

    order_count:
      label: Order Count
      kind: entity_count
      entity_key: order
      accumulation: { kind: event }
      value_type: count

model.entities: is required on every model and lists which entities the model exposes. The primary entity is auto-detected; FK references infer relationships with other models that expose the same entity. When the column name differs from the entity's canonical key, override with expr:.

Metric example

metrics/revenue.yml

metrics:
  revenue_usd:
    label: Revenue (USD)
    description: Total revenue. Codified for stable reference.
    kind: aggregate
    measure: revenue_usd
    value_type: currency

  aov_usd:
    label: Average order value (USD)
    kind: ratio
    numerator: revenue_usd
    denominator: order_count
    null_behavior: null_if_zero
    value_type: currency
    time: ordered_at

Most metric kinds use direct named fields (measure:, numerator:, denominator:). kind: derived and kind: conversion use the full expression AST. value_type: is required on every metric.

Snowflake packages

Snowflake-backed packages use the same semantic model with package.warehouse: snowflake and either a Snow CLI or native connector block. Native deployments use env/file credential indirection and the optional connector extra.

package.yml, Snowflake CLI connection

package:
  id: shop_snowflake
  namespace: shop
  warehouse: snowflake
  connection:
    kind: snowflake_cli
    name: shop_dev

package.yml, native Snowflake connection

package:
  id: shop_native
  namespace: shop
  warehouse: snowflake
  connection:
    kind: snowflake_native
    name: prod_native
    options:
      account_env: SNOWFLAKE_ACCOUNT
      user_env: SNOWFLAKE_USER
      password_env: SNOWFLAKE_PASSWORD
      warehouse: COMPUTE_WH
      query_tag: semantic-rails

Path-finding behavior

When a query asks for "X per Y" where X is a measure on one model and Y is a dimension on a different entity, the planner walks the inferred entity graph for the shortest path. This is automatic — you don't author join paths.

"orders per customer": the orders model has both order_id and customer_id; planner uses that table directly.
"items per customer": no single table has all three columns; planner walks order_item → order → customer via inferred relationships.
"items per customer" when a denormalized table contains all three: planner prefers the direct table over the multi-hop path.

When to add a metric

Add a metric when an access pattern deserves a stable contract — when consumers (dashboards, queries, agents) should not have to re-derive the computation. Don't add a metric just because a measure exists; the catalog already lists measures as queryable primitives. Reserve the metrics: block for ratios, governed cumulative/rolling/PTD windows, and derived expressions over multiple measures.

Continue to Query Planner for how authored packages turn into validated plans and executable SQL.

Package Authoring