Transforms

Transforms are the T side of Frank. They turn Bronze tables, other transform outputs, or custom code into Silver and Gold data products.

What a transform stores

A transform owns:

Metadata: name, description, tags, tenant.
Target: FIWARE Smart Data Model or custom schema.
Sources: one or more input tables, aliases, join metadata, and ordering.
Field mappings: source expressions, literals, runtime context fields, AI confidence, and ordering.
Pattern config: optional transform pattern ID, version, and params.
Materialization: table, view, incremental, merge keys, schedule config.
Incremental config: full refresh, watermark, cursor field, tiebreaker, per-input read modes.
Artifact reference: the currently hydrated runnable artifact.
Runtime state: lifecycle stage, last run outcome, run stats, test results.

The editable spec and runnable artifact are separate. A transform can be changed without immediately replacing the executable artifact until hydration succeeds.

Lifecycle and runtime outcome

Frank split user intent from runtime truth:

Field	Values	Meaning
`lifecycle_stage`	`draft`, `ready`, `retired`	Whether the transform is usable or intentionally withdrawn.
`last_run_outcome`	`none`, `running`, `succeeded`, `failed`	What the most recent run did.

Readiness checks use both fields plus hydration state:

A transform can run when it is hydrated, not retired, and no run is in flight.
A transform can be scheduled after it has been hydrated and promoted to ready.
Failed runs do not block manual retry.

Legacy status is derived for older consumers.

Input sources

Transforms support:

One Bronze table.
Multiple Bronze tables with joins.
Other transform outputs for chaining.
Mixed source tables and transform outputs for staged Silver-to-Gold flows.

This supports common patterns:

text

raw.postgres_orders -> stg_orders -> fct_daily_sales
raw.stripe_customers \
raw.salesforce_contacts -> dim_customer_360
raw.postgres_customers /

Mapping kinds

AI and UI mapping flows support three field mapping kinds:

Kind	Use it for	Required fields
`source_expression`	A source column feeds a target field, optionally with SQL.	`source_field`
`literal`	A constant value such as source system, schema version, or boolean flag.	`literal_value`, `literal_type`
`context`	Runtime metadata such as tenant ID, pipeline ID, run start, transform name, or source name.	`context_key`

Context keys are intentionally allowlisted:

text

tenant.id
pipeline.id
pipeline.run_id
run.started_at
transform.name
source.name

Transform patterns

Transform patterns live in backend/config/transform_patterns. They are synced at API startup and exposed through /api/v1/transform-patterns.

Families include:

Projection: select and rename.
Filtering: SQL predicates.
Joining: left, inner, lookup.
Aggregation: group by and window functions.
Deduplication: first/latest row selection.
Dimensions: upsert, SCD Type 1, SCD Type 2 merge.
Validation: regex, enum, anomaly flags.
Conversion: unit, currency, timezone.
Geospatial: WKT parsing, H3 enrichment, H3 aggregation, point-in-polygon, nearest, distance, spatial joins.
Python runner: containerized escape hatches such as h3_enrich, fx_rate_ingest, and test patterns.

Each pattern defines params, required fields, runtime, template files, and validation rules.

Runtimes

Transform artifacts can target multiple runtimes:

Runtime	Value	Use it for
Trino SQL	`trino_sql`	Direct SQL execution over Iceberg.
dbt SQL	`dbt_sql`	dbt-style model rendering and execution.
Python runner	`python_runner`	Containerized custom logic using `frank-sdk`.
Flink SQL	`flink_sql`	Future streaming SQL surface.

The renderer produces the runnable artifact; the executor runs it through the correct runtime path.

Hydration

Hydration turns an editable transform spec into a concrete artifact:

Resolve input context and source schemas.
Apply transform pattern or field mappings.
Render SQL/dbt/Python runner content.
Validate required params and schema assumptions.
Persist a TransformArtifact.
Update current_artifact_id.

Hydration is the boundary between design and execution.

Running transforms

Manual run path:

bash

frankctl transforms trigger <transform-id>
frankctl transforms runs <transform-id>
frankctl transforms logs <transform-id> <run-id>

Run statuses:

text

pending -> running -> completed | failed | cancelled

The API stores summary metadata in Postgres and detailed logs in Loki / runtime logs, keeping list views fast while preserving drill-down.

Incremental transforms

Transform-level incremental modes:

Mode	Meaning
`full_refresh`	Reprocess all input rows.
`watermark`	Use a cursor field and tiebreaker for bounded incremental reads.

Per-input read config lets fact tables run delta while dimension tables are read as full snapshots:

json

{
  "iceberg.bronze.orders": { "mode": "delta" },
  "iceberg.bronze.customers": { "mode": "full" }
}

Python runner transforms

Python runner patterns execute in containers and use frank-sdk to read runtime config, connect to Trino, emit metrics, emit lineage, and return structured results. They are the right choice when SQL would be forced or when a domain library is the real implementation.

Use the SDK guide for the authoring contract: SDK.

When to use what

Need	Use
Rename, cast, filter, map fields	Field mappings or `select_rename` / `filter`.
Join two or more tables	Join patterns or multi-source transform wizard.
Standardize into a known semantic schema	Target SDM + AI-suggested field mappings.
Generate custom logic	AI generate-transform, then review and publish.
Use domain Python libraries	Python runner pattern with `frank-sdk`.
Chain reusable steps	Pipeline DAG.

Transforms ​

What a transform stores ​

Lifecycle and runtime outcome ​

Input sources ​

Mapping kinds ​

Transform patterns ​

Runtimes ​

Hydration ​

Running transforms ​

Incremental transforms ​

Python runner transforms ​

When to use what ​

Transforms

What a transform stores

Lifecycle and runtime outcome

Input sources

Mapping kinds

Transform patterns

Runtimes

Hydration

Running transforms

Incremental transforms

Python runner transforms

When to use what