Skip to content

Transforms

Transforms are the T side of Frank. They turn Bronze tables, other transform outputs, or custom code into Silver and Gold data products.

What a transform stores

A transform owns:

  • Metadata: name, description, tags, tenant.
  • Target: FIWARE Smart Data Model or custom schema.
  • Sources: one or more input tables, aliases, join metadata, and ordering.
  • Field mappings: source expressions, literals, runtime context fields, AI confidence, and ordering.
  • Pattern config: optional transform pattern ID, version, and params.
  • Materialization: table, view, incremental, merge keys, schedule config.
  • Incremental config: full refresh, watermark, cursor field, tiebreaker, per-input read modes.
  • Artifact reference: the currently hydrated runnable artifact.
  • Runtime state: lifecycle stage, last run outcome, run stats, test results.

The editable spec and runnable artifact are separate. A transform can be changed without immediately replacing the executable artifact until hydration succeeds.

Lifecycle and runtime outcome

Frank split user intent from runtime truth:

FieldValuesMeaning
lifecycle_stagedraft, ready, retiredWhether the transform is usable or intentionally withdrawn.
last_run_outcomenone, running, succeeded, failedWhat the most recent run did.

Readiness checks use both fields plus hydration state:

  • A transform can run when it is hydrated, not retired, and no run is in flight.
  • A transform can be scheduled after it has been hydrated and promoted to ready.
  • Failed runs do not block manual retry.

Legacy status is derived for older consumers.

Input sources

Transforms support:

  • One Bronze table.
  • Multiple Bronze tables with joins.
  • Other transform outputs for chaining.
  • Mixed source tables and transform outputs for staged Silver-to-Gold flows.

This supports common patterns:

text
raw.postgres_orders -> stg_orders -> fct_daily_sales
raw.stripe_customers \
raw.salesforce_contacts -> dim_customer_360
raw.postgres_customers /

Mapping kinds

AI and UI mapping flows support three field mapping kinds:

KindUse it forRequired fields
source_expressionA source column feeds a target field, optionally with SQL.source_field
literalA constant value such as source system, schema version, or boolean flag.literal_value, literal_type
contextRuntime metadata such as tenant ID, pipeline ID, run start, transform name, or source name.context_key

Context keys are intentionally allowlisted:

text
tenant.id
pipeline.id
pipeline.run_id
run.started_at
transform.name
source.name

Transform patterns

Transform patterns live in backend/config/transform_patterns. They are synced at API startup and exposed through /api/v1/transform-patterns.

Families include:

  • Projection: select and rename.
  • Filtering: SQL predicates.
  • Joining: left, inner, lookup.
  • Aggregation: group by and window functions.
  • Deduplication: first/latest row selection.
  • Dimensions: upsert, SCD Type 1, SCD Type 2 merge.
  • Validation: regex, enum, anomaly flags.
  • Conversion: unit, currency, timezone.
  • Geospatial: WKT parsing, H3 enrichment, H3 aggregation, point-in-polygon, nearest, distance, spatial joins.
  • Python runner: containerized escape hatches such as h3_enrich, fx_rate_ingest, and test patterns.

Each pattern defines params, required fields, runtime, template files, and validation rules.

Runtimes

Transform artifacts can target multiple runtimes:

RuntimeValueUse it for
Trino SQLtrino_sqlDirect SQL execution over Iceberg.
dbt SQLdbt_sqldbt-style model rendering and execution.
Python runnerpython_runnerContainerized custom logic using frank-sdk.
Flink SQLflink_sqlFuture streaming SQL surface.

The renderer produces the runnable artifact; the executor runs it through the correct runtime path.

Hydration

Hydration turns an editable transform spec into a concrete artifact:

  1. Resolve input context and source schemas.
  2. Apply transform pattern or field mappings.
  3. Render SQL/dbt/Python runner content.
  4. Validate required params and schema assumptions.
  5. Persist a TransformArtifact.
  6. Update current_artifact_id.

Hydration is the boundary between design and execution.

Running transforms

Manual run path:

bash
frankctl transforms trigger <transform-id>
frankctl transforms runs <transform-id>
frankctl transforms logs <transform-id> <run-id>

Run statuses:

text
pending -> running -> completed | failed | cancelled

The API stores summary metadata in Postgres and detailed logs in Loki / runtime logs, keeping list views fast while preserving drill-down.

Incremental transforms

Transform-level incremental modes:

ModeMeaning
full_refreshReprocess all input rows.
watermarkUse a cursor field and tiebreaker for bounded incremental reads.

Per-input read config lets fact tables run delta while dimension tables are read as full snapshots:

json
{
  "iceberg.bronze.orders": { "mode": "delta" },
  "iceberg.bronze.customers": { "mode": "full" }
}

Python runner transforms

Python runner patterns execute in containers and use frank-sdk to read runtime config, connect to Trino, emit metrics, emit lineage, and return structured results. They are the right choice when SQL would be forced or when a domain library is the real implementation.

Use the SDK guide for the authoring contract: SDK.

When to use what

NeedUse
Rename, cast, filter, map fieldsField mappings or select_rename / filter.
Join two or more tablesJoin patterns or multi-source transform wizard.
Standardize into a known semantic schemaTarget SDM + AI-suggested field mappings.
Generate custom logicAI generate-transform, then review and publish.
Use domain Python librariesPython runner pattern with frank-sdk.
Chain reusable stepsPipeline DAG.

Frank is built by aiaiai-pt.