Skip to content

Architecture Overview

Frank is a low-code EL/T platform for building governed lakehouse data products and publishing them into an ontology.

It combines four layers:

  1. Experience layer -- SvelteKit UI, frankctl, Python pattern CLI, and API.
  2. Control plane -- FastAPI, metadata models, pattern registries, schema libraries, and auth.
  3. Execution plane -- Temporal workers, Dagster assets, Trino execution, Kubernetes Python runner, and Martha AI workflows.
  4. Data plane -- Iceberg catalog, S3/MinIO object storage, Bronze/Silver/Gold tables, and ontology-core-v2.

The picture

text
                              Builders and operators
                UI              frankctl             API / CI
                 |                  |                   |
                 +------------------+-------------------+
                                    |
                                    v
                          FastAPI control plane
         +--------------------------+---------------------------+
         |                          |                           |
         v                          v                           v
  Source registry            Transform registry           Pipeline registry
  Pattern catalog            Artifact hydration           Versioned DAGs
  Stream config              Runtime metadata             Sandbox/activation
         |                          |                           |
         v                          v                           v
  Temporal source worker     Dagster + transform worker    Dagster / sandbox
  Airbyte / dlt              Trino / dbt / Python runner   Step execution
         |                          |                           |
         +--------------------------+---------------------------+
                                    |
                                    v
                         Apache Iceberg lakehouse
                    Bronze -> Silver -> Gold datasets
                                    |
                                    v
                      Backing datasets and ontology sync
                                    |
                                    v
                            ontology-core-v2

Control plane

The FastAPI application owns product state:

  • Sources and streams.
  • Source pattern definitions.
  • Transform specs, sources, mappings, artifacts, runs, and lineage.
  • Transform pattern registry.
  • Pipeline versions, steps, edges, and sandbox results.
  • Schedules.
  • Dataset browsing.
  • Schema libraries.
  • Ontology entity type proxy.
  • Backing datasets and ontology sync history.
  • Identity policies.
  • AI routes backed by Martha.

Postgres stores metadata. Iceberg stores data. The API derives tenant scope from auth and passes service identity where workers need to call back into protected endpoints.

Source execution

Source execution is asynchronous and handled by Temporal source workers.

text
Source pattern -> Source -> Discovery -> Streams -> Sync -> Bronze tables

Airbyte patterns use PyAirbyte and Dockerized source connectors. dlt patterns use Python-native source builders for REST, GraphQL, filesystem, Kafka, and related lightweight sources.

Both engines implement the same extraction contract:

  • Discover stream schemas.
  • Extract selected streams.
  • Apply data envelope metadata.
  • Track cursors.
  • Batch writes.
  • Write to Iceberg through shared naming helpers.
  • Return structured sync status and logs.

Transform execution

Transforms separate design-time specs from runtime artifacts.

text
Transform spec -> Hydration -> Artifact -> Materialization -> TransformRun

The transform spec stores sources, mappings, pattern params, target schema, materialization, and incremental mode. Hydration renders executable content:

  • Trino SQL.
  • dbt-style SQL.
  • Python-runner files and container config.
  • Future runtime renderers such as Flink SQL.

The recommended execution path is Dagster-first:

text
API trigger -> Dagster materialization -> asset calls API /execute -> run status sync

This gives operators a Dagster run, asset key, logs, and consistent scheduling behavior.

Pipeline execution

Pipelines are versioned DAGs over transform steps.

text
Pipeline -> PipelineVersion -> PipelineStep + PipelineStepEdge

Frank validates DAGs with topological sorting and cycle detection. Versions are immutable and content-hashed. Sandbox runs execute a version before activation. Activation links or creates transforms for each step and promotes the pipeline to an operational state.

AI execution

Frank owns workflow definitions for AI assistance and seeds them into Martha.

text
Frank API -> Martha workflow -> LLM / tools -> structured result -> Frank spec

AI routes return typed payloads for schema matching, field mapping, pattern params, SQL review, code generation, CI fixes, transform publishing, and pipeline composition.

Ontology execution

Ontology integration turns curated Iceberg tables into semantic entities.

text
Silver/Gold table -> BackingDataset -> OntologySyncRun -> ontology-core-v2 entities

Backing datasets map columns to entity properties and relationships. Sync runs track snapshots, cursors, rows synced, workflow IDs, logs, and drift signals.

Observability

Frank uses:

  • Structured logs for API and worker events.
  • Loki for persistent run logs.
  • OpenTelemetry for API and worker traces.
  • Dagster run IDs for materialization tracking.
  • Temporal workflow IDs for async operations.
  • TransformRun, SyncRun, and OntologySyncRun records for fast UI/API history.

Design principles

EL and T are separate

Sources sync raw data. Transforms model data. A source can be active without a transform; a transform can remain usable while an upstream source needs attention.

Specs and artifacts are separate

Users edit transform specs. Hydration produces artifacts. Execution uses artifacts. This keeps design iteration away from runtime stability.

Patterns are product surface

Source and transform patterns are not hidden implementation details. They are how Frank expands connector coverage and transform capability while keeping UI and API behavior consistent.

Ontology is a publication layer

Frank does not treat ontology sync as an afterthought. Backing datasets, identity policies, entity type browsing, mapping suggestions, health checks, and sync history are first-class product features.

Frank is built by aiaiai-pt.