Skip to content

Concepts

Frank has a small vocabulary. The product is easiest to understand when the extract/load side, transform side, and ontology side stay separate.

The mental model in one paragraph

You create sources from reusable source patterns. Each source discovers streams and syncs selected streams into tenant-scoped Bronze Iceberg tables. You create transforms from field mappings, SQL, dbt-style templates, or Python-runner artifacts. Transforms consume Bronze tables or other transform outputs and materialize Silver or Gold tables. You compose transforms into versioned pipelines. You can then register a curated table as a backing dataset for an ontology entity type so semantic applications can consume it.

Primitives

ConceptWhat it isUse it for
TenantThe isolation boundary for sources, streams, transforms, pipelines, datasets, runs, and ontology mappings.Separate customers, environments, or domains without mixing metadata or data paths.
Source patternA declarative connector template. Patterns describe config fields, defaults, examples, auth hints, and the extraction engine (airbyte or dlt).Add PostgreSQL, Salesforce, REST APIs, GraphQL, files, S3, SFTP, Kafka, Stripe, Slack, and more without hardcoding the UI.
SourceA configured extract/load connection created from a source pattern.Own connection config, discovery schema, status, schedule, and sync metrics.
StreamA table, API resource, file glob, or event stream inside a source.Select what to sync, set full-refresh or incremental mode, cursor fields, primary keys, and destination table names.
DatasetAn Iceberg table exposed through the datasets API.Browse Bronze, Silver, or Gold data, preview rows, and inspect snapshots.
Transform patternA reusable transform recipe with a parameter schema and runtime renderer.Filter, dedupe, join, aggregate, validate, convert, geospatially enrich, or run Python containers.
TransformA materialized transformation definition.Map source fields to target fields, apply patterns, generate SQL or Python artifacts, run manually, schedule, and inspect history.
ArtifactThe hydrated executable output for a transform version.Keep the runnable SQL/dbt/Python representation separate from the editable transform spec.
RunA source sync, transform execution, pipeline sandbox, or ontology sync execution.Track status, logs, metrics, workflow IDs, row counts, and failures.
PipelineA versioned DAG of transform steps.Compose multi-step EL/T flows with edges, fan-in, terminal steps, sandbox validation, activation, and pause/resume.
Schema libraryThe target schema catalog exposed to the UI and API.Pick FIWARE Smart Data Models or custom schemas for transforms and backing datasets.
Entity typeAn ontology schema object served by ontology-core-v2.Define the semantic object that a curated table backs, including fields and relationships.
Backing datasetA registration that maps an Iceberg table to an ontology entity type.Push table rows into ontology entities with column-to-property mappings and sync history.
Identity policyA reusable rule for deriving stable entity identifiers.Normalize source fields, build passthrough/composite/hash/UUID keys, and dry-run identity resolution.
AI workflowA Martha workflow owned by Frank and called by the API.Suggest schemas, mappings, pattern params, SQL reviews, code generation, CI fixes, publishing, and pipeline composition.

How they fit together

text
Tenant
  |
  +-- Source patterns --> Sources --> Streams --> Bronze Iceberg tables
  |
  +-- Transform patterns --> Transforms --> Artifacts --> Runs
  |                                           |
  |                                           +--> Silver / Gold Iceberg tables
  |
  +-- Pipelines --> Versions --> Steps + Edges --> Transform runs
  |
  +-- Schema libraries / Ontology entity types
                         |
                         +--> Backing datasets --> Ontology sync runs

Two lifecycles, not one

Frank deliberately separates EL from T:

text
Source:    draft -> ready -> syncing -> active <-> paused
                               |
                               +-> error
            * -> decommissioned

Transform: draft -> ready -> retired
Runtime:   none -> running -> succeeded | failed

A source can be active with no transform. A transform can remain ready while its upstream source is stale. A pipeline can be drafted and sandboxed before it is activated. That separation is what makes multi-source joins, transform chaining, and partial recovery practical.

What this gets you

You do not have to build:

  • Connector UX -- dynamic forms, validation, discovery, selected streams, and sync scheduling.
  • Lakehouse write plumbing -- Iceberg naming, tenant namespaces, envelopes, cursor state, idempotent snapshots, and write retries.
  • Transform runtime plumbing -- hydration, artifacts, renderers, run records, logs, cancellation, and retry policies.
  • Pipeline safety -- DAG validation, version hashes, sandbox execution, step classification, and activation gates.
  • Semantic publication -- ontology proxying, entity type versioning, backing dataset mappings, identity policy support, and sync history.
  • AI orchestration -- prompt workflows, model calls, trace IDs, and graceful degradation when Martha is offline.

You still own:

  • Data intent -- which data matters, what it means, and how it should be joined, cleaned, modeled, and published.

What is next

Frank is built by aiaiai-pt.