Skip to content

Ontology Integration

Frank turns curated Iceberg tables into semantic entities. The ontology integration lets teams publish pipeline outputs into ontology-core-v2 so applications can consume typed, versioned, relationship-aware data.

The model

text
Gold / Silver Iceberg table
        |
        v
Backing dataset
        |
        v
Ontology entity type
        |
        v
Ontology entities

A backing dataset says: this Iceberg table backs this ontology entity type, using these column-to-property mappings and this primary key.

Entity types

Entity types are schemas served by ontology-core-v2. Frank proxies the entity type surface so data builders can work inside the same UI and API:

http
GET    /api/v1/ontology/status
GET    /api/v1/ontology/entity-types
GET    /api/v1/ontology/entity-types/domains
GET    /api/v1/ontology/entity-types/{code}
POST   /api/v1/ontology/entity-types
POST   /api/v1/ontology/entity-types/{code}/versions
PATCH  /api/v1/ontology/entity-types/{code}
DELETE /api/v1/ontology/entity-types/{code}
GET    /api/v1/ontology/entity-types/{code}/versions

Entity types can include fields and relationships. Frank synthesizes relationship references into field-like mapping targets so users can map station_name or route_id style columns into relationship refs during backing dataset setup.

Backing datasets

A backing dataset contains:

FieldMeaning
iceberg_namespace / iceberg_tableThe materialized table to publish.
entity_type_id / entity_type_nameThe ontology type being backed.
schema_library_refOptional source schema reference such as fiware:Transportation/Vehicle.
property_mappingsColumn-to-property mapping array.
primary_key_columnStable entity key column.
title_key_columnHuman-readable entity label column.
sync_modeWhen the dataset should publish.
cursor_columnOptional incremental sync cursor.
transform_id / pipeline_idOptional lineage back to the producer.

Backing dataset lifecycle:

text
pending -> syncing -> synced
synced -> syncing
synced -> needs_remapping
needs_remapping -> pending
error -> pending | syncing

Property mappings

Mappings are explicit and reviewable:

json
[
  {
    "column": "vehicle_id",
    "property": "id",
    "is_primary_key": true,
    "type": "string"
  },
  {
    "column": "observed_at",
    "property": "dateObserved",
    "type": "datetime"
  },
  {
    "column": "station_name",
    "property": "ref_station",
    "is_relationship": true,
    "target_type": "station",
    "target_key": "name"
  }
]

Relationship mappings let the sync activity resolve business keys into ontology entity UUIDs.

Mapping assistance

Frank can suggest backing dataset mappings:

http
POST /api/v1/backing-datasets/suggest-mappings

The suggestion request includes the Iceberg table and target entity type. Frank uses table schema, target property names, and AI assistance to propose column-to-property matches.

Sync

Backing datasets sync rows from Iceberg into ontology-core-v2. The sync path tracks:

  • Workflow ID and workflow run ID.
  • Status: pending, running, synced, error, skipped.
  • Started and completed timestamps.
  • Rows synced.
  • Snapshot ID.
  • Full vs incremental sync.
  • Error message.
  • Trigger source.

Useful endpoints:

http
POST /api/v1/backing-datasets/{id}/sync
GET  /api/v1/backing-datasets/{id}/sync-history
GET  /api/v1/backing-datasets/{id}/sync-history/{run_id}/logs
GET  /api/v1/backing-datasets/{id}/health

The health endpoint checks the mapping, table state, ontology status, and schema drift signals that matter before publication.

Schema libraries

Schema libraries provide target schemas for transforms and backing datasets:

http
GET  /api/v1/schema-libraries
GET  /api/v1/schema-libraries/{library_id}/domains
GET  /api/v1/schema-libraries/{library_id}/domains/{domain}/schemas
GET  /api/v1/schema-libraries/{library_id}/schemas/{schema_id}
GET  /api/v1/schema-libraries/schema/{full_id}
GET  /api/v1/schema-libraries/search
POST /api/v1/schema-libraries/validate/{full_id}

The registry combines FIWARE Smart Data Models and custom schemas behind one browsing and validation surface.

Identity policies

Identity policies define stable keys for semantic entities. Strategies include:

  • passthrough: use the normalized source field.
  • composite: concatenate normalized fields.
  • hash: hash the composite key.
  • uuid: generate a UUID-form key from normalized values.

Policies can normalize values with operations such as trim, upper/lower, space stripping, and NFC normalization. They can be system-level or tenant-level.

Important endpoints:

http
GET    /api/v1/identity-policies
GET    /api/v1/identity-policies/{id}
POST   /api/v1/identity-policies
PUT    /api/v1/identity-policies/{id}
DELETE /api/v1/identity-policies/{id}
POST   /api/v1/identity-policies/{id}/dry-run

Use dry runs to verify identifier output before a transform or backing dataset depends on it.

  1. Build and run a transform into a Silver or Gold table.
  2. Choose or create an ontology entity type.
  3. Register a backing dataset for the table.
  4. Use mapping suggestions, then review field and relationship mappings.
  5. Pick primary key and title key columns.
  6. Run a health check.
  7. Trigger sync.
  8. Monitor sync history and logs.

Frank is built by aiaiai-pt.