Skip to content

AI Assistance

Frank uses Martha to make the slow parts of data pipeline design faster: understanding source schemas, choosing target models, drafting mappings, reviewing SQL, generating custom transform code, and composing pipeline DAGs.

The AI layer is deliberately assistive. It proposes; the user reviews, edits, tests, and promotes.

Where AI shows up

WorkflowAPIUse it for
Target schema suggestionPOST /api/v1/ai/suggest-target-schemaMatch a source table to FIWARE Smart Data Models or other schema-library targets.
Field mapping suggestionPOST /api/v1/ai/suggest-field-mappingsMap source fields to target fields, including transforms, literals, and runtime context.
Pattern parameter suggestionPOST /api/v1/ai/suggest-pattern-paramsFill transform pattern parameters from schema and partial user input.
SQL reviewPOST /api/v1/ai/review-sqlReview Trino SQL for correctness, performance, and data quality issues.
Transform generationPOST /api/v1/ai/generate-transformGenerate Python transform pattern files from a step description and schema contract.
CI fixPOST /api/v1/ai/fix-ci-failureDiagnose generated pattern CI logs and propose file-level fixes.
Publish transformPOST /api/v1/ai/publish-transformOpen a PR for bespoke transform code.
Pipeline compositionPOST /api/v1/ai/compose-pipelineDraft a multi-step pipeline DAG from source tables, target schema, and intent.

The user experience

AI assistance appears inside the source, transform, and pipeline workflows:

  1. A user selects source data.
  2. Frank gathers schema and context.
  3. Martha runs the relevant workflow.
  4. Frank returns structured suggestions with confidence and reasoning.
  5. The user accepts, edits, or rejects suggestions.
  6. The result becomes ordinary Frank configuration: a target schema, field mapping, transform pattern, custom code package, or pipeline DAG.

Nothing special is stored because it came from AI. Once accepted, it is part of the same spec, artifact, versioning, sandbox, and run lifecycle as hand-authored work.

Schema matching

Target schema suggestion analyzes source fields and returns ranked matches:

json
{
  "matches": [
    {
      "schema_id": "fiware:Transportation/Vehicle",
      "schema_name": "Vehicle",
      "confidence": 0.91,
      "reason": "The source contains vehicle identifiers, position, speed, and timestamp fields.",
      "field_preview": ["id", "location", "speed", "dateObserved"]
    }
  ]
}

Use this at the start of a transform when a team knows the data but not the best standard model.

Field mapping

Field mappings support three kinds:

KindExample
source_expressionspeedKph comes from speed_mph * 1.60934.
literalsource_system is always "here_traffic".
contexttenantId comes from runtime context tenant.id.

This lets AI fill complete target schemas, including metadata and constants, instead of only direct column matches.

Pattern parameter suggestions

Transform patterns are powerful because they are reusable, but their params can still be tedious. AI can inspect a source schema and propose:

  • Deduplication keys.
  • Timestamp columns.
  • Numeric fields for anomaly detection.
  • Group-by dimensions.
  • Join keys.
  • Conversion source/target units.
  • Geospatial columns.

The result is a params object plus per-field reasoning.

SQL review

SQL review is designed for fast iteration before hydration or publication. It checks:

  • Invalid references and type mismatches.
  • Expensive scans and risky joins.
  • Null handling.
  • Lossy casts.
  • Missing predicates.
  • Data quality risks.

The output is a list of issues with severity, message, suggestion, and line number when available.

Code generation and publication

When a catalog pattern is not enough, Frank can ask Martha to generate a bespoke Python transform pattern. The request includes:

  • Step description.
  • Source schema.
  • Target schema.
  • Capability tier.
  • Pattern name.
  • Optional pipeline step ID.

The generated files follow the frank-sdk contract: read TRANSFORM_CONFIG, process data, emit metrics/logs/lineage, and write a FrankResult to stdout.

For generated custom code, the publish path can create a PR against the transform pattern repository. CI can then build, test, and register the pattern back into Frank.

Pipeline composition

Pipeline composition creates a draft DAG from intent:

yaml
pipeline_name: customer_360
source_tables:
  - iceberg.bronze.crm.contacts
  - iceberg.bronze.billing.customers
target_description: Unified customer profile with billing and CRM attributes.
target_sdm_id: fiware:Customer/Customer
pipeline_context: Prefer reusable catalog patterns; avoid bespoke code unless needed.

CLI:

bash
frankctl ai compose-pipeline -f customer-360.yaml --timeout 600

The response can include proposed steps, pattern IDs, suggested params, input/output columns, dependencies, confidence, and reasoning. The pipeline still goes through normal review, sandbox validation, and activation.

Workflow ownership

Frank owns the workflow definitions in backend/services/martha_workflows.py and seeds them into Martha with:

bash
python scripts/seed_martha_workflows.py --update-existing

That keeps the AI behavior versioned with the Frank product rather than hidden inside an external prompt store.

Frank is built by aiaiai-pt.