Schema Propagation

Schema propagation is TypeStream's compile-time pass that flows type information through the pipeline graph. It enables schema validation before execution, field-name display in the GUI, and type-safe pipeline configurations.

How it works

Source resolution: Source nodes look up their schema from Schema Registry via the virtual filesystem. For example, a topic encoded in Avro has its full field structure resolved at compile time.
Node-by-node inference: Each downstream node's inferOutputSchema() receives the upstream schema and produces an output schema:
- Pass-through nodes (Filter, Count, Inspector, Each, Group): Schema is unchanged
- Schema-modifying nodes: Transform the schema in specific ways
Validation: If a node references a field that doesn't exist in the input schema (e.g. a GeoIP node with ipField: "ip_address" when no such field exists), the error is caught at compile time -- before any Kafka Streams resources are allocated.

Schema-modifying nodes

Node	Schema Change
Join	Merges left and right schemas into a combined struct
GeoIp	Adds `outputField` (string)
TextExtractor	Adds `outputField` (string)
EmbeddingGenerator	Adds `outputField` (list of floats)
OpenAiTransformer	Adds `outputField` (string)

Encoding rules

The output encoding follows these rules:

If the output schema is the same type as the input (e.g. filtering doesn't change the schema), the encoding is preserved (Avro stays Avro).
If the output schema differs from the input (e.g. a join combines two schemas), the encoding defaults to JSON.
If the source encoding is not set, it defaults to Avro.

Examples

# Input: Avro. Output: Avro (schema unchanged by filter)
cat books | grep "Station" > station_books

# Input: Avro. Output: JSON (schema changed by cut)
cat books | cut title > book_titles

# Input: Avro + JSON. Output: JSON (mixed encodings)
join books ratings > book_ratings

Why this matters

Catch errors early: A misconfigured ipField or missing topic fails at plan/validate time, not after deployment.
GUI feedback: The graph builder shows field names on every edge as you build, so you can verify the pipeline is correct before submitting.
Documentation: Schema information is available programmatically, making it easy to understand what data flows through each stage of the pipeline.

How it works​

Schema-modifying nodes​

Encoding rules​

Examples​

Why this matters​

How it works

Schema-modifying nodes

Encoding rules

Examples

Why this matters