Skip to main content

Add Semantic Search

This guide shows how to build a pipeline that generates vector embeddings from text and writes them to a Weaviate vector database, enabling semantic search over your streaming data.

Prerequisites

  • TypeStream installed and running
  • A Weaviate instance accessible from the TypeStream server
  • An OpenAI API key set as OPENAI_API_KEY on the server (for embedding generation)

Register a Weaviate connection

Before creating the pipeline, register your Weaviate instance with TypeStream. In the GUI, navigate to Connections > Weaviate and add your connection details (URL and optional API key).

The connection will appear as a sink option in the graph builder palette.

Build the pipeline

The full flow: read from a Kafka topic, generate embeddings from a text field, and write to Weaviate.

  1. Drag a Kafka Source and select your topic
  2. Drag an Embedding Generator node and connect it
    • Set textField to the field containing your text (e.g. title from wikipedia_changes)
    • Set outputField to embedding
    • Choose a model (e.g. text-embedding-3-small)
  3. Drag a Weaviate Sink from the palette (appears under Vector Sinks after registering a connection)
    • Set the collection name
    • Configure the document ID strategy and vector strategy
  4. Click Create Job

Weaviate sink configuration

FieldDescription
collection_nameWeaviate collection to write to
document_id_strategyHow to derive the document ID from records
vector_strategyHow to map the embedding field to the Weaviate vector
timestamp_fieldOptional field to use as the record timestamp

Schema behavior

The Embedding Generator adds outputField (type: list of floats) to the output schema. It validates at compile time that textField exists in the input schema.

See also