ai.formatting_python_template provides a flexible way to structure the input for embedding models. This enables you to incorporate relevant metadata and additional text. This can significantly enhance the quality and usefulness of the generated embeddings, especially in scenarios where context from multiple fields is important for understanding or searching the content.
- Define a template for formatting the data before embedding
- Allow the combination of multiple fields from the source table
- Add consistent context or structure to the text being embedded
- Customize the input for the embedding model to improve relevance and searchability
$chunk variable contains the chunked text.
Samples
Default formatting
The default formatter uses the$chunk template, resulting in outputting the chunk text as-is.
Add context from other columns
Add the title and publication date to each chunk, providing more context for the embedding.Combine multiple fields
Prepend author and category information to each chunk.Add consistent structure
Add start and end markers to each chunk, which could be useful for certain types of embeddings or retrieval tasks.Arguments
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
template | TEXT | $chunk | ✔ | A string using Python template strings with $-prefixed variables that defines how the data should be formatted |
- The $chunk placeholder is required and represents the text chunk that will be embedded
- Other placeholders can be used to reference columns from the source table
- The template allows for adding static text or structuring the input in a specific way
Returns
A JSON configuration object that you can use inai.create_vectorizer.