Astro Intelligence

Multi-Modal API Reference

Multi-Modal API Reference

The multi-modal module provides content models, encoders, table parsers, and a converter for bridging multi-modal items with the text-based pipeline.

All classes are importable from anchor:

from anchor import (
    ModalityType, MultiModalContent, MultiModalItem, MultiModalConverter,
    TextEncoder, TableEncoder, ImageDescriptionEncoder, CompositeEncoder,
    MarkdownTableParser, HTMLTableParser,
)

ModalityType

Enum of supported content modalities.

class ModalityType(StrEnum):
    TEXT = "text"
    IMAGE = "image"
    TABLE = "table"
    CODE = "code"
    AUDIO = "audio"

MultiModalContent

Represents a single content modality with optional raw binary data. Frozen (immutable) after creation.

Constructor

class MultiModalContent(BaseModel):
    modality: ModalityType
    content: str
    raw_data: bytes | None = None
    mime_type: str | None = None
    metadata: dict[str, Any] = Field(default_factory=dict)

Fields

FieldTypeDefaultDescription
modalityModalityTyperequiredContent type
contentstrrequiredText representation or description
raw_databytes | NoneNoneRaw binary data (images, audio)
mime_typestr | NoneNoneMIME type for raw_data
metadatadict[str, Any]{}Arbitrary metadata

MultiModalItem

Groups multiple MultiModalContent pieces into a single retrievable unit. Frozen (immutable) after creation.

Constructor

class MultiModalItem(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    contents: list[MultiModalContent]
    source: SourceType
    score: float = Field(default=0.0, ge=0.0, le=1.0)
    priority: int = Field(default=5, ge=1, le=10)
    token_count: int = Field(default=0, ge=0)
    metadata: dict[str, Any] = Field(default_factory=dict)
    created_at: datetime = Field(default_factory=lambda: datetime.now(UTC))

Fields

FieldTypeDefaultDescription
idstrauto UUIDUnique identifier
contentslist[MultiModalContent]requiredContent pieces
sourceSourceTyperequiredOrigin type (RETRIEVAL, MEMORY, etc.)
scorefloat0.0Relevance score (0.0--1.0)
priorityint5Priority level (1--10, lower = higher)
token_countint0Estimated token count
metadatadict[str, Any]{}Arbitrary metadata
created_atdatetimenow (UTC)Creation timestamp

MultiModalConverter

Static utility class for converting between MultiModalItem and ContextItem.

Methods

to_context_item

@staticmethod
def to_context_item(item: MultiModalItem, encoder: ModalityEncoder) -> ContextItem

Convert a single MultiModalItem to a ContextItem. All content pieces are encoded to text and concatenated with double newlines. The resulting ContextItem has metadata["multimodal"] = True.

to_context_items

@staticmethod
def to_context_items(
    items: list[MultiModalItem], encoder: ModalityEncoder
) -> list[ContextItem]

Batch convert a list of MultiModalItem objects.

from_context_item

@staticmethod
def from_context_item(
    item: ContextItem, modality: ModalityType = ModalityType.TEXT
) -> MultiModalItem

Convert a ContextItem back to a MultiModalItem. Wraps the text content in a single MultiModalContent of the specified modality.


TextEncoder

Pass-through encoder for text content. Returns the content field unchanged.

Constructor

class TextEncoder:
    def __init__(self) -> None

Methods

encode

def encode(self, content: MultiModalContent) -> str

Return the text content as-is.

Properties

PropertyTypeValue
supported_modalitieslist[ModalityType][ModalityType.TEXT]

TableEncoder

Converts table content to Markdown text. Pass-through if already Markdown.

Constructor

class TableEncoder:
    def __init__(self) -> None

Methods

encode

def encode(self, content: MultiModalContent) -> str

Return the table content (already in Markdown format).

Properties

PropertyTypeValue
supported_modalitieslist[ModalityType][ModalityType.TABLE]

ImageDescriptionEncoder

Encodes image content into text via an optional description callback.

Constructor

class ImageDescriptionEncoder:
    def __init__(self, describe_fn: Callable[[bytes], str] | None = None) -> None

Parameters

ParameterTypeDefaultDescription
describe_fnCallable[[bytes], str] | NoneNoneCallback to generate text from image bytes

Methods

encode

def encode(self, content: MultiModalContent) -> str

Encode image content into text. Fallback order:

  1. describe_fn(raw_data) if both are available
  2. metadata["description"] if present and non-empty
  3. content field as last resort

Properties

PropertyTypeValue
supported_modalitieslist[ModalityType][ModalityType.IMAGE]

CompositeEncoder

Routes encoding to the appropriate sub-encoder based on modality type.

Constructor

class CompositeEncoder:
    def __init__(
        self,
        encoders: dict[ModalityType, TextEncoder | TableEncoder | ImageDescriptionEncoder] | None = None,
    ) -> None

Parameters

ParameterTypeDefaultDescription
encodersdict | NoneNoneCustom encoder mapping. Defaults to TEXT, TABLE, IMAGE, CODE

Default encoders:

ModalityEncoder
TEXTTextEncoder()
TABLETableEncoder()
IMAGEImageDescriptionEncoder()
CODETextEncoder()

Methods

encode

def encode(self, content: MultiModalContent) -> str

Encode by delegating to the appropriate sub-encoder. Raises ValueError if no encoder is registered for the content's modality.

Properties

PropertyTypeDescription
supported_modalitieslist[ModalityType]All registered modality types

MarkdownTableParser

Extracts tables from Markdown text using regex.

Constructor

class MarkdownTableParser:
    def __init__(self) -> None

Methods

extract_tables

def extract_tables(self, source: Path | bytes) -> list[MultiModalContent]

Extract Markdown tables from a file path or raw bytes. Returns a list of MultiModalContent objects with modality=TABLE and metadata={"format": "markdown"}.


HTMLTableParser

Extracts <table> elements from HTML and converts them to Markdown.

Constructor

class HTMLTableParser:
    def __init__(self) -> None

Methods

extract_tables

def extract_tables(self, source: Path | bytes) -> list[MultiModalContent]

Extract HTML tables from a file path or raw bytes. Each table is converted to Markdown format. Returns a list of MultiModalContent objects with modality=TABLE and metadata={"format": "html", "original_format": "html"}.


See Also

On this page