Astro Intelligence
Anchor

anchor

Context engineering toolkit for AI applications

anchor

Context is the product. The LLM is just the consumer.

The Python toolkit for context engineering -- assemble RAG, memory, tools, and system prompts into a single, token-aware pipeline.

Get Started View on GitHub

PyPI Downloads Python License: MIT


Why anchor?

Most AI frameworks focus on the LLM call. But the real challenge is assembling the right context -- the system prompt, conversation memory, retrieved documents, and tool outputs that the model actually sees.

anchor gives you a single, composable pipeline that manages all of it within a strict token budget. No duct-taping RAG, memory, and tools together. Build intelligent context pipelines in minutes.


Features

  • Hybrid RAG — Dense embeddings + BM25 sparse retrieval with Reciprocal Rank Fusion. Combine multiple retrieval strategies in a single pipeline for higher recall and precision. Retrieval guide →

  • Smart Memory — Token-aware sliding window with automatic eviction. Oldest turns are evicted when the conversation exceeds its budget — recent context is never lost. Memory guide →

  • Token Budgets — Priority-ranked assembly fills from highest-priority items down. Per-source allocations let you reserve tokens for system prompts, memory, retrieval, and responses independently. Token budgets →

  • Provider Agnostic — Anthropic, OpenAI, or plain text. Format the assembled context for any LLM provider with a single method call. Swap providers without changing your pipeline. Formatters guide →

  • Protocol-Based — Every extension point is defined as a PEP 544 structural protocol. Bring your own retriever, tokenizer, reranker, or memory store — no base classes required. Protocols →

  • Type-Safe — All models are frozen Pydantic v2 dataclasses with full py.typed support. Catch integration errors at type-check time, not at runtime. API reference →

  • Agent Framework — Built-in tool registration, skills, and memory+RAG skills that give your agent long-term recall. Compose agents from the same pipeline primitives. Agent guide →

  • Full Observability — Tracing, metrics, cost tracking, and native OTLP export. Know exactly what your pipeline is doing, how long it takes, and what it costs. Observability guide →


Installation

pip

pip install astro-anchor

uv

uv add astro-anchor

Extras

pip install astro-anchor[bm25]   # BM25 sparse retrieval (rank-bm25)
pip install astro-anchor[cli]    # CLI tools (typer + rich)
pip install astro-anchor[all]    # Everything

30-Second Quickstart

Build your first context pipeline:

from anchor import ContextPipeline, MemoryManager, AnthropicFormatter

pipeline = (
    ContextPipeline(max_tokens=8192)
    .with_memory(MemoryManager(conversation_tokens=4096))
    .with_formatter(AnthropicFormatter())
    .add_system_prompt("You are a helpful assistant.")
)

result = pipeline.build("What is context engineering?")
print(result.formatted_output)   # Ready for the Anthropic API
print(result.diagnostics)        # Token usage, timing, overflow info

[!TIP] Plain strings just work build() accepts either a plain str or a QueryBundle object. Plain strings are automatically wrapped in a QueryBundle for you.


How It Works

Loading diagram...

Every ContextItem carries a priority (1--10). When the total exceeds max_tokens, the pipeline fills from highest priority down. Items that do not fit are tracked in result.overflow_items.


Comparison

FeatureLangChainLlamaIndexmem0anchor
Hybrid RAG (Dense + BM25 + RRF)partialyesnoyes
Token-aware Memorypartialnoyesyes
Token Budget Managementnononoyes
Provider-agnostic Formattingnononoyes
Protocol-based Plugins (PEP 544)nopartialnoyes
Zero-config Defaultsnonoyesyes
Built-in Agent Frameworkyesyesnoyes
Native Observability (OTLP)partialpartialnoyes

Token Budgets

For fine-grained control over how tokens are allocated across sources, use the preset budget factories:

from anchor import ContextPipeline, default_chat_budget

budget = default_chat_budget(max_tokens=8192)
pipeline = ContextPipeline(max_tokens=8192).with_budget(budget)

Three presets are available:

PresetBest forConversationRetrievalResponse
default_chat_budgetConversational apps60%15%15%
default_rag_budgetRAG-heavy apps25%40%15%
default_agent_budgetAgentic apps30%25%15%

[!NOTE] Each budget automatically reserves 15% of tokens for the LLM response. Per-source overflow strategies ("truncate" or "drop") control what happens when a source exceeds its cap.


Decorator API

Register pipeline steps with decorators instead of factory functions:

from anchor import ContextPipeline, ContextItem, QueryBundle

pipeline = ContextPipeline(max_tokens=8192)

@pipeline.step
def boost_recent(items: list[ContextItem], query: QueryBundle) -> list[ContextItem]:
    """Boost the score of recent items."""
    return [
        item.model_copy(update={"score": min(1.0, item.score * 1.5)})
        if item.metadata.get("recent")
        else item
        for item in items
    ]

result = pipeline.build("What is context engineering?")

[!TIP] Use @pipeline.async_step for async functions and call abuild() instead of build().


Next Steps

  • Getting Started — Installation, first pipeline, and all the basics.
  • Core Concepts — Context engineering, architecture, protocols, and token budgets.
  • Guides — Pipeline, retrieval, memory, agents, observability, and more.
  • API Reference — Full API documentation for every module.

On this page