anchor
Context engineering toolkit for AI applications
anchor
Context is the product. The LLM is just the consumer.
The Python toolkit for context engineering -- assemble RAG, memory, tools, and system prompts into a single, token-aware pipeline.
Why anchor?
Most AI frameworks focus on the LLM call. But the real challenge is assembling the right context -- the system prompt, conversation memory, retrieved documents, and tool outputs that the model actually sees.
anchor gives you a single, composable pipeline that manages all of it within a strict token budget. No duct-taping RAG, memory, and tools together. Build intelligent context pipelines in minutes.
Features
-
Hybrid RAG — Dense embeddings + BM25 sparse retrieval with Reciprocal Rank Fusion. Combine multiple retrieval strategies in a single pipeline for higher recall and precision. Retrieval guide →
-
Smart Memory — Token-aware sliding window with automatic eviction. Oldest turns are evicted when the conversation exceeds its budget — recent context is never lost. Memory guide →
-
Token Budgets — Priority-ranked assembly fills from highest-priority items down. Per-source allocations let you reserve tokens for system prompts, memory, retrieval, and responses independently. Token budgets →
-
Provider Agnostic — Anthropic, OpenAI, or plain text. Format the assembled context for any LLM provider with a single method call. Swap providers without changing your pipeline. Formatters guide →
-
Protocol-Based — Every extension point is defined as a PEP 544 structural protocol. Bring your own retriever, tokenizer, reranker, or memory store — no base classes required. Protocols →
-
Type-Safe — All models are frozen Pydantic v2 dataclasses with full
py.typedsupport. Catch integration errors at type-check time, not at runtime. API reference → -
Agent Framework — Built-in tool registration, skills, and memory+RAG skills that give your agent long-term recall. Compose agents from the same pipeline primitives. Agent guide →
-
Full Observability — Tracing, metrics, cost tracking, and native OTLP export. Know exactly what your pipeline is doing, how long it takes, and what it costs. Observability guide →
Installation
pip
pip install astro-anchoruv
uv add astro-anchorExtras
pip install astro-anchor[bm25] # BM25 sparse retrieval (rank-bm25)
pip install astro-anchor[cli] # CLI tools (typer + rich)
pip install astro-anchor[all] # Everything30-Second Quickstart
Build your first context pipeline:
from anchor import ContextPipeline, MemoryManager, AnthropicFormatter
pipeline = (
ContextPipeline(max_tokens=8192)
.with_memory(MemoryManager(conversation_tokens=4096))
.with_formatter(AnthropicFormatter())
.add_system_prompt("You are a helpful assistant.")
)
result = pipeline.build("What is context engineering?")
print(result.formatted_output) # Ready for the Anthropic API
print(result.diagnostics) # Token usage, timing, overflow info[!TIP] Plain strings just work
build()accepts either a plainstror aQueryBundleobject. Plain strings are automatically wrapped in aQueryBundlefor you.
How It Works
Every ContextItem carries a priority (1--10). When the total exceeds
max_tokens, the pipeline fills from highest priority down. Items that do not
fit are tracked in result.overflow_items.
Comparison
| Feature | LangChain | LlamaIndex | mem0 | anchor |
|---|---|---|---|---|
| Hybrid RAG (Dense + BM25 + RRF) | partial | yes | no | yes |
| Token-aware Memory | partial | no | yes | yes |
| Token Budget Management | no | no | no | yes |
| Provider-agnostic Formatting | no | no | no | yes |
| Protocol-based Plugins (PEP 544) | no | partial | no | yes |
| Zero-config Defaults | no | no | yes | yes |
| Built-in Agent Framework | yes | yes | no | yes |
| Native Observability (OTLP) | partial | partial | no | yes |
Token Budgets
For fine-grained control over how tokens are allocated across sources, use the preset budget factories:
from anchor import ContextPipeline, default_chat_budget
budget = default_chat_budget(max_tokens=8192)
pipeline = ContextPipeline(max_tokens=8192).with_budget(budget)Three presets are available:
| Preset | Best for | Conversation | Retrieval | Response |
|---|---|---|---|---|
default_chat_budget | Conversational apps | 60% | 15% | 15% |
default_rag_budget | RAG-heavy apps | 25% | 40% | 15% |
default_agent_budget | Agentic apps | 30% | 25% | 15% |
[!NOTE] Each budget automatically reserves 15% of tokens for the LLM response. Per-source overflow strategies (
"truncate"or"drop") control what happens when a source exceeds its cap.
Decorator API
Register pipeline steps with decorators instead of factory functions:
from anchor import ContextPipeline, ContextItem, QueryBundle
pipeline = ContextPipeline(max_tokens=8192)
@pipeline.step
def boost_recent(items: list[ContextItem], query: QueryBundle) -> list[ContextItem]:
"""Boost the score of recent items."""
return [
item.model_copy(update={"score": min(1.0, item.score * 1.5)})
if item.metadata.get("recent")
else item
for item in items
]
result = pipeline.build("What is context engineering?")[!TIP] Use
@pipeline.async_stepfor async functions and callabuild()instead ofbuild().
Next Steps
- Getting Started — Installation, first pipeline, and all the basics.
- Core Concepts — Context engineering, architecture, protocols, and token budgets.
- Guides — Pipeline, retrieval, memory, agents, observability, and more.
- API Reference — Full API documentation for every module.