[!NOTE] What is anchor?
anchor is a Python library for context engineering -- the discipline
of building systems that provide the right information to language models at
the right time. It provides pipelines for assembling context from multiple
sources (retrieval, memory, system prompts) and formatting it for any LLM
provider.
[!NOTE] How is anchor different from LangChain or LlamaIndex?
anchor focuses specifically on the context assembly layer rather than
being a full application framework. It is lightweight, protocol-based (no
heavy inheritance hierarchies), and gives you fine-grained control over token
budgets, priority-based overflow, and provider-specific formatting. You can
use it alongside LangChain or LlamaIndex if you want.
[!NOTE] Is anchor production-ready?
Yes. It is designed for production use with built-in error handling policies
(on_error="skip"), observability callbacks (TracingCallback, CostTracker),
comprehensive diagnostics, and thorough test coverage. See the
Production Patterns cookbook for deployment
guidance.
[!NOTE] Does anchor require an API key?
No. The core library (pipelines, memory, retrieval, formatting) works entirely
locally with no API calls. An API key is only needed if you use the Agent
class (which calls the Anthropic API) or if you use an embedding function that
calls an external service.
[!NOTE] Can I install anchor in a Jupyter notebook?
Yes. Run !pip install astro-anchor in a cell. All examples in the
documentation are designed to work in notebooks.
[!NOTE] What is the difference between build() and abuild()?
build() is synchronous and runs all pipeline steps sequentially.
abuild() is asynchronous and can run async steps concurrently. Use
abuild() when your pipeline includes async retrievers or when you want
to parallelize I/O-bound operations.
"raise": stop the pipeline and propagate the exception
"skip": log the error, continue with items from previous steps
"empty": log the error, continue with an empty item list
[!NOTE] Can I add custom steps to the pipeline?
Yes. Use the decorator API or pass a callable to add_step():
@pipeline.step(name="my-custom-step")def boost_recent(items, query): """Boost items from the last 24 hours.""" for item in items: if is_recent(item): item = item.model_copy(update={"score": item.score * 1.5}) return items
[!NOTE] What order do pipeline steps run in?
Steps run in the order they are added. Items flow from one step to the
next. System prompts and memory items are injected after all steps run,
during the final context window assembly.
[!NOTE] Can I mix sync and async steps?
Yes. When using abuild(), sync steps are called directly and async
steps are awaited. Both work seamlessly in the same pipeline.
[!NOTE] What eviction strategies are available?
anchor provides several eviction strategies for SlidingWindowMemory:
FIFO (default): removes the oldest turns first
ImportanceEviction: keeps high-importance turns longer based on a scoring function
SummaryEviction: summarizes evicted turns into a condensed fact before removing them
[!NOTE] How do I persist memory across sessions?
Use a persistent MemoryEntryStore instead of InMemoryEntryStore:
from anchor import MemoryManager, JsonFileMemoryStorememory = MemoryManager( conversation_tokens=4096, persistent_store=JsonFileMemoryStore("memory.json"),)
Conversation turns are in-memory only (by design, as they are ephemeral).
Persistent facts stored via add_fact() are saved to the store.
[!NOTE] How does fact deduplication work?
When you call memory.add_fact(), the library checks for semantic duplicates
using text similarity. If a substantially similar fact already exists, the new
fact is silently dropped. This prevents the fact store from filling up with
redundant information.
[!NOTE] What is graph memory?
Graph memory stores relationships between entities (people, places, concepts)
as a knowledge graph. It enables queries like "What does the user know about
Project X?" by traversing entity relationships. See the memory management
guide for details.
[!NOTE] How does BM25 sparse retrieval work?
BM25 (Best Matching 25) is a keyword-based ranking algorithm that scores
documents by term frequency and inverse document frequency. It excels at
exact keyword matching where dense embeddings might miss specific terms.
[!NOTE] How do I choose hybrid retrieval weights?
Start with 70% dense / 30% sparse and tune using evaluation metrics. Dense
retrieval is better for semantic similarity, while sparse retrieval is better
for exact keyword matching. Use the Evaluation Workflow
to find optimal weights for your dataset.
[!NOTE] Can I use an external vector database?
Yes. Implement the VectorStore protocol with your database client:
class PineconeVectorStore: def __init__(self, index): self.index = index def add(self, ids, embeddings, metadata=None): self.index.upsert(vectors=zip(ids, embeddings, metadata or [])) def query(self, embedding, top_k=10): return self.index.query(vector=embedding, top_k=top_k)
[!NOTE] How do I choose the right token budget preset?
Match the preset to your model's context window, leaving headroom for the
model's response:
Model
Context Window
Recommended Budget
Claude Haiku
200K
MEDIUM (16K) to LARGE (32K)
Claude Sonnet
200K
LARGE (32K) to XL (64K)
GPT-4o
128K
MEDIUM (16K) to LARGE (32K)
GPT-4o-mini
128K
SMALL (4K) to MEDIUM (16K)
Start smaller and increase only if retrieval quality suffers.
[!NOTE] What happens when the token budget overflows?
By default, the lowest-priority items are dropped until the context fits
within the budget. Priority order (highest to lowest):
System prompts (priority 10)
Persistent facts (priority 8)
Conversation turns (priority 7)
Retrieved documents (priority 5)
You can change the overflow policy to "truncate_oldest" or "error".
[!NOTE] Can I set a custom token budget for each source?
Yes. Use TokenBudgetConfig for fine-grained allocation:
[!NOTE] How do I switch between LLM providers?
Swap the formatter on the pipeline:
from anchor import ( AnthropicFormatter, OpenAIFormatter, GenericTextFormatter,)# For Anthropic Claudepipeline.with_formatter(AnthropicFormatter())# For OpenAI GPTpipeline.with_formatter(OpenAIFormatter())# For any provider (plain text)pipeline.with_formatter(GenericTextFormatter())
[!NOTE] Can I create a custom formatter?
Yes. Implement the Formatter protocol:
from anchor import Formatter, ContextWindowclass MyFormatter: def format(self, window: ContextWindow) -> dict: # Transform the context window into your provider's format return { "system": window.system_text(), "context": window.retrieval_text(), "history": window.conversation_items(), }
[!NOTE] What does the formatted output look like for Anthropic?
AnthropicFormatter produces a dict with system and messages keys that
can be passed directly to anthropic.messages.create():
result = pipeline.build(query)formatted = result.formatted_output# formatted = {# "system": [{"type": "text", "text": "You are a helpful assistant."}],# "messages": [# {"role": "user", "content": "Hello"},# {"role": "assistant", "content": "Hi there!"},# ],# }
[!NOTE] I'm getting ModuleNotFoundError: No module named 'rank_bm25'
Install the BM25 extra:
pip install astro-anchor[bm25]
[!NOTE] I'm getting TokenBudgetExceeded errors
This happens when your overflow policy is set to "error" and the assembled
context exceeds the token budget. Solutions:
Increase max_tokens
Reduce top_k on your retriever steps
Switch to overflow_policy="truncate_lowest_priority" (the default)
[!NOTE] My pipeline is slow -- how do I debug it?
Check the step-by-step timing in diagnostics:
result = pipeline.build(query)for step in result.diagnostics.get("steps", []): print(f"{step['name']}: {step['time_ms']:.1f} ms")
Common bottlenecks:
Embedding computation: use a faster model or cache embeddings
Reranking: reduce the number of candidates passed to the reranker
External API calls: use async steps with abuild()
[!NOTE] Retrieved results seem irrelevant -- how do I improve quality?
Try these steps in order:
Check your embedding function: ensure it is producing meaningful vectors
Adjust chunk sizes: smaller chunks (128-256 tokens) often improve precision
Add a reranker: CrossEncoderReranker significantly improves relevance
Try hybrid retrieval: combine dense and sparse search for better coverage
Use evaluation metrics: run the Evaluation Workflow
to measure and compare configurations
[!NOTE] How do I enable debug logging?
Set the log level for the anchor logger: