anchor uses Python Protocols (PEP 544) to define all extension
points. This page explains what protocols are, why anchor chose them
over class inheritance, and how to implement your own.
A Protocol is a way to declare an interface in Python using structural
subtyping -- also known as "static duck typing." A class satisfies a
protocol if it has the right methods with the right signatures. No base
class or registration is needed.
Any object with a retrieve(query, top_k) method satisfies the Retriever
protocol -- even if it has never seen the protocol definition.
[!NOTE] PEP 544
Protocols were introduced in Python 3.8 via PEP 544. They are part of
the typing module and are fully supported by mypy, pyright, and other
type checkers.
Traditional frameworks use abstract base classes (ABCs) to define
interfaces. This creates problems:
Concern
Inheritance (ABC)
Protocol
Must import base class
yes
no
Must call super().__init__()
often
never
Works with third-party classes
no
yes
Runtime isinstance() checks
yes
yes (@runtime_checkable)
IDE autocompletion
yes
yes
Type checker validation
yes
yes
With protocols, you can wrap any existing object -- a Pinecone client, a
custom database class, a test stub -- without modifying its inheritance
chain. If it has the right methods, it works.
Many protocols come in sync/async pairs. The async variant uses a different
method name (prefixed with a) to avoid ambiguity:
Sync
Async
Sync Method
Async Method
Retriever
AsyncRetriever
retrieve()
aretrieve()
Reranker
AsyncReranker
rerank()
arerank()
PostProcessor
AsyncPostProcessor
process()
aprocess()
QueryTransformer
AsyncQueryTransformer
transform()
atransform()
Use the sync variant with pipeline.build() and the async variant with
pipeline.abuild().
[!CAUTION] Sync steps in async pipelines
abuild() can run both sync and async steps -- sync functions are
called directly. But build() cannot run async steps and will raise
TypeError if it encounters one.