Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md
OmniCache-AI Logo

omnicache-ai

Unified multi-layer caching for AI Agent pipelines.
Drop it in front of any LLM call, embedding, retrieval query, or agent workflow
to eliminate redundant API calls and cut latency and cost.

PyPI version Python CI License: MIT PyPI Downloads LangChain LangGraph AutoGen CrewAI Agno


Table of Contents


Why omnicache-ai?

Every AI agent pipeline makes the same expensive calls repeatedly:

Without cachingWith omnicache-ai
Every LLM call billed at full token costIdentical prompts returned instantly, zero tokens
Embeddings re-computed on every requestVectors stored and reused across sessions
Vector search re-run for same queriesRetrieval results cached by query + top-k
Agent state lost between runsSession context persisted across turns
Semantically identical questions treated as newCosine similarity match returns cached answer

vs. GPTCache / LiteLLM / redis-vl

GPTCache (8k ⭐) was the closest open-source competitor — but it has been effectively unmaintained since 2024. omnicache-ai fills that gap with a far more complete, modern, and actively maintained solution.

Featureomnicache-aiGPTCacheLiteLLMredis-vl
13 framework adapters31 (gateway)
Adaptive semantic threshold
Streaming cache
Tiered backend (L1 + L2)
Stampede protection
Provider prompt cache trackingpartial
Qdrant + Weaviate backends
Prometheus + OTEL export
Multi-tenant namespacing
Actively maintained (2026)❌ dead

LiteLLM is a gateway/proxy (different problem domain). redis-vl is Redis-only with no framework adapters. GPTCache is the direct predecessor — omnicache-ai is what GPTCache should have become.


Key Features

Cache Layers

LayerClassWhat it cachesSerialization
LLM ResponseResponseCacheModel output keyed by model + messages + paramspluggable
EmbeddingsEmbeddingCachenp.ndarray vectors keyed by model + textnp.tobytes()
RetrievalRetrievalCacheDocument lists keyed by query + retriever + top-kpluggable
Context/SessionContextCacheConversation turns keyed by session ID + turn indexpluggable
SemanticSemanticCacheAnswers reused for semantically similar queries (cosine ≥ threshold)pluggable
Adaptive SemanticAdaptiveSemanticCacheSemanticCache + auto-tuning threshold + multi-turn guardpluggable
StreamingStreamingResponseCacheBuffers streamed LLM chunks; replays from cache as generatorpluggable
Prompt CachePromptCacheLayerInjects Anthropic cache_control; tracks provider cache savings

Storage Backends

BackendClassExtrasBest For
In-Memory (LRU)InMemoryBackend— (core)Dev, testing, single-process
Async In-MemoryAsyncInMemoryBackend— (core)FastAPI, async frameworks
DiskDiskBackend— (core)Persistent, single-machine
RedisRedisBackend[redis]Shared across processes / services
Tiered (L1 + L2)TieredBackend— (core)Memory speed + Redis persistence
FAISSFAISSBackend[vector-faiss]High-speed in-process vector search
ChromaDBChromaBackend[vector-chroma]Persistent vector store + metadata
QdrantQdrantBackend[vector-qdrant]Fastest production vector DB (22ms)
WeaviateWeaviateBackend[vector-weaviate]Native hybrid search (vector + BM25)

Framework Adapters

FrameworkClassHook PointAsync
OpenAI SDKOpenAICacheAdapterclient.chat.completions.createachat_create
Anthropic SDKAnthropicCacheAdapterclient.messages.createamessages_create
Google ADKGoogleADKCacheAdapterAgent.run() / run_async()arun
OpenAI Agents SDKOpenAIAgentsCacheAdapterRunner.run() / run_sync()arun
LlamaIndex LLMLlamaIndexLLMCacheAdaptercomplete() / chat() / async variantsacomplete / achat
LlamaIndex QueryEngineLlamaIndexQueryCacheAdapterquery() / aquery()aquery
Claude Agent SDKClaudeAgentCacheAdapterclaude_code_sdk.query() async generator✅ (async generator)
LangChain ≥ 0.2LangChainCacheAdapterBaseCachelookup / updatealookup / aupdate
LangGraph ≥ 0.1 / 1.xLangGraphCacheAdapterBaseCheckpointSaverget_tuple / put / listaget_tuple / aput / alist
AutoGen ≥ 0.4AutoGenCacheAdapterAssistantAgent.run() / arun()arun
AutoGen 0.2.xAutoGenCacheAdapterConversableAgent.generate_reply()
CrewAI ≥ 0.28CrewAICacheAdapterCrew.kickoff()kickoff_async
Agno ≥ 0.1AgnoCacheAdapterAgent.run() / arun()arun
A2A ≥ 0.2A2ACacheAdapterprocess() / wrap() decoratoraprocess

Middleware

ClassWrapsAsync
LLMMiddlewareAny sync LLM callable
AsyncLLMMiddlewareAny async LLM callable
EmbeddingMiddlewareAny sync/async embed function
RetrieverMiddlewareAny sync/async retriever

Core Engine

ComponentClassDescription
OrchestratorCacheManagerCentral hub — wires backend, key builder, TTL policy, invalidation
Key BuilderCacheKeyBuildernamespace:type:sha256[:16] canonical keys
MetricsCacheMetricsHit/miss/eviction counters + provider cache hits + cost saved
SerializerSerializerPluggable encode/decode — PickleSerializer (default), JsonSerializer
CompressorCompressorOptional compression — GzipCompressor, NoopCompressor (default)
Stampede ShieldStampedeShieldPer-key threading.Lock prevents concurrent duplicate LLM calls
Request ConfigRequestConfigPer-request TTL / threshold / skip_cache overrides
Cache WarmerCacheWarmerBulk pre-populate from query lists or CSV files
TTL PolicyTTLPolicyGlobal + per-layer TTL overrides
EvictionEvictionPolicyLRU / TTL-only strategies, wired into InMemoryBackend
InvalidationInvalidationEngineTag-based bulk eviction
Multi-TenantCacheManager.for_tenant(id)Scoped manager with per-tenant key namespacing, shared backend
SettingsOmnicacheSettingsDataclass + from_env() for 12-factor config
PrometheusPrometheusExporter/metrics HTTP endpoint — requires [observability]
OpenTelemetryOpenTelemetryExporterPush metrics to OTEL collector — requires [observability]

AI Agent Pipeline Architecture

Where Cache Layers Sit in a Full AI Pipeline

flowchart TD User(["👤 User / Application"]) User -->|query| Adapters subgraph Adapters["🔌 Framework Adapters (13)"] direction LR OAI["OpenAI SDK"] ANT["Anthropic SDK"] GADK["Google ADK"] OAIA["OpenAI Agents SDK"] LLI["LlamaIndex"] CLA["Claude Agent SDK"] LC["LangChain"] LG["LangGraph"] AG["AutoGen"] CR["CrewAI"] AN["Agno"] A2["A2A"] end Adapters -->|intercepted call| MW subgraph MW["⚙️ Middleware"] direction LR LLM_MW["LLMMiddleware"] EMB_MW["EmbeddingMiddleware"] RET_MW["RetrieverMiddleware"] end MW -->|cache lookup| Layers subgraph Layers["🗂️ Cache Layers (omnicache-ai)"] direction TB ASC["AdaptiveSemanticCache\n(auto-tuning threshold)"] SC["SemanticCache\n(cosine similarity)"] RC["ResponseCache\n(LLM output, stampede-safe)"] SRC["StreamingResponseCache\n(buffered stream replay)"] EC["EmbeddingCache\n(np.ndarray)"] REC["RetrievalCache\n(documents)"] CC["ContextCache\n(session turns)"] PC["PromptCacheLayer\n(provider cache_control)"] end Layers -->|hit → return| User Layers -->|miss → forward| Core subgraph Core["🧠 Core Engine"] direction LR CM["CacheManager\n+ for_tenant()"] KB["CacheKeyBuilder\nnamespace:type:sha256"] MT["CacheMetrics\nhit_rate · cost_saved"] IE["InvalidationEngine\ntag-based eviction"] TP["TTLPolicy\nper-layer TTLs"] SS["StampedeShield\nper-key lock"] OBS["Observability\nPrometheus · OTEL"] end Core <-->|read / write| Backends subgraph Backends["💾 Storage Backends (9)"] direction LR MEM["InMemoryBackend\n(LRU)"] AMEM["AsyncInMemoryBackend\n(asyncio)"] TIER["TieredBackend\n(L1 + L2)"] DISK["DiskBackend\n(diskcache)"] REDIS["RedisBackend\n[redis]"] FAISS["FAISSBackend\n[vector-faiss]"] CHROMA["ChromaBackend\n[vector-chroma]"] QDRANT["QdrantBackend\n[vector-qdrant]"] WEAV["WeaviateBackend\n[vector-weaviate]"] end Core -->|miss| LLM_CALL subgraph LLM_CALL["🤖 Actual AI Work (on cache miss only)"] direction LR LLM["LLM API\ngpt-4o / claude / gemini"] EMB["Embedder\ntext-embedding-3"] VDB["Vector DB\npinecone / weaviate"] TOOLS["Tools / APIs"] end LLM_CALL -->|result| Core Core -->|store + return| User style Layers fill:#1e3a5f,color:#fff,stroke:#3b82f6 style Backends fill:#1a3326,color:#fff,stroke:#22c55e style Adapters fill:#3b1f5e,color:#fff,stroke:#a855f7 style MW fill:#3b2a0f,color:#fff,stroke:#f59e0b style Core fill:#1e2a3b,color:#fff,stroke:#64748b style LLM_CALL fill:#3b1a1a,color:#fff,stroke:#ef4444

Cache Layer Responsibilities in the Pipeline

flowchart LR Q(["Query"]) Q --> S0 subgraph S0["① Adaptive Semantic Layer"] ASC["AdaptiveSemanticCache\nauto-tuning threshold\n+ multi-turn guard"] end S0 -->|miss| S1 subgraph S1["② Semantic Layer"] SC["SemanticCache\ncosine similarity ≥ threshold\n→ skip everything below"] end S1 -->|miss| S2 subgraph S2["③ Response Layer"] RC["ResponseCache\nexact model+msgs+params\nhash match · stampede-safe"] end S2 -->|miss| S2b subgraph S2b["④ Streaming Layer"] SRC["StreamingResponseCache\nbuffered stream replay\nfor streaming LLMs"] end S2b -->|miss| S3 subgraph S3["⑤ Retrieval Layer"] REC["RetrievalCache\nquery + retriever + top-k\nhash match"] end S3 -->|miss| S4 subgraph S4["⑥ Embedding Layer"] EC["EmbeddingCache\nmodel + text hash match\nreturns np.ndarray"] end S4 -->|miss| S5 subgraph S5["⑦ Context Layer"] CC["ContextCache\nsession_id + turn_index\nreturns message history"] end S5 -->|all miss| API["🤖 LLM / API Call\n+ PromptCacheLayer\n(provider cache_control)"] API -->|result| Store["Store in all\nrelevant layers\n+ CacheMetrics update"] Store --> R(["Response"]) S0 -->|hit ⚡| R S1 -->|hit ⚡| R S2 -->|hit ⚡| R S2b -->|hit ⚡| R S3 -->|hit ⚡| R S4 -->|hit ⚡| R S5 -->|hit ⚡| R style S0 fill:#1a3a2a,color:#fff,stroke:#10b981 style S1 fill:#4c1d95,color:#fff,stroke:#7c3aed style S2 fill:#1e3a5f,color:#fff,stroke:#3b82f6 style S2b fill:#1e2a4a,color:#fff,stroke:#60a5fa style S3 fill:#14532d,color:#fff,stroke:#22c55e style S4 fill:#713f12,color:#fff,stroke:#f59e0b style S5 fill:#7f1d1d,color:#fff,stroke:#ef4444

Backend Selection by Use Case

flowchart TD Start(["Which backend?"]) Start --> Q1{"Multiple processes\nor services?"} Q1 -->|Yes| Q1b{"Also need\nfast local reads?"} Q1b -->|Yes| TIERED["TieredBackend\nL1=InMemory + L2=Redis\n[core]"] Q1b -->|No| REDIS["RedisBackend\n[redis]"] Q1 -->|No| Q2{"Need vector\nsimilarity?"} Q2 -->|Yes| Q3{"Need hybrid\nsearch (vector+BM25)?"} Q3 -->|Yes| WEAV["WeaviateBackend\n[vector-weaviate]"] Q3 -->|No| Q3b{"Production scale\n(10M+ vectors)?"} Q3b -->|Yes| QDRANT["QdrantBackend\n22ms p95 · [vector-qdrant]"] Q3b -->|No| Q3c{"Persist\nto disk?"} Q3c -->|Yes| CHROMA["ChromaBackend\n[vector-chroma]"] Q3c -->|No| FAISS["FAISSBackend\n[vector-faiss]"] Q2 -->|No| Q4{"Async framework\n(FastAPI / LangGraph)?"} Q4 -->|Yes| AMEM["AsyncInMemoryBackend\n[core]"] Q4 -->|No| Q5{"Survive\nrestarts?"} Q5 -->|Yes| DISK["DiskBackend\n[core]"] Q5 -->|No| MEM["InMemoryBackend\n[core]"] style REDIS fill:#dc2626,color:#fff style TIERED fill:#b45309,color:#fff style FAISS fill:#2563eb,color:#fff style CHROMA fill:#7c3aed,color:#fff style QDRANT fill:#0e7490,color:#fff style WEAV fill:#0f766e,color:#fff style DISK fill:#d97706,color:#fff style MEM fill:#059669,color:#fff style AMEM fill:#047857,color:#fff

Installation

Requirements

  • Python ≥ 3.12
  • Core dependencies: diskcache, numpy (installed automatically)

pip (PyPI)

# Minimal — in-memory + disk backends pip install omnicache-ai # ── Framework adapters ────────────────────────────────────────────── pip install 'omnicache-ai[openai]' # OpenAI SDK adapter pip install 'omnicache-ai[anthropic]' # Anthropic SDK adapter pip install 'omnicache-ai[google-adk]' # Google ADK adapter pip install openai-agents # OpenAI Agents SDK adapter pip install 'omnicache-ai[llamaindex]' # LlamaIndex LLM + QueryEngine adapters pip install claude-code-sdk # Claude Agent SDK adapter pip install 'omnicache-ai[langchain]' # LangChain ≥ 0.2 pip install 'omnicache-ai[langgraph]' # LangGraph ≥ 0.1 / 1.x pip install 'omnicache-ai[autogen]' # AutoGen legacy (pyautogen 0.2.x) pip install 'autogen-agentchat>=0.4' # AutoGen new API (separate package) pip install 'omnicache-ai[crewai]' # CrewAI ≥ 0.28 / 1.x pip install 'omnicache-ai[agno]' # Agno ≥ 0.1 / 2.x pip install 'a2a-sdk>=0.3' omnicache-ai # A2A SDK ≥ 0.2 # ── Storage backends ──────────────────────────────────────────────── pip install 'omnicache-ai[redis]' # Redis pip install 'omnicache-ai[vector-faiss]' # FAISS vector search pip install 'omnicache-ai[vector-chroma]' # ChromaDB vector store pip install 'omnicache-ai[vector-qdrant]' # Qdrant (22ms p95, fastest) pip install 'omnicache-ai[vector-weaviate]' # Weaviate hybrid search # ── Observability ──────────────────────────────────────────────────── pip install 'omnicache-ai[observability]' # Prometheus + OpenTelemetry exporters # ── Common combos ─────────────────────────────────────────────────── pip install 'omnicache-ai[langchain,redis]' pip install 'omnicache-ai[langgraph,vector-qdrant]' # ── Everything ────────────────────────────────────────────────────── pip install 'omnicache-ai[all]'

uv

uv add omnicache-ai uv add 'omnicache-ai[langchain,redis]' uv add 'omnicache-ai[all]'

From source

git clone https://github.com/ashishpatel26/omnicache-ai.git cd omnicache-ai uv sync --dev # installs all dev + core deps uv run pytest # verify install

Verify

python -c "import omnicache_ai; print(omnicache_ai.__version__)" # 0.3.0

Environment variable configuration

VariableDefaultValues
OMNICACHE_BACKENDmemorymemory · disk · redis
OMNICACHE_REDIS_URLredis://localhost:6379/0Any Redis URL
OMNICACHE_DISK_PATH/tmp/omnicacheAny writable path
OMNICACHE_DEFAULT_TTL3600Seconds;0 = no expiry
OMNICACHE_NAMESPACEomnicacheKey prefix string
OMNICACHE_SEMANTIC_THRESHOLD0.95Float 0–1
OMNICACHE_TTL_EMBEDDING86400Per-layer override
OMNICACHE_TTL_RETRIEVAL3600Per-layer override
OMNICACHE_TTL_CONTEXT1800Per-layer override
OMNICACHE_TTL_RESPONSE600Per-layer override
export OMNICACHE_BACKEND=redis export OMNICACHE_REDIS_URL=redis://localhost:6379/0 export OMNICACHE_DEFAULT_TTL=3600
from omnicache_ai import CacheManager, OmnicacheSettings manager = CacheManager.from_settings(OmnicacheSettings.from_env())

Quick Start

from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder manager = CacheManager( backend=InMemoryBackend(), key_builder=CacheKeyBuilder(namespace="myapp"), ) manager.set("my_key", {"result": 42}, ttl=60) value = manager.get("my_key") # {"result": 42}

LangChain in 3 lines

from langchain_core.globals import set_llm_cache from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder from omnicache_ai.adapters.langchain_adapter import LangChainCacheAdapter set_llm_cache(LangChainCacheAdapter(CacheManager(backend=InMemoryBackend(), key_builder=CacheKeyBuilder()))) # Every ChatOpenAI / ChatAnthropic call is now cached automatically

Cache Layers

LLM Response Cache

Cache the string or dict output of any LLM call, keyed by model + messages + params.

from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder, ResponseCache manager = CacheManager(backend=InMemoryBackend(), key_builder=CacheKeyBuilder(namespace="myapp")) cache = ResponseCache(manager) messages = [{"role": "user", "content": "What is 2+2?"}] cache.set(messages, "4", model_id="gpt-4o") answer = cache.get(messages, model_id="gpt-4o") # "4" # get_or_generate — calls generator only on cache miss def call_llm(msgs): return openai_client.chat.completions.create(...).choices[0].message.content answer = cache.get_or_generate(messages, call_llm, model_id="gpt-4o")

Embedding Cache

from omnicache_ai import EmbeddingCache emb_cache = EmbeddingCache(manager) vec = emb_cache.get_or_compute( text="Hello world", compute_fn=lambda t: embed_model.encode(t), model_id="text-embedding-3-small", )

Retrieval Cache

from omnicache_ai import RetrievalCache ret_cache = RetrievalCache(manager) docs = ret_cache.get_or_retrieve( query="What is RAG?", retrieve_fn=lambda q: vectorstore.similarity_search(q, k=5), retriever_id="my-vectorstore", top_k=5, )

Context / Session Cache

from omnicache_ai import ContextCache ctx_cache = ContextCache(manager) ctx_cache.set(session_id="user-123", turn_index=0, messages=[...]) history = ctx_cache.get(session_id="user-123", turn_index=0) ctx_cache.invalidate_session("user-123") # clear all turns for this session

Semantic Cache

Returns a cached answer for semantically similar queries (cosine ≥ threshold). Requires pip install 'omnicache-ai[vector-faiss]'.

from omnicache_ai import SemanticCache from omnicache_ai.backends.memory_backend import InMemoryBackend from omnicache_ai.backends.vector_backend import FAISSBackend sem_cache = SemanticCache( exact_backend=InMemoryBackend(), vector_backend=FAISSBackend(dim=1536), embed_fn=lambda text: embed_model.encode(text), # returns np.ndarray threshold=0.95, ) sem_cache.set("What is the capital of France?", "Paris") sem_cache.get("What is the capital of France?") # "Paris" — exact sem_cache.get("Which city is the capital of France?") # "Paris" — semantic hit

Middleware (Decorator Pattern)

Wrap any sync/async LLM callable without changing its signature.

from omnicache_ai import LLMMiddleware, CacheKeyBuilder, ResponseCache middleware = LLMMiddleware(response_cache, key_builder, model_id="gpt-4o") @middleware def call_llm(messages: list[dict]) -> str: return openai_client.chat.completions.create(...).choices[0].message.content wrapped = middleware(call_llm) # or wrap an existing callable
from omnicache_ai import AsyncLLMMiddleware @AsyncLLMMiddleware(response_cache, key_builder, model_id="gpt-4o") async def call_llm_async(messages): return await async_client.chat(messages)

Same pattern: EmbeddingMiddleware, RetrieverMiddleware


Framework Adapters

LangChain

from langchain_core.globals import set_llm_cache from omnicache_ai.adapters.langchain_adapter import LangChainCacheAdapter set_llm_cache(LangChainCacheAdapter(manager)) llm = ChatOpenAI(model="gpt-4o") response = llm.invoke("What is 2+2?") # cached on second call

LangGraph

Compatible with langgraph ≥ 0.1 and ≥ 1.0 — adapter auto-detects the API version.

from omnicache_ai.adapters.langgraph_adapter import LangGraphCacheAdapter saver = LangGraphCacheAdapter(manager) graph = StateGraph(MyState).compile(checkpointer=saver) result = graph.invoke({"messages": [...]}, config={"configurable": {"thread_id": "t1"}})

AutoGen

# autogen-agentchat 0.4+ (new API) from autogen_agentchat.agents import AssistantAgent from omnicache_ai.adapters.autogen_adapter import AutoGenCacheAdapter agent = AssistantAgent("assistant", model_client=...) cached = AutoGenCacheAdapter(agent, manager) result = await cached.arun("What is 2+2?") # pyautogen 0.2.x (legacy) from autogen import ConversableAgent agent = ConversableAgent(name="assistant", llm_config={...}) cached = AutoGenCacheAdapter(agent, manager) reply = cached.generate_reply(messages=[{"role": "user", "content": "Hi"}])

CrewAI

from crewai import Crew from omnicache_ai.adapters.crewai_adapter import CrewAICacheAdapter crew = Crew(agents=[...], tasks=[...]) cached_crew = CrewAICacheAdapter(crew, manager) result = cached_crew.kickoff(inputs={"topic": "AI trends"}) result = await cached_crew.kickoff_async(inputs={"topic": "AI trends"})

Agno

from agno.agent import Agent from omnicache_ai.adapters.agno_adapter import AgnoCacheAdapter agent = Agent(model=..., tools=[...]) cached = AgnoCacheAdapter(agent, manager) response = cached.run("Summarize the latest AI research") response = await cached.arun("Summarize the latest AI research")

Google ADK

from google.adk.agents import Agent from omnicache_ai.adapters.google_adk_adapter import GoogleADKCacheAdapter agent = Agent(name="research_agent", model="gemini-2.0-flash", instruction="...") cached = GoogleADKCacheAdapter(agent, manager) result = cached.run("Summarise the quarterly report") # live call result = cached.run("Summarise the quarterly report") # instant from cache result = await cached.arun("Async task")

OpenAI Agents SDK

from agents import Agent, Runner from omnicache_ai.adapters.openai_agents_adapter import OpenAIAgentsCacheAdapter agent = Agent(name="assistant", instructions="Be concise", model="gpt-4o") adapter = OpenAIAgentsCacheAdapter(manager) result = adapter.run(agent, "What is RAG?") result = await adapter.arun(agent, "What is RAG?") # async

LlamaIndex

from llama_index.llms.openai import OpenAI from omnicache_ai.adapters.llamaindex_adapter import ( LlamaIndexLLMCacheAdapter, LlamaIndexQueryCacheAdapter, ) # LLM cache cached_llm = LlamaIndexLLMCacheAdapter(OpenAI(model="gpt-4o"), manager) response = cached_llm.complete("What is vector search?") # QueryEngine (RAG) cache engine = index.as_query_engine() cached_engine = LlamaIndexQueryCacheAdapter(engine, manager) response = cached_engine.query("What are the key findings?")

Claude Agent SDK

from omnicache_ai.adapters.claude_agent_adapter import ClaudeAgentCacheAdapter adapter = ClaudeAgentCacheAdapter(manager) async for msg in adapter.query("Fix the import error in utils.py", options=options): print(msg) # streams on first call, replays from cache on second

OpenAI SDK

import openai from omnicache_ai.adapters.openai_adapter import OpenAICacheAdapter client = openai.OpenAI() adapter = OpenAICacheAdapter(client, manager) response = adapter.chat_create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], ) # Second call with identical args returns instantly from cache # Async client = openai.AsyncOpenAI() adapter = OpenAICacheAdapter(client, manager) response = await adapter.achat_create(model="gpt-4o", messages=[...])

Anthropic SDK

import anthropic from omnicache_ai.adapters.anthropic_adapter import AnthropicCacheAdapter client = anthropic.Anthropic() adapter = AnthropicCacheAdapter(client, manager) response = adapter.messages_create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}], ) # Async client = anthropic.AsyncAnthropic() adapter = AnthropicCacheAdapter(client, manager) response = await adapter.amessages_create(model="claude-sonnet-4-6", ...)

A2A (Agent-to-Agent)

from omnicache_ai.adapters.a2a_adapter import A2ACacheAdapter adapter = A2ACacheAdapter(manager, agent_id="planner") # Explicit call result = adapter.process(handler_fn, task_payload) result = await adapter.aprocess(async_handler, task_payload) # As a decorator @adapter.wrap def handle_task(payload: dict) -> dict: return downstream_agent.process(payload)

v0.2.0 Features

Cache Metrics

manager = CacheManager(backend=InMemoryBackend(), key_builder=CacheKeyBuilder()) manager.get("key") # miss manager.set("key", "val") manager.get("key") # hit print(manager.metrics.snapshot()) # {'hits': 1, 'misses': 1, 'evictions': 0, 'sets': 1, 'hit_rate': 0.5, 'miss_rate': 0.5}

Pluggable Serializer

from omnicache_ai import JsonSerializer, ResponseCache # Use JSON instead of pickle (safer for Redis with untrusted data) cache = ResponseCache(manager, serializer=JsonSerializer())

Stampede Protection

ResponseCache.get_or_generate() uses a per-key lock automatically — under concurrency, only one thread calls the LLM; others wait and read from cache.

Tiered Backend (L1 + L2)

from omnicache_ai import TieredBackend, InMemoryBackend from omnicache_ai.backends.redis_backend import RedisBackend backend = TieredBackend( l1=InMemoryBackend(max_size=1000), # fast, local l2=RedisBackend(url="redis://..."), # persistent, shared l1_ttl=300, # 5-min local copy ) manager = CacheManager(backend=backend, key_builder=CacheKeyBuilder())

Compression

from omnicache_ai import GzipCompressor, CacheManager manager = CacheManager( backend=InMemoryBackend(), key_builder=CacheKeyBuilder(), compressor=GzipCompressor(level=6), # compress all stored bytes )

Streaming Response Cache

from omnicache_ai import StreamingResponseCache stream_cache = StreamingResponseCache(manager) def stream_fn(messages): return openai_client.chat.completions.create( model="gpt-4o", messages=messages, stream=True ) # First call: live stream + buffered to cache # Second call: replays chunks from cache at full speed for chunk in stream_cache.get_or_stream(messages, stream_fn, model_id="gpt-4o"): print(chunk.choices[0].delta.content or "", end="", flush=True)

Async Backend

from omnicache_ai import AsyncInMemoryBackend # Use in async frameworks — does not block the event loop backend = AsyncInMemoryBackend(max_size=10_000) value = await backend.get("key") await backend.set("key", "value", ttl=60)

Tag-Based Invalidation

from omnicache_ai import InvalidationEngine, InMemoryBackend, CacheManager, CacheKeyBuilder manager = CacheManager( backend=InMemoryBackend(), key_builder=CacheKeyBuilder(), invalidation_engine=InvalidationEngine(InMemoryBackend()), ) manager.set("key1", "v1", tags=["model:gpt-4o", "env:prod"]) manager.set("key2", "v2", tags=["model:gpt-4o"]) count = manager.invalidate("model:gpt-4o") # removes both entries # ResponseCache / ContextCache tag automatically from omnicache_ai import ResponseCache, ContextCache rc = ResponseCache(manager) rc.invalidate_model("gpt-4o") # remove all gpt-4o responses ctx = ContextCache(manager) ctx.invalidate_session("user-123") # clear all session turns

Backends

BackendExtraUse case
InMemoryBackendDev, testing, single-process
DiskBackendSurvives restarts, single-machine
RedisBackend[redis]Shared cache across processes/services
FAISSBackend[vector-faiss]Semantic/vector similarity search
ChromaBackend[vector-chroma]Persistent vector store with metadata
from omnicache_ai.backends.redis_backend import RedisBackend from omnicache_ai.backends.disk_backend import DiskBackend manager = CacheManager(backend=RedisBackend(url="redis://localhost:6379/0"), ...) manager = CacheManager(backend=DiskBackend(path="/var/cache/omnicache"), ...)

Custom Backend

Implement the CacheBackend Protocol — no inheritance required (structural typing):

from omnicache_ai.backends.base import CacheBackend from typing import Any class MyBackend: def get(self, key: str) -> Any | None: ... def set(self, key: str, value: Any, ttl: int | None = None) -> None: ... def delete(self, key: str) -> None: ... def exists(self, key: str) -> bool: ... def clear(self) -> None: ... def close(self) -> None: ... assert isinstance(MyBackend(), CacheBackend) # True

Project Structure

omnicache_ai/
├── __init__.py                   # Public API surface (28 exports)
├── __main__.py                   # CLI: stats | flush | inspect <key>
├── config/
│   └── settings.py               # OmnicacheSettings dataclass + from_env()
├── backends/
│   ├── base.py                   # CacheBackend + VectorBackend Protocols
│   ├── async_base.py             # AsyncCacheBackend Protocol
│   ├── memory_backend.py         # InMemoryBackend (LRU, thread-safe, RLock)
│   ├── async_memory_backend.py   # AsyncInMemoryBackend (asyncio.Lock)
│   ├── disk_backend.py           # DiskBackend (diskcache, process-safe)
│   ├── redis_backend.py          # RedisBackend [optional: redis]
│   ├── tiered_backend.py         # TieredBackend (L1 memory + L2 any backend)
│   └── vector_backend.py         # FAISSBackend + ChromaBackend [optional]
├── core/
│   ├── key_builder.py            # namespace:type:sha256[:16] canonical keys
│   ├── metrics.py                # CacheMetrics (hits/misses/evictions/hit_rate)
│   ├── serializer.py             # Serializer protocol, PickleSerializer, JsonSerializer
│   ├── compressor.py             # Compressor protocol, GzipCompressor, NoopCompressor
│   ├── stampede.py               # StampedeShield (per-key threading.Lock)
│   ├── policies.py               # TTLPolicy, EvictionPolicy (wired into InMemoryBackend)
│   ├── invalidation.py           # Tag-based InvalidationEngine
│   └── cache_manager.py          # Central orchestrator + from_settings()
├── layers/
│   ├── embedding_cache.py        # np.ndarray ↔ bytes serialization
│   ├── retrieval_cache.py        # list[Document], pluggable serializer
│   ├── context_cache.py          # session_id + turn_index keyed
│   ├── response_cache.py         # model + messages + params keyed, stampede-safe
│   ├── semantic_cache.py         # exact → vector two-tier lookup
│   └── streaming_cache.py        # StreamingResponseCache (sync + async generators)
├── middleware/
│   ├── llm_middleware.py         # LLMMiddleware + AsyncLLMMiddleware
│   ├── embedding_middleware.py   # EmbeddingMiddleware
│   └── retriever_middleware.py   # RetrieverMiddleware
└── adapters/
    ├── openai_adapter.py         # OpenAICacheAdapter (chat.completions.create)
    ├── anthropic_adapter.py      # AnthropicCacheAdapter (messages.create)
    ├── langchain_adapter.py      # BaseCache (lookup/update/alookup/aupdate)
    ├── langgraph_adapter.py      # BaseCheckpointSaver (get_tuple/put/list + async)
    ├── autogen_adapter.py        # AssistantAgent 0.4+ + ConversableAgent 0.2.x
    ├── crewai_adapter.py         # Crew.kickoff() + kickoff_async()
    ├── agno_adapter.py           # Agent.run() + arun()
    └── a2a_adapter.py            # process() + aprocess() + @wrap

Development

# Clone and install with dev deps git clone https://github.com/ashishpatel26/omnicache-ai cd omnicache-ai uv sync --dev # Install pre-commit hooks (runs automatically on every commit) uv run pre-commit install # Run all tests uv run pytest # With coverage report uv run pytest --cov=omnicache_ai --cov-report=term-missing # Lint + format + type check (via pre-commit) uv run pre-commit run --all-files # Run specific layer tests uv run pytest tests/layers/ tests/core/ -v # Run adapter tests (requires optional deps) uv run pytest tests/adapters/ -v

Contributing

We welcome contributions of all kinds — bug fixes, new backends, new adapters, documentation, and performance improvements.

FilePurpose
CONTRIBUTING.mdFull dev setup, coding standards, how to add backends/adapters
CODE_OF_CONDUCT.mdContributor Covenant 2.1 — community standards
.github/ISSUE_TEMPLATE/bug_report.ymlStructured bug report with version, backend, reproduction fields
.github/ISSUE_TEMPLATE/feature_request.ymlFeature request with area dropdown, motivation, solution fields
.github/pull_request_template.mdPR checklist: type, changes, tests, breaking changes

Quick start for contributors:

git clone https://github.com/ashishpatel26/omnicache-ai cd omnicache-ai uv sync --dev uv run pre-commit install uv run pytest # all green before you start

Open an issue or Discussion before starting large changes.


License

MIT — see LICENSE


Built with ❤️ for the AI engineering community

关于 About

Unified multi-layer caching library for AI/agent pipelines — LangChain, LangGraph, AutoGen, CrewAI, Agno, A2A
agentagnoaiagentsautogencaching-memorycaching-strategiescrewaiembeddingslangchainlanggraphrag

语言 Languages

Python74.4%
TypeScript15.7%
CSS5.1%
JavaScript4.8%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
15
Total Commits
峰值: 12次/周
Less
More

核心贡献者 Contributors