Skip to content
15 changes: 10 additions & 5 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,18 @@ AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_EMBEDDING_DATA_TYPE=float32
AI_FOUNDRY_EMBEDDING_DISTANCE_FUNCTION=cosine
# Optional. Vector index type for the memories container: diskANN (default),
# quantizedFlat, or flat. diskANN requires the DiskANN capability on the Cosmos
# DB account; use quantizedFlat or flat for accounts without it (e.g. the
# classic Cosmos DB emulator).
AI_FOUNDRY_EMBEDDING_VECTOR_INDEX_TYPE=diskANN
# Optional. Vector index type for the memories container: quantizedFlat
# (default), diskANN, or flat. quantizedFlat works on any Cosmos DB account
# (including the classic emulator); diskANN requires the DiskANN capability on
# the Cosmos DB account, so opt into it explicitly when available.
AI_FOUNDRY_EMBEDDING_VECTOR_INDEX_TYPE=quantizedFlat
COSMOS_DB_FULL_TEXT_LANGUAGE=en-US

# Embed raw conversation turns on write so they can be vector-searched via
# search_turns(). The turns container is always provisioned with a
# vector index, so toggling this never requires recreating the container.
ENABLE_TURN_EMBEDDINGS=false

AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=<your-model-deployment>
# Optional. Pin the Azure OpenAI REST API version used by chat and embeddings
# clients. Leave blank to use the toolkit default ("2024-12-01-preview").
Expand Down
2 changes: 2 additions & 0 deletions Docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ Memories stored in Cosmos DB include embeddings generated by Microsoft AI Foundr

Facts work especially well for vector search because each fact is stored as a small, self-contained document.

By default raw conversation turns are *not* embedded — only derived memories (facts, episodic, procedural, summaries) carry vectors. Set `enable_turn_embeddings=True` (env `ENABLE_TURN_EMBEDDINGS`) to also embed turns on write, then call `search_turns()` to vector-search the raw conversation log. The turns container is always provisioned with a `quantizedFlat` vector index, so this flag only toggles embedding generation and can be turned on or off at any time without recreating the container.

---

## Processing Pipeline
Expand Down
10 changes: 6 additions & 4 deletions Docs/public_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

### Connection

- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container='memories_turns', cosmos_summaries_container='memories_summaries', cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend. The SDK uses a hard 3-container topology: turns in `memories_turns`, facts/episodic/procedural in `memories`, and summaries in `memories_summaries` (or the names you pass).
- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container='memories_turns', cosmos_summaries_container='memories_summaries', cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, enable_turn_embeddings=None, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend. The SDK uses a hard 3-container topology: turns in `memories_turns`, facts/episodic/procedural in `memories`, and summaries in `memories_summaries` (or the names you pass). `enable_turn_embeddings` (default `False`, env `ENABLE_TURN_EMBEDDINGS`) embeds raw turns on write so they can be vector-searched via `search_turns()`; the turns container is always provisioned with a vector index, so toggling this never requires recreating it.
- `close() -> None` — close Cosmos/model clients and owned credentials.
- `connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None, summaries_container=None) -> None` — connect to existing memory, turns, and summaries containers.
- `create_memory_store(database=None, container=None, turns_container=None, summaries_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect the memory, turns, summaries, counter, and lease containers.
Expand All @@ -37,7 +37,8 @@

### Retrieval

- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search derived memories (facts/episodic/procedural).
- `search_turns(search_terms, user_id, thread_id=None, role=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search the raw conversation log instead of facts/episodic/procedural (requires turn embeddings; see `enable_turn_embeddings`). `user_id` is required so the search is scoped to one partition instead of scanning every user's turns.
- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
Expand Down Expand Up @@ -67,7 +68,7 @@ Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval,

### Connection

- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container='memories_turns', cosmos_summaries_container='memories_summaries', cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure async local state, model clients, and optional processing backend. The async SDK uses the same hard 3-container topology as the sync client.
- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container='memories_turns', cosmos_summaries_container='memories_summaries', cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, enable_turn_embeddings=None, processor=None) -> None` — configure async local state, model clients, and optional processing backend. The async SDK uses the same hard 3-container topology as the sync client. `enable_turn_embeddings` (default `False`, env `ENABLE_TURN_EMBEDDINGS`) embeds raw turns on write so they can be vector-searched via `search_turns()`.
- `async close() -> None` — close async/sync resources and owned credentials.
- `async connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None, summaries_container=None) -> None` — connect to existing memory, turns, and summaries containers.
- `async create_memory_store(database=None, container=None, turns_container=None, summaries_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect memory, turns, summaries, counter, and lease containers.
Expand All @@ -90,7 +91,8 @@ Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval,

### Retrieval

- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search derived memories (facts/episodic/procedural).
- `async search_turns(search_terms, user_id, thread_id=None, role=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search the raw conversation log instead of facts/episodic/procedural (requires turn embeddings; see `enable_turn_embeddings`). `user_id` is required so the search is scoped to one partition instead of scanning every user's turns.
- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
Expand Down
5 changes: 5 additions & 0 deletions azure/cosmos/agent_memory/_base/base_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
)
from azure.cosmos.agent_memory.exceptions import CosmosNotConnectedError, MemoryNotFoundError, ValidationError
from azure.cosmos.agent_memory.logging import configure_logging, get_logger
from azure.cosmos.agent_memory.thresholds import get_enable_turn_embeddings

logger = get_logger(__name__)

Expand Down Expand Up @@ -46,6 +47,7 @@ def _init_base_config(
embedding_dimensions: Optional[int],
chat_deployment_name: str,
use_default_credential: bool,
enable_turn_embeddings: Optional[bool] = None,
default_credential_module: str = "azure.identity",
) -> None:
"""Initialize shared local state, config values, and default credentials."""
Expand Down Expand Up @@ -76,6 +78,9 @@ def _init_base_config(
self._embedding_deployment_name = embedding_deployment_name
self._embedding_dimensions = _resolve_embedding_dimensions(embedding_dimensions)
self._chat_deployment_name = chat_deployment_name
self._enable_turn_embeddings = (
enable_turn_embeddings if enable_turn_embeddings is not None else get_enable_turn_embeddings()
)

self._owns_cosmos_credential = False
self._owns_ai_foundry_credential = False
Expand Down
35 changes: 23 additions & 12 deletions azure/cosmos/agent_memory/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,8 @@ def _resolve_embedding_dimensions(val: Optional[int]) -> int:
"""Resolve embedding dimensions from explicit value or ``AI_FOUNDRY_EMBEDDING_DIMENSIONS`` env var.

Defaults to 1536 (the dimension we ship with for ``text-embedding-3-large``
truncated to 1536, which is the size DiskANN is tuned for in our containers).
truncated to 1536, which is the size our quantizedFlat vector indexes are
tuned for in our containers).

Raises :class:`ConfigurationError` if the env var is set but cannot be
parsed as a positive integer.
Expand Down Expand Up @@ -259,13 +260,15 @@ def _resolve_distance_function(val: Optional[str]) -> str:
def _resolve_vector_index_type(val: Optional[str]) -> str:
"""Resolve vector index type from explicit value or ``AI_FOUNDRY_EMBEDDING_VECTOR_INDEX_TYPE`` env var.

Defaults to ``diskANN``. Raises :class:`ConfigurationError` for unknown values.
Defaults to ``quantizedFlat``. Raises :class:`ConfigurationError` for unknown values.

``diskANN`` requires the Cosmos DB account to have the DiskANN vector index
capability enabled. Accounts that do not (for example the classic Cosmos DB
emulator) can use ``quantizedFlat`` or ``flat`` instead.
``quantizedFlat`` works on any Cosmos DB account (including the classic
emulator). ``diskANN`` requires the Cosmos DB account to have the DiskANN
vector index capability enabled; opt into it explicitly when available.
"""
raw = (val if val is not None else os.environ.get("AI_FOUNDRY_EMBEDDING_VECTOR_INDEX_TYPE") or "diskANN").strip()
raw = (
val if val is not None else os.environ.get("AI_FOUNDRY_EMBEDDING_VECTOR_INDEX_TYPE") or "quantizedFlat"
).strip()
if raw not in _ALLOWED_VECTOR_INDEX_TYPES:
raise ConfigurationError(
message=(
Expand Down Expand Up @@ -434,9 +437,15 @@ def _container_policies(
embedding_data_type: str,
distance_function: str,
full_text_language: str,
vector_index_type: str = "diskANN",
include_salience_composite: bool = True,
vector_index_type: str = "quantizedFlat",
) -> tuple[dict, dict, dict]:
"""Build the vector, indexing, and full-text policies for container creation."""
"""Build the vector, indexing, and full-text policies for container creation.

``include_salience_composite`` adds the ``(salience, created_at, id)``
composite index required by procedural synthesis on the MEMORIES container.
Turns reuse this builder with it disabled (turns are never synthesized).
"""
vector_embedding_policy = {
"vectorEmbeddings": [
{
Expand All @@ -451,25 +460,27 @@ def _container_policies(
indexing_policy = {
"includedPaths": [{"path": "/*"}],
"excludedPaths": [
{"path": "/embedding/*"},
{"path": "/source_memory_ids/*"},
{"path": "/supersedes_ids/*"},
{"path": '/"_etag"/?'},
],
Comment thread
aayush3011 marked this conversation as resolved.
"vectorIndexes": [{"path": "/embedding", "type": vector_index_type}],
"fullTextIndexes": [{"path": "/content"}],
}

if include_salience_composite:
# Procedural synthesis selects TOP N by (salience DESC, created_at ASC, id ASC).
# Cosmos requires a composite index for multi-property ORDER BY; without it the
# query returns a non-deterministic 50 of N when many docs share the default
# salience (0.5), which makes the source-id short-circuit in synthesize_procedural
# thrash and burn LLM calls on every reconcile.
"compositeIndexes": [
indexing_policy["compositeIndexes"] = [
[
{"path": "/salience", "order": "descending"},
{"path": "/created_at", "order": "ascending"},
{"path": "/id", "order": "ascending"},
]
],
}
]

full_text_policy = {
"defaultLanguage": full_text_language,
Expand Down
Loading
Loading