Schema Introspection

Mango samples your collections to infer field types, indexes, and cross-collection references. The result is injected into the system prompt so the LLM generates correct queries without guessing.

How it works

When you call agent.setup() with introspect=True, Mango runs introspect_schema() on your database:

Sample documents — fetches up to 100 documents per collection (mix of sequential + random via $sample)
Infer fields — walks every document, records field paths, types, and presence frequency
Detect indexes — reads index definitions via index_information()
Detect references — heuristic: user_id → looks for users collection, movieId → looks for movies
Build prompt — schema is serialized and injected into the system prompt

Enable introspection

agent = MangoAgent(
    ...,
    introspect=True,   # run introspect_schema() on setup()
)
agent.setup()

Pass pre-computed schema

For production deployments you may want to introspect once and reuse:

schema = db.introspect_schema()   # run once, save/cache the result

agent = MangoAgent(
    ...,
    schema=schema,     # pass directly, introspect=False (default)
)
agent.setup()

SchemaInfo

Each collection produces a SchemaInfo:

@dataclass
class SchemaInfo:
    collection_name: str
    document_count: int
    fields: list[FieldInfo]
    indexes: list[dict]
    sample_documents: list[dict]   # 5 representative docs

Each field produces a FieldInfo:

@dataclass
class FieldInfo:
    name: str
    path: str              # dotted path e.g. "address.city"
    types: list[str]       # e.g. ["string", "null"]
    frequency: float       # 0.0–1.0, how often field appears
    is_indexed: bool
    is_unique: bool
    is_reference: bool
    reference_collection: str | None
    sub_fields: list[FieldInfo] | None    # for subdocuments
    array_element_types: list[str] | None

Large databases

For databases with 100+ collections, Mango groups them by name pattern in the system prompt to avoid token explosion.

For databases with more than 10 collections, Mango also enables schema linking enforcement: before the first run_mql call on any collection, the agent automatically runs describe_collection and prepends the result to the query response. The LLM always sees the correct field names and types at query time — without an extra round-trip and without relying on the LLM to remember to inspect the schema first.

For smaller databases (≤ 10 collections), the full schema is already in the system prompt, so enforcement is skipped.

Per-query schema selection: rather than injecting all collection schemas upfront, Mango uses keyword matching between the question and the schema to select only the most relevant collections for each query. This reduces token usage significantly on large databases while keeping accuracy high.

Manual schema inspection

schema = db.introspect_schema()

for collection_name, info in schema.items():
    print(f"{collection_name}: {info.document_count} docs, {len(info.fields)} fields")
    for field in info.fields:
        print(f"  {field.path} ({', '.join(field.types)}) freq={field.frequency:.0%}")