Schema Introspection

Mango samples your collections to infer field types, indexes, and cross-collection references. The result is injected into the system prompt so the LLM generates correct queries without guessing.

How it works

When you call agent.setup() with introspect=True, Mango runs introspect_schema() on your database:

  1. Sample documents — fetches up to 100 documents per collection (mix of sequential + random via $sample)
  2. Infer fields — walks every document, records field paths, types, and presence frequency
  3. Detect indexes — reads index definitions via index_information()
  4. Detect references — heuristic: user_id → looks for users collection, movieId → looks for movies
  5. Build prompt — schema is serialized and injected into the system prompt

Enable introspection

agent = MangoAgent(
    ...,
    introspect=True,   # run introspect_schema() on setup()
)
agent.setup()

Pass pre-computed schema

For production deployments you may want to introspect once and reuse:

schema = db.introspect_schema()   # run once, save/cache the result

agent = MangoAgent(
    ...,
    schema=schema,     # pass directly, introspect=False (default)
)
agent.setup()

SchemaInfo

Each collection produces a SchemaInfo:

@dataclass
class SchemaInfo:
    collection_name: str
    document_count: int
    fields: list[FieldInfo]
    indexes: list[dict]
    sample_documents: list[dict]   # 5 representative docs

Each field produces a FieldInfo:

@dataclass
class FieldInfo:
    name: str
    path: str              # dotted path e.g. "address.city"
    types: list[str]       # e.g. ["string", "null"]
    frequency: float       # 0.0–1.0, how often field appears
    is_indexed: bool
    is_unique: bool
    is_reference: bool
    reference_collection: str | None
    sub_fields: list[FieldInfo] | None    # for subdocuments
    array_element_types: list[str] | None

Large databases

For databases with 100+ collections, Mango groups them by name pattern in the system prompt to avoid token explosion. The full schema is only fetched when the LLM calls describe_collection for a specific collection.

Manual schema inspection

schema = db.introspect_schema()

for collection_name, info in schema.items():
    print(f"{collection_name}: {info.document_count} docs, {len(info.fields)} fields")
    for field in info.fields:
        print(f"  {field.path} ({', '.join(field.types)}) freq={field.frequency:.0%}")