Schema Introspection
Mango samples your collections to infer field types, indexes, and cross-collection references. The result is injected into the system prompt so the LLM generates correct queries without guessing.
How it works
When you call agent.setup() with introspect=True, Mango runs introspect_schema() on your database:
- Sample documents — fetches up to 100 documents per collection (mix of sequential + random via
$sample) - Infer fields — walks every document, records field paths, types, and presence frequency
- Detect indexes — reads index definitions via
index_information() - Detect references — heuristic:
user_id→ looks foruserscollection,movieId→ looks formovies - Build prompt — schema is serialized and injected into the system prompt
Enable introspection
agent = MangoAgent(
...,
introspect=True, # run introspect_schema() on setup()
)
agent.setup()
Pass pre-computed schema
For production deployments you may want to introspect once and reuse:
schema = db.introspect_schema() # run once, save/cache the result
agent = MangoAgent(
...,
schema=schema, # pass directly, introspect=False (default)
)
agent.setup()
SchemaInfo
Each collection produces a SchemaInfo:
@dataclass
class SchemaInfo:
collection_name: str
document_count: int
fields: list[FieldInfo]
indexes: list[dict]
sample_documents: list[dict] # 5 representative docs
Each field produces a FieldInfo:
@dataclass
class FieldInfo:
name: str
path: str # dotted path e.g. "address.city"
types: list[str] # e.g. ["string", "null"]
frequency: float # 0.0–1.0, how often field appears
is_indexed: bool
is_unique: bool
is_reference: bool
reference_collection: str | None
sub_fields: list[FieldInfo] | None # for subdocuments
array_element_types: list[str] | None
Large databases
For databases with 100+ collections, Mango groups them by name pattern in the system prompt to avoid token explosion. The full schema is only fetched when the LLM calls describe_collection for a specific collection.
Manual schema inspection
schema = db.introspect_schema()
for collection_name, info in schema.items():
print(f"{collection_name}: {info.document_count} docs, {len(info.fields)} fields")
for field in info.fields:
print(f" {field.path} ({', '.join(field.types)}) freq={field.frequency:.0%}")