Schema Introspection
Mango samples your collections to infer field types, indexes, and cross-collection references. The result is injected into the system prompt so the LLM generates correct queries without guessing.
How it works
When you call agent.setup() with introspect=True, Mango runs introspect_schema() on your database:
- Sample documents — fetches up to 100 documents per collection (mix of sequential + random via
$sample) - Infer fields — walks every document, records field paths, types, and presence frequency
- Detect indexes — reads index definitions via
index_information() - Detect references — heuristic:
user_id→ looks foruserscollection,movieId→ looks formovies - Build prompt — schema is serialized and injected into the system prompt
Enable introspection
agent = MangoAgent(
...,
introspect=True, # run introspect_schema() on setup()
)
agent.setup()
Pass pre-computed schema
For production deployments you may want to introspect once and reuse:
schema = db.introspect_schema() # run once, save/cache the result
agent = MangoAgent(
...,
schema=schema, # pass directly, introspect=False (default)
)
agent.setup()
SchemaInfo
Each collection produces a SchemaInfo:
@dataclass
class SchemaInfo:
collection_name: str
document_count: int
fields: list[FieldInfo]
indexes: list[dict]
sample_documents: list[dict] # 5 representative docs
Each field produces a FieldInfo:
@dataclass
class FieldInfo:
name: str
path: str # dotted path e.g. "address.city"
types: list[str] # e.g. ["string", "null"]
frequency: float # 0.0–1.0, how often field appears
is_indexed: bool
is_unique: bool
is_reference: bool
reference_collection: str | None
sub_fields: list[FieldInfo] | None # for subdocuments
array_element_types: list[str] | None
Large databases
For databases with 100+ collections, Mango groups them by name pattern in the system prompt to avoid token explosion.
For databases with more than 10 collections, Mango also enables schema linking enforcement: before the first run_mql call on any collection, the agent automatically runs describe_collection and prepends the result to the query response. The LLM always sees the correct field names and types at query time — without an extra round-trip and without relying on the LLM to remember to inspect the schema first.
For smaller databases (≤ 10 collections), the full schema is already in the system prompt, so enforcement is skipped.
Per-query schema selection: rather than injecting all collection schemas upfront, Mango uses keyword matching between the question and the schema to select only the most relevant collections for each query. This reduces token usage significantly on large databases while keeping accuracy high.
Manual schema inspection
schema = db.introspect_schema()
for collection_name, info in schema.items():
print(f"{collection_name}: {info.document_count} docs, {len(info.fields)} fields")
for field in info.fields:
print(f" {field.path} ({', '.join(field.types)}) freq={field.frequency:.0%}")