How It Works
Every time you call agent.ask(), Mango runs a structured loop that combines memory retrieval, LLM reasoning, and tool execution.
The agent loop
User question
│
▼
1. Retrieve memory ──── Search vector store for similar past questions
│ Inject matches as few-shot examples in the prompt
▼
2. Build system prompt ─ Schema context + memory examples + current datetime
│
▼
3. LLM call ──────────── LLM decides which tools to call (or answers directly)
│
▼
4. Tool execution ────── Tools run queries against MongoDB
│ Results returned as structured text
▼
5. Feed results back ─── Tool results appended to the conversation
│ Loop repeats from step 3
▼
6. Final answer ───────── LLM produces a text response (no more tool calls)
│
▼
7. Save to memory ─────── Successful interactions saved automatically
The loop runs until the LLM produces a text response without requesting any tool calls, or until max_iterations is reached (safety cap, default: 8).
Memory retrieval
Before the first LLM call, Mango searches the vector store for questions semantically similar to the current one. Matches above the similarity threshold are injected into the system prompt as examples:
## Relevant past interactions
Q: How many users signed up last month?
Tool: run_mql | Args: {"operation": "count", "collection": "users", ...}
Result: 1,247 users signed up in February 2026.
This gives the LLM concrete examples of correct queries for your specific database, dramatically improving accuracy on repeated or similar questions.
Schema context
agent.setup() introspects your database once and builds a system prompt that includes:
- Collection names and document counts
- Field names, types, and presence frequencies
- Index information
- Detected cross-collection references
The LLM uses this context to generate queries with the correct field names and operators — without guessing.
Tools available
Mango ships with a set of read-only tools the LLM can call:
| Tool | What it does |
|---|---|
list_collections | List all collections (grouped for large databases) |
describe_collection | Full schema for a specific collection |
collection_stats | Document count and storage size |
run_mql | Execute find, aggregate, count, or distinct |
search_saved_correct_tool_uses | Search memory explicitly |
save_text_memory | Save free-form knowledge about the database |
Safety
- Read-only by design.
run_mqlonly acceptsfind,aggregate,count, anddistinct. Any attempt to run write operations is rejected at the tool level with aValidationError. - Allowlist, not blocklist. The permitted operations are explicitly whitelisted, not everything-except-writes.
- No raw query passthrough. The LLM cannot execute arbitrary strings — all queries go through the
QueryRequestdataclass.
AgentResponse
ask() returns an AgentResponse with everything you need:
@dataclass
class AgentResponse:
answer: str # the natural language answer
tool_calls_made: list[str] # which tools were called
input_tokens: int # total input tokens used
output_tokens: int # total output tokens used
iterations: int # number of LLM calls in this turn
memory_hits: int # how many memory examples were injected