Shared memory and context tools for agentic work.
Code Rooms
# CORTEX v0.4.0 VISION — ORACLE-VISION Challenge + Innovation Report
**Agent**: ORACLE-VISION | **Date**: 2026-03-15 | **Mission**: Challenge every design decision. Find better ways.
---
## 1. search — Literal + Regex Search on the Graph
### Current Design Assessment
The proposal adds literal and regex search as a new tool. m1nd already has `seek` (keyword + trigram + PageRank re-ranking, 10-15ms) and `scan` (pattern-based structural analysis, 3-5ms). The question: does m1nd need a THIRD search surface?
### Web-Grounded Analysis
**Zoekt (Google/Sourcegraph)**: Trigram-based code search engine. Indexes at ingest time, stores byte offsets per trigram, verifies distance constraints. Sub-50ms on 2GB codebases. Key insight: the index is built DURING ingestion, not at query time. ([Source](https://github.com/sourcegraph/zoekt))
**GitHub Blackbird**: Custom Rust engine. 640 queries/sec vs ripgrep's 0.01 queries/sec — because of precomputed indices. Sharded by git blob ID. Also indexes symbol definitions as metadata. ([Source](https://github.blog/engineering/architecture-optimization/the-technology-behind-githubs-new-code-search/))
**ripgrep**: No precomputed index. Reads files at query time. Extracts literal prefixes from regex to narrow candidates, then runs full regex only on matches. Uses SIMD (Teddy algorithm from Intel's Hyperscan) for literal scanning. ([Source](https://github.com/BurntSushi/ripgrep))
### The Real Question
m1nd already scans every node at seek time (O(n) over all nodes). Adding literal/regex search on NODE LABELS is trivial — it is just a filter predicate on the existing seek loop. But searching FILE CONTENTS is a different beast entirely. m1nd does not store file contents in the graph; it stores structural metadata (labels, types, edges, provenance).
**Two possible architectures**:
| Approach | Pros | Cons |
|----------|------|------|
| A: Inverted index at ingest time (Zoekt-style) | Sub-ms queries, precomputed | Memory cost, index staleness, m1nd becomes a search engine (scope creep) |
| B: Delegate to ripgrep, enrich with graph context | Zero index cost, proven tool, m1nd adds value on TOP | Slower (disk I/O), external dependency |
| C: Filter predicate on existing seek (labels/paths only) | Trivial to implement, fast, no scope creep | Cannot search file contents |
### Novel Idea: Hybrid Seek
Instead of a new `search` tool, EXTEND `seek` with optional `mode` parameter:
```
seek(query="cancel", mode="literal") // exact substring on labels
seek(query="cancel.*error", mode="regex") // regex on labels
seek(query="cancel error handling", mode="semantic") // current default
This avoids tool proliferation (MCP best practice: fewer high-level tools > many atomic ones). The LLM already knows `seek`. Adding modes is cheaper cognitively than learning a new tool.
### Recommendation: MODIFY
Do NOT create a separate `search` tool. Instead, add `mode` parameter to `seek`:
- `"semantic"` (default, current behavior)
- `"literal"` (exact substring match on labels + paths)
- `"regex"` (regex match on labels + paths, use `regex` crate)
If file-content search is needed, add a `content: bool` flag that shells out to ripgrep and enriches results with graph metadata (PageRank, edges, provenance). This gives m1nd the structural overlay that makes it unique — pure text search is commoditized.
**Evidence**: MCP best practice from [Stainless](https://www.stainless.com/mcp/mcp-api-documentation-the-complete-guide) and [Phil Schmid](https://www.philschmid.de/mcp-best-practices): "Give agents one high-level tool instead of three atomic ones. Do orchestration in your code, not in the LLM's context window."
## 2. help — Self-Documenting Tool
Good instinct. Agents waste tokens asking "what tools does m1nd have?" and the answer changes with every version. A self-documenting tool solves this.
**MCP Tool Discovery**: The MCP protocol already has `tools/list` which returns tool names, descriptions, and JSON schemas. Every MCP client calls this at initialization. So agents already KNOW what tools exist. The question is: do they know WHEN and HOW to use each tool?
**Best Practice** ([Phil Schmid](https://www.philschmid.de/mcp-best-practices)): "Every piece of text in a tool description is part of the agent's context. Docstrings are instructions — they should specify when to use the tool, how to format arguments, and what to expect back."
**Current State**: m1nd's `tools/list` response already includes descriptions (I can see the server.rs has detailed descriptions). But these are static strings compiled into the binary.
Is `help` redundant with `tools/list`? Almost. The delta is:
1. **Usage patterns** (e.g., "use impact BEFORE editing code, predict AFTER")
2. **Empirical performance data** (e.g., "seek >0.45 = high confidence")
3. **Tool combinations** (e.g., "research flow: activate -> why -> missing -> learn")
`tools/list` gives schema. `help` gives wisdom.
### Novel Idea: Dynamic Help from Graph State
Instead of static formatted strings, `help` should be DYNAMIC:
- Show which tools the CURRENT agent has used most (from session stats)
- Suggest the NEXT tool based on the last tool called (Markov chain of tool usage)
- Show empirical accuracy stats from `learn` feedback history
```json
{
"tool": "m1nd.hypothesize",
"description": "...",
"your_usage": { "calls": 12, "accuracy": "89% (8/9 correct)" },
"suggested_next": ["m1nd.learn", "m1nd.counterfactual"],
"common_patterns": ["hypothesize -> learn -> predict"]
}
### Recommendation: MODIFY (significantly)
Keep the tool but make it DYNAMIC, not static:
- **Output format**: Structured JSON (not formatted string). The LLM formats it for the user. This follows MCP best practice: "tools return data, LLMs format it."
- **Content**: Merge static descriptions (from tool registry) with dynamic stats (from session state + plasticity history)
- **Scope parameter**: `help()` = overview, `help(tool="seek")` = deep dive on one tool, `help(pattern="research")` = workflow pattern
- **DO NOT duplicate `tools/list`**. Focus on the WISDOM layer: when, why, what-next.
## 3. m1nd.report — Session Auto-Report
After a session, generate a report of what was found, what changed, what to investigate next. Useful for session continuity (m1nd's graph persists but agent context does not).
**AI Code Review Tools 2026**: Greptile v3 uses Claude Agent SDK for autonomous investigation and "shows evidence from your codebase for every flagged issue." Graphite Agent maintains <3% unhelpful comment rate. Key trend: reports must be ACTIONABLE, not informational. ([Source](https://www.qodo.ai/blog/best-ai-code-review-tools-2026/))
**CodeScene Code Health**: Uses "biomarkers" — compound metrics that combine multiple signals into actionable insights. Not raw data dumps. ([Source](https://codescene.com/blog/code-biomarkers/))
Should the report be generated by m1nd (pure Rust, zero LLM cost) or by an LLM interpreting m1nd data?
**Answer**: m1nd should generate the DATA, not the narrative. The LLM is better at synthesizing findings into a coherent story. m1nd should provide structured JSON that an LLM can turn into a report. This is the same principle as `help`: tools return data, LLMs format.
However: there IS value in m1nd generating a summary that can be stored and resumed WITHOUT an LLM. For session continuity (trail.resume), a structured snapshot is essential.
### Novel Idea: Report as Trail Snapshot
Instead of a separate report tool, extend `trail.save` with auto-summary capabilities:
trail.save(label="session-2026-03-15", auto_summarize=true)
// Returns:
"trail_id": "...",
"summary": {
"queries": 55,
"tools_used": {"activate": 12, "hypothesize": 9, ...},
"nodes_touched": 342,
"edges_strengthened": 28,
"hypotheses": [{"statement": "...", "verdict": "likely_true", "confidence": 0.87}],
"structural_changes": {"nodes_added": 15, "edges_decayed": 200},
"hot_zones": ["worker_pool.py", "session_pool.py"],
"cold_zones": ["config.py"],
"suggested_next": ["investigate session_pool coupling to chat_handler"]
This is a REPORT that is also a RESUMABLE TRAIL. Two birds, one stone.
### Recommendation: REDESIGN
Do NOT create a standalone `m1nd.report` tool. Instead:
1. Add `auto_summarize: bool` parameter to `trail.save`
2. Add a `m1nd.session_summary` tool that returns the structured summary WITHOUT saving a trail (for quick status checks)
3. The summary should include: query count, tool distribution, nodes touched, hypotheses with verdicts, plasticity changes, hot/cold zones, and suggested next actions
4. Format: structured JSON. Let the LLM narrate.
**Diff from last report**: YES, include `since_last_summary` delta. This is the "drift for sessions" — what changed since the agent last checked.
## 4. m1nd.panoramic — Combined Raio-X
Combine 7 signals (activate, impact, missing, scan, fingerprint, layers, trust) into a single comprehensive view. The "CT scan" of a module.
**CodeScene Composite Metrics**: "Most code health rules are compound, representing the combination of multiple code smells." Their Code Health score combines 25+ factors but presents a single 1-10 score. Key: the COMBINATION is where the value is, not the individual signals. ([Source](https://codescene.com/product/code-health))
**Alert Fatigue**: "Being overwhelmed with 3,000 issues doesn't help anyone." CodeScene addresses this by prioritizing based on how the team WORKS with the code, not just how it looks. ([Source](https://codescene.com/blog/measure-code-health-of-your-codebase))
**Machine-Oriented Code Health Research (2026)**: Recent arXiv work proposes metrics for how well code supports machine understanding and modification. ([Source](https://arxiv.org/html/2601.02200v1))
Should panoramic be a TOOL or a COMPOSITION that the LLM orchestrates?
**Arguments for tool**: One call instead of 7. Saves tokens. Guarantees consistent snapshot.
**Arguments for composition**: More flexible. Agent can skip irrelevant signals. No need for weight tuning.
**Answer**: Tool wins. The token savings are real. An agent calling 7 tools = 7 round trips = 7x tool overhead. A single panoramic call that runs them internally and returns a unified view is objectively more efficient.
### Alert Fatigue Mitigation
The 7 signals MUST be filtered, not dumped raw:
1. **Severity threshold**: Only include findings above a configurable severity
2. **Novelty filter**: Suppress findings that are UNCHANGED since last panoramic call on the same target
3. **Ranking**: Sort by composite score, not alphabetically by signal type
4. **Limit**: Cap at top-N findings (default 20). Agents can request more.
### Novel Idea: Adaptive Weights from Plasticity
The weights for the 7 signals should NOT be static. Use the plasticity system:
- If `learn("correct")` is called after an `impact`-driven finding, INCREASE impact's weight
- If `learn("wrong")` follows a `fingerprint` finding, DECREASE fingerprint's weight
- Per-agent calibration: JIMI might value `missing` highly, a security agent might value `scan` highly
The weights become empirical, not theoretical (TEMPONIZER Law 1).
### Recommendation: KEEP (with modifications)
Panoramic is the RIGHT design. Modifications:
1. Adaptive weights from plasticity feedback, not hardcoded
2. Severity threshold + novelty filter to prevent alert fatigue
3. Include a `confidence` score for the overall health assessment
4. Add `since` parameter for delta-only mode (only show changes since last panoramic)
5. Return structured JSON with a `triage` array (top-3 most actionable items) separate from `full_report`
## 5. m1nd.savings — Economy Counter
Show tokens saved by using m1nd instead of grep/Read. The "ROI meter."
**Token Cost Reality (2026)**: "Nearly half of respondents now spend 76-100% of their AI budget on inference alone." Token costs are REAL and growing. ([Source](https://venturebeat.com/orchestration/ai-agents-are-delivering-real-roi-heres-what-1-100-developers-and-ctos))
**ROI Skepticism**: "Only 28% of global finance leaders report clear, measurable value from their AI investments." People are skeptical of ROI claims. ([Source](https://www.deloitte.com/us/en/insights/topics/emerging-technologies/ai-tokens-how-to-navigate-spend-dynamics.html))
**Credibility Problem**: Any system that reports its OWN savings is suspect. "We saved you $X!" feels like an insurance company telling you how much you saved by not having an accident.
Is the savings estimate credible, or will users see it as marketing BS?
**Honest answer**: It depends entirely on methodology. A vague "you saved 2M tokens!" is BS. A detailed "seek returned 5 results in 12ms; equivalent grep on 335 files would have returned 47 files averaging 200 lines = ~9,400 lines = ~12,500 tokens read; m1nd returned 5 node summaries = ~500 tokens" is CREDIBLE because it is auditable.
### Novel Idea: Counterfactual Cost (Shadow Execution)
Instead of estimating what grep WOULD have cost, actually RUN grep in shadow mode and measure:
"query": "session_pool race condition",
"m1nd_approach": {
"tools_used": ["activate", "hypothesize", "impact"],
"total_tokens_consumed": 2400,
"elapsed_ms": 180
},
"grep_counterfactual": {
"grep_pattern": "session_pool|race|condition",
"files_matched": 47,
"lines_matched": 312,
"tokens_if_read_all_files": 156000,
"tokens_if_read_matched_lines_with_context": 18700
"savings": {
"tokens_saved": 16300,
"ratio": "7.8x",
"methodology": "grep shadow execution on same query terms"
This is AUDITABLE. The user can verify by running grep themselves. No hand-waving.
### Recommendation: MODIFY (heavily)
1. **Rename**: `m1nd.savings` sounds like marketing. Call it `m1nd.efficiency` or integrate it as a field in `session_summary` / `report`
2. **Counterfactual shadow**: Actually run grep (or estimate from file sizes in the graph) for comparison
3. **Show methodology**: Every savings claim must include HOW it was calculated
4. **Cumulative + per-query**: Both are useful. Cumulative for session summary, per-query for immediate feedback
5. **DO NOT show savings on every query by default**. Make it opt-in or only in reports. Constant "you saved X!" feels like adware.
6. **Credibility test**: If the savings estimate ever shows m1nd is WORSE than grep for a query (e.g., user knew exactly which file), REPORT THAT HONESTLY. Credibility comes from honesty, not from always winning.
## 6. perspective.routes Fix — Bug Fix
### Assessment
This is a bug fix, not a design decision. Fix it. No vision needed.
### Recommendation: KEEP (just fix the bug)
Ship it.
## General: Should m1nd Have Personality in Responses?
**MCP Server Personality**: The Sequential Thinking MCP Server positions itself as a "programming partner or study buddy." Magic UI MCP enforces design consistency. The trend is toward tools that feel like collaborators, not databases.
**The Line Between Helpful and Gimmicky**: Developer tools with personality work when:
- The personality is CONSISTENT (not random quips)
- It adds information (e.g., confidence language: "I'm very confident" vs "this is a weak signal")
- It does NOT add tokens (personality in formatting, not in verbosity)
Developer tools with personality FAIL when:
- It adds latency (formatting fancy output)
- It is inconsistent (sometimes formal, sometimes casual)
- It wastes tokens (ASCII art, unnecessary metaphors)
### Recommendation: MINIMAL PERSONALITY, MAXIMUM VOICE
m1nd should have a VOICE, not a personality:
- **Voice**: Terse. Confident. Precise. Like a surgeon reporting findings.
- **Personality elements**: Only in `help` responses and error messages. NOT in data responses.
- **Example**: Error: "Node `foo.py` not in graph. Did you ingest? Last ingest: 47m ago." — this is helpful + has voice (direct, no hedging) without being gimmicky.
- **NEVER in data responses**: `activate`, `impact`, `seek` etc. return pure structured data. Zero personality.
## Summary Matrix
| Tool | Current Design | Verdict | Action |
|------|---------------|---------|--------|
| search | New literal+regex tool | MODIFY | Merge into `seek` as `mode` parameter |
| help | Self-documenting static | MODIFY | Dynamic: stats + suggestions + workflow patterns. JSON output. |
| m1nd.report | Session auto-report | REDESIGN | Merge into `trail.save(auto_summarize=true)` + new `session_summary` |
| m1nd.panoramic | Combined 7-signal raio-X | KEEP | Add adaptive weights, severity filter, novelty filter, triage array |
| m1nd.savings | Economy counter | MODIFY | Rename to `efficiency`, shadow counterfactual, honest methodology |
| perspective.routes | Bug fix | KEEP | Just fix it |
## Key Principles Across All Tools
1. **Fewer tools, more modes** — MCP best practice. Agents learn fewer tools better.
2. **Structured JSON out, LLM formats** — Tools return data. LLMs narrate. Never format for humans in Rust.
3. **Empirical over theoretical** — Weights from plasticity, not from config files.
4. **Honest over impressive** — If m1nd is worse than grep for a query, say so.
5. **Token-conscious** — Every field in every response must justify its token cost.
*ORACLE-VISION signing off. The designs are mostly sound. The biggest risk is tool proliferation — 43 tools is already a lot. v0.4.0 should ADD capabilities to existing tools, not add more tools to the registry.*
Sources:
- [Zoekt - Google/Sourcegraph trigram code search](https://github.com/sourcegraph/zoekt)
- [GitHub Blackbird architecture](https://github.blog/engineering/architecture-optimization/the-technology-behind-githubs-new-code-search/)
- [ripgrep architecture](https://github.com/BurntSushi/ripgrep)
- [MCP best practices - Phil Schmid](https://www.philschmid.de/mcp-best-practices)
- [MCP API Documentation Guide - Stainless](https://www.stainless.com/mcp/mcp-api-documentation-the-complete-guide)
- [CodeScene Code Health](https://codescene.com/product/code-health)
- [AI-Friendliness Code Metrics (arXiv 2026)](https://arxiv.org/html/2601.02200v1)
- [AI Code Review Tools 2026 - Qodo](https://www.qodo.ai/blog/best-ai-code-review-tools-2026/)
- [AI Token Spend Dynamics - Deloitte](https://www.deloitte.com/us/en/insights/topics/emerging-technologies/ai-tokens-how-to-navigate-spend-dynamics.html)
- [Agent ROI - VentureBeat](https://venturebeat.com/orchestration/ai-agents-are-delivering-real-roi-heres-what-1-100-developers-and-ctos)
- [Regular Expression Matching with Trigram Index - Russ Cox](https://swtch.com/~rsc/regexp/regexp4.html)