The Scaling Wall: Moving Beyond MD Files in Multi-Agent Systems

May 06, 2026

Right now, files are everywhere in the agent ecosystem. Markdown files are the default for everything — skills, custom instructions, agent descriptions, commands, memory. The entire vibe-coding movement runs on the assumption that a folder of .md files is all you need.

And I get the appeal. LLMs are trained on repos, docs, logs, and README-driven workflows. They already know how to list directories, grep for patterns, read line ranges, and write artifacts. The filesystem is the most natural interface we can give an agent. No schemas, no migrations, no query planners. Just text.

But here’s the thing — I’ve walked this path before. And I’ve seen how it ends.

Sovereign Agentic AI (Volodymyrs View) is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

We’ve Been Here Before

This pattern is not new. Early computers were fiddling with text files until it failed. Then people started inventing layers on top of the file system — indexing, querying, structured retrieval. First came reverse-list systems like Adabas. Then hierarchical databases with COBOL and tree structures. Then relational databases and SQL.

Every time the industry starts from files, and every time it rediscovers that files don’t scale past a certain complexity threshold.

The knowledge management world learned this lesson recently. Logseq started as a heavily markdown-based system, just like Obsidian. But at some point, the Logseq team realized they would be much more effective and compact if they converted to a database backend. Obsidian tried a different route — keeping the markdown files but layering query plugins on top — and the result is functional but far from elegant.

From my own observation, every time I tried to build a knowledge management system on top of Obsidian or Logseq, it worked beautifully at first. But once I needed sophisticated queries across a large, linked corpus, the whole thing turned into an unnavigable mess. I spent years going back and forth between these two systems before I accepted that the center of any serious “second brain” is always a database.

Agents are walking the same path right now.

Where Files Work (And Where They Don’t)

Let me be fair to files. For small codebases, toy setups, single-user agents, and keyword-friendly queries, files are great. A folder of markdown, a handful of skill files, some commands — more than enough. The filesystem gives you simplicity, portability, debuggability. You can open a folder and see exactly what the agent saved.

But production agentic systems are not toy setups. The moment you scale up — multi-agent architectures, shared state, growing memory corpora, concurrent access — files hit five hard limitations.

Concurrent writes. Anyone who has dealt with concurrent file access in any language knows it’s tedious. File locking is fragile, platform-dependent, and breaks in surprising ways on network filesystems. Databases solved concurrent access decades ago with MVCC and write-ahead logs. Your agents don’t need to worry about who’s competing for the resource.

Semantic retrieval at scale. When your corpus is small, grep works fine. But as it grows — thousands of memory entries, paraphrased queries, synonym-heavy questions — keyword search breaks down hard. You need vector indexes, hybrid search, ranked retrieval. Files give you grep. Databases give you HNSW indexes, full-text search, and query planners that actually optimize.

ACID guarantees for shared state. In any serious multi-agent system, agents share state. You need atomicity, consistency, isolation, durability. Files provide none of this by default. You have to build it yourself — and building it correctly is a massive engineering effort.

Audit trails and access control. Who wrote what, when, and who can read it? Row-level security, transaction logs, role-based access — databases have this built in. With files, you’re reimplementing permissions from scratch.

Indexed queries over growing memory. Agent memory is not static. It grows. And as it grows, queries that scan the entire file tree degrade linearly. You need B-trees, inverted indexes, the kind of query infrastructure databases have spent forty years optimizing.

The Benchmarks Don’t Lie

Richmond Alake from Oracle Developers recently published a detailed benchmark comparing a filesystem-backed agent (FSAgent) with a database-backed agent (MemAgent). The setup was straightforward: both agents used the same LLM, the same task (a research assistant that ingests arXiv papers, answers questions, maintains continuity across sessions), and the same evaluation methodology. The only difference was the memory substrate — files versus Oracle AI Database with vector search.

The findings were clear across three scenarios.

Small corpus, keyword-friendly queries: filesystem and database performed nearly identically. When the information to traverse is minimal and the query matches the source phrasing, retrieval quality converges. Both agents found the right passages and produced comparable answers.

Large corpus, fuzzy queries: the gap widened dramatically. The filesystem agent scored 29.7% on an LLM-judge evaluation, while the database-backed agent hit 87.1%. Keyword search returns too many shallow hits or none when the user’s phrasing doesn’t match the source text verbatim. Semantic search surfaces conceptually relevant chunks even when the vocabulary differs.

Concurrent writes: this is where the filesystem breaks hardest. Without locking, concurrent writes silently lose entries — you get good throughput but corrupted memory. With locking, integrity is restored, but now you own all the complexity of lock scope, contention, platform differences, and failure recovery. The database maintained full integrity under the same workload with standard ACID transactions.

The full benchmark notebook is available on the Oracle AI Developer Hub GitHub. Run it yourself.

You’re Already Rebuilding a Database

Here’s the part that makes me smile. I’ve watched several teams spend weeks, sometimes months, “hardening” their filesystem-based memory implementations. They add file locking for concurrency. They build custom indexing for search. They layer on journaling for durability. They implement access control lists. They add metadata schemas for structured queries.

And I look at this and think — congratulations. You just rebuilt a database. Only with fewer guarantees and more edge cases to own.

This is not a new observation. Anders Swanson from Oracle put it well: building a stateful app using only a filesystem runs the risk of reinventing the database, and that is a large and complex task that should be avoided. Dax from OpenCode called the filesystem “just the worst kind of database.” The pattern is consistent across the industry.

The Interface vs. Substrate Distinction

The key insight is that interface and substrate are different decisions. Filesystems win as an interface — LLMs already know how to use them. Databases win as a substrate — concurrency, auditability, semantic search, ACID guarantees.

The smartest architectures decouple the two. LangSmith’s agent builder, for example, stores data in a database but exposes it to the agent as a filesystem. The agent interacts with files because that’s what it knows. The system stores everything in a database because that’s what scales. This “virtual filesystem” pattern is likely where the industry converges.

My own work with LadybugDB follows a similar philosophy — a graph database with vector search as the memory substrate, but with interfaces that agents can navigate naturally. The combination of graph structure (for relationships and trust chains) and vector search (for semantic retrieval) gives you a hybrid retrieval architecture that neither files nor a simple key-value store can match.

What About Codebases?

There’s an adjacent topic worth mentioning. Several projects I’ve been following try to treat the codebase itself as a database — parsing the AST, making the code globally queryable, giving agents structured access to architecture and dependencies rather than just file contents.

This is exactly what people do with Smalltalk and the Glamorous Toolkit in the moldable programming community. If you can turn your codebase together with your documentation into a queryable structure and hand it to an agent, you get much better understanding — not only for the agent, but for yourself.

The granularity matters. With a database, you get precise context, proper attention management, and traceability. With a flat folder of files, you get scanning and hoping. But codebases-as-databases for agents is a topic for a separate deep dive.

Practical Guidance

Here’s my recommendation, distilled from both my own experience and the benchmark evidence.

Use files when you’re building prototypes, single-user agents, or small-scale tools where iteration speed matters most. A folder of markdown gets you surprisingly far when the corpus is small and the queries are keyword-friendly.

Use databases when you’re building multi-user, multi-agent production applications. Any system where agents share state, where memory grows continuously, where you need semantic retrieval, or where concurrent access is a reality.

Don’t start with polyglot persistence. Running a separate vector database for embeddings, a NoSQL store for JSON, a graph database for relationships, and a relational database for transactions gives you four failure modes, four security models, and four backup strategies. Converged databases that handle multiple data types natively — whether that’s Oracle AI Database, or for embedded use cases, something like SQLite or LadybugDB — reduce operational complexity dramatically.

Start simple. Even a single SQLite file is a massive upgrade over raw filesystem memory. SQLite is fully portable, it’s a single file, and you can have multiple databases that you glue together for different concerns. From there, you can scale up to a full graph database with vector search when the complexity warrants it.

What’s Next

I’m flying next week to meet with the Oracle Devs team. We’re going to talk about agent harnesses, memory architectures, and how Oracle fits into the agentic memory space. I’ll report back with what I learn.

The databases are not dead. Files are simple, they’re a natural interface to LLMs, but when the information gets complex, they’re not a simple solution anymore. Use the tools that decades of engineering have already built for you — because if you don’t, you’ll end up rebuilding them anyway.

When Your Body Stops Speaking: Tea Meditation as a Science-Based Burnout Recovery Protocol

Tea drinking AI Architect

How to hack a flow and engineering with tea. tea philosophy and agents to buy me a tea https://ko-fi.com/volodymyrpavlyshyn