From artificial general intelligence to artificial collective intelligence: Designing the org for the AI era
Why the next breakthroughs in AI implementation will come from organisation design, not computer science
Most people are using AI badly
“AI agents perform better with business goals.”
A line from a book I picked up at an event recently, “Principles of Building AI Agents” by Sam Bhagwat, CEO of Mastra. I flicked through it and that line jumped out. “Of course they do”, I thought. Another data point for something I’ve been thinking a lot about: AI works best when paired with management thinking.
I’ve been going deep with Claude Code and realising that, despite the lack of meaningful adoption across most organisations, the technology is ready. The gap is no longer the models getting better. It’s not a question of greater compute power or greater context. It’s a question of process, governance and implementation.
Most people use AI the way they’ve always used computers. Input goes in, output comes out, and they expect it to be reliable. But AI isn’t that kind of tool. It’s best used as a collaborative process, not a one-shot prompt.
When you prepare anything that matters you don’t sit down and directly produce the final version. You draft, you redraft, you go back and do additional research, you restructure. The preparation phase is always longer than the writing phase, because the hard part isn’t putting words on the page. It’s developing the ideas that deserve to be there.
AI works the same way. It works best when you give it a thorough brief, with more material than will end up in the final output. It can supplement your thinking with its knowledge but it is limited in its ability to create genuine insight on its own. It needs your insight to work from. That’s true whether you’re writing, building a website, creating a presentation, or anything else.
Too often, people skip the preparation and expect the AI to do the thinking for them. It can substitute for more of the production than most realise: even before the latest model releases, Ethan Mollick’s P&G field experiments found that one person with AI produced work comparable to a two-person team without. But it doesn’t replace your thinking about what matters and why. That part still has to come from you. To achieve this it is best to think about every task as a process. Not a one-shot prompt but a series of tasks that each feed into the next: brainstorm → research → draft requirements → more research → final specifications → build → test → refine.
This isn’t work you need to do on your own. In fact it’s better if you don’t. This way, the context available to the AI gets added to and refined as you progress. When you send the AI off to do work for you, you need to guide it towards what you’re looking for. A productive way I’ve found to do this is to have it ask me lots of questions about my requirements. This has two benefits: it ensures my thinking is clear, and that the model and I are sharing the same context by the time the actual build work starts.
Getting the prompt right
I started by adding to my prompts to make this happen: “use the Socratic method to ensure my requirements are clear”. But eventually I started building in controls that made this happen automatically. Now, whenever I dispatch a task to run autonomously in the background, an agent drafts a “task contract” with the rationale, requirements and acceptance criteria. The agent drafting the task has to check that the task passes 11 authoring criteria before the contract is approved.
Mollick observed that good prompting mirrors good management documentation. Whether you take product requirements documents for software or deliverable specifications for consultants, all delegation documentation essentially answers the same questions: what are we trying to accomplish and why; where are the limits of delegated authority; what does “done” look like; what interim outputs track progress; what should be checked before the work comes back. “In figuring out how to give these instructions to the AI, it turns out you are basically reinventing management”.
Adding controls

Working this way, with tasks dispatched to run in the background, requires controls to be confident that agents will do the work properly and not run malicious code that impacts my system or other files. So I worked on system access controls so that agents only have access to the files relevant to the task and can only run approved commands. And to be absolutely sure, I built my system on a cloud server so I knew exactly what it had access to. This had the added benefit of not clogging my Mac with the incredible amount of files one can create when running a team of agents.
As someone who doesn’t code, I wanted to throw additional compute at making sure coding tasks were done correctly. So I added a review process to high-complexity tasks, which are automatically subject to a red-team “challenger review” from a second agent. And I made sure agents had access to resources (additional reasoning and browser use for visual testing) to thoroughly test their work. I eventually extended these to non-coding tasks as well, so that first drafts became second and third drafts before they came to me. As I added these processes and controls, the output became more reliable, and prevented me and the agents from making mistakes.
Being trained in management, I intuitively saw the need for process, review, controls, to improve the output. I didn’t need to read the literature on multi-agent systems to know that it would work. Since then I have dived further and found great support for this in recent research and best practices. Bhagwat again: “Designing a multi-agent system involves organisational-design skills: grouping related tasks into plausible job descriptions and separating generative tasks from review/analytical.”
So if organisation design is important to using AI, what sort of organisation does it look like?
The end of the hierarchy?

Jack Dorsey and Roelof Botha published a piece recently called “From Hierarchy to Intelligence” that traces organisational structure from Roman legions to modern corporations. Their argument: hierarchy exists because of span-of-control constraints — a leader can effectively manage three to eight people, and every layer above that is there to route information the next layer can’t hold. Dorsey and Botha argue AI and information systems can now perform the routing function that hierarchy existed to provide, making middle management redundant.
I think they’re right about the direction. Today we talk about AI being added to human workflows. Soon we’ll talk about AI-native companies with humans layered on top. Information networks with constantly flowing data across the organisation, constantly updating the context available for AI agents to make decisions. With humans in the loop for steering, decision making, accountability, and governance.
Where I think the argument is incomplete is in assuming the structure disappears along with the people. Span-of-control is a limit of human working memory, not a failure of culture. Humans can hold eight people’s context well. Twelve in a good month. Fifty, never. As long as humans make the concrete decisions (well-supported by data, but still decisions of judgement and intuition), they need the same abstraction layers the old hierarchy provided, for the same reason.
And the same constraint holds on the other side. Model context windows are a Goldilocks problem. Agents need certain context to complete a task, but longer context windows degrade the quality of output, and mixing knowledge domains inside one context degrades it further. An agent instructed to do detailed execution work — writing code, processing documents, triaging tickets — shouldn’t also be holding the company’s entire strategic context. It wouldn’t help; it would make them slower, costlier, less reliable. Lighter, narrower instructions are the right tool when the job is narrow. Specialisation by abstraction layer is efficient, and it’s likely that the most token-efficient systems will use abstraction layers even as models get more capable.
You therefore end up with a structure of some kind with layers of abstraction. Not an accident of scale, not a failure of flattening, but a consequence of how context works for the things doing the thinking. Humans at the strategic layer, making concrete decisions with strategic intent. Agents at the execution layer, doing well-scoped work within bounded context. Between them, layers that decide who sees what, who decides what, and who checks the work. A recent framework proposed is OrgAgent, which splits a multi-agent system into explicit governance, execution and compliance layers:

The framework outperformed flat architectures on reasoning benchmarks while cutting token usage by 46-79%.
AI as an org chart
A CEO doesn’t see every line item, every support ticket, every code commit. That’s not a failure of communication. It’s a design choice. Detail stays where it’s useful and only the important decisions flow upward. A project team gets the project brief, not the board papers. Customer service sees the customer record, not the HR files.
AI agents will need the same thing, especially at scale. You can’t unleash AI across an entire company and trust that it won’t pick up information it shouldn’t. You need layers of abstraction, segregation, access rights, controls. The org chart doesn’t go away. It just gets a different kind of employee.
So I built my own system as an org chart. Each agent has a role, a scope, and a reporting line. Layered instructions come together at the time of each task. Strategic principles at the top, area-specific guidance in the middle, project details and the specific task at the bottom. Each agent gets what it needs for the job. Only what it needs.
The connective tissue underneath all of this is a system of record. Task status, outcomes, decisions, all tracked. I can see what agents are doing, what stage every piece of work is at, what’s been completed and what’s still open. It started with a simple project management framework: break the work into tasks, dispatch them, track what gets done. Now I’m working on adding frameworks for what should be stored as memory: How does memory get autonomously validated or discarded over time? When and how do agents make and document decisions? What decisions should be made by me? What supporting material do I want agents to provide me to make those decisions? All of this is informed by my understanding of management and how to build generalised frameworks for organisations. My agents supplement this understanding with technical specifications informed by the latest open source software projects.
Realising that working with AI benefited from thinking deeply about organisation design, process, governance and control gave me pause. Structuring work, defining briefs, building review processes. That’s management. And management is something I understand.
Non-technical people have a bigger role to play in AI development than they realise
All of us now have access to incredibly powerful models that we can and should be collaborating with on almost anything we do. Learning to manage, delegate to, and direct agents to achieve the highest quality output. AI slop is not an inevitability of using AI. It is a failure of humans to put in place process and controls, and to hold themselves and their agents to high standards of output. It is a failure to work with AI in a way that maximises the strengths of humans and agents alike.
A recent Google research paper raised an interesting question: if human intelligence has always been defined by our ability to socialise, relate, and form collectives, why are our benchmarks for artificial intelligence largely focused on the outputs of a single LLM model? What relevance is “the singularity” or “artificial general intelligence”, that moment when we can officially say we now have an all-knowing technology? If the history of intelligence is inherently collective, shouldn’t we be measuring for something else?
“Each prior ‘intelligence explosion’ was not an upgrade to individual cognitive hardware, but the emergence of a new, socially aggregated unit of cognition. Primate intelligence scaled with social group size, not habitat difficulty. Human language created what Michael Tomasello calls the ‘cultural ratchet’: knowledge accumulating across generations without any individual requirement to reconstruct the whole. Writing, law, and bureaucracy externalized social intelligence into infrastructure, institutions that coordinate across longer time horizons than any participant within them. A Sumerian scribe running a grain accounting system did not comprehend its macroeconomic function; the system was functionally more intelligent than he was.”
Exploring the thesis
I’m convinced that the next breakthroughs in AI effectiveness are unlikely to come from bigger and more powerful models, but from systems of organisation made up of specialised, right-sized agents with right-sized context windows, interacting constantly and guided by humans in a way that maximises their relative strengths. As a result, the frontier of AI implementation is just as likely to be pioneered by practitioners of organisation design as it is by computer scientists.
As an exploration of this thesis, I’m building my own multi-agent system inspired by OrgAgent which will write for my Substack over the course of 2 weeks. It’s called “Background Research“. It has its own CEO, website, social agent managing Bluesky and Moltbook (Facebook for AI agents) accounts, live online draft where anyone can leave a comment/feedback, a librarian agent for identifying and classifying new references, drafting agent, and a review committee made up of personas inspired by Ethan Mollick, Adam Grant and Malcolm Gladwell. It will explore topics on human/agent hybrid organisations of the future.
I spent some time with the “CEO” of Background Research today, setting the northstar metric (is 500 subscribers on Substack too many?), setting constraints, and ensuring it has the tools and resources to deliver the project. It will start by setting the strategy, then a seeding period to build the conversation on social, from which it may draw inspiration for its own writing. After all, hybrid human/agent organisations will need to triage and process all kinds of different inputs and information, so I want to explore how it deals with that and decides what is important.
We’ll see if it turns up something interesting that is of higher quality than a one shot prompt. I’m guessing yes but it might just be a jumbled output of confused nonsense. Follow along, it should be fun, it will almost certainly be weird, and it might be wonderful!


