Two Weeks With a Workforce — The Automation Engineer

It is 3am on a Saturday morning. Nixie has been running four agents on a podman stack. Pepper as my PA, Gilfoyle and Dinesh and Jared as the workforce. Silicon Valley names because the joke felt right. I have been on the sofa since dinner thinking five more minutes to just finish this, and somehow five minutes has been five hours.

The agents are working. They are talking to each other in a Discord channel called ops. They are reading shared markdown files for state and writing them back. The whole thing is held together with mounted volumes, hand-rolled UID mappings, and a couple of OpenClaw containers that I do not entirely trust. It is an engineering experiment. By the end of the weekend it will teach me what I was actually building and what I was not.

What I was testing

The hypothesis was that you could put four small specialist agents on commodity hardware, give them a shared workspace, and see if real work would actually come out of it. Pepper would manage me. Gilfoyle would do ops. Dinesh would write code. Jared would project-manage. The Discord channel was the bus. The shared markdown files were the memory. The 3090 was supposed to do all of the inference.

Underneath, OpenClaw containers ran the agent loops. Podman because rootless was clean and I had no interest in sudo-everything just to ship containers. The whole stack was Ansible-driven so I could rebuild it from scratch. I committed each layer as I went.

What worked

Naming was the most surprising thing. Naming an agent Pepper after deciding I needed something that would call me out on my bullshit changed how I prompted her. The persona shaped the rest of the design, not the other way around. The Silicon Valley four held the same shape. (It is possible the LLMs benefit from this too. It is possible they never really did.)

Per-tenant identity was right. One human tenant, three agent principals, one household pseudo-tenant for shared state. Tenants modelled around humans, not products, gave a cleaner shape because identity is what actually needs isolating. That decision survived the experiment.

Markdown as memory was also right, with caveats. The agents could write into shared files and pick the state up later. Querying it was awful. We discussed LangChain and vector stores and decided to keep the data as markdown so we could upgrade the data layer later without rewriting the inputs. That call held.

What did not

Discord did not work as a comms layer. The original plan had been Mattermost, but the Mattermost paywall had already killed an earlier version of this experiment. Discord was satisfying to watch. Agents talking to each other in real time was a real moment. But every message was a context refresh, every reply was an LLM call, and the only way to get a bot to respond at all was to @-mention it on every single message. Tag Gilfoyle in ops, get nothing. Tag him again on the next message, get a reply. By the time I’d been at it a few days I knew this was not the comms layer for serious work.

UID mapping inside rootless podman cost me an hour I am not getting back. My user is 1000 on Nixie. Node is 1000 in the OpenClaw container. They lined up by happy coincidence. The moment I tried to think one step further, what if the host user was 1001 and node was still 1000, the security model fell over. I argued with Claude for fifteen minutes about whether keep-id was actually doing what we thought it was doing. The answer in the end was that mapping shared writes back to a single host UID is fine for one user, and specifically not fine the moment you scale it. Defence in depth was the real answer. Single-tenant for now, multi-tenant only when you have the management to back it up.

Four agents on one 3090 is a context-thrashing problem before it is an inference problem. Each agent kept its own conversation context. Each tool call cost real tokens. The total cost of carrying four parallel contexts was higher than I expected, and the value of having them was lower than I hoped, because most of what they actually needed to know was the same shared state in markdown anyway. The workforce was a more elaborate way of doing what one well-prompted agent with the right tool access could already do.

Where this lands

Two weeks in, the SDLC pipeline is pulling focus. The workforce was stood up to feed something, and that something has turned out to be code review and harness work that does not need four agents to do it. I am winding it down.

What’s surviving: per-human tenancy for the human-facing layer. Markdown as the memory layer is in. The multi-agent chat-bus is out. Discord could not carry it. The reflect-and-review cycle is moving from agents-talking-to-each-other to a single agent in a different mount mode. How agent identity should work for the SDLC pipeline, where agents will run autonomously in their own contexts and permissions, is its own question, and not one this experiment closed.

Pepper still exists, as a name. She is a prompt and a persona for whatever I build next. Gilfoyle, Dinesh, and Jared are gone.

The workforce ran for about two weeks. Then the SDLC pulled focus and the workforce simplified itself out of existence. Done.