Finding an Inference Box on Facebook Marketplace
Researched using my AMD card in an eGPU with a Mac Mini. ROCm killed it. Then I found a complete rig on Facebook Marketplace for less than the GPU alone.
I’ve been using LLM tooling since the early Copilot previews in 2022. I was onboarding GitHub as an enterprise platform at the time and had access to most preview features before they went public. For a while I was ahead of the curve.
Then the hype hit and compliance caught up. Suddenly everything needed approval. The developers ended up locked to older models while the rest of the industry moved on.
After redundancy I wanted to stop depending on someone else’s API. Token costs were climbing, every provider was subsidising to build dependency, and I’d seen this pattern before. The dot-com bubble didn’t burst because the internet was useless. It burst because the economics didn’t hold. I didn’t know if LLM pricing would survive contact with reality, and I didn’t want to find out the hard way.
I wanted my own inference.
Trying What I Already Had
I had an AMD Radeon RX 6900 XT in my gaming rig and a Razer Core X eGPU enclosure. The idea was: put the card in the enclosure, connect it to my 2018 Mac Mini over Thunderbolt, run inference from there. Use what I’ve already got.
I did the research before pulling the card out. ROCm (AMD’s CUDA equivalent) is a mess for LLM workloads. Most inference frameworks are built CUDA-first and AMD support is an afterthought. Thunderbolt bandwidth is another compromise on top. The more I read, the more it looked like a Heath Robinson job. I wanted something that worked.
Time to look at other options.
The Facebook Marketplace Find
What I needed was 24GB of VRAM. That’s the threshold. Below it you’re constantly hitting limits with anything bigger than an 8B parameter model. The RTX 3090 was the obvious choice. Huge second-hand market, 24GB GDDR6X, and the prices had come down as the 4000 series took over.
Then I found a complete rig listed locally. The seller was a local builder, not a flipper. Came with the original GPU box. Better value than buying the card alone and building a system around it. The specs:
- AMD Ryzen 7 5800X
- 32GB DDR4 3600MHz
- MSI X570S Torpedo Max
- NVIDIA RTX 3090 24GB (Zotac Gaming Trinity)
- Samsung 1TB SSD
- Noctua NH-D15 cooler
- 1200W Platinum PSU
Better value than buying the GPU alone and building around it. The Zotac Trinity is their mid-tier 3090. Triple fan, 350W TDP. Early Zotac 3090 batches had thermal pad issues causing high VRAM temperatures. Worth checking if the memory junction stays under 100°C under load.
What to Check Before Buying
I asked the seller to run a few things before I committed:
- LM Studio with Llama 3 8B loaded. Watching for token speed (should be 80-120+ tok/s on a 3090), VRAM filling correctly, temperatures stabilising under 83°C.
- GPU-Z to verify the card was detected properly and the full 24GB was accessible.
Having the original GPU box was a positive signal. Mining-pulled cards rarely keep their packaging.
The Name
I already had a Windows gaming machine called Pixie. The naming convention was set. Something from the same register, something that pairs.
Nixie. Water spirits in Germanic mythology. And a quiet *nix pun for a Linux server.
Pixie does the flashy work. Nixie does the heavy lifting.
Related
Setting Up Nixie
Ubuntu Server on a headless box with an RTX 3090. NVIDIA drivers, Ollama, memory tuning, and a Go bot wired to Mattermost.
Semantic Recipe Search with LangChain, pgvector, and Local Embeddings
Keyword search fails on intent. 'Quick weeknight chicken' shares zero words with 'Fish Finger Sandwiches' but it's the right answer.