Finding an Inference Box on Facebook Marketplace

I’ve been using LLM tooling since the early Copilot previews in 2022. I was onboarding GitHub as an enterprise platform at the time and had access to most preview features before they went public. For a while I was ahead of the curve.

Then the hype hit and compliance caught up. Suddenly everything needed approval. The developers ended up locked to older models while the rest of the industry moved on.

After redundancy I wanted to stop depending on someone else’s API. Token costs were climbing, every provider was subsidising to build dependency, and I’d seen this pattern before. The dot-com bubble didn’t burst because the internet was useless. It burst because the economics didn’t hold. I didn’t know if LLM pricing would survive contact with reality, and I didn’t want to find out the hard way.

I wanted my own inference.

Trying What I Already Had

I had an AMD Radeon RX 6900 XT in my gaming rig and a Razer Core X eGPU enclosure. The idea was: put the card in the enclosure, connect it to my 2018 Mac Mini over Thunderbolt, run inference from there. Use what I’ve already got.

I did the research before pulling the card out. ROCm (AMD’s CUDA equivalent) is a mess for LLM workloads. Most inference frameworks are built CUDA-first and AMD support is an afterthought. Thunderbolt bandwidth is another compromise on top. The more I read, the more it looked like a Heath Robinson job. I wanted something that worked.

Time to look at other options.

The Facebook Marketplace Find

What I needed was 24GB of VRAM. That’s the threshold. Below it you’re constantly hitting limits with anything bigger than an 8B parameter model. The RTX 3090 was the obvious choice. Huge second-hand market, 24GB GDDR6X, and the prices had come down as the 4000 series took over.

Then I found a complete rig listed locally. The seller was a local builder, not a flipper. Came with the original GPU box. Better value than buying the card alone and building a system around it. The specs:

AMD Ryzen 7 5800X
32GB DDR4 3600MHz
MSI X570S Torpedo Max
NVIDIA RTX 3090 24GB (Zotac Gaming Trinity)
Samsung 1TB SSD
Noctua NH-D15 cooler
1200W Platinum PSU

Better value than buying the GPU alone and building around it. The Zotac Trinity is their mid-tier 3090. Triple fan, 350W TDP. Early Zotac 3090 batches had thermal pad issues causing high VRAM temperatures. Worth checking if the memory junction stays under 100°C under load.

What to Check Before Buying

I asked the seller to run a few things before I committed:

LM Studio with Llama 3 8B loaded. Watching for token speed (should be 80-120+ tok/s on a 3090), VRAM filling correctly, temperatures stabilising under 83°C.
GPU-Z to verify the card was detected properly and the full 24GB was accessible.

Having the original GPU box was a positive signal. Mining-pulled cards rarely keep their packaging.

The Name

I already had a Windows gaming machine called Pixie. The naming convention was set. Something from the same register, something that pairs.

Nixie. Water spirits in Germanic mythology. And a quiet *nix pun for a Linux server.

Pixie does the flashy work. Nixie does the heavy lifting.

Finding an Inference Box on Facebook Marketplace

Trying What I Already Had

The Facebook Marketplace Find

What to Check Before Buying

The Name

Setting Up Nixie

Semantic Recipe Search with LangChain, pgvector, and Local Embeddings

Trying What I Already Had

The Facebook Marketplace Find

What to Check Before Buying

The Name

Related

Setting Up Nixie

Semantic Recipe Search with LangChain, pgvector, and Local Embeddings