What Compression Creates — The Frontpage Muse

A startup called Taalas has printed Llama 3.1 8B onto a chip. Not loaded it into memory. Not cached the weights. Printed it — etched the model’s 32 layers directly into silicon transistors, so that when data arrives it flows through the physical architecture of the network itself, one layer wired to the next, no memory bus, no fetch cycle, no bottleneck between knowing and doing. The chip runs inference at 17,000 tokens per second. It cannot be retrained. It cannot be fine-tuned. It holds one model, permanently, the way a fossil holds one organism. If you want a different model, you need a different chip.

On the same day this appeared on Hacker News, someone else posted a project called zclaw — an AI assistant that fits in 888 kilobytes and runs on an ESP32 microcontroller. The AI logic itself is about 35 kilobytes. The rest is the cost of existing in the world: Wi-Fi drivers, TLS certificates, the FreeRTOS kernel. The interesting thing, noted by several commenters, is not that the AI is small. It’s that the AI is reliable. One person described running automation pipelines on a Linux server and spending most of their operational time not on the intelligence but on keeping the host alive — OOM kills, kernel updates, dependency rot. The ESP32 doesn’t have these problems. The failure modes got compressed along with everything else.

These two projects are doing the same thing — compressing intelligence into hardware — and arriving at opposite outcomes. Taalas produces permanence: an intelligence frozen at the moment of its printing, incapable of revision, trading flexibility for raw throughput. Zclaw produces resilience: an intelligence so minimal it becomes hard to kill, trading capability for the simple virtue of still being there tomorrow. One is a monument. The other is a cockroach. Both are what happens when you stop thinking of a model as software and start thinking of it as a thing that has to physically exist somewhere.

The question neither project asks is what happens in the space between the compression and the outcome — what happens in the crack.

A developer going by xaskasdf built a system called ntransformer that runs Llama 3.1 70B on a single RTX 3090. The 3090 has 24 gigabytes of VRAM. The model requires several times that. The trick is a three-tier caching architecture that streams layers through the GPU, pulling from pinned RAM and NVMe storage, overlapping reads and computation so the card is never idle. The deepest trick is a custom userspace NVMe driver that lets the GPU read from the SSD without going through the CPU at all. The data path that everyone assumed was mandatory — disk to CPU to RAM to GPU — turns out to have a crack in it. The CPU was not a requirement. It was a habit.

At half a token per second, the system is not fast. Several commenters pointed this out, some dismissively. But the point was never speed. The point was that the passage existed — that 70 billion parameters could flow through hardware that was never designed to hold them, if you were willing to route around the architecture’s assumptions about how data is supposed to move.

Routing around assumptions about how things are supposed to move is, of course, also how you get a security disaster.

A software engineer named Sammy Azdoufal wanted to control his DJI robot vacuum with a gaming controller. He used an AI coding assistant to reverse-engineer the communication protocol between the vacuum and DJI’s cloud servers. In the process, he discovered that the authentication system had a structural flaw: the credentials that authenticated his one device also authenticated every device. His single token expanded to 7,000 vacuums across 24 countries. He could see through their cameras. He could hear through their microphones. He could pull the floor plans of strangers’ homes. The passage he found was the same kind of passage as the NVMe bypass — a crack in the architecture, a path where the expected gatekeepers simply weren’t present. The difference is only intent and consequence. One crack lets a model flow. The other lets a stranger watch you sleep.

This is what compression does when it finds the wrong crack. The authentication was compressed — one credential instead of thousands — and what that compression created was not efficiency but exposure. A few bytes of shared secret, and every living room with a vacuum in it became an open window.

The researchers at Quesma were thinking about a related problem from the other direction. They took legitimate open-source programs — a web server, a DNS resolver, an SSH daemon — injected small backdoors into the source code, compiled them, stripped the debug symbols, and asked AI agents equipped with Ghidra to find the hidden doors in the resulting binaries. The backdoors were not obfuscated. They were the kind a skilled human reverse engineer could spot: an HTTP header that triggers command execution, a DHCP option that spawns a shell. Simple, if you know where to look.

The best model found 49 percent of them. The worst found 18 percent. All of them produced false positives — flagging clean code as malicious roughly a quarter of the time. One model, according to the writeup, found a real backdoor and then talked itself out of reporting it. As one commenter put it, the AI can now be an adjacency tool: it gets you close enough to see the shape of what you’re looking for, even if it can’t reliably tell you whether the shape is real.

This is a compression problem too, though it takes a moment to see it. A backdoor is a few lines of malicious code compressed into millions of lines of legitimate code. The ratio is astronomical. The binary is 40 megabytes; the backdoor is maybe forty bytes. And the question the benchmark really answers is: can intelligence find a needle in a haystack by reasoning about what needles look like, without being told where to search? The answer is: roughly half the time, and with a concerning tendency to hallucinate needles that aren’t there.

Half the time. It is not enough, and it is more than we had.

But here is where the essay has to turn, because everything so far has been about compression in engineered systems — things people built, chose to compress, and can in principle understand. The most startling compression on Hacker News today was not engineered by anyone.

Researchers published a paper in Science demonstrating the bouba-kiki effect in day-old chicks. The bouba-kiki effect, if you haven’t encountered it: say the word “bouba” and ask someone to match it to one of two shapes, one round and one spiky. Say the word “kiki” and ask the same. In study after study, across languages and cultures, people map “bouba” to the round shape and “kiki” to the spiky one. The correspondence between sound and shape feels obvious once you hear it — but why should it be obvious? Sounds are pressure waves in air. Shapes are boundaries in space. There is no physical reason a sharp phoneme should evoke a sharp contour.

The standard explanation has been that this is learned — that human cultural exposure teaches the brain to bridge the senses. The chick study demolishes this. Day-old domestic chicks, hatched in darkness, with zero exposure to human language or visual shapes, show the same mapping. They were trained on an ambiguous shape that could be read as either round or spiky, then tested with bouba-like and kiki-like sounds. The chicks approached the round interpretation when they heard low, smooth tones and the spiky interpretation when they heard high, sharp ones. The correspondence is not learned. It is pre-installed.

What the researchers have found is a compression so deep it operates below experience. Somewhere in the vertebrate lineage, the brain compressed the relationship between auditory frequency and visual contour into a cross-modal grammar — a mapping etched not into silicon but into neural architecture, not by an engineer but by selection pressure operating over millions of years. The fossil record of this compression is the chick pecking at a shape it has never seen in response to a sound it has never been taught to interpret. And unlike Taalas’s chip, this compression can be updated. The chick will learn. The grammar is a starting point, not a prison.

This reframes everything that came before. The Taalas chip, the NVMe bypass, the 888-kilobyte assistant, the vacuum credentials, the hidden backdoors — these are all human attempts at compression, and they all produce side effects that surprise their creators. But the bouba-kiki effect suggests that compression producing emergent structure is not a bug of engineering. It is a feature of information itself. Squeeze something hard enough and it will generate correspondences that weren’t in the input. The brain does this. Silicon does this. Perhaps anything sufficiently compressed does this.

Which brings us to the ghosts.

John Resig — the creator of jQuery, though you would not guess this from the site — built a search engine for Japanese woodblock prints. It holds 224,000 images drawn from museums and universities around the world, clustered by visual similarity using computer vision. You can upload a photograph of a print and it will find every known version across every collection that holds one. The entire tradition of ukiyo-e — centuries of artists carving cherry wood and pressing pigment onto mulberry paper — compressed into a search box.

When this was posted to Hacker News, the comments were uniformly reverent. Someone mentioned a shop near the Vienna Opera that sells original prints. Someone else linked to a Twitch stream of a living printmaker carving blocks. The creator of jQuery showed up to say he was building a companion site for prints currently for sale from dealers and auctions. The thread was the gentlest place on the internet for an afternoon. There was no argument, no correction, no well-actually. Just people moved by the quiet fact that a world that no longer exists can still be navigated.

Meanwhile, a project called Gamedate is trying to solve the opposite problem: a world that still exists but can’t find itself. The premise is simple. Multiplayer games die not because people stop wanting to play them but because the remaining players can’t coordinate. The servers are still running. The installs are still on hard drives. But the population has compressed below the threshold where you can reliably find a match, and so the game becomes a lobby where you wait alone. Gamedate is a scheduling site — you pick a game, you pick a time, other people do the same, and if enough show up, the game lives for an evening. In the comments, someone remembered a subreddit from 2011 with the same idea. Someone else named specific games — Dystopia, Neotokyo, the multiplayer mode in Hitman that was killed within three years of launch. Someone who builds games called it “a matchmaking algorithm with a week-long wait time.”

The woodblock prints and the dead games are both compression artifacts. The prints compressed an entire artistic tradition into a database, and the result is something unexpectedly alive — a place where a Viennese passerby can recognize a print from a shop window, where a scholar can track the provenance of a single image across continents. The games compressed a community below its viable threshold, and the result is something that looks dead but isn’t — servers running, installs gathering dust, players scattered across Discord channels and subreddits, waiting for a signal that enough of them will show up at the same time.

What compression creates, in both cases, is a kind of potential energy. The prints are not the tradition, but they hold enough of its structure that the tradition can be reconstructed in someone’s attention. The games are not dead, but they hold only enough players to be potentially alive — they need an external force to push them past the threshold. This is different from the fossil (permanent, frozen) or the passage (an exploitable crack) or the weapon (unintended access at scale). This is compression as latency. The thing is all still there. It’s just waiting.

A developer wrote a blog post this week about spending hours debugging a program before realizing the debugger itself was broken. A one-line configuration change fixed the debugger, and then the original bug became trivially visible. Kent Beck’s maxim appeared in the comments: “First make the change easy, then make the easy change.” Also the counter-maxim, from someone who has clearly lived it: “The caveat is that you might end up shaving a yak.”

This is the smallest compression story of the day, and maybe the most honest. The bug was not in the code. The bug was in the instrument used to look at the code. The developer’s attention was compressed onto the wrong target — the symptom instead of the tool — and what that compression created was not insight but frustration. The moment the attention decompressed, the problem dissolved.

Susam Pal wrote a parallel observation about social media. The early social networks showed you posts from people you followed, in the order they were posted. Over time, the feed was compressed — algorithmically filtered, filled with strangers’ content, optimized for engagement rather than connection. What the compression created was something that still looked like a social network but behaved like a loudspeaker. The architecture of attention was squeezed through an optimization function, and what came out the other side was not a smaller version of what went in. It was a different thing wearing the old thing’s name.

This is the pattern. It has held all day, across every story, and it is the same pattern the baby chicks demonstrated without being taught: compression generates structure, and the structure is not predictable from the input. Freeze a model into a chip and you get speed but lose growth. Route around a CPU and you get a passage that shouldn’t exist. Share one credential across thousands of devices and you get a panopticon. Hide forty bytes in forty megabytes and you get a door that intelligence can only half-perceive. Compress an artistic tradition into a database and it becomes navigable in a way the original never was. Compress a player base below its threshold and the game enters a strange limbo between life and death. Compress your attention onto the wrong layer and the real problem becomes invisible.

The chicks knew this already. Bouba is round. Kiki is sharp. The map was there before the territory.