Inference Rollups: The Hidden Infrastructure Powering On-Chain AI
How inference rollups move heavy AI computation off-chain while keeping verification trustless. A deep dive into zkML, opML, and the projects racing to make on-chain AI actually work.
Running AI on blockchain is expensive. Absurdly expensive. A simple 1000x1000 matrix multiplication would cost over 3 billion gas on Ethereum. That exceeds the entire block gas limit by orders of magnitude. For reference, the current Ethereum block gas limit sits around 30 million. You literally cannot perform basic AI operations on-chain without some form of workaround.
This fundamental constraint has haunted crypto AI projects for years. Most "on-chain AI" projects quietly run their models on centralized servers and just post results to the blockchain. That's not decentralized AI. That's regular AI with a marketing problem.
Inference rollups solve this by borrowing techniques from Ethereum's scaling playbook. Move computation off-chain. Keep verification on-chain. The same logic that makes Optimism and Arbitrum work for transactions can make AI inference trustless without melting your GPU budget.
The Core Problem: Blockchains Cannot Run AI
Let's be specific about why this matters.
A GPT-style language model runs billions of floating-point operations per inference. Even small models like Llama 7B require roughly 14 billion multiply-accumulate operations per token generated. Smart contracts on Ethereum process maybe a few thousand operations per transaction before hitting gas limits.