The New LLM That Generates Code 10x Faster

On June 3rd at our FutureTech Meetup, Volodymyr Kuleshov, co-founder of Inception Labs and researcher at Stanford AI Lab, unveiled a revolutionary development — the first commercial diffusion-based language model (dLLM).

A team led by this Ukrainian innovator developed the Mercury model, representing a breakthrough in the world of language models.

Volodymyr Kuleshov

From Noise to Code: How Diffusion Works

Traditional language models operate on an autoregressive principle — generating text sequentially, word by word, from left to right. It’s similar to how we write by hand: first one letter, then the next.

The Inception Labs team proposed a radically different approach: their model uses diffusion — the same principle underlying image generators like Midjourney, DALL-E, and Sora.

Here’s how it works:

The model starts with random tokens — a kind of “noise”
Gradually updates all words in parallel, not sequentially
Each iteration improves the result, “reducing the noise”
The model can correct mistakes during the generation process

Volodymyr demonstrated this with a concrete example: when asked “What open mathematical problem did Andrew Wiles solve in 1994?” the model first generated an approximate answer “Final Theorem of Fermat,” then corrected it in a single step to the precise “Last Theorem of Fermat.”

Speed as Competitive Advantage

Mercury’s key advantage is incredible speed. The model generates up to 1,000 tokens per second — 10 times faster than traditional LLMs.

According to independent testing by EleutherAI, Mercury achieves speeds that were previously only possible with specialized Cerebras chips.

“By changing our approach to language modeling, we can use a better algorithm and achieve results that were previously only available with specialized chips,” explains Volodymyr.

But the Inception Labs team achieved these results on standard NVIDIA GPUs.

Revolution in Programming

Mercury shows particularly impressive results in programming.

Volodymyr highlighted several key use cases where their model can be most valuable:

Code autocompletion: In development, response speed is critically important. If an IDE delays more than 400 milliseconds, programmers lose concentration. Mercury provides instant response, dramatically improving the development experience.

Agent systems: AI agents that generate and modify code can now complete tasks in seconds instead of minutes. This opens new possibilities for automating complex development tasks.

IDE integrations: Features like NextEdit and ApplyEdit, which predict the user’s next actions, become much more effective thanks to high processing speed.

Market Recognition

Mercury underwent serious testing in Copilot Arena — an independent benchmark where users choose the better model in “blind” testing. The results are impressive: Mercury achieved the highest score, even compared to GPT-4o, and became #1 in speed.

“We received tremendous positive feedback on social media and in the tech press,” says Volodymyr.

Scalability and Efficiency

Mercury’s high speed provides several important advantages:

Ability to serve more users simultaneously
Faster response for decision support systems
Ability to generate longer and higher-quality code fragments in the same time

What’s Next?

Mercury is already available for testing on the Inception Labs website — both through web interface and API. The team is actively collecting user feedback and planning to release new models in the coming months.

Volodymyr encourages the Ukrainian tech community to try Mercury:

“If you see something interesting, let us know. We believe that diffusion modeling is the future of AI.”

The Mercury story once again demonstrates how developers from Ukraine are creating breakthrough technologies at the global level.