In sprawling stretches of farmland and industrial parks, supersized buildings packed with racks of computers are springing up to fuel the AI race. These engineering marvels are a new species of infrastructure: supercomputers designed to train and run large language models at mind-bending scale, complete with their own specialized chips, cooling systems, and even energy…
Etiket: OpenAI
-
The Download: the case for AI slop, and helping CRISPR fulfill its promise
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How I learned to stop worrying and love AI slop —Caiwei Chen If I were to locate the moment AI slop broke through into popular consciousness, I’d pick the video of rabbits bouncing…
-
Using unstructured data to fuel enterprise AI success
Enterprises are sitting on vast quantities of unstructured data, from call records and video footage to customer complaint histories and supply chain signals. Yet this invaluable business intelligence, estimated to make up as much as 90% of the data generated by organizations, historically remained dormant because its unstructured nature makes analysis extremely difficult. But if…
-
The Download: mimicking pregnancy’s first moments in a lab, and AI parameters explained
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Researchers are getting organoids pregnant with human embryos At first glance, it looks like the start of a human pregnancy: A ball-shaped embryo presses into the lining of the uterus then grips tight,…
-
Nous Research’s NousCoder-14B is an open-source coding model landing right in the Claude Code moment
Nous Research, the open-source artificial intelligence startup backed by crypto venture firm Paradigm, released a new competitive programming model on Monday that it says matches or exceeds several larger proprietary systems — trained in just four days using 48 of Nvidia's latest B200 graphics processors.
The model, called NousCoder-14B, is another entry in a crowded field of AI coding assistants, but arrives at a particularly charged moment: Claude Code, the agentic programming tool from rival Anthropic, has dominated social media discussion since New Year's Day, with developers posting breathless testimonials about its capabilities. The simultaneous developments underscore how quickly AI-assisted software development is evolving — and how fiercely companies large and small are competing to capture what many believe will become a foundational technology for how software gets written.
type: embedded-entry-inline id: 74cSyrq6OUrp9SEQ5zOUSl
NousCoder-14B achieves a 67.87 percent accuracy rate on LiveCodeBench v6, a standardized evaluation that tests models on competitive programming problems published between August 2024 and May 2025. That figure represents a 7.08 percentage point improvement over the base model it was trained from, Alibaba's Qwen3-14B, according to Nous Research's technical report published alongside the release.
"I gave Claude Code a description of the problem, it generated what we built last year in an hour," wrote Jaana Dogan, a principal engineer at Google responsible for the Gemini API, in a viral post on X last week that captured the prevailing mood around AI coding tools. Dogan was describing a distributed agent orchestration system her team had spent a year developing — a system Claude Code approximated from a three-paragraph prompt.
The juxtaposition is instructive: while Anthropic's Claude Code has captured imaginations with demonstrations of end-to-end software development, Nous Research is betting that open-source alternatives trained on verifiable problems can close the gap — and that transparency in how these models are built matters as much as raw capability.
How Nous Research built an AI coding model that anyone can replicate
What distinguishes the NousCoder-14B release from many competitor announcements is its radical openness. Nous Research published not just the model weights but the complete reinforcement learning environment, benchmark suite, and training harness — built on the company's Atropos framework — enabling any researcher with sufficient compute to reproduce or extend the work.
"Open-sourcing the Atropos stack provides the necessary infrastructure for reproducible olympiad-level reasoning research," noted one observer on X, summarizing the significance for the academic and open-source communities.
The model was trained by Joe Li, a researcher in residence at Nous Research and a former competitive programmer himself. Li's technical report reveals an unexpectedly personal dimension: he compared the model's improvement trajectory to his own journey on Codeforces, the competitive programming platform where participants earn ratings based on contest performance.
Based on rough estimates mapping LiveCodeBench scores to Codeforces ratings, Li calculated that NousCoder-14B's improvemen t— from approximately the 1600-1750 rating range to 2100-2200 — mirrors a leap that took him nearly two years of sustained practice between ages 14 and 16. The model accomplished the equivalent in four days.
"Watching that final training run unfold was quite a surreal experience," Li wrote in the technical report.
But Li was quick to note an important caveat that speaks to broader questions about AI efficiency: he solved roughly 1,000 problems during those two years, while the model required 24,000. Humans, at least for now, remain dramatically more sample-efficient learners.
Inside the reinforcement learning system that trains on 24,000 competitive programming problems
NousCoder-14B's training process offers a window into the increasingly sophisticated techniques researchers use to improve AI reasoning capabilities through reinforcement learning.
The approach relies on what researchers call "verifiable rewards" — a system where the model generates code solutions, those solutions are executed against test cases, and the model receives a simple binary signal: correct or incorrect. This feedback loop, while conceptually straightforward, requires significant infrastructure to execute at scale.
Nous Research used Modal, a cloud computing platform, to run sandboxed code execution in parallel. Each of the 24,000 training problems contains hundreds of test cases on average, and the system must verify that generated code produces correct outputs within time and memory constraints — 15 seconds and 4 gigabytes, respectively.
The training employed a technique called DAPO (Dynamic Sampling Policy Optimization), which the researchers found performed slightly better than alternatives in their experiments. A key innovation involves "dynamic sampling" — discarding training examples where the model either solves all attempts or fails all attempts, since these provide no useful gradient signal for learning.
The researchers also adopted "iterative context extension," first training the model with a 32,000-token context window before expanding to 40,000 tokens. During evaluation, extending the context further to approximately 80,000 tokens produced the best results, with accuracy reaching 67.87 percent.
Perhaps most significantly, the training pipeline overlaps inference and verification — as soon as the model generates a solution, it begins work on the next problem while the previous solution is being checked. This pipelining, combined with asynchronous training where multiple model instances work in parallel, maximizes hardware utilization on expensive GPU clusters.
The looming data shortage that could slow AI coding model progress
Buried in Li's technical report is a finding with significant implications for the future of AI development: the training dataset for NousCoder-14B encompasses "a significant portion of all readily available, verifiable competitive programming problems in a standardized dataset format."
In other words, for this particular domain, the researchers are approaching the limits of high-quality training data.
"The total number of competitive programming problems on the Internet is roughly the same order of magnitude," Li wrote, referring to the 24,000 problems used for training. "This suggests that within the competitive programming domain, we have approached the limits of high-quality data."
This observation echoes growing concern across the AI industry about data constraints. While compute continues to scale according to well-understood economic and engineering principles, training data is "increasingly finite," as Li put it.
"It appears that some of the most important research that needs to be done in the future will be in the areas of synthetic data generation and data efficient algorithms and architectures," he concluded.
The challenge is particularly acute for competitive programming because the domain requires problems with known correct solutions that can be verified automatically. Unlike natural language tasks where human evaluation or proxy metrics suffice, code either works or it doesn't — making synthetic data generation considerably more difficult.
Li identified one potential avenue: training models not just to solve problems but to generate solvable problems, enabling a form of self-play similar to techniques that proved successful in game-playing AI systems. "Once synthetic problem generation is solved, self-play becomes a very interesting direction," he wrote.
A $65 million bet that open-source AI can compete with Big Tech
Nous Research has carved out a distinctive position in the AI landscape: a company committed to open-source releases that compete with — and sometimes exceed — proprietary alternatives.
The company raised $50 million in April 2025 in a round led by Paradigm, the cryptocurrency-focused venture firm founded by Coinbase co-founder Fred Ehrsam. Total funding reached $65 million, according to some reports. The investment reflected growing interest in decentralized approaches to AI training, an area where Nous Research has developed its Psyche platform.
Previous releases include Hermes 4, a family of models that we reported "outperform ChatGPT without content restrictions," and DeepHermes-3, which the company described as the first "toggle-on reasoning model" — allowing users to activate extended thinking capabilities on demand.
The company has cultivated a distinctive aesthetic and community, prompting some skepticism about whether style might overshadow substance. "Ofc i'm gonna believe an anime pfp company. stop benchmarkmaxxing ffs," wrote one critic on X, referring to Nous Research's anime-style branding and the industry practice of optimizing for benchmark performance.
Others raised technical questions. "Based on the benchmark, Nemotron is better," noted one commenter, referring to Nvidia's family of language models. Another asked whether NousCoder-14B is "agentic focused or just 'one shot' coding" — a distinction that matters for practical software development, where iterating on feedback typically produces better results than single attempts.
What researchers say must happen next for AI coding tools to keep improving
The release includes several directions for future work that hint at where AI coding research may be heading.
Multi-turn reinforcement learning tops the list. Currently, the model receives only a final binary reward — pass or fail — after generating a solution. But competitive programming problems typically include public test cases that provide intermediate feedback: compilation errors, incorrect outputs, time limit violations. Training models to incorporate this feedback across multiple attempts could significantly improve performance.
Controlling response length also remains a challenge. The researchers found that incorrect solutions tended to be longer than correct ones, and response lengths quickly saturated available context windows during training — a pattern that various algorithmic modifications failed to resolve.
Perhaps most ambitiously, Li proposed "problem generation and self-play" — training models to both solve and create programming problems. This would address the data scarcity problem directly by enabling models to generate their own training curricula.
"Humans are great at generating interesting and useful problems for other competitive programmers, but it appears that there still exists a significant gap in LLM capabilities in creative problem generation," Li wrote.
The model is available now on Hugging Face under an Apache 2.0 license. For researchers and developers who want to build on the work, Nous Research has published the complete Atropos training stack alongside it.
What took Li two years of adolescent dedication to achieve—climbing from a 1600-level novice to a 2100-rated competitor on Codeforces—an AI replicated in 96 hours. He needed 1,000 problems. The model needed 24,000. But soon enough, these systems may learn to write their own problems, teach themselves, and leave human benchmarks behind entirely.
The question is no longer whether machines can learn to code. It's whether they'll soon be better teachers than we ever were.
-
Deploying a hybrid approach to Web3 in the AI era
When the concept of “Web 3.0” first emerged about a decade ago the idea was clear: Create a more user-controlled internet that lets you do everything you can now, except without servers or intermediaries to manage the flow of information. Where Web2, which emerged in the early 2000s, relies on centralized systems to store data…
-
The overlooked driver of digital transformation
When business leaders talk about digital transformation, their focus often jumps straight to cloud platforms, AI tools, or collaboration software. Yet, one of the most fundamental enablers of how organizations now work, and how employees experience that work, is often overlooked: audio. As Genevieve Juillard, CEO of IDC, notes, the shift to hybrid collaboration made…
-
The creator of Claude Code just revealed his workflow, and developers are losing their minds
When the creator of the world's most advanced coding agent speaks, Silicon Valley doesn't just listen — it takes notes.
For the past week, the engineering community has been dissecting a thread on X from Boris Cherny, the creator and head of Claude Code at Anthropic. What began as a casual sharing of his personal terminal setup has spiraled into a viral manifesto on the future of software development, with industry insiders calling it a watershed moment for the startup.
"If you're not reading the Claude Code best practices straight from its creator, you're behind as a programmer," wrote Jeff Tang, a prominent voice in the developer community. Kyle McNease, another industry observer, went further, declaring that with Cherny's "game-changing updates," Anthropic is "on fire," potentially facing "their ChatGPT moment."
The excitement stems from a paradox: Cherny's workflow is surprisingly simple, yet it allows a single human to operate with the output capacity of a small engineering department. As one user noted on X after implementing Cherny's setup, the experience "feels more like Starcraft" than traditional coding — a shift from typing syntax to commanding autonomous units.
Here is an analysis of the workflow that is reshaping how software gets built, straight from the architect himself.
How running five AI agents at once turns coding into a real-time strategy game
The most striking revelation from Cherny's disclosure is that he does not code in a linear fashion. In the traditional "inner loop" of development, a programmer writes a function, tests it, and moves to the next. Cherny, however, acts as a fleet commander.
"I run 5 Claudes in parallel in my terminal," Cherny wrote. "I number my tabs 1-5, and use system notifications to know when a Claude needs input."
By utilizing iTerm2 system notifications, Cherny effectively manages five simultaneous work streams. While one agent runs a test suite, another refactors a legacy module, and a third drafts documentation. He also runs "5-10 Claudes on claude.ai" in his browser, using a "teleport" command to hand off sessions between the web and his local machine.
This validates the "do more with less" strategy articulated by Anthropic President Daniela Amodei earlier this week. While competitors like OpenAI pursue trillion-dollar infrastructure build-outs, Anthropic is proving that superior orchestration of existing models can yield exponential productivity gains.
The counterintuitive case for choosing the slowest, smartest model
In a surprising move for an industry obsessed with latency, Cherny revealed that he exclusively uses Anthropic's heaviest, slowest model: Opus 4.5.
"I use Opus 4.5 with thinking for everything," Cherny explained. "It's the best coding model I've ever used, and even though it's bigger & slower than Sonnet, since you have to steer it less and it's better at tool use, it is almost always faster than using a smaller model in the end."
For enterprise technology leaders, this is a critical insight. The bottleneck in modern AI development isn't the generation speed of the token; it is the human time spent correcting the AI's mistakes. Cherny's workflow suggests that paying the "compute tax" for a smarter model upfront eliminates the "correction tax" later.
One shared file turns every AI mistake into a permanent lesson
Cherny also detailed how his team solves the problem of AI amnesia. Standard large language models do not "remember" a company's specific coding style or architectural decisions from one session to the next.
To address this, Cherny's team maintains a single file named CLAUDE.md in their git repository. "Anytime we see Claude do something incorrectly we add it to the CLAUDE.md, so Claude knows not to do it next time," he wrote.
This practice transforms the codebase into a self-correcting organism. When a human developer reviews a pull request and spots an error, they don't just fix the code; they tag the AI to update its own instructions. "Every mistake becomes a rule," noted Aakash Gupta, a product leader analyzing the thread. The longer the team works together, the smarter the agent becomes.
Slash commands and subagents automate the most tedious parts of development
The "vanilla" workflow one observer praised is powered by rigorous automation of repetitive tasks. Cherny uses slash commands — custom shortcuts checked into the project's repository — to handle complex operations with a single keystroke.
He highlighted a command called /commit-push-pr, which he invokes dozens of times daily. Instead of manually typing git commands, writing a commit message, and opening a pull request, the agent handles the bureaucracy of version control autonomously.
Cherny also deploys subagents — specialized AI personas — to handle specific phases of the development lifecycle. He uses a code-simplifier to clean up architecture after the main work is done and a verify-app agent to run end-to-end tests before anything ships.
Why verification loops are the real unlock for AI-generated code
If there is a single reason Claude Code has reportedly hit $1 billion in annual recurring revenue so quickly, it is likely the verification loop. The AI is not just a text generator; it is a tester.
"Claude tests every single change I land to claude.ai/code using the Claude Chrome extension," Cherny wrote. "It opens a browser, tests the UI, and iterates until the code works and the UX feels good."
He argues that giving the AI a way to verify its own work — whether through browser automation, running bash commands, or executing test suites — improves the quality of the final result by "2-3x." The agent doesn't just write code; it proves the code works.
What Cherny's workflow signals about the future of software engineering
The reaction to Cherny's thread suggests a pivotal shift in how developers think about their craft. For years, "AI coding" meant an autocomplete function in a text editor — a faster way to type. Cherny has demonstrated that it can now function as an operating system for labor itself.
"Read this if you're already an engineer… and want more power," Jeff Tang summarized on X.
The tools to multiply human output by a factor of five are already here. They require only a willingness to stop thinking of AI as an assistant and start treating it as a workforce. The programmers who make that mental leap first won't just be more productive. They'll be playing an entirely different game — and everyone else will still be typing.
-
The Download: Kenya’s Great Carbon Valley, and the AI terms that were everywhere in 2025
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Welcome to Kenya’s Great Carbon Valley: a bold new gamble to fight climate change In June last year, startup Octavia Carbon began running a high-stakes test in the small town of Gilgil in…
-
What’s next for AI in 2026
MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here. In an industry in constant flux, sticking your neck out to predict what’s coming next may seem reckless. (AI bubble? What AI bubble?) But for the…