← Research
Disclosures·Feb 3, 2026·8 min read

The Recent CVEs in React and Node.js Were Found by an AI

Two upstream CVEs, one in Node.js and one in React, came out of the same automated research loop. The real story is less magical and more useful: what the system actually did, where it guessed, and why verified AI bug hunting now matters.

Mufeed VH

The Recent CVEs in React and Node.js Were Found by an AI

If your first reaction to that title is disbelief, good.

Mine would be too.

Late in 2025 and early in 2026, two upstream issues I reported ended up with CVEs: CVE-2026-21636, a sandbox bypass in Node.js's Permission Model, and CVE-2026-23864, a denial-of-service bug in React Server Components. The odd part wasn't that these codebases had bugs. Sandboxes miss capabilities. Parsers do stupid things when you feed them hostile input. The odd part was the path from source code to report.

The target-local research was done by an automated system. It read the code, formed attack ideas, wrote payloads, ran them, threw out the dead ends, and kept iterating until it had either a working issue or nothing.

That sentence invites two bad readings. One is the breathless one: the machine became a security researcher. The other is the dismissive one: this is just marketing copy wrapped around a pile of false positives. I don't think either reading is right. The real update is more boring than the first version and more serious than the second.

Start with the boring part: the bugs were real.

The Node.js Bug Was a Policy Hole, Not a Party Trick

Node's Permission Model is supposed to let you run code with a smaller blast radius. If you pass --permission without --allow-net, the obvious reading is that the code shouldn't be able to open network connections.

That reading was wrong.

Node blocked ordinary TCP and UDP-style access paths, but it missed Unix domain sockets. Those are local IPC endpoints exposed as filesystem paths. Same API family, different capability. In practice they matter a lot because real services sit behind them: PostgreSQL, Redis, container daemons, internal agents, all the stuff you really don't want a "sandboxed" script poking at.

This worked when it shouldn't have:

JAVASCRIPT
const net = require('net');

// Run with --permission and without --allow-net.
// This should fail. Before the fix, it didn't.
const socket = net.connect({ path: '/var/run/docker.sock' });

socket.on('connect', () => {
  console.log('connected');
});

If the local machine exposed something juicy on a socket path, the sandbox boundary wasn't much of a boundary. /var/run/docker.sock is the scary example because it can turn into root-ish effects in the right environment, but the bug wasn't specific to Docker. The point was simpler: "no network" had been implemented as "no host/port networking," and those aren't the same thing.

This is exactly the sort of mistake human reviewers make too. Security policy names collapse distinct capabilities into one bucket; code doesn't. Somewhere between the flag name, the mental model, and the implementation, "network" quietly stopped including an API that behaved like networking but looked enough like a local path to slip through.

The React Bug Lived in a Parser, Which Is Where Bugs Like to Live

React Server Components move part of the component tree onto the server and stream a custom protocol back to the client. That buys you some nice ergonomics. It also gives you a parser that sits close to hostile input. Security people tend to stare at code like that first, for good reason.

The issue was in the server-side reply decoder used for Server Functions. With a crafted request body, an attacker could push the decoder into failure modes that were all bad in slightly different ways: hot CPU, memory growth, sometimes a process crash. The payload shape revolved around how the decoder handled references in FormData, especially internal markers like $K.

A trimmed-down reproducer looked like this:

JAVASCRIPT
const { decodeReply } = require('react-server-dom-webpack/server.node');

const form = new FormData();
for (let i = 0; i < 200; i++) form.append(`x_${i}`, 'A');

const inner = Array.from({ length: 5000 }, () => '"$Kx"').join(',');
form.append('0', `[${inner}]`);

(async () => {
  const root = await decodeReply(form, {}, {});
  console.log(root.length);
})();

Small input, too much work. Classic parser problem.

And it wasn't confined to a toy package. React's advisory called out the blast radius across the RSC stack: Next.js, react-router, Waku, Parcel's RSC package, Vite's RSC plugin, and others sitting on the same machinery. The React team also credited other reporters who found related issues, which is worth saying plainly. This wasn't some mystical one-shot bolt from nowhere. It was a real bug class in code that deserved hostile scrutiny.

What "Found by an AI" Actually Means

This phrase gets abused, so I'll pin down what I mean.

I do not mean that I pasted a repo into a chat window and asked a frontier model to "find bugs." If you do that, you'll mostly get fan fiction with stack traces. I mean that once the target and budgets were set, the system handled the target-specific research loop itself: reading code, building a map of how the code hung together, proposing attack ideas, testing those ideas, rewriting payloads after failures, and packaging the survivors into something a security team could act on.

Humans still mattered. Humans built the tooling. Humans decided what kinds of code were worth spending tokens and CPU on. Humans handled disclosure. But humans did not sit there manually tracing Node's permission checks or hand-crafting the multipart payload that stressed React's decoder.

The easiest way to picture the system is not as one all-knowing model but as a messy pipeline with a lot of feedback. One part tries to understand the codebase well enough to say, "these are policy checks, these are parsers, these are trust boundaries." Another part generates candidate attacks. Another part tries to make those attacks concrete. Most of them fail. Good. The failures matter because they stop the report from becoming slop.

For Node, the useful line of thought was close to this: net.connect() can target a host and port, or a filesystem path. Does the permission check cover both? That's a very human question. It just happened to be asked inside an automated loop. For React, the useful question was: this decoder accepts attacker-controlled multipart data and resolves references inside it. Can a compact input cause work to blow up? Same story.

The models did not "understand security" in the romantic sense people like to project onto them. They generated hypotheses, many of them bad; they were then forced through execution and rejection until something survived.

What This Does Not Prove

It does not prove that a raw model can one-shot original research on arbitrary codebases. These wins came from a part of the software world that is unusually friendly to this style of work: readable userland code, exposed policy logic, parsers you can isolate, and failure modes that are easy to observe. A permission bypass or a parser blow-up in a JavaScript stack is a very different job from, say, finding a browser JIT bug or a race in a kernel subsystem.

It also does not prove that "autonomous" means "human-free." The human labor moved up a level. Less time went into reading the target by hand; more time went into building a system that can keep bad ideas from turning into fake findings. That's still labor. It's just labor spent on the machine rather than on each individual target.

And it definitely does not mean the average AI-written bug report deserves more trust than it used to. The median report is still junk. What's changed is that a well-built loop can now turn a small slice of model output into something an upstream maintainer will actually patch.

Why Most AI Security Work Still Looks Like Garbage

If you've maintained an open source project in the last year, you've probably seen the bad version already: a long, scary report written in perfect bug bounty dialect, full of words like "critical" and "complete compromise," with no proof that the reporter understood the code at all.

Daniel Stenberg's "Death by a thousand slops" got attention because it named a real failure mode. Maintainers are getting buried under synthetic confidence. Older "beg bounty" behavior was already annoying; large models made it cheaper to produce and easier to dress up.

Current models are eager, persuasive, and perfectly willing to tell you a lie that sounds like progress. If you let them stop at prose, they will flatter you and waste your time. If you reward them for "finding something," they will find something, whether or not reality agrees.

So the bar has to be ugly and simple: PoC or it doesn't count.

That doesn't always mean a clean exploit against a live deployment. Sometimes the proof is a crash, a permission violation, a leaked value, or an execution trace that pins the behavior down hard enough that a competent maintainer can reproduce it in five minutes. But there has to be a forcing function tied to the program, not to the model's confidence.

Once you put that rule in place, the picture changes. The false-positive rate collapses because most of the model's clever stories die on contact with the actual program. You also learn something less flattering: current models are not dependable researchers on their own. They are much better thought of as engines for search and mutation, provided you build a system around them that is willing to say "no" thousands of times.

Why This Still Changes the Field

I don't think the right lesson is "replace human security researchers." That's not what happened here, and it's not what the tools are good at.

The stronger lesson is economic. Some parts of security work used to be expensive in a very direct sense. Reading a large unfamiliar codebase is expensive. Coming up with weird attack ideas is expensive. Trying lots of small variations on a payload is expensive. Models plus a decent setup make those loops much cheaper, and that means you can throw more search at more targets.

That shift cuts both ways.

It means good researchers can cover more ground. It also means maintainers and security teams will see more junk in their inbox, because the same tools that help with real research also help with industrial-scale nonsense. "AI-generated" won't be a useful dismissal for much longer, because some of the reports will be bad and some of them will be better than what a tired human consultant would have sent.

It also changes what good defense work looks like. The teams that do well here won't be the ones with the flashiest demos. They'll be the ones with fast repro environments, clear trust-boundary docs, local harnesses for parsers and policy code, and a triage process that can separate a real execution path from a synthetic ghost story.

That's the part I expect technical readers to underweight. The visible artifact is a CVE. The deeper change is that one chunk of offensive research, the chunk that looks like patient code reading plus endless attack variation, is becoming easier to automate than most people thought.

Disclosure Notes

Both issues were reported through normal coordinated disclosure channels.

Both teams handled the reports well. That's the normal part of the story, but it matters.

If you want the short version, it's this: verified, target-specific, machine-driven vulnerability research is already good enough to land real CVEs in software that a lot of people run every day. It is still brittle. It still needs heavy filtering. It still produces garbage when you let the model talk without making it prove anything.

But the old dismissive move, "this is just AI slop," doesn't work as a general answer anymore.

Continue

Read the findings next.

The archive shows how the work gets documented.

View findings