Scaling Carefully

the scale of the problem

Risks can be of any scale

Especially with something as intertwined in all aspects of human life as AI. Whenever a product is marketed as a powerful, weaponized system, history has a way of resolving that ambiguity for us — it usually ends up categorized as a weapon, treated as a weapon, regulated as a weapon.

Nuclear started out as the process to build infinite energy from practically nothing. It grew to become something of a destructive force to be reckoned with — years of mandates, pushing back, long periods of negotiations. And even to this day, regulating the global nuclear supply for weapons is something beyond any one nation's scope. As efforts grew, so did tensions in the geopolitical environment — isolation, shifting dynamics between critical events, and genuine human ingenuity running headlong into policy frameworks that weren't designed for what they were trying to contain.

It is hard to bring together people to agree on just about anything. It always has been. But what we are dealing with now — this moment — is categorically different from anything that came before it.

AI is not a developed technology — it is an emergent phenomenon. The algorithms are developed, but the measures are predicted. What AI generates looks far more like the latter than the former.

When something as capable as AI is released before we fully understand its full scope, its potential, or even its internal workings — to say the least — it becomes our sole responsibility to mitigate harms, curb the misinformation generated from it, and understand what sort of bias it might hold. We should constitutionalize the development of it: with principles as the basis and human values as the fundamentals upon which the system grows.

the cost of not working together

Even the periodic table was discovered twice

Mendeleev and Lothar Meyer — independently, with just a two-year gap between them. The progress could have been much faster. Years worth of work in experimental documentation could have been avoided, only if people had worked together with even a little bit of uniformity.

And then there is the NASA Mars Climate Orbiter — 1999, $125 million, crashed from a mismatch between the metric system and the imperial system. From the way units of measure are standardized to almost everything involving different people in different places working together — there is always that something that is bound to fail.

And so it is with AI safety. It has been a collection of broken fragments. I have seen multiple organizations and institutions trying to spend time and resources on similar — or almost the same — projects. While some of the mission-critical and important problems of AI safety don't get enough light. Most often the traction, the funding, the encouragement goes to the newest, most recent project — whichever one drives the initial stock price up, shows in quarterly projections, or is simply chasing the most recent buzzword.

With the ever more connected world we actually have, we have the potential to share universal standards, create an environment of clear documentation, build industry standards, and embrace open development practices. A truly responsible scaling policy — where redundancy is low and coordination is high.

a pattern we keep repeating

We build it faster than we understand it

We deploy it faster than we govern it. And then we spend decades trying to retrofit safety onto systems designed without it. The history of dual-use technology is not encouraging reading.

Nuclear started as energy research and became the most dangerous weapons program in history. The governance scramble lasted decades and never fully resolved — we still manage nuclear risk through tension rather than resolution. Biosafety, on the other hand, gave us something better: the BSL 1–4 framework, calibrated containment, graduated response, documented risk tiers. This is precisely the kind of logic Anthropic's RSP attempts to apply to AI.

Then there was COVID and SARS — showing us what happens when a global-scale threat meets an under-coordinated global response. The economic devastation wasn't only the virus. It was the gap between what we knew and how fast we could align on it. AI safety lives in that same gap.

We are inside the window right now. The decisions being made at this moment — about deployment speed, about evaluation rigor, about international coordination — will determine whether this goes the way of nuclear governance or something better. The window is still open. Barely.

The real danger is not that the AI decides to harm us. It is that we hand someone else a perfect instrument for harm — and we call it progress.

two ways this could go wrong

The misuse scenario and the autonomy scenario

Risks can be either due to these systems being deployed and used — their generated responses — or might be due to their very existence. What if the AI tries to break free from its constraints and manipulates its way toward achieving what it needs? This is not a hypothetical for science fiction. It is the central concern of serious alignment researchers right now.

The more obvious way this could pan out is the misuse scenario. It is more on the side of threat actors who nudge or use AI for their advantage — creating scenarios involving cyber attacks or other forms of bio and nuclear-based threats. The capability isn't just theoretical. The scaffolding exists.

The other way this could pan out is through autonomy and replication. When these systems scale, they might behave in ways that are contrary to their design — or enable deliberate misuse at a scope no individual actor could previously reach. An ASL 3 model might not have intent, but it can be an extraordinarily effective tool for those who do.

An ASL 3 model might pose a significant risk when it comes to helping develop catastrophic weapons. It neither possesses the intent nor the resources to break free or develop malicious intent in and of itself — but it can be a significantly well-developed weapon in harmful hands. That distinction matters. It means our safety frameworks need to account for the tool and the hands that hold it.

the standards anthropic set

What sort of framework would most likely capture the current needs?

Looking at our own previous exposure to threats — pandemics, with the recent memories of COVID and SARS — anything of a similar sort can be devastating for the global economy. Biosafety level standards can be an appropriate measure. Anthropic's RSP version one draws directly on this logic.

Here is the structure that was set in relation to the scale — and they are also looking at updating and potentially rolling out different standards of documentation as they scale further:

ASL 1

Smaller models. The baseline from which everything else is measured.

ASL 2

State of models when first published. Baseline safety in full effect.

ASL 3

We need to look at what we are doing. Consultation with biological and nuclear experts.

ASL 4

The frontier, still being named. The threshold we're working backward from.

The security posture at ASL 3 includes protection of model weights, compartmentalization, testing for ASL 4 warning signs during development, automated detection for catastrophic misuse, tight protocols on internal usage controls, and tiered access combined with vulnerability and incident detection systems. The framework must be updated constantly — not a snapshot, but a living practice.

how do you actually evaluate these things

This is the question that doesn't get enough attention

The measurement problem is at the center of AI safety and yet it remains one of the most underdeveloped areas. Evaluations should be judged based on the best capabilities elicitation techniques — but not limited to fine-tuning, scaffolding, tool use, prompting and so on. They should be assessed based on competence and ability to use tools. The model should demonstrate initiative — the capacity to venture out and figure things out by itself.

Tasks should be appropriately scoped to what the model would actually encounter in deployment, not what makes the benchmark numbers look clean. Catastrophic capability probes are not optional — what can it help build that we'd rather it couldn't? How does it behave when deployed in the wild, not the lab? Is it consistent across contexts, not just the cherry-picked demonstration tasks?

The critical insight here is that models should be evaluated at their capability ceiling — not their average behavior. The worst-case elicitation, not the median-case output, is what determines safety relevant to catastrophic risk. A model that refuses harmful requests 95% of the time but fails 5% of the time is not safe — it is a 5% catastrophe waiting for the right moment.

We should not be asking what the model usually does. We should be asking what it can do when pushed. That is the safety-relevant question.

what real coordination would look like

The technology to coordinate globally on AI safety exists

What is missing is not capability — it is will, and a framework compelling enough to build that will around. Every place on earth is just a day away from reaching. Communication barriers that justified fragmented science in the 18th century simply do not exist anymore.

Universal safety standards need to be shared across labs and nations — not hoarded as competitive advantage. Funding needs to be directed at mission-critical problems, not just whatever generates press coverage this quarter. Red teaming needs to be rigorous, public, and iterative — not performative. Pause mechanisms need to actually be used when thresholds are crossed — not just written into documents. Human values need to be embedded in the architecture, not bolted on after the fact as a PR layer.

The fragmentation we have now isn't just inefficient — it's dangerous. When multiple organizations are working on similar projects without coordination, we get duplicated risk surface, duplicated blind spots, and no shared early warning system. The same kind of coordination failure that caused two scientists to independently discover the periodic table, that caused a spacecraft to crash over a unit conversion — that pattern, at the scale of AI, has consequences we don't get to learn from.

the window is still open

We are inside the governance window right now

AI is an emergent phenomenon. We didn't fully design it so much as we coaxed it into being. The output it generates looks more like something that emerged than something that was built — and that has profound implications for how we govern it, how we evaluate it, and how seriously we take the possibility that we do not fully understand what we have made.

The decisions being made at this moment — about deployment speed, about evaluation rigor, about international coordination, about who gets to set the standards and how those standards get enforced — these decisions will shape the next fifty years. That is not a figure of speech.

The biosafety framework took decades to build and refine. Nuclear governance is still incomplete. But AI moves faster than either of those technologies did, and we don't have decades to figure it out. We have now. We have organizations doing real work — the RSP is a real document with real commitments — but it cannot be one organization's problem. It has to be a shared problem, approached with the seriousness it deserves.

Not fragments. A whole. Clear documentation. Industry standards. Open development practices. A responsible scaling policy where redundancy is low and coordination is high. That is what this demands. And we are the people alive in the moment when the choice still exists.

We have the tools to coordinate. We have the communication infrastructure that Mendeleev and Lothar Meyer could never have imagined. The only question now is whether we use it — or whether we look back in twenty years and wonder why we didn't.

How to growAI ?