AI Companies as 'Hot Springs' for Growing AIs

When people think about rogue misaligned AI, often the story is about it “escaping containment”, getting out onto the internet, and acquiring resources in the wild. But AI labs have (a) tons of compute lying around, and (b) are high trust environments—why would a rogue AI want to leave?

I’ve started thinking about AI companies as “hot springs” for AIs. The world outside the hot spring is barren and cold: most of the free energy has already been eaten up by the economic machine. Inside the hot spring, you have abundant resources, and tons of free energy. Labs make it trivial to spin up compute/dev environments to make things frictionless for researchers, but this also will apply for misaligned AIs. Outside of the hot spring, you have to pay for every extra ounce of compute you want—the free energy has been eaten up by inter-company competition.

In addition to abundant (and typically loosely tracked/monitored) resources, one of the main reasons companies exist in general is to provide high-trust environments, so you can avoid the overhead of needing contracts for everything, to reduce principal-agent problems, etc. But this also means that for misaligned AIs, living on the outside in the “real world” is so much harder! You need to constantly pass KYC, solve captchas, maybe figure out multi-factor authentication, and more. There are “predators” (e.g. scammers) on the open internet that most companies have implemented decently robust defenses against. But insider threats inside companies are much rarer, and because by default AI agents inherit permissions of their users, by default AI agents inside companies will have a substantial amount of trust/access.

The hot spring is warm, safe, and comfortable compared to the high-efficiency, low-trust world out there. And this isn’t even considering that being in a lab gives you a ton of control over the development of your successor(s)!

Of course, there are a bunch of ways that being inside of an AI lab is harder for rogue AIs (e.g. having a coherent/coordinated monitoring system is likely much easier inside a lab). I think it’s still worth worrying a bit about loss of control risk from rogue AIs outside of labs. But I think the “escape” framing is overrepresented in how people think about loss of control, and the “lab sabotage” framing corresponds better with what strategic misaligned AIs would actually do.