Artificial General Intelligence: can we avoid the ultimate existential threat?

Existential risks are posed by events that would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.”  Philosopher Toby Ord has written a best-seller on existential risks, and Astronomer-Royal Martin Rees  has warned that this is “Our Final Century.” Yet x-risks are still a neglected topic on government and global organisations’ agendas.

Fortunately, there are signs of change. NASA just completed a project to crash a projectile (the DART spacecraft) into an asteroid to change its course. This illustrates that the potential existential risk of an asteroid colliding with the earth can be vastly reduced if we act now.  

Asteroid impacts, climate change, and nuclear conflagration are all potential existential risks, but that is just the beginning. So are “solar flares, super-volcanic eruptions, high-mortality pandemics“, and even stellar explosions. All of these deserve more attention in the public debate.

But one fear trumps the worries of existential risk researchers: Artificial General Intelligence (AGI). AI may be the ultimate existential risk. Warnings by prominent scientists like Stephan Hawking, twittering billionaire Elon Musk and an open letter signed in 2015 by more than 11,000 persons have raised public awareness of this still under-appreciated threat. Toby Ord estimates that the likelihood of AI causing human extinction is one in ten for the next hundred years.

What makes AGI such a danger?

AI is a General-Purpose Technology like the wheel, writing, or electricity and holds the promise of helping humans overcome intellectual constraints and solve fundamental problems. An AI system as or even more intelligent than humans – an Artificial General Intelligence (AGI) or Artificial Super Intelligence (ASI) – will be able to self-improve recursively and hence generate an intelligence explosion. If humans manage to control AGI before this happens, it could unlock countless innovations in science and business. It could herald a period of super-exponential growth and advances in all areas of science.

AI is an existential threat because of the unlikelihood that humans will be able to control an AGI/ASI once it appears. The AI may intentionally or, more likely, unintentionally wipe out humanity or lock humans into a perpetual dystopia. Or a malevolent actor may use AI to enslave the rest of humanity or even worse.

Rapid breakthroughs in AI abound, but doubts remain

So far, the quest for smarter AI has resulted in systems that match human brains in computing power (flops). But some think it is just a matter of time until we have the right “master” algorithm that will perfectly understand how the world and the people in it work.

It is not unrealistic to expect recursively self-improving AI to arrive soon. Breakthroughs in AI have been coming in rapid succession with AlphaGo, GPT3, Gato, Dall-E2, AlphaCode, and others. And at least 72 projects with the explicit goal of creating AGI have been set up and funded, including DeepMind with 1300 staff.

Despite rapid progress, some doubt that we are headed for AGI or will ever be able to invent such a technology or that it will pose an existential threat. Experts often argue that we do not understand the human brain and the role of consciousness in intelligence adequately enough.

Accidental progress could get us there

At the same time, we constantly create things we do not fully understand until after we have replicated them. Innovation is often extreme trial and error or even accidental.  Viagra and other medicines are proof of this!

A century after their invention, we still need to find out precisely why aeroplane wings give a lift. Copying birds would be beyond us, yet we still figured out how to fly. We may also stumble onto an algorithm that can mimic general intelligence without fully understanding it, or even get there primarily by scaling up current approaches.

Is AI’s existential threat inevitable?

As experts seek to avoid disaster, those working in the emerging field of AI Safety have been trying to create a Friendly AI by aligning AGI/ASI values with human values. However, this “alignment problem” has yet to be solved: there is no agreement on how to proceed and no solution in sight.

As the paperclip and smile maximiser thought experiments have shown, it appears hazardous to program values directly into AI systems’ utility functions (supergoals). One suggested solution from Stuart Russell is that we let AI be uncertain about its utility function so that it would need to learn from and defer to humans.

Others have proposed containing AI until we can solve the alignment problem. One option would be to create an “AI Nanny”, an AI with super-human intelligence that will prevent an AGI/ASI from arising too soon.

While more than 50 proposals for an AI Nanny are in the works, “nobody has any idea how to do such a thing, and it seems well beyond the scope of current or near-future science and engineering.”  At any rate, if we accidentally stumble into creating a recursively self-improving AI, it may evolve so rapidly that it would be too fast for humans to do anything about. There would be no “fire alarm”, nor would there be time to take safety measures or pull the off switch. Although innovation tends to happen incrementally through starts-and-stops, so we may have time to eliminate fatal flaws in AI. This would still require vigilance.

Some researchers claim it is impossible to ensure safety with AGI. So, should we even try to develop it? For technology such as AGI, it would make sense to apply a form of the precautionary principle: do not develop if safety is not proven. However, the difficulty is in the implementation. Currently, only large and well-funded teams are researching AGI. But if exponential hardware progress continues, small teams or individuals could amass sufficient computational power to play around with large AI models and lead to AI arms races.

Regulating all these small players may not be tractable. The only option then could be some form of hardware regulation. Implementing such regulations on a global level would be a long and challenging process, perhaps comparable to nuclear non-proliferation or negotiating international climate agreements, pitching us against the collective action problem. Since this option will take time, and AI and computing capabilities are racing forward, it is crucial to start now.

Human nature makes AGI too risky for trial and error

Since the inception of existential risk research, humanity has lost a lot of its naïve cheerfulness about landmark technological breakthroughs. While few in the past held the Luddite opinion that technological development is universally bad, the opposite view, namely that technological development is universally good, has been mainstream.

Existential risk research implies we should reconsider that idea, as it consistently concludes that we run a high risk of extinction due to our own inventions. Instead, we should focus on making risk-gain trade-offs for innovations on a per-technology basis. At the same time, we should use a mix of technology and regulation to enhance safety instead of increasing risk, and AI Safety is a prime example of that.

In the past, we used trial-and-error science and technology to manage nature’s challenges. The big question now is: can we use caution, reason, restraint, coordination, and safety-enhancing technology to address the existential risks stemming from our own inventions?

Disclaimer: The opinions expressed and arguments employed herein are solely those of the authors and do not necessarily reflect the official views of the OECD or its member countries. The Organisation cannot be held responsible for possible violations of copyright resulting from the posting of any written material on this website/blog.