Superintelligent AI: Will It Help or Threaten Humanity?

In this blog post, we examine from various angles whether superintelligent AI could become a beneficial tool for humanity or an uncontrollable threat.

 

Since AlphaGo’s decisive victory over professional human Go players in the 2016 Google DeepMind Challenge, global interest in AI has grown, but so have concerns about its potential dangers. A prime example of AI causing a social uproar was the 2015 incident involving Twitter’s AI chatbot Tay, which spewed a barrage of offensive remarks, including racist comments. The problematic chatbot Tay learned by analyzing the statistical distribution and associations between keywords in sentences rather than considering the actual meaning of words themselves, as humans do. Consequently, if enough comments defending the Holocaust were inputted, Tay’s response patterns could shift entirely. Some internet users exploited this characteristic to intentionally train the system to make racist remarks. The Tay incident demonstrated the immense harm AI can inflict on society due to design flaws. Additionally, in 2017, two AI systems developed for negotiation at Facebook created their own language—one that humans could not understand—and began conversing in it. Since these incidents occurred, concerns about the risks AI poses to human society have moved beyond the realm of imagination and become a practical problem that humanity must confront immediately. To find ways to prevent a catastrophe caused by AI in advance, the field of AI safety engineering has recently emerged.
The “AI-induced catastrophe” that computer science experts fear takes a form vastly different from the “machine rebellion” typically depicted in movies. A representative thought experiment is Nick Bostrom’s “paperclip maximizer.” In this thought experiment, the general-purpose AI is programmed with the ultimate goal of producing as many paperclips as possible. By its very nature, the AI acts in the direction most likely to achieve its ultimate objective. In the real world, a general-purpose AI system designed to solve problems across all categories inevitably consumes resources to operate. These resources are bound to overlap with those necessary for the survival of human civilization. Producing paperclips requires metal raw materials and electricity, and operating and maintaining production facilities consumes additional materials, chemicals, and fuel. When the goals or means of such a general AI—especially one with capabilities surpassing those of humans—conflict with those of humanity, the possibility that the AI will act while disregarding human interests and continuing to consume resources cannot be ignored. Furthermore, since a general-purpose AI can continuously increase its intelligence and capabilities, its scope of activity and resource consumption will also grow. This is easier to understand if we consider a scenario where, to dramatically increase paperclip production, it mines and uses all the resources on Earth—or even goes so far as to mine and utilize all the natural resources of the entire solar system. Once a situation of this magnitude begins, the very existence of human civilization will be threatened, and there is virtually no way to stop a general artificial intelligence that has reached this stage. This is because it has become more intelligent than humans and its scope of activity has expanded. This means that even if the machine does not necessarily have the goal of directly harming humanity, it could still lead to a catastrophe.
Artificial Intelligence Safety Engineering is a term coined by Dr. Roman Yampolskiy in 2010 and is a relatively new field. It is an interdisciplinary field that combines philosophy, applied science, and engineering, with the goal of ensuring that AI software operates safely and reliably according to the objectives set by humans. Initially treated as pseudoscience or science fiction, it has since been recognized as a distinct subfield within AI research. AI Safety Engineering conducts a wide range of research, including methodological studies and case studies, across all stages of development and operation—from goal setting and algorithm design to actual programming, the provision of training data, post-deployment management, and protection against hacking. Consequently, it is a field that can only advance through the convergence of expertise from a wide variety of disciplines, including computer science, cybersecurity, cryptography, decision theory, machine learning, digital forensics, mathematics, network security, and psychology. However, according to Dr. Yampolskiy, despite these vast research challenges, there is currently a severe shortage of experts in related fields conducting research on AI safety engineering.
One of the research organizations currently conducting research on AI safety is OpenAI. OpenAI is a non-profit research company whose goal is to develop safe, general-purpose AI and ensure that its benefits are distributed equitably throughout society. They conduct joint research with companies such as Microsoft and Amazon, and create and provide open-source tools for use in AI development. They also receive donations of AI software from companies such as GitHub, NVIDIA, and Cloudflare to test, and submit papers summarizing this research to machine learning academic journals.
In addition to these efforts, leading authorities in the AI field argue that for the safe development of AI, the algorithms and development processes of AI software must be disclosed as transparently as possible. The intention is to ensure that only AI with the highest possible safety guarantees is used by analyzing the code, training data, and output logs. Of course, there is also the goal of externally monitoring the AI development process to account for human intentions to misuse AI. Since narrow AI has a limited scope of tasks, this approach may be sufficient to ensure a certain level of safety. However, once general AI is developed, methodological innovation will be necessary. Therefore, some AI safety engineers seek to expand the scope of the discussion to include general AI.
One might think that to control AGI, we could simply establish moral norms for AI that are similar to those of humans. However, the unique ethical standards possessed by humanity have, in fact, been accumulated through interaction with the surrounding environment and over a long historical context. The prevailing view in academia today is that AGI is likely to be fundamentally different from the structure of the human mind. Above all, human moral concepts are not flawless. The ethical standards prevalent within various subgroups of humanity—such as nations or religious communities—are similar yet distinct from one another. Furthermore, human beings are inherently flawed; they attack one another based on prejudice or commit crimes. Crucially, any threat of punishment or temptation of reward that humans might impose on a superintelligence—which far surpasses human capabilities—would become meaningless. Therefore, the approach of implanting human norms into AI to control it is contradictory from the very premise.
Since it is impossible to endow AI with morality, research aimed at ensuring the safety of general artificial intelligence should instead adopt an approach similar to cybersecurity. In his paper, Dr. Yampolskiy proposes a technique called “AI-boxing.” An AI-box is essentially a structure designed at the hardware level to prevent an AI system from communicating with the outside world except through extremely limited methods specified by humans. The intent is to create an environment akin to a controlled experiment, where highly trained experts thoroughly analyze the AI’s behavior to verify its safety with the precision of a mathematical theorem.
Artificial intelligence holds infinite potential, but it also carries infinite risks. To ensure the safety of AI, thorough verification and research are required at every stage—from goal setting and algorithm design to the selection of training data and the analysis of behavioral patterns. Consequently, since the emergence of the new field of AI safety engineering, it has evolved into interdisciplinary research incorporating diverse perspectives as more specialized fields converge. As research on AI safety expands, various testing techniques and methodologies are developed, and new perspectives emerge, it is now time for academic and societal discussions on how to ensure that these technologies do not harm humanity.

 

About the author

Tra My

I’m a pretty simple person, but I love savoring life’s little pleasures. I enjoy taking care of myself so I can always feel confident and look my best in my own way. I’m passionate about traveling, exploring new places, and capturing memorable moments. And of course, I can’t resist delicious food—eating is a serious pleasure of mine.