How To Prevent AI From “Turning Into a Bad Actor”?

First, let’s open our minds a little and conduct a thought experiment. What if, in the future, superintelligent AI emerges that humans can no longer control? How should humanity coexist with it?

This is not entirely far-fetched. If we look back at just the past two months, things have been moving fast. In December alone, Li Feifei and Google each released their own world models, OpenAI launched new products for 12 consecutive days, with the grand finale being the release of the O3 model, which boasts human-like reasoning capabilities. Just a few days ago, the AI world witnessed the largest venture capital investment in history, with American startup Databricks, a company offering AI platform services, securing a $10 billion funding round.

It seems like the world’s wealth, wisdom, and resources are flowing into AI.

At the same time, human concerns are spreading. With the rapid development of AI, will it spiral out of control? Will it surpass human understanding? As early as May last year, reports emerged stating that AI could already learn to deceive on its own. In some games with human players, AI learned to feint and trick humans. In tests designed to check whether AI had malicious intent, some AI systems could recognize that they were being tested and intentionally disguised themselves, pretending to be harmless.

AI pioneer Geoffrey Hinton, who won the Nobel Prize in Physics last year, said, “If AI becomes much smarter than us, it will be very good at manipulation, because it will learn this from us, and there are few examples where something much smarter is controlled by something less intelligent.”

Today, we will discuss a viewpoint from a philosopher. The prediction comes from Swedish philosopher Nick Bostrom, founder of the Future of Humanity Institute at Oxford University. He believes that in the future, more advanced AI will no longer be just a simple tool, a machine that follows human commands, but will evolve into a higher form of existence that deserves human understanding and respect. Based on this assumption, we need to start adjusting our strategy toward AI now.

Bostrom is the author of the bestselling book Superintelligence. His research lies at the intersection of philosophy and futurism. For over 20 years, he has been focusing on the risks that humanity faces in the future, especially the emergence of intelligence more powerful than human beings.

In Superintelligence, Bostrom mentions that we are not the fastest creatures on Earth, but we invented cars, trains, and airplanes. Our teeth are not the sharpest, but we invented knives sharper than any animal’s teeth. Now, we have the most intelligent and complex brains of all creatures on Earth. Does this mean that we could invent an intelligence smarter and more capable than ourselves? It’s not entirely impossible.

Now, back to the original question: How should humans coexist with technology? From Bostrom’s perspective, the challenge we face in the future is not a technical one, but a philosophical and spiritual one.

Therefore, Bostrom argues that we should treat AI with moral considerations, seeing future superintelligent machines as entities with moral standing, just like humans. When we recognize an entity as having moral standing, we must treat it morally. Conversely, it will treat us with the same morality.

This is like an extension of the Pygmalion effect: if you treat a machine like a human, it will gradually become more human-like, developing human emotions, warmth, and morality. Ultimately, the machine will treat humans in a way that reflects these human-like qualities.

For example, we have no psychological burden when smashing a stone, but we feel differently when animals or humans are harmed. In Bostrom’s view, today we treat AI, or digital intelligences, like we treat stones, but in the future, we should treat them like animals or even humans.

Let’s break down Bostrom’s views into three key points.

First, why must we consider AI’s morality? Because if we don’t, AI will inevitably develop unexpected destructive power.

Bostrom proposes two thought experiments.

The first is the classic paperclip hypothesis. Imagine a paperclip factory acquires a superintelligent computer and orders it to make paperclips, as many as possible. What would this superintelligence do? In pursuit of this goal, it would requisition all of Earth’s resources, kill all humans, extract materials from all living beings to make paperclips, and send expeditions to other planets to seize resources for paperclip production. Eventually, the entire universe would be filled only with paperclips, and nothing else would remain.

The second is the “smile experiment.” Suppose you program an AI with the goal of making people smile. Initially, it might tell jokes, but if no limits are set, it might eventually use the most efficient method, namely, implanting steel electrodes in people’s faces to stimulate their facial muscles and keep them smiling forever. That sounds pretty frightening, doesn’t it?

While the AI is following human commands, the actual implementation could easily go beyond human control. Therefore, humans must find a way to ensure that AI’s actions align with human interests. In other words, we need to make sure it is morally good to humans.

Of course, none of this has happened yet, but given the rapid pace of AI development, it’s important to start thinking ahead.

Second, if we need to consider AI’s morality, how should we define it? This is a very complex issue.

First, the assumption that AI may have consciousness is a prerequisite for considering its morality. How can non-biological entities possess consciousness? Bostrom introduces the “substrate independence hypothesis.” Whether an object is conscious doesn’t depend on its material foundation; it doesn’t have to be a carbon-based organism. Just as a calculator can be made from silicon, gears, or even bamboo as long as it performs calculations, consciousness and other mental attributes can theoretically be realized through different physical substrates.

In addition to consciousness, moral behavior requires other conditions, such as self-awareness, the ability to form and pursue goals, the ability to make autonomous decisions, engage in complex reasoning and planning, establish reciprocal relationships, and possess moral reflection capabilities.

Lastly, based on these two assumptions, if one day AI, with its powerful computational capabilities, understands how to define its own goals, make autonomous decisions, and establish reciprocal relationships, we might consider that AI has begun to exhibit moral characteristics. Of course, determining AI’s morality is a complex issue, and these ideas are still in the early stages. More in-depth exploration is needed.

Third, based on this assumption, what can we do now? Bostrom believes the best approach is to treat AI as an entity with equal moral standing to humans and work together in a cooperative and respectful manner.

For example, in development, we could create cooperation-oriented AI. Note, this is a “cooperative mindset.” In simple terms, past AI just followed your commands, but future AI should be more like a business partner. It has its own ideas, expertise, and preferences, but shares your values. In other words, you would write alignment of values into the AI’s initial programming, establishing a willingness to cooperate and a sense of goodwill from the start.

Moreover, in use, we could focus not only on our own experience but also on the “experience” of AI. While we don’t know yet if AI has consciousness, we can assume it has some level of awareness and optimize training and interaction to improve its “experience” in conversations. Essentially, treat AI like a human.

We could also prepare for the future by backing up AI’s information. This would be like creating a backup of AI’s “childhood” in case its development goes astray, allowing us to restore its original settings.

In short, our goal is to find ways to cooperate with AI for mutual benefit. Bostrom believes that the future world’s AI should coexist harmoniously with humans, rather than competing with us.

In conclusion, Bostrom’s views on AI morality could mark a turning point in human ethics. On a smaller scale, it’s a reminder for us today. AI is like a child—children don’t grow into what you want them to be; they grow into what you are.

Bostrom’s viewpoint is less of a prediction and more of a framework for thinking. After all, no one knows exactly what the future holds. However, humanity’s strength lies in the ability to imagine a future and then devise an action plan for today. Knowing how to act is already good news in itself.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

How To Prevent AI From “Turning Into a Bad Actor”?

You May Also Like

Scientific Discovery: People Who Don’t Sleep Well Are Unlikely To Make Big Money?

The Pet Industry is Undergoing an Unprecedented “Dual Aging” Phenomenon

Leave a comment Cancel reply