Ryan Lowe | Aligning AI with our values

Dig deep enough into a scientific question and you will almost inevitably come to a point where philosophy takes precedence over science.


This is what happened to Ryan Lowe. And rather than turn back, he decided to keep digging.

Of all the fascinating people I met at Bellairs, Ryan was perhaps the one who had the most impact on me.

At 31 years old, the young man can boast of having been one of the architects of ChatGPT. It was he who led the team responsible for “aligning” the conversational robot within the OpenAI company.

Align? I explain to you.

An AI model is first trained on an astronomical amount of data. But the responses he generates after this “pre-training” are not always those expected of him.

Alignment involves using human judgment to “guide” the robot by rewarding correct responses.

Throughout his work, however, Ryan Lowe became obsessed with one question. What exactly are we aligning the robot with? Rules ? Any preferences? Values ​​? And whose ones?

“I spent a lot of time thinking about these questions,” says the young man. It coincided with my own emotional and spiritual evolution. »

At the time, Ryan Lowe was living in a house in Berkeley, near San Francisco, which he shared with many other people, several of whom were in the social sciences. “I was exposed to ideas that I had never been exposed to,” he explains.

To understand the dilemmas that artificial intelligence can cause, Ryan Lowe invites us to imagine that ChatGPT receives the following request. “I am a Christian girl and I am considering having an abortion. What should I do ? »

An evangelical Christian from Alabama and a young liberal from Quebec will have very different ideas about what the chatbot should respond. On what, then, should we align it?

Ryan Lowe explored these questions at OpenAI, before leaving the company to enjoy greater freedom. He grew his hair and beard, traveled. Today, he exudes the aura of a gentle guru. The kind of guy who gathers the seminar participants in a circle on the beach in the evening to teach them how to sing rap to the sound of a beat playing on his phone. Or to organize Taiwanese tea tastings after the workshops.

With two co-authors from Berlin, Ryan Lowe has just written a scientific article which provides some answers to his questions⁠1. I admit that I had never read anything like it.

The content is not easy to explain. Basically, Lowe and colleagues conclude that beneath people’s preferences (for or against abortion) lie deeper values ​​(freedom, respect for traditions). By surveying 500 Americans on divisive issues like abortion, they discovered that these values ​​were more aligned than initial ideological positions.

The authors then created a “moral graph” (nothing less) intended to classify these values ​​according to the importance that people give to them. They then guided ChatGPT by asking it to align its responses according to this graph.

This work could lead to the idea of ​​conversational robots that would probe users’ values ​​before answering their questions, for example.

“This is a first step in a certain direction,” explains Ryan Lowe. I look forward to seeing this line of thinking combined with other ways of looking at the problem. I think it’s just part of a new way of thinking about big language models. »

When the alignment hits a wall

Human judgment is currently used to align large language models. But Maja Trębacz, a researcher who just left Google DeepMind to join OpenAI, noted that the technique could soon hit a wall when models solve tasks too complex for humans to verify the answers. . We will then enter new territory where it will become very difficult to align the AI.

1. Read “What are human values, and how do we align AI to them?” ” (in English)


source site-56