Re-release - AI Alignment with Dr. Stuart Russell by Probably Science | Podchaser

Episode from the podcastProbably Science

Re-release - AI Alignment with Dr. Stuart Russell

Released Saturday, 23rd December 2023

Good episode? Give it some love!

Re-release - AI Alignment with Dr. Stuart Russell

Re-release - AI Alignment with Dr. Stuart Russell

Saturday, 23rd December 2023

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

2:00

have turned many people into much more extreme

2:02

versions of themselves. So

2:04

that's one example. But in the

2:06

long run, if you just ask anyone,

2:08

even a person, a die-hard skeptic

2:10

as we call them, or a

2:13

denialist as you might also call

2:15

them, okay, so we're investing

2:17

hundreds of billions of dollars with the

2:19

goal of creating general purpose intelligence

2:22

that's more intelligent than human beings,

2:25

and therefore more powerful than human beings. How

2:27

do you propose to retain power over

2:30

more powerful entities than yourself forever?

2:33

So that's the question. And

2:36

usually when you put it like

2:38

that, people say, oh yeah, I see what you mean. Okay,

2:41

I haven't thought about that. And

2:43

that's the issue, right? We are spending hundreds

2:46

of billions of dollars to achieve something and

2:48

we haven't thought about it yet. So

2:51

my colleague or

2:54

former student, Andrew Ng, is one

2:56

of the skeptics and he says, well, you know, I

2:58

don't worry about this anymore than I worry about overpopulation

3:01

on Mars. But

3:04

if we had a plan to move the

3:06

entire human race to Mars and

3:09

no one had thought about what we were going to

3:11

breathe when we got there, you

3:13

would say that's an unwise plan. But

3:17

that's the situation that we're in. No one has thought

3:19

about what happens if we succeed. So

3:21

the book is partly about how

3:24

to convince people that this matters

3:27

and then what is my

3:30

proposal for doing something about it. Right.

3:32

And the social media thing is interesting because I was thinking

3:34

about if you could go back in time 10 or

3:37

even just five years and

3:39

you tried to be the Paul

3:42

Revere of this system,

3:44

it would be really difficult to even convince people

3:47

of what 2019 would look like in

3:49

that way. Like, I don't think people

3:52

would have believed we would have entire governments fundamentally

3:54

altered as an unintended consequence

3:56

of optimizing for click through. And yet

3:58

it already happened this quickly. Yes,

4:02

so for the non-American listeners,

4:04

Paul Revere is someone who warned that the

4:06

British are coming, the British are coming, so

4:08

he was on the side of the American

4:10

revolutionaries. And

4:13

I guess my recommendation

4:15

would have been, first

4:17

of all, change the way you

4:19

think about the problem. So don't

4:22

just think, okay, what

4:24

is my objective, my in this case

4:26

being the social media platforms, what

4:28

is my immediate short-term objective is to make

4:30

money, and then set

4:32

up some optimizing machinery with that

4:34

as the objective, and then completely

4:36

ignore the effects that that's

4:38

going to have on things other

4:41

than your bottom line. So

4:45

with, you know, with, you

4:47

know, chemical companies that used to just dump poisonous

4:50

chemicals into the river while

4:53

they were making money, we said, okay, you

4:55

have to stop doing that or you have

4:57

to pay enormous fines or taxes or whatever.

5:00

We're trying to tax the

5:02

oil companies and coal companies for the carbon dioxide,

5:05

but that doesn't seem to be working. So

5:10

we can't really do that with turning

5:14

people into neo-fascists. It's not clear, you

5:16

know, what should the penalty be per

5:18

neo-fascist created. But

5:24

basically, if you're

5:26

going to build a system that messes

5:28

with stuff whose

5:31

value you're not sure

5:33

about, then you should

5:35

try to avoid messing with that stuff.

5:38

So in this case, the stuff is the

5:40

human mind, you know, our opinions,

5:42

our positions, our perceptions of the

5:44

world. So to the extent possible,

5:48

don't build systems that mess with that.

5:51

Since you don't know whether that messing is

5:53

a good idea or a bad idea. And

5:57

so you can design algorithms that... are

6:01

much less likely to manipulate people. So

6:04

the basic difference for the geeks

6:07

is between a supervised learning algorithm

6:09

that learns what people

6:11

want and a reinforcement learning algorithm that

6:14

changes what people want so that it's

6:16

easier to supply. And the

6:18

reinforcement learning algorithm doesn't know you have a brain.

6:20

It doesn't know you have political opinions. You're

6:22

just a clickstream history. And

6:27

they learn that given a certain

6:29

type of clickstream history, if

6:32

you subsequently feed certain articles

6:34

to that clickstream history, it

6:37

starts to generate more money. And

6:40

that's it. So it

6:42

turns out from our side that you're

6:45

gradually feeding people more and more extreme

6:47

violent videos or more and more extreme

6:49

pornographic content or more and more extreme

6:51

political content. Whereas

6:55

a supervised learning algorithm is not trying to change

6:57

the world. It's just trying to learn what the

6:59

world is like. In this case, learn what your

7:01

opinions are. So you might still get a bit

7:03

of an echo chamber effect, but

7:05

you wouldn't get this manipulation of people

7:07

to the extremes, which is

7:09

what seems to have happened. So

7:12

this is a general principle. And

7:15

this is one of the consequences

7:17

of the new way of doing AI that the

7:20

book proposes is that when

7:23

the algorithm knows that it doesn't know

7:26

the value of everything, it

7:29

will naturally avoid messing with the parts of

7:31

the world whose value it's not sure about.

7:34

And if it does have to mess with that, it

7:37

will ask permission. So

7:40

if it was a climate

7:42

control system before turning

7:44

the ocean into sulfuric acid in order to

7:46

reduce the amount of carbon dioxide, it

7:50

would ask permission because it's not sure if we want the

7:52

ocean to be made of sulfuric acid. And

7:55

so you get the

7:57

kind of deferential behavior that you would hope

7:59

for. by

12:00

just playing the right moves. Well, in theory, you

12:02

could just play the right moves, but in practice,

12:04

you can't because it's smarter than you

12:07

are. So

12:09

it will always anticipate and

12:11

frustrate your attempts. And it'll

12:13

take preemptive steps and possibly

12:15

even deceptive steps. So

12:18

it might pretend to

12:20

be innocuous, harmless, and

12:22

stupid long enough to prepare

12:24

all of its defenses so that it can

12:27

carry out the objective that you gave it.

12:30

It's not deceptive because it's

12:32

evil or because it wants to do something different

12:34

from what you told it. It's

12:36

just afraid that something it might do

12:39

would cause you to switch it off.

12:41

And so since it needs

12:43

to achieve the objective that's been programmed

12:45

into it, it develops

12:47

a subterfuge of appearing helpless so

12:50

that it can prepare all its

12:52

defenses and

12:55

then come out with the real plan to

12:57

achieve the objective. Whereas

13:01

in the new approach to AI, you

13:04

get the exact opposite effect that the

13:06

smarter the machine is, the

13:08

better it is for you. Because the

13:10

better it learns what your

13:12

true preferences are, the better

13:15

it avoids messing with

13:17

parts of the world that it's not sure about. And

13:21

just in general, it's going to be more useful to you. So

13:26

really, partly the

13:28

book is aimed at everybody, saying, look,

13:31

here is AI. Here is how it's

13:33

done. Here is why doing more of that leads

13:35

you off the cliff. And

13:38

then this is other approach. But it's also

13:40

a little bit aimed at the AI community

13:43

to say, listen, I think

13:46

I want everyone to stop

13:49

and think about how they're building their systems.

13:53

And I'm not wagging my finger and saying, you're bad

13:55

people. I'm just saying

13:57

the method of engineering that we've

13:59

developed. And

14:02

it was developed back in the middle of the 20th century,

14:06

the basic paradigm. And

14:08

it's the same paradigm as we have

14:10

for control theory, control engineering, where

14:13

you have a fixed cost function

14:15

that the controller has

14:17

to minimize. In

14:20

economics, you have

14:22

a fixed target like a GDP or the corporate

14:24

profit. In

14:27

statistics, you try to minimize the

14:29

loss function, so basically

14:31

the cost of prediction errors. And

14:34

in all these cases, we assume that the

14:36

objective is fixed and known to

14:38

the machinery that's supposed to be optimizing it. And

14:42

that's just an extreme and

14:45

extremely unrealistic special case

14:48

of what is generally true, which is

14:50

that the machinery that's supposed to be

14:52

optimizing doesn't have access to the

14:54

objective. Right. And you

14:56

spend a fair bit of time in the

14:58

book talking about the definition of intelligence, even

15:00

in humans. And we

15:02

don't always know what our true reasoning

15:05

behind things are. We're not even

15:07

anywhere. You said that we're as

15:09

far away from being rational as,

15:11

what was the analogy? I

15:14

think it's a sluggish from overtaking the

15:16

Starship Enterprise at warp nine. That's the

15:18

way you put it. Yeah. So

15:22

there's a number of things about

15:24

human intelligence that are not

15:27

ideal. So

15:29

one of them is clearly that the

15:32

world is much, much, much too complicated

15:34

to actually behave rationally, i.e.

15:37

for our actions to be the ones

15:39

that best satisfy our own preferences about

15:41

the future. So you

15:43

can see that very simply if you look at chess,

15:46

right? You're

15:48

standing there in front of a chess board.

15:51

That chess board is a tiny little piece of

15:53

the real world, and it's very, very well behaved.

15:57

We Know exactly how the pieces move and what

15:59

the rules are. And yet we

16:01

still can't make the right decision. In

16:04

that situation, and the real world

16:06

is so much more complicated, the

16:09

horizons a so much longer than

16:11

they are in chess on there

16:13

are so many more moving parts.

16:15

The rules are so much less

16:17

well known. The world is much

16:19

less predictable. So that means that.

16:21

ah, As a practical matter,

16:24

in fact, know computer is ever going

16:26

to be rational either. Or

16:28

even if it was the size of uverse,

16:30

it's still com. Calculate what is

16:32

the right course of action. She's

16:34

my. So

16:37

that's one thing sometimes called bounded

16:39

rationality. But

16:42

another thing is you say that we

16:44

don't even know our own preferences about

16:46

the future. I'm so sad.

16:49

Makes it doubly hard to write

16:51

them down completely incorrectly and provide

16:53

them to the machine. A. Sort.

16:56

Example: Using the book which which

16:58

apparently has has already been adopted

17:00

by some philosophers is a this

17:02

fruit called the Durian which which

17:05

I never tried and I have

17:07

I've delivered he didn't try it

17:09

was writing the book on. Because

17:13

the the jury of food is something

17:15

that some people think is completely sublime

17:17

and in a writer's but going back

17:19

to nineteenth century has described as a

17:22

d most sublime was food provided on

17:24

this earth. And

17:26

then other people say well it reminds

17:28

me of skunk spray foam it in

17:30

a well known be wound swabs in

17:33

our isn't it banned of the options

17:35

and what a country's Yeah yeah so

17:37

it so it's I'm is common in

17:39

the Southeast Asia, Indonesian so on and

17:42

and every. So often. You. hear

17:44

one of these during emergencies where you

17:46

know they tax on them into a

17:48

crate on an airplane and they didn't

17:51

seal it properly and the passengers in

17:53

a revolt and force the pilot to

17:55

turn around and land a plane or

17:58

your entire building their evacuated and

18:01

so on and so forth. So for

18:03

some people, it's absolutely unbearable. And I

18:06

don't know which of those two kinds of people I am.

18:09

So that's a clear case where

18:11

I don't know my preferences about a future

18:13

that involves eating durian, right? Is that a

18:15

future I want or a future I don't

18:18

want and I don't know. And in

18:23

fact, when you think about it, that's

18:25

pretty much the universal situation

18:28

we find ourselves in. You know, if you're

18:31

finishing high school and you go to the

18:33

career counselor and they say, well, you know,

18:35

there's a job in the coal mine or

18:37

there's a job open in the library, right?

18:40

So do you want to be a librarian or a coal miner? You

18:43

don't know. You haven't the faintest idea. You

18:45

don't know how you're going to enjoy

18:48

being underground or being surrounded by dusty

18:50

books and have no one to talk

18:52

to for hours on end. And

18:56

so I think this is actually pretty

18:59

much the normal condition that there's large

19:01

parts of our

19:04

own preferences, meaning

19:06

how much we will like any

19:08

given life that

19:11

we just don't know until we

19:14

see it. You know, someone who's good at

19:16

introspection probably

19:18

has a better idea of how they're going

19:20

to feel about a given situation,

19:22

but you still don't know until you're in

19:25

it. And how do you

19:27

program an AI to take into account that

19:29

those preference changes or personal

19:31

growth, right? That's the issue. Well,

19:35

there's two issues, right? So

19:37

it isn't necessarily preference changes

19:39

in the sense that my

19:42

preferences are sort of in me.

19:45

They're there, but

19:47

I don't know what they are, right?

19:49

So whether or not I like durian, it's

19:52

not a decision I make, right? I

19:54

taste the durian and I find out

19:57

what my preferences are, but they were

19:59

there. there in me is a latent part

20:02

of my neurological

20:04

structure, I guess, or something about my

20:06

DNA as to whether or

20:08

not I like the durian taste. And

20:11

so that part where

20:13

your preference for durian is something

20:15

that's fixed but

20:17

unknown, that's relatively easy

20:19

for us to deal with. We're

20:23

already working under the assumption

20:25

that the machine is learning about your

20:27

preferences from

20:29

choices that you make and if

20:33

you don't know whether or not you like

20:35

durian, then you're not either going to run

20:37

away from it or drool at

20:40

the prospect of eating some durian. You're

20:42

going to exhibit sort of

20:44

not indecisive behavior, not sure if I

20:47

really want to try this and kind

20:50

of like if you read Dr. Seuss's Green Eggs

20:52

and Ham. I

20:54

don't want to try it. No, I don't want to try it. I definitely

20:56

don't want to try it. So

20:59

that kind of behavior clearly shows

21:01

that you actually are not really

21:03

sure whether you like the durian or the green

21:06

eggs and ham. And

21:09

so that's fine. And the machine wouldn't

21:11

force you to eat durian because it's

21:13

convinced that you like it and

21:16

it wouldn't deprive you of it because it's convinced that

21:18

you hate it. It would maybe

21:20

suggest that you try a little bit at some point

21:22

whenever you're ready. And

21:25

that's what you'd want. The difficult part actually

21:28

is the plasticity of preferences that

21:31

are obviously we're not born with

21:33

a whole complicated set of

21:37

preferences about politics, about religion,

21:39

about how much we value

21:43

wealth generation versus

21:45

family time versus this versus that.

21:49

We're not born knowing what it's like to have children. Many

21:52

people think they really want to have children

21:54

and change their minds and

21:56

so on. So we're acquiring.

22:00

solidifying preferences all the time

22:02

through experiences that

22:04

may not be the experience that

22:06

the preference is directly about.

22:09

For example, I think a lot of

22:11

our culture convinces

22:14

us that having

22:17

children is a desirable thing,

22:19

that it's a wonderful experience. I

22:21

think that contributes to the

22:25

formation of our preferences. The

22:28

question is how do you avoid the AI system

22:32

manipulating human preferences so that they're

22:34

easier to satisfy? The

22:38

loophole theory that you talked about, like if this

22:41

thing is smart enough, it's going to find a way to

22:44

shortcut to get the goal that you gave it. The

22:49

problem has to be formulated very

22:52

carefully. You

22:54

might say, okay, the goal

22:57

is not just in English,

22:59

if we were talking to each other, we would say,

23:01

okay, we want the underlying

23:04

constitutional objective of the machine

23:06

to be satisfying

23:09

human preferences, to be beneficial to us.

23:12

When you get into setting

23:15

up the mathematical problem, if

23:17

you're operating under

23:20

the assumption that human preferences can be changed,

23:22

then you need to be more precise. Do

23:24

you mean the preferences the human had at

23:27

the beginning? The human, we know what they

23:29

had at the end. The

23:32

preferences that they would have if you

23:34

weren't interfering, it becomes

23:37

a little bit more complicated. The

23:39

simplest answer would be the preferences

23:41

that they had at the beginning. That's

23:46

a little bit problematic because if, let's

23:49

say you have a domestic robot that's with

23:51

you for most of your life, well,

23:54

obviously, by the time you're 50, you

23:56

don't want it to be satisfying the preferences you had when

23:58

you were five. So,

24:02

but at the same time, you don't

24:04

want it to be molding your preferences

24:06

actively. It cannot really have

24:09

no effect on your preferences because,

24:11

you know, just having a domestic

24:14

robot serving you is going to change

24:16

the kind of person you are. Probably

24:20

you're going to be a little bit more spoiled than

24:23

you would be otherwise. And

24:27

so, I don't think you can argue that the

24:30

machine cannot touch human preferences

24:32

or have any effect on them because I think

24:34

that's just infeasible. So, I would say this is

24:36

one of the areas where we

24:38

need a lot more philosophical

24:42

help actually to

24:45

get these kinds of refinements

24:47

done correctly. And speaking

24:49

of philosophy, we didn't actually define intelligence

24:52

to start this conversation. Obviously,

24:55

we already have machines that are hyper competent and

24:57

more competent than humans in a lot of different

24:59

fields, but like what is the definition of

25:02

intelligence and what is

25:05

this? If everyone

25:07

succeeds in what they're doing right now, what will AI

25:10

look like? Do these AIs have to have

25:12

their own intrinsic goals to be intelligent as

25:15

opposed to just ones we gave them? Do they have to have wants

25:17

like humans do? So,

25:20

no, they certainly don't have

25:22

to have any

25:25

of their own internally generated

25:27

desires. So

25:32

the standard model is where we build machinery

25:34

that optimizes objectives that we put

25:36

in. And

25:39

that can be done in many different ways. There

25:41

are many different kinds of AI frameworks

25:44

and algorithms. So

25:47

for example, reinforcement learning is one where

25:49

you don't put in, in some sense,

25:51

you don't put in the entire objective upfront.

25:54

You kind of feed it to the

25:56

learning algorithm in drabs and drabs depending

25:59

on its behavior. you

26:01

give positive or negative rewards or negative

26:03

reward being a punishment in some sense.

26:07

And so its goal is

26:10

to maximize the stream of

26:12

positive rewards that it receives.

26:16

And so the

26:18

precise subjective is implicit

26:21

in the part that is supplying the

26:24

rewards. That's

26:26

what we would call the objective. So

26:30

it doesn't make sense for them to derive

26:33

their own separate goals

26:36

and objectives because for one

26:38

thing adding

26:41

its own goals and objectives would mean that

26:43

it wouldn't be achieving the ones

26:45

that we set for it. And

26:48

also we don't really have any good

26:50

idea for how to generate goals

26:53

out of nothing. Yeah, yeah,

26:55

when you start to think about that it kind

26:57

of blows your mind. Like what is a goal,

26:59

right? Right. I

27:02

mean we

27:04

have a very complicated system. There's

27:07

a biological system based

27:09

around our dopamine system which

27:13

evolution built into us to

27:15

give us a kind of a guidepost for how

27:19

not to die immediately. So the

27:21

dopamine system is

27:24

positively stimulated by nice

27:26

sweet calorie rich foods

27:30

and sex and other things like that.

27:32

So basically this is evolution saying look

27:34

if you eat a lot of

27:37

edible food and have lots of sex

27:39

then you'll probably end up having a

27:42

high degree of evolutionary fitness.

27:47

But it doesn't work perfectly, right? So

27:50

you can also take a whole bunch of

27:52

drugs to stimulate your dopamine system and

27:55

then you don't reproduce and you die fairly quickly.

28:00

And so the dopamine system is not a

28:02

perfect signpost to how to behave

28:04

in order to have evolutionary success,

28:06

but it's so much

28:08

better than nothing that many,

28:13

many successful species have

28:16

dopamine systems or

28:18

something equivalent. So

28:20

that, and that dopamine system is what

28:22

allows you to learn during your lifetime.

28:24

It gives you a signal saying, yeah,

28:27

this is probably good, this is probably

28:29

good. So become better at finding this

28:31

kind of sweet food or finding mates

28:33

or whatever it might be. And

28:36

learning during your lifetime turns

28:39

out to actually

28:41

accelerate evolution. So it's sort

28:43

of a doubly beneficial process

28:46

from evolution's point of view. So

28:49

that's one part of our own internal motivation

28:53

system or preference structure, if you like. And

28:55

then another part and

28:58

possibly much more important is

29:00

what we soak in from our

29:03

culture, from family, friends,

29:05

peers, and these days

29:08

from media. And

29:12

there, you know, we, that I

29:14

think departs often very

29:17

strongly from the basic

29:19

biological urges that

29:21

the dopamine system provides. So

29:24

by setting up, for example, in some

29:27

cultures, let's say, in

29:29

Tibetan Buddhism, the goal to

29:32

be a monk is set up as

29:35

one of the most desirable objectives. And

29:37

that was also true in medieval

29:39

Europe, you know, with

29:41

the Catholic monasteries, they were wealthy, they

29:44

were relatively safe compared to ordinary life,

29:49

privilege, powerful. So that

29:51

was a very desirable cultural goal that

29:53

was built in to individuals

29:56

through the culture. But

29:59

it's a... non-reproducing role. So

30:03

clearly it's not something

30:05

that evolution would

30:08

advocate, at least for individuals.

30:11

Maybe there's some wise evolutionary

30:13

plan to

30:15

have a large number of people

30:18

being monasteries to keep the species safe

30:20

and on the right track. But I doubt it. I

30:23

think it's just this is what happens

30:25

with cultural processes as opposed

30:27

to biological processes. So

30:30

these days we

30:32

have all kinds of different role models, all

30:35

kinds of different pressures

30:39

to consume, whether it's

30:41

food or clothes

30:43

or fashion, media content, sport,

30:45

etc., etc., etc. It's a

30:47

very, very complicated landscape

30:50

and that

30:52

interacts with our

30:55

emerging, maturing consciousness

30:58

and internal

31:00

mental processes in ways that are

31:03

wonderfully varied and

31:06

produce individuals with all

31:09

kinds of vocations and

31:12

desires for their own future and the future of

31:14

other people. So

31:17

all of that is going

31:20

on in humans. And basically, to

31:22

sum it all up, you're

31:24

intelligent to the extent that your

31:27

actions can be expected

31:29

to achieve your objectives. And

31:34

this is a notion that goes

31:36

back in economics and philosophy for hundreds

31:38

or thousands of years of

31:41

rational behavior. And

31:46

it's often caricatured as sort

31:48

of homo economicus, just

31:50

greed, acquisition

31:53

of wealth is the only objective. Of course, that's not

31:55

what it means. Your objectives can

31:57

be anything At All. You

32:02

can be Mother Teresa and has the

32:04

objectives of of the saving the lived

32:06

in destitute children. And

32:09

that's completely fontana. You don't have to

32:11

be selfish known as be greedy, Don't

32:13

have to care about money. It can

32:15

be anything at all. So rational behavior.

32:18

Is the the ideal.

32:21

For what we mean by human intelligence

32:23

and then we basically just copy that

32:25

into machines. And

32:28

and that became the basis for ai

32:30

back in the forties and fifties when

32:32

the home field was getting going. And.

32:37

I think this was a mistake. With

32:41

having it just modeled after a human

32:43

goals and in of itself as a

32:45

mistake. You're at

32:47

having it be a having idea

32:49

be. The machines are intelligent to the

32:52

extent that their actions can be

32:54

expected to achieve their objectives. Off.

32:56

By copying this notion, saying it will, That's what

32:58

in it means for humans be intelligent Than that's

33:00

what it means for machine to be intelligent. And

33:04

then of course you are. The machine doesn't have it's own

33:06

objectives. It doesn't have all the biology and the culture. So.

33:10

We were just put those it and for

33:12

the simple kind of. Toy

33:15

world like into the chessboard so

33:17

for them to virtual chessboard. It.

33:21

Seems quite natural that you'll just have the

33:23

goal of winning the game. And.

33:27

Or. If you want to. Be

33:30

No. Find

33:32

roots on a map. The goal is just

33:34

okay. You wanna get to the destination as

33:36

quickly as possible and so it seemed like

33:39

on in the toy examples. That.

33:41

People were are beginning to work on that

33:43

It with. Your specifying

33:45

objective wasn't a problem. And

33:48

in fact, in many cases what they

33:50

were working on in a I was.

33:54

Artificial. problems that had already been

33:56

set up with a well defined objectives of

33:58

chess is one of those that all

34:00

of checkmate is just part of

34:02

a definition of chess. So it kind

34:04

of comes with a perfectly defined

34:06

objective. Which

34:09

is not like the real world. Right. Exactly.

34:12

So that's the problem. And

34:14

funnily enough, in the early

34:18

part of the history of AI,

34:21

we also made an assumption that the

34:24

rules were known and

34:27

that the state of the world was known. And

34:29

that again is true in chess. We know the

34:31

rules of chess. We can see

34:33

where the pieces are on the board. And

34:36

so uncertainty simply doesn't come into it.

34:40

And so for most of the first 30

34:43

years or so of AI research,

34:47

it was assumed that you would know the rule and

34:49

you would know the state of the world. And

34:54

sometime around 1980,

34:57

the main

35:00

leading researchers in the field sort of

35:02

fessed up and they said, okay, fine.

35:05

We're right. We admit that in

35:08

fact, we won't always have perfect knowledge of

35:10

how the world works. And we won't always have

35:12

perfect knowledge of the state of the world. I

35:15

mean, this is sort of blatantly obvious to everybody now.

35:17

It was surprisingly

35:20

difficult for people to admit it because

35:23

it meant that the technology they

35:25

had developed, which was mainly this sort

35:27

of symbolic logic technology, was

35:29

limited in its application that you couldn't

35:31

solve a lot of real world problems

35:34

using symbolic logic because you didn't have

35:36

definite knowledge of the state

35:38

or of the rules, the dynamics, the

35:40

physics of the world. So

35:42

we accepted uncertainty wholeheartedly

35:46

by the end of the 1980s

35:48

and the beginning of the 1990s. But

35:51

we continued to assume

35:53

that the objective was known completely incorrectly,

35:55

that we had perfect knowledge of

35:58

the objective and the machine would be able to... have

36:00

that perfect knowledge. And I

36:02

can't really explain why it's taken

36:04

another 25 or 30 years to

36:12

see. And I'm one of them.

36:14

It took me a while to see that, in fact,

36:17

in the real world, you'd almost never have perfect

36:19

knowledge of the objective that

36:21

the machine was supposed to be pursuing.

36:25

It's surprising you talk about how people

36:27

who are raising the alarm about possible

36:29

negative outcomes of AI are seen as

36:32

anti-AI or Luddites, when in fact you're

36:34

just saying, no, we just have to

36:36

take into account these possible problems. And

36:39

that people who are developing the technology are some of

36:41

the ones who are saying, don't worry, we'll never even

36:43

get there. So there's no need for concern. Well, then

36:45

why are you working on it if you think you

36:47

won't actually achieve? Yeah, I mean,

36:49

it's bizarre. And I think we

36:51

just have to assume that it's

36:54

a kind of defensive denialism.

36:58

It would be uncomfortable and awkward

37:00

to admit that what you're working

37:02

on might be

37:05

sort of all the wrong path and also a

37:07

threat to the human race. What

37:09

are the biggest events that would happen in the

37:11

course of human civilization would be

37:14

inventing superhuman AI that would be up there with

37:16

an asteroid wiping out civilization or things like

37:19

that? Yeah, I think so. And this was

37:21

actually at the beginning of the book,

37:24

I'm recounting a talk that I

37:26

gave at an art

37:28

museum in London to a completely non-technical audience. And

37:30

it was the first time that I

37:32

was sort of publicly declaring

37:35

this position. So

37:38

the phrase, the biggest

37:40

event in human history comes from that

37:42

lecture. And

37:45

it was basically, I formulated

37:48

it as kind of like the Oscars. Here

37:50

are the five candidates for biggest

37:52

event in human history, you know,

37:54

asteroid wipes, or we all die

37:57

in climate disaster. you

38:00

know, we develop fast and light

38:02

travel and conquer the universe, we

38:04

solve the problem of aging and

38:07

all we all become immortal.

38:11

We're a superior

38:14

alien civilization lands on the earth.

38:16

And then the last one was

38:19

that we developed super intelligent AI. And

38:22

so, you

38:25

know, I chose that one as as

38:28

the winner, the biggest event, because basically

38:32

our whole civilization is just

38:34

built on our intelligence. And if

38:37

we have a lot more of it,

38:40

that would be an entirely

38:43

new civilization, and

38:46

possibly a much better one if we can

38:48

actually keep it. If

38:51

we can control the

38:54

potentially much more powerful entities

38:56

that we're creating, then

39:00

we can we

39:03

can direct that power to

39:05

the benefit of everybody. So it could be

39:07

a golden age. It

39:09

could in fact give us the immortality and the

39:11

fast and light travel if those things are possible,

39:13

then they're going to be much more possible if

39:17

we have access to such tools. And

39:20

it's a little bit like the arrival of

39:24

superior alien civilization, except

39:27

that it's not

39:29

a black box. At least it's not a black

39:32

box if we do it the right way. You

39:35

know, if it was really a black box, if an

39:38

alien entity landed on earth that was much

39:40

more intelligent than humans, you know,

39:42

how would you control it? You couldn't. Yeah, right,

39:45

you lose your toast. So forget it.

39:48

The only route to

39:52

getting this right is to design

39:54

the AI system in such a way that

39:57

we can provably control. Call

40:00

it. Is not good enough

40:02

to say? Well I think we've done a good

40:04

job and you know and I are given all

40:06

the programmers some. I'm pretty

40:08

good guidelines. You

40:10

know and we have a panels you know

40:13

as experts just in case something goes wrong

40:15

that this is not gonna cut it. Is

40:17

a look at what happened with. Nuclear

40:20

Power right? The. Risks of

40:22

equal power pretty apparent because people and see

40:24

what a nuclear explosion would like him, what

40:26

he could do. And

40:28

the the was a lot of

40:30

regulation. A.

40:33

Some people estimated for every for

40:35

every pound of nuclear power station

40:37

they're are seven pounds of paper.

40:41

It's hard to imagine that that's that's what

40:43

I've been told by a nuclear engineers. To

40:47

the amount of. Of for

40:49

regulation around the construction ah and

40:51

testing and checking of nuclear power

40:54

stations with with immense much bigger

40:56

i think than anything ever before

40:58

in the History of Mankind school.

41:00

that wasn't enough right? We still

41:03

had to noble and see the

41:05

humor. And that wiped out.

41:07

The nuclear industry as well as a

41:10

fair number of people in a large

41:12

chunk of land on. And.

41:15

So. We didn't get any of

41:17

the benefits of. Nuclear Power. Ah,

41:20

Because we stopped building nuclear power stations

41:22

in a lot of countries have actually

41:24

decided to phase it out altogether. so

41:27

Germany for example is in the process

41:29

of getting rid of. All with nuclear

41:31

power stations are all potential benefits of

41:33

carbon free energy and cheap electricity and

41:35

so on. Ah, We. Lost.

41:38

Because. We didn't pay attention to

41:40

the risks and nobody would say. You.

41:42

Know that a nuclear engineer who's. Proposing.

41:46

Ah, and improved design of nuclear power

41:49

stations as less likely to suffer a

41:51

meltdown no one would call him a

41:53

luddite. Breasts. Ah,

41:55

so why. why

41:58

is So it's

42:00

the Information Technology Innovation Foundation

42:03

that awards the Luddite award.

42:07

And they've awarded that prize to people

42:09

who are pointing to potential risks from AI.

42:14

And this seems weird, right? And

42:19

at the same time, I guess

42:21

they're applauding people who

42:23

say, you know, people within the field

42:25

of AI who are now saying

42:27

for the first time ever, oh, by

42:30

the way, you know, the reason we don't have

42:32

to worry is because in fact, we're guaranteed to

42:34

fail. Now, if you

42:36

ask me, that's anti AI. To

42:40

say that this

42:43

problem is beyond the capabilities of

42:46

the assembled AI researchers

42:49

of the world, you know, who

42:51

are growing rapidly, and, you know,

42:54

now have access to hundreds of billions of dollars

42:56

in funding, to say

42:59

that that all of those incredibly smart

43:01

people were too stupid to

43:04

solve the remaining problems between here

43:06

and human level AI. First

43:11

of all, I think it's completely

43:14

ground. Right? There

43:16

is no argument being made

43:18

as to why the problem

43:20

can't be solved other than, well, if

43:23

it isn't, if it isn't solved,

43:25

then we don't have to worry. So it

43:28

basically means it's a way of washing

43:31

your hands of the problem. Yeah.

43:33

Other than that, there's no justification being

43:35

given whatsoever. The other thing

43:37

is that, you know, history tells

43:39

us that that's a pretty foolish

43:43

attitude. And

43:47

in fact, coming back to nuclear power again, right, that

43:49

was the position of many

43:52

leading nuclear physicists in the early part of

43:54

the 20th century that, yes,

43:56

there is a massive amount of energy locked in

43:58

the atom. And

46:01

he says, you know, don't worry.

46:04

I know we're heading for a cliff, but I guarantee

46:06

we're gonna run out of gas before we get there.

46:09

Right? It's like, well, come on, guys.

46:12

That's not how you manage the effect of

46:14

the human race when the stakes are so

46:16

high. Yeah. So overall, are

46:19

you optimistic that if people

46:21

heed this warning now that we could put

46:23

in place these rules for

46:25

what the future of AI would look like, and we

46:27

could be in this golden era version

46:29

of the future and not one

46:31

of these various dystopias brought about by the

46:34

King Midas problem and things like that? So

46:38

I'm reasonably optimistic. There's certainly a lot of

46:41

work to do because we've

46:43

got 70 years of technological development

46:46

under the old model. And

46:48

it's not easy to replace that

46:50

overnight with

46:52

technology that operates under the new model. We're

46:56

just at the early stages of developing the

46:58

algorithms and the various subcases

47:00

and how you solve them for

47:03

that. So there's still a lot of work to do. But

47:05

even before then, I think just

47:08

the advice to think

47:11

not what is the objective

47:13

that I want the system to optimize, but

47:15

what are the potential effects of the system?

47:18

Do I know whether those effects are desirable

47:20

or undesirable? And if I

47:23

don't know, then I design

47:25

the system not to have those effects,

47:29

not to change the world in

47:31

ways that the system and I

47:33

don't know whether that's a good idea or not.

47:37

That's a better approach. So

47:39

that's sort of like a

47:42

best practice guideline for the time being. But

47:44

yeah, in the long run, the

47:46

goal would be to have

47:49

technological templates designed for software

47:51

that are provably safe

47:53

and beneficial. And

47:55

then, there are two other basic

47:59

problems are

48:01

much less technological, but I still worry about them. And

48:03

at the end of the book, I

48:07

discuss these. And I would

48:09

say I'm a

48:12

little bit less optimistic about

48:14

these, because I don't see

48:17

technological solutions for them, because

48:19

they're not really technological problems.

48:21

One is something

48:23

that probably is apparent to many people,

48:26

is if we develop this

48:28

incredibly powerful technology, what

48:30

about people who want to use it

48:33

for evil purposes? They're

48:36

not going to use the safe and

48:38

beneficial version, which would actually prevent

48:41

them from doing bad things to people. Because

48:44

it will be designed to have the

48:47

preferences of everyone in mind. So

48:49

if you try to destroy

48:52

the world, or take over the world, or do whatever it is you

48:54

want to do, it would have

48:56

to resist. But what's

48:58

stopped them developing the unsafe version,

49:02

perhaps under the old model, and putting in the

49:04

objective of, I'm the ruler of the universe. And

49:07

the system finds some way

49:10

of satisfying that, that

49:12

maybe is not even what the bad

49:14

guy intended. So it's not that he

49:16

might succeed, it's that the bad person

49:19

might fail by

49:22

losing control over the AI

49:24

system that is unleashed. And

49:27

so that's one set of worries. And if you

49:29

think about how well we're doing

49:31

with cybercrime right now, not,

49:34

then this

49:36

would be much, much more of

49:39

a risk and a threat. And so

49:42

we're going to need to develop not

49:45

just policing, but also, I think

49:47

we've got to somehow build this

49:49

into the moral fabric of

49:52

our whole society, that this is

49:54

a suicidal direction

49:57

to take. And there are interesting

49:59

precedents. in science

50:02

fiction. For example,

50:04

in Dune, which is Frank

50:06

Herbert's novel about the

50:08

farthest in the future, humanity

50:10

has gone through a near-death

50:12

experience in the form of a

50:15

catastrophic conflict between humanity and

50:17

machines, which, as

50:19

we're told, we

50:22

only just survived to

50:24

tell the tale. And so

50:26

as a result, there's basically an 11th

50:28

commandment to not make a machine in

50:30

the likeness of man. So

50:32

there are no computers in

50:35

that future. So

50:39

that gives you a sense that

50:41

this is not something you want to mess around

50:43

with, that you would need pretty

50:45

rigorous regulations and

50:48

enforcement, but also a kind

50:51

of a moral code and understanding that

50:53

everyone understands. Just as I

50:55

think creating a

50:57

pandemic organism,

51:00

some engineered virus that would destroy the human race,

51:03

I think everyone

51:05

understands that's a bad idea. Yeah.

51:07

You have to hope your evil supervillains at

51:09

least have some self-preservation instincts on top of

51:11

their evil. Even they are not proposing. I

51:13

think, well, maybe there are some groups

51:18

who really think that we should cleanse

51:20

the earth of human beings altogether, but

51:23

fortunately, they're not too bright. The

51:27

second issue is sort

51:30

of the other half, or the other 99.99% of the human

51:33

race, not the bad actors, but

51:35

all of the rest of us who are

51:41

lazy and short-sighted, even the

51:43

best of us are lazy and short-sighted. And

51:49

by creating machines that

51:51

have the capacity to run our

51:54

civilization for us, we

51:56

create a disincentive

51:58

to run it ourselves. And

52:03

when you think about it, right, we've spent over

52:06

the whole human history, it

52:08

adds up to about a trillion person years

52:11

of teaching and learning just

52:14

to keep our civilization moving forward, right, to pass

52:16

it to the next generation so that it doesn't

52:18

collapse. And

52:23

now, or at least at some point in

52:25

the visible future, we may not have to

52:27

do that because

52:30

we can pass the knowledge into machines instead

52:32

of into the next generation of humans. And

52:35

once that happens, right, it's in

52:38

some way sort of irreversible. Like

52:43

once there are no humans left who even knew

52:45

how these machines were designed, who is going to

52:47

have an incentive to figure it out in a

52:49

retro- Right. And it's just

52:51

very, very complicated to sort of pull

52:53

yourself up by the knowledge bootstraps. You

52:58

know, perhaps the machines could sort of

53:00

reteach us if we

53:02

decide that this is in fact, you know, we made a

53:04

huge mistake. But if you

53:06

look at, so if you see Wall-E, in

53:10

Wall-E, right, the humans

53:13

have been taken off the earth on

53:15

sort of giant intergalactic cruise ships, and

53:18

they just become passengers. They no longer know

53:21

how it works. They become

53:23

obese and stupid and lazy, totally

53:26

unable to look after themselves. And

53:29

this is another, you know, another story that

53:31

goes back thousands of years

53:33

to, you know, the Lotus Eaters and

53:37

other mythological temptations

53:40

that when life makes it

53:42

possible to do

53:46

nothing, to not learn,

53:49

to not face up to challenges, to

53:51

not solve problems, we have

53:53

a tendency to take advantage of

53:56

that. You know, that are not healthy for us. Well,

54:01

one thing, if your listeners haven't read the

54:03

story, The Machine Stops

54:06

by E.M. Forster, I

54:10

highly recommend that story. E.M. Forster

54:12

mostly wrote, you know,

54:14

acute social observation novels of

54:18

early Edwardian England or, you know, But

54:21

yeah, those are the Bertrand Ivey movies. But

54:23

this is a story that is

54:27

really a science fiction story. You know, in

54:29

1909, he basically described

54:31

the internet, iPad,

54:34

video conferencing, mooks.

54:36

So most people are

54:38

spending their time either, you know, consuming

54:40

or producing mook content. And

54:46

The Machine looks after everything.

54:48

It makes sure you get fed,

54:51

it pipes in music, keeps

54:53

you comfortable. So

54:55

The Machine is looking after everyone and we

54:57

pursue these increasingly

55:00

effete activities

55:04

and have less and less understanding of how

55:06

everything really works. And

55:09

so that was a warning sign from 110 years ago of one

55:11

direction that it

55:19

seems like a slippery slope that's

55:21

pretty hard to avoid.

55:24

Yeah. And, you

55:26

know, some people have argued that it's already happening.

55:28

I think people have been arguing this for

55:31

a long time. I

55:33

mean, it makes sense when you have everything offloaded on your

55:35

phone. Why would you waste your

55:38

own brain cycles and doing things you don't

55:40

have to? Yeah. Yeah. Yeah. So I

55:42

think my, you know,

55:44

my ability to navigate, even

55:46

in the Bay Area where I live

55:48

has probably decreased because it's much

55:51

easier just to have the phone navigate for me.

55:55

And so you don't exercise that part of your

55:57

brain, you don't refresh.

56:00

those memories of how all the streets connect to

56:02

each other and wherever they are. I

56:06

think there are trade-offs.

56:09

You offload some parts, but because

56:11

you have access to much more

56:13

knowledge through the internet, rather

56:16

than just saying, it's too hard to go,

56:19

you know, trundle down to the library,

56:21

wait for the library to open, find the book

56:23

if they happen to have it, open the book,

56:25

read the page. It used to take a

56:27

whole day to find out a fact, and

56:30

now it takes a second or less to

56:32

find out that fact. So we actually find

56:35

out more stuff than we used to as

56:37

a result. So there are pluses

56:39

and minuses to the way things work right

56:41

now, but we're talking about something much more

56:43

general, a general potentially

56:47

debilitating enfeeblement of human

56:49

civilization. And the

56:51

solution to that, again, it's not a technical solution, right?

56:53

This is a cultural problem. It's

56:57

the economic incentive to

57:00

learn, and let's face it, that's

57:02

one of the primary drivers

57:06

of our education system. You know, the system

57:08

of training and industry

57:11

is economic. Basically, our

57:13

civilization would collapse without it. And

57:17

when that goes away, you know, what replaces

57:20

it? How do we ensure

57:22

that we don't slide

57:25

into dependency? And

57:27

it seems to me it has to be a cultural imperative

57:30

that this part of what it means

57:32

to be a

57:34

good self-actualized

57:36

human being is not just

57:39

that we get

57:41

to enjoy life and have

57:43

aromatherapy massages and all

57:45

that kind of stuff. But that we know

57:48

things, that we are able to do things,

57:50

that if we want to build

57:52

a deck, we can build a deck. If

57:54

we want to design new

57:57

kinds of radio telescopes, we can design new kinds

57:59

of radio telescopes. I

1:00:01

hate to end on a pessimistic note, but

1:00:03

again, it's not to say this couldn't all

1:00:05

end very well. It

1:00:07

certainly can if everybody starts thinking about

1:00:09

these problems now as opposed to when it's too late.

1:00:13

Exactly. And I can't emphasize enough,

1:00:15

the book has so much more than what

1:00:17

we've already delved into and it's a great

1:00:20

read. Everyone should check out Human Compatible, Artificial

1:00:22

Intelligence and the Problem of Control. Stuart

1:00:24

Russell, thank you so much for joining me. Thank you. It

1:00:27

was a pleasure.

Rate

Get this podcast via API

From The Podcast

Probably Science

Professional comedians with so-so STEM pedigrees take you through this week in science. Incompetently. Featuring hosts Matt Kirshen, Andy Wood (and sometimes Jesse Case or Brooks Wheelan) along with a rotating cast of special guests from the worlds of comedy and science.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More