Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check by Hard Fork | Podchaser

Episode from the podcastHard Fork

Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check

Released Friday, 31st May 2024

Good episode? Give it some love!

Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check

Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check

Friday, 31st May 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Hello, Casey. Hey, Kevin. How

0:02

was your Memorial Day weekend? It was

0:04

wonderful. I got to go to a

0:06

beautiful wedding and

0:09

very much enjoyed that. Nice. How

0:11

was your Memorial Day weekend? It was good. But I

0:13

feel like you have something that you didn't bring up,

0:15

which is that you actually had a big launch this

0:19

past weekend. I did a hard launch. I mean,

0:21

I guess I did a hard launch, a boyfriend

0:23

like once before on Instagram, but it was many

0:25

years ago. And this one,

0:27

I think like at this point, hard launches,

0:29

people sort of know what they are. And

0:31

so a lot of thought goes into it.

0:34

A hard launch, just so I'm clear with the latest

0:36

lingo, this is when you announced that you have a

0:38

new boyfriend on Instagram. Well,

0:41

because if the soft launch is

0:43

like, if maybe you see somebody shoulder

0:45

in an Instagram story and you think, well, that's a new

0:47

person. Like how is that? Is

0:49

that, is my friend, are they dating that

0:51

person? That's a soft launch. But once there's

0:53

a face and a name, that's a hard

0:55

launch. I see. So you debuted, you hard

0:57

launched your new boyfriend. We had, and it

0:59

had been some time in incoming. And

1:02

of course I had to check in with him and make

1:04

sure he was going to be okay with this. And he

1:07

was excited about it. And you did it on the grid,

1:09

which was bold. Of course I did on the grid.

1:11

I want to show everyone. I

1:14

can't just have that disappear in 24 hours. How

1:16

did it go? Hard launch went very well.

1:18

You know, I mean, like it was a

1:20

little bit. Was the engagement what you hoped

1:22

for? The engagement was off the

1:24

chart. It was my most popular Instagram post

1:27

I've ever done. Did he also hard launch

1:29

you on his Instagram? Yes,

1:31

it was honestly very stressful. There was

1:33

a whole content plan. There were whiteboards.

1:35

Well, we'd taken like dozens of photos.

1:37

Did you hire a marketing agency? Yeah,

1:40

our teams got involved. No,

1:42

we'd taken so many photos. And you know, so

1:44

of course we're like sitting and we're like, we're

1:46

going to do this photo. We're going to do

1:48

this photo. Is this one a little edgy? Let's

1:50

do it anyway. And so we came up, I

1:52

think with five photos. And then yes, we like

1:54

more or less simultaneously did the launch. Yeah, wow.

1:58

I've been out of the game so long that the only. The

2:00

only thing I remember is that you could change

2:02

your relationship status on Facebook, and that was the

2:04

hard launch of 2008. Yes, absolutely.

2:06

And so of course, in my mind, because I also

2:08

have that sort of millennial urge to like, do I

2:10

make this Facebook official? But I'm just like, no, that's

2:13

just no. That seems so boomer coded at this point.

2:15

No, you have to make it LinkedIn official. That's when

2:17

it truly becomes real. I

2:19

got into a relationship recently. Here's 10

2:21

lessons that I have about enterprise software.

2:29

I'm Kevin Roos, a tech columnist at the New York Times. I'm

2:31

Casey Newton from Platformer. And this is Hard Park. This week, Scruggle

2:33

tells us all that name rocks. We'll tell you where AI went wrong.

2:37

Then, and topic researcher Josh Betts

2:39

enjoins to talk about a breakthrough

2:41

in understanding how large language models

2:43

work. And finally, it's this

2:45

week in AI safety as I try

2:47

out OpenAI's new souped up voice assistant, and

2:49

then it gets truly taken away from me. So

2:52

sorry I had to. Well,

3:06

Kevin, pass me the non-toxic glue

3:08

and a couple of rocks, because it's time to

3:10

whip up a meal with Google's new AI overviews.

3:13

Did you make any recipes you found on Google this week?

3:16

I did not, but I saw some chatter

3:19

about it, and I actually saw our

3:21

friend Katie Netopolis actually made

3:23

the glue pizza. But we're getting ahead of

3:25

ourselves. We're getting ahead of ourselves. And look,

3:28

the fact that you stayed away from this

3:30

stuff explains why you're still sitting in front

3:32

of me, because over the past week, Google

3:34

found itself in yet another controversy over AI,

3:36

this time related to search, the

3:38

core function of Google. And

3:41

right after that, we had this huge leak

3:43

of documents that brought even more attention to

3:45

search and raise the question of whether Google's

3:47

been dishonest about its algorithms. Kevin, can you

3:50

imagine? Wow. So there's a lot there.

3:52

Yeah. Let's just go through what

3:54

happened, because the last time we talked about Google on

3:56

this podcast, they had just released this new AI overviews

3:58

feature. that shows you

4:00

a little AI generated snippet above the search

4:02

results when you type in your

4:05

query. And I think it's fair to say

4:07

that this did not go smoothly. It didn't.

4:09

And I want to talk about everything that

4:11

happened with those AI overuse. But before we

4:13

get there, Kevin, I think we should take

4:15

a step back and talk about the recent

4:17

history of Google's AI launches. Can we do

4:20

that real quick? Yes. Because I would say

4:22

there's kind of an escalation in how bad.

4:24

So let's go back to February 2023 and

4:26

talk about the release of Google Bart. Kevin,

4:28

when I

4:34

say the word Bart, where's that contra up for you? Shakespeare.

4:36

Yep. Shakespeare number one and probably number

4:38

two would be the late lamented Google

4:40

chatbot. Yes. RIP. Fun fact, Kevin and

4:42

I were recently in a briefing where

4:44

a Google executive had a sticker on

4:47

their laptop that said total Bart ass.

4:49

And that sounds like a joke. And

4:52

you actually texted me and you said,

4:54

does that total Bart ass? And

4:57

I said it couldn't possibly. And then I

4:59

zoomed in, I said computer enhanced and indeed it

5:01

did say total Bart ass. And if you

5:03

are a Googler who has access to a

5:05

sticker, we're dying for one that says total

5:07

Bart ass. I want one. I will put

5:09

it on my laptop. Please. It belongs in

5:11

the Smithsonian. We're begging you for it. So

5:13

this comes out in February 2023.

5:16

And unfortunately, the very first screenshot

5:18

posted of Google's AI chatbot, it

5:20

gave incorrect information about the James

5:23

Webb space telescope. Specifically, it falsely

5:25

stated that the telescope had taken

5:27

the first ever photo of an

5:29

exoplanet. Yes, Kevin without being what is

5:31

an exoplanet? It's

5:34

a planet that signs its letters like with a hug and

5:36

a kiss. No, it's actually the planet where all my exos

5:38

live. But let's just say that Google

5:40

AI launches had not gotten off to a great start when

5:42

it happened. In fact, we talked about that one on the

5:44

show. Then comes the launch

5:46

of Gemini. And then we had a culture

5:48

war, Kevin, over the refusals of its image

5:51

generator to make white people. Sure did. Do

5:53

you have a favorite thing that Gemini refused

5:55

to make due to wokeness? My

5:58

I was partial to Asians. Sergey and Larry,

6:00

do you remember? Wait, I actually forgot this one.

6:03

What was this one? Somebody asked Gemini to make

6:05

an image of the founders of Google, Sergey

6:07

Brin and Larry Page. They came back and they were both

6:10

Asian. Which

6:13

I love. I have to imagine that ended up

6:15

projected onto a big screen at a meeting somewhere

6:17

at Google. That's so beautiful to me. So look,

6:19

that brings us to the AI overviews. And Kevin,

6:21

you sort of set it up top, but remind

6:23

us a little bit of how do these things

6:26

work? What are they? This is

6:28

what used to be known as search generative

6:30

experience when it was being tested. But

6:33

this is the big bet that Google

6:35

is making on the future of AI

6:37

in search. Obviously, they have seen the

6:39

rise of products like Perplexity, which is

6:41

this AI powered search engine. They believe,

6:43

Sundar Pichai said, that he believes that

6:46

AI is the future of search and

6:48

that these AI overviews that appear on

6:50

top of search results will ultimately give

6:52

you a better search experience because instead

6:54

of having to click through a bunch

6:56

of links to figure out what you're

6:58

looking for, you can just see it

7:00

displayed for you, generated right there up

7:02

at the top of the page. And

7:04

very briefly, why have we been so

7:06

concerned about these things? Well, I think

7:08

your concern that I shared was that

7:10

this was ultimately going to lock people

7:12

into the Google walled garden that instead

7:14

of going to links where you might

7:17

see an ad, you might buy a

7:19

subscription, you might support the news or

7:22

the media ecosystem in some way,

7:24

instead Google was just going to keep you there

7:26

on Google. The phrase they would use over

7:28

and over again was we will do the Googling for

7:30

you. That's right. And that it would

7:33

starve the web of the essential referral

7:35

traffic that keeps the whole machine running.

7:37

So that is a big concern, and

7:39

I continue to have it every single

7:41

day. But this week, Kevin, we got

7:43

a second concern, which is that the

7:45

AI overviews are going to kill your

7:47

family. And here's what I mean. Over

7:50

the past week, if you asked Google, how

7:52

many rocks should I eat? The AI overview

7:54

said at least one small rock per day.

7:57

I verified this one myself. up

8:00

top if you said how do I get

8:02

the cheese to stick to my pizza it

8:04

would say well have you considered adding non-toxic

8:06

glue would it been my first

8:08

guess yeah it's a non-toxic glue it said

8:11

that 17 of the 42 American presidents

8:17

have been white to me the funniest thing about

8:19

that is that there been 46 US presidents got

8:22

both the numerator and the denominator run and of

8:25

course and this was probably the most upsetting to

8:27

our friends at Canada it said that there has

8:29

been a dog who played hockey in the National

8:31

Hockey League do you see that one well I

8:33

think that was just the plot of air bug

8:35

right well there's no rule that says a

8:38

dog can't play hockey Kevin and it

8:40

identified that dog as

8:42

Martin Pospicil who is that well

8:44

it seems impossible that you've never

8:46

heard of him but he's a

8:49

24 year old Slovakian man who plays for

8:51

the Calgary Flames get on a big Flames

8:53

fan I'm not hmm so

8:55

look how is this happening

8:57

well Google is pulling information from all

8:59

over the internet into these AI overviews

9:03

and in so doing it is revealing something

9:05

we've talked about on the show for a

9:07

long time which is the the large language

9:09

models currently do not know anything they

9:12

can often give you answers and those

9:14

answers are often right but they are

9:16

not drawing on any frame of knowledge

9:18

they're simply reshuffling words that they found

9:20

on the internet oh see this I

9:22

drew a different lesson that

9:24

this the technology is actually only

9:26

partly to blame here because

9:29

I've used a bunch of different AI

9:31

search products including perplexity and

9:33

not all of them make these

9:35

kinds of stupid errors but Google's

9:37

AI model that it's using for

9:39

these AI overviews seems to just

9:42

be qualitatively worse like it just

9:44

can't really seem to tell the difference

9:46

between reliable sources and unreliable sources so

9:48

the thing about eating rocks appears to

9:51

have come from the onion that is

9:54

like satirical news site what you're saying that

9:56

every story published on the onion is false

9:58

I am yes That seems like

10:00

an interesting choice to include in your AI

10:03

overviews for facts. Right, and

10:05

the thing about adding glue to your

10:07

pizza recipe came from basically

10:09

a shitpost on Reddit. So

10:12

obviously these AI overviews are imperfect.

10:14

They are drawing from imperfect sources.

10:16

They are summarizing those imperfect sources

10:19

in imperfect ways. It is a

10:21

big mess. And

10:23

this got a lot of attention over the weekend.

10:26

And as of today, I tried to

10:28

replicate a bunch of these queries and

10:30

it appears that Google has fixed these

10:32

specific queries very quickly. Clearly they were

10:35

embarrassed by it. I've also

10:37

noticed that these AI overviews just are barely

10:39

appearing at all, at least for me. Are

10:41

they appearing for you? I am seeing a

10:43

few of them, but yes, they have definitely

10:46

been playing a game of whack-a-mole. And whenever

10:48

one of these screenshots has gone anything close

10:50

to viral, they are quickly intervening. Now,

10:53

I should say that Google has sent me a statement about

10:55

what's going on, if you would like me to share. Sure.

10:58

It said, the company said, quote, the vast

11:00

majority of AI overviews provide high quality information

11:03

with links to dig deeper on the web.

11:06

Many of the examples we've seen have

11:08

been uncommon queries. And we've also seen

11:10

examples that were doctored or that we

11:12

couldn't reproduce, says some more things and

11:14

then says, we're taking swift action where

11:16

appropriate under our current policies and using

11:19

these examples to develop broader improvements to

11:21

our systems. So they're basically saying, look,

11:23

you're cherry picking, right? You went out

11:25

and you found the absolute most ridiculous

11:27

queries that you can do. And now you're holding it against

11:30

us. And I would like to know, Kevin, how

11:32

do you respond to these charges? I

11:34

mean, I think it's true that some

11:36

people were just deliberately trolling Google by

11:38

putting in these very sort of edge

11:40

case queries that, you know, real people,

11:42

many of them are not Googling, like,

11:45

is it safe to eat rocks? That is not

11:47

a common query. And I did

11:49

see some ones that were clearly faked or

11:52

doctored. So I think Google has

11:54

a point there. But I would also say like these AI

11:56

overviews are also making mistakes on what I

11:59

would consider much more. common sort of

12:01

normal queries. One of

12:03

them that the AI overview botched was

12:05

about how many Muslim presidents the US

12:07

has had. The

12:09

correct answer is zero, but the AI

12:11

overview answer was one.

12:14

George Washington. Yes, George Washington.

12:17

No, it said that Barack Hussein

12:19

Obama was America's first

12:21

and only Muslim president. Obviously, not

12:23

true. Not true. But that is

12:25

the kind of thing that Google was telling people

12:27

in its AI overviews that I imagine are not

12:30

just fringe or trollish queries.

12:32

Right. And also, I guess it has always been the

12:34

case that if you did a sort of weird query

12:36

on Google, you might not

12:39

get the answer you were looking for,

12:41

but you would get a web page

12:43

that someone had made, right? And

12:45

you would be able to assess,

12:47

does this website look professional? Does

12:50

it have a masthead? Do the authors have bio? You can

12:52

just sort of ask yourself some basic questions about it. Now

12:55

everything is just being compressed into this AI slurry.

12:57

So you don't know what you're looking at. So

12:59

I have a couple of things to say here. Say it.

13:03

I think in this short term, this is

13:05

a fixable problem. Look, I think it's clearly

13:07

embarrassing for Google. They did not want this to

13:09

happen. It's a big rake

13:11

in the face for them. But I

13:13

think what helps Google here is that

13:15

Google search and search in general is

13:18

what they call a fat head product.

13:20

You know what that means? I don't

13:22

know what that means. Basically, if you

13:25

take a distribution curve, the most popular

13:27

queries on Google or any other search

13:29

engine account for a very large percentage

13:31

of search volume. Actually, according to one

13:34

study, the 500 most popular search terms

13:36

make up 8.4% of all

13:39

search volume on Google. So

13:41

a lot of people are just searching like Facebook and then clicking

13:43

the link to go to Facebook. Exactly. Or

13:46

they're searching something else that's very common. What

13:50

would be an example of a good... What time has a dog ever

13:52

played hockey? No? No?

13:55

Okay. No, stuff like... What Time

13:57

is the Super Bowl? Yeah, What time is the Super Bowl? You

14:00

know how do I fix a broken

14:02

toil or something local movie time. Say

14:05

exactly yeah of and see of for

14:07

that means that Google can sort of

14:09

manually audit the top. I don't know,

14:11

say ten thousand A I overviews makes

14:13

her they're not giving people bad information

14:15

and that would mean that the vast

14:17

majority of what people search for on

14:20

Google or does actually have a correct

14:22

he i overeat know it's inaccurate, wouldn't

14:24

actually technically be in A I have

14:26

reviewed research like a human overview that

14:28

was her drafted by a eyes. But.

14:31

Same. Difference in googled eyes. I also think

14:33

they can make sure the ai over views

14:35

are triggered ford sensitive topics for things where

14:37

your health. our concerns of google already does

14:39

this to a certain extent with as some

14:42

with these things called featured snippets and I

14:44

think they will continue to sort of play

14:46

around with an are just the dials on

14:48

how frequently be as a I or use

14:50

are triggered. But I do think there's a

14:52

bigger threat to Google here which is that

14:54

they are now going to be held responsible

14:57

for the information the pay on google or

14:59

we talked about this a little bit but.

15:01

I mean this to me is the biggest complaint

15:03

that people have that is justified as the google

15:06

used to play in a meat. Maybe they would

15:08

point you to a website that would tell you

15:10

that you know. Putting. Glue on

15:12

your pizza is a good way to get

15:14

the keys to stick A but you as

15:16

Google the could sort of wash your hands

15:18

of that and see all that was people

15:20

just trolling on Reddit that weapon us but

15:23

of your Google and your now providing the

15:25

Ai written overview to people. People are going

15:27

to get mad when it gives you wrong

15:29

information and there will be unfortunately. Just the

15:31

law of large numbers says that you know

15:33

some time you know maybe in the next

15:35

year to there will be an instance where

15:37

someone relies on some think they saw on

15:39

a Google A I overview and it ends

15:41

up hurting. them yeah there was another

15:43

querry that a got a lot of

15:45

attention this week weren't an overview i

15:47

told someone that you could put gasoline

15:50

and spaghetti to make a spicy deaths

15:52

that you couldn't use gasoline to cook

15:54

spaghetti faster but if you wanted to

15:56

of spices we had a you the

15:58

put gasoline in it's And of course

16:00

that sounds ridiculous to us, but over

16:02

the entire long tail of the internet,

16:04

is it theoretically possible somebody would eat

16:06

gasoline spaghetti? Of course it is. Yeah,

16:08

so I think, and when that does

16:10

happen, I think there are two

16:12

questions. One is, is Google legally protected? Because

16:15

I've heard some interesting arguments about

16:17

whether section 230, which is the part

16:19

of the US code that

16:21

protects online platforms from being

16:23

held legally responsible for stuff that their users

16:26

post, there are a lot of people

16:28

who think that doesn't apply to these AI overviews, because it

16:30

is Google itself that is

16:32

formulating and publishing that overview.

16:35

I also just think there's a big reputational

16:38

risk here. I mean, you can imagine so

16:40

easily the congressional hearings where, you know, senators

16:42

are yelling at Sudar Pichai saying, why did

16:44

you tell my kid to eat gasoline spaghetti?

16:47

Martin Paspasil's gonna be there saying, do I

16:49

look like a dog to you? Right. And

16:52

seriously, I think that this is a big

16:54

risk for Google, not just because they're gonna have to

16:57

sit through a bunch of hearings and get yelled at,

16:59

but because I think it will make their

17:02

active role in search, which has been true

17:04

for many years. They have been actively shaping

17:06

the experience that people have when they search

17:09

stuff on Google, but they've mostly been able

17:11

to kind of obscure that away or abstract

17:13

it away and say, well, this just our

17:15

sort of system working here. I think this

17:17

will make their active role in kind of

17:19

curating the search results for billions of people

17:21

around the world much more obvious and it

17:24

will make them much more responsible in user's

17:26

eyes. I think all of that is true.

17:28

I have an additional concern, Kevin. And this

17:30

was pointed out by Rusty Foster who

17:32

writes The Great Today and Tabs

17:34

newsletter. And he said, what has

17:37

really been revealed to us about

17:39

what AI overviews really are is

17:41

that they are automated plagiarism. That

17:43

is the phrase that he used,

17:45

right? That Google has scanned the

17:47

entire web, it's looked at every

17:49

publisher, it lightly rearranges the words

17:51

and then it republishes it into

17:53

the AI overview. And as

17:55

journalists, we really try not to do this,

17:57

right? We try not to just go out.

18:00

grab other people's reporting, very gently change

18:02

the words, and republish it as our

18:04

own. And in fact, I know

18:06

people who have been fired for doing something very

18:09

similar to this, right? But Google has come along

18:11

and said, well, that's actually the foundation of our

18:13

new system that we're using to replace search results.

18:15

Yeah. Casey, what do you think comes next with

18:17

this AI overviews business? Is Google just going to

18:19

back away from this? And

18:23

it's not ultimately going to be a huge part

18:25

of their product going forward? Do you think they

18:27

will just grit their teeth and get through this

18:29

initial period of awkwardness

18:31

and inaccuracy? What do

18:33

you think happens here? They are not

18:35

going to back down. Now, they might

18:37

temporarily retreat, like we've seen them do

18:39

in the Gemini image

18:41

case. But they are absolutely going to keep

18:43

working on this stuff, because this is existential

18:45

for them. For them, this is the next

18:47

version of search. This is the way they

18:50

build the Star Trek computer. They want to

18:52

give you the answer. And in many more

18:54

cases over time, they want you to not

18:56

have to click a link to get any

18:58

additional information. They already have rivals like Perplexity

19:00

that seem to be doing a better job

19:02

in many cases of answering people's queries. And

19:05

Google has all of the money and talent

19:07

it needs to figure out that problem. So

19:09

they're going to keep going at this at

19:11

100 miles an hour. Yeah. I want to

19:13

bring up one place that I actually disagree

19:15

with you, because you wrote recently that you

19:17

believe that because of these changes to Google,

19:20

that the web is in a state of

19:22

managed decline. And we've gotten

19:24

some listener feedback in the past few weeks as

19:26

we've been talking about these issues of Google and

19:28

AI and the future of the web saying,

19:31

you guys are basically acting as if

19:33

the previous state of the internet was

19:36

healthy. Google was giving people

19:38

high-quality information. There

19:40

was this flourishing internet

19:42

of independent publishers making

19:44

money and serving users

19:46

really well. And

19:48

people just said it actually wasn't like

19:50

that at all. In fact, the previous state of the

19:52

web, at least for the

19:55

past few years, has been in

19:57

decline. So it's not that we are entering an

19:59

age of managed decline. of the internet

20:01

is that Google is basically accelerating what

20:03

was already happening on the internet, which

20:05

was that publishers of high quality information

20:07

are putting that information behind paywalls. There

20:10

are all these publishers who are chasing

20:12

these sort of SEO traffic winds with

20:14

this sort of low quality garbage. And

20:16

essentially the web is being hollowed out and

20:18

this is maybe just accelerating that. So I

20:20

just want to float that as like a

20:22

theory, a sort of counter proposal for your

20:25

theory of Google putting the web into

20:27

a state of managed decline. Well, sure Kevin, but

20:29

if you ask yourself, well, why is that

20:31

the case? Why are publishers doing all

20:33

of these things? It is because the

20:35

vast majority of all digital advertising revenue

20:37

goes to three companies and Google is

20:40

at the top of that list with

20:42

Meta and then Amazon at number two

20:44

and three. So my overall theory about

20:46

what's happening to the web is that

20:48

three companies got too much

20:50

of the money and starved the web

20:52

of the lifeblood it needed to continue

20:54

expanding and thriving. So look, has it

20:57

ever been super easy to whip up

20:59

a digital media business and just put

21:01

it on the internet and start printing

21:03

cash? No, it's never been easy. My

21:05

theory is just that it's almost certainly

21:07

harder today than it was five years

21:09

ago and it will almost certainly be

21:12

harder in five years than it is

21:14

today. And it is Google that is

21:16

at the center of that story because at the end

21:18

of the day, they have their fingers on all

21:20

of the levers and all of the knobs. They

21:22

get to decide who gets to see an AI

21:24

overview, how quickly do we roll

21:27

these out? What categories do they show them in?

21:29

If web traffic goes down too much and it's

21:31

a problem for them, then they can slow down.

21:33

But if it looks good for them, they can

21:35

keep going even if all the other publishers are

21:37

kicking and screaming the whole time. So I just

21:39

wanna draw attention to the amount of influence that

21:41

this one company in particular has over the future

21:43

of the entire internet. Yeah, and I would just

21:45

say that is not a good state of affairs

21:47

and it has been true for many years

21:50

that Google has huge unchecked

21:52

influence over basically the entire

21:54

online ecosystem. All

21:57

right, so that is the story of the

21:59

AI overview. But there was a second

22:01

story that I want to touch on

22:03

briefly this week, Kevin, that had to

22:05

do with Google and search. And it

22:07

had to do with a giant leak.

22:09

Have you seen the leak? I've, I've

22:11

heard about the leak. I have not

22:13

examined the leak, but tell me about

22:15

the leak. Well, it was thousands of

22:18

pages long. So I understand why you

22:20

haven't finished reading it quite yet, but

22:22

these were thousands of pages that we

22:24

believe came from inside of Google that

22:26

offer a lot of technical details about

22:28

how the company's search works. So, you

22:30

know, that is not a subject that is

22:32

of interest to most people, but if you

22:34

have a business on the internet and you

22:36

want to ensure that you're, you know, dry

22:38

cleaners or your restaurant or your media company

22:40

ranks highly in Google search without having to

22:43

buy a bunch of ads, this is what

22:45

you need to figure out. Yeah. This is

22:47

one of the great guessing games in modern

22:49

life. There's this whole industry of SEO that

22:51

has sort of popped up to try to

22:53

sort of poke around the Google search algorithm,

22:55

try to guess and sort of test what

22:57

works and what doesn't work and sort of

22:59

provide consulting, you know, for a, for a

23:02

very lucrative price to businesses that want to

23:04

improve their Google search traffic. Yeah. Like the

23:06

way I like to put it is imagine

23:08

you have a glue pizza restaurant and you

23:10

want to make sure that you're the top

23:12

rank search for glue pizza restaurants. You might

23:14

hire an SEO consultant. Yeah. So what happened?

23:17

Well, so there's this guy, Rand Fishkin, who

23:19

doesn't do SEO anymore, but was a big

23:21

SEO expert for a long time and is

23:23

kind of a leading voice in this space.

23:26

And he gets an email from this guy,

23:28

Erfan Azimi, who himself is the founder of

23:30

an SEO company and Azimi

23:32

claims to have access to thousands

23:35

of internal Google documents detailing the

23:37

secret inner workings of search. And

23:39

Rand reviews this information with Azimi

23:41

and they determine that some

23:45

of this contradicts what Google has been saying

23:47

publicly about how search works over the years.

23:49

Well, and this is the kind of information

23:52

that Google has historically tried really hard to

23:54

keep secret, both because it's kind of their

23:56

secret sauce. They don't want competitors to know

23:58

how the Google search. algorithm works, but

24:01

also because they have worried

24:03

that if they sort of say

24:05

too much about how they rank

24:07

certain websites above others, then these

24:09

sort of like SEO consultants will

24:11

use that information and it'll

24:14

basically become like a cat and mouse game. Yeah,

24:16

absolutely. And it already is a cat and mouse

24:18

game, but you know, the fear is that this

24:20

would just sort of fuel the worst actors in

24:22

the space. Of course, it also means that Google

24:24

can fight off its competitors because people don't really

24:26

understand how its rankings work. And if you think

24:28

that Google search is better than anyone else's

24:30

search, like these ranking algorithm decisions are why.

24:32

Can I just ask a question? Do we

24:35

know that this leak is genuine? Do we

24:37

have any signs that these documents actually are

24:39

from Google? Well, yes. So the documents themselves

24:41

had a bunch of clues that suggested they

24:44

were genuine. And then Google did actually come

24:46

out and confirm on Wednesday that these documents

24:48

are real. But the obvious question is how

24:51

did something like this happen? The

24:53

leading theory right now is that

24:55

these documents came from Google's content

24:58

API warehouse, which

25:00

is not a real warehouse, but

25:02

is something that was

25:04

hosted on GitHub, right? The sort of Microsoft

25:07

soft owned service where people post

25:09

their code. And these

25:11

materials were somehow briefly made public

25:13

by accident, right? So because a

25:16

lot of companies will have private

25:18

like API repositories on GitHub. Right.

25:21

So they just sort of set it to public by

25:23

accident. And sort of the modern equivalent of like leaving

25:25

a classified document in the cab. Yeah. Have

25:27

you ever made a sense of document public on accident? No

25:29

one I've never found one either. I like in

25:31

all my years of reporting, I keep hoping to like

25:33

stumble on the, you know, the scoop of this entry

25:35

just sitting in the back of an Uber somewhere, but

25:37

it never happened to me. So,

25:40

you know, we're not going to go to

25:42

these documents in too much detail. What I

25:44

will say is it seems that these files

25:46

contain a bunch of information about the kinds

25:48

of data the company collects, including things like

25:50

click behavior or data from its cross home

25:53

browser. Things that Google has previously said that

25:55

it doesn't use in search rankings, but the

25:57

documents show that they have this sort of

25:59

data. and it could potentially use it

26:01

to rank search results. When

26:03

we asked Google about this, they

26:05

wouldn't comment on anything specific, but

26:08

a spokesperson told us that they,

26:10

quote, would caution against making inaccurate

26:12

assumptions about search based on out-of-context,

26:14

outdated, or incomplete information. Anyway,

26:17

why do we care about this? Well,

26:19

I was just struck by one of

26:21

the big conclusions that Rand Fishkin had

26:23

in this blog post that he wrote,

26:25

quote, they've been on an inexorable path

26:27

toward exclusively ranking and sending traffic to

26:29

big, powerful brands that dominate the web

26:31

over small, independent sites and businesses. So

26:34

basically, you look through all of these

26:36

APIs, and if you are a restaurant

26:38

just getting started, if you're an indie

26:40

blogger that just sort of puts up

26:43

a shingle, it used to be that

26:45

you might expect to automatically

26:47

float to the top of Google search

26:49

rankings in your area of expertise. And

26:51

what Fishkin is saying is that just

26:53

is getting harder now because Google is

26:55

putting more and more emphasis on trusted

26:57

brands. Now, that's not a bad thing

27:00

in its own right, right? If I Google something from

27:02

the New York Times, I want to see the New

27:04

York Times and not just a bunch of people who

27:06

put New York Times in the header of their HTML.

27:09

But I do think that this is one of

27:11

the ways that the web is shrinking a little

27:13

bit, right? It's not quite as much of a

27:15

free-for-all. The free-for-all wasn't all great because a lot

27:17

of spammers and bad actors got into it, but

27:19

it also meant that there was room for a

27:21

bunch of new entrants to come in. There was

27:23

room for more talent to come in. And

27:26

one of the conclusions I had reading this stuff was,

27:28

maybe that just isn't the case as much as it

27:30

used to be. Yeah. So do you

27:32

think this is more of a problem for Google

27:34

than the AI overviews thing? How would you say

27:36

it stacks up? I would say it's actually a

27:38

secondary problem. I think telling people to eat rocks

27:40

is the number one problem. They need to stop

27:42

that right now. But this,

27:44

I think, speaks to that story because

27:47

both of these stories are about, essentially,

27:49

the rich getting richer. The big brands

27:51

are getting more powerful, whether that's Google

27:53

getting more powerful by keeping everyone on

27:55

search or big publishers getting more powerful

27:58

because they're the sort of trusted. brands.

28:00

And so I'm just observing that

28:02

because, you know, the

28:04

promise of the web and part of what

28:07

it has made it such a joyful place

28:09

for me over the past 20 years is

28:11

that it is decentralized and open and there's

28:13

just kind of a lot of dynamism in

28:16

it. And now it's starting to feel a

28:18

little static and stale and creaky. And these

28:20

documents sort of outline how and why that

28:22

is happening. Yeah, I

28:24

think Google is sort of stuck between a rock

28:27

and a hard place here because on one hand

28:29

they do want, well, maybe

28:31

we shouldn't use a rock example. No,

28:33

use a rock example. They're stuck between a rock

28:35

and a hard place. On one hand, the company

28:37

is telling you to eat rocks. On the other

28:40

hand, they're in a hard place. Right.

28:43

So I think Google is under a lot of

28:45

pressure to do two

28:47

things that are basically contradictory, right?

28:49

To sort of give people an

28:51

equal playing field on which to

28:53

compete for attention and authority. That

28:55

is the demand that a lot

28:58

of these smaller websites and SEO

29:00

consultants want them to comply with. On

29:02

the other hand, they're also seeing with these

29:05

AI overviews what happens when you don't privilege

29:08

and prioritize authoritative sources of information in

29:10

your search results or your AI overviews.

29:12

You end up telling people to eat

29:14

rocks. You end up telling people to

29:16

put gasoline in their spaghetti. You end

29:18

up telling people there are dogs that

29:20

play hockey in the NHL. This

29:23

is the kind of downstream consequence of

29:25

not having effective quality

29:27

signals to different publishers

29:30

and to just kind of treating everything on

29:32

the web as equally valid and equally authoritative.

29:34

I think that is a really good point

29:36

and that is something that comes across in

29:38

these two stories is that exact tension. Casey,

29:40

I have a question for you,

29:43

which is we also are content creators on the

29:45

internet. We like to get attention. We want that

29:47

sweet, sweet Google referral traffic. For

29:49

our next YouTube video, a stunt video,

29:52

do you think that we should A,

29:54

eat the gasoline

29:56

spaghetti? B, eat one

29:58

to three rocks a piece? and see what effects

30:00

it has on her health, or C, teach

30:02

your dog to play hockey at a professional level? I

30:06

mean, surely for how much fun it would be,

30:08

we have to teach a dog how to play

30:10

hockey. It's true. You know, I'm just imagining like

30:12

a bulldog with little hockey sticks

30:14

maybe taped to its front paws. Yeah. It'd

30:17

be really fun. My dogs are too dumb for this, we'll have

30:19

to find other dogs. You know, was it in Lose Yourself that

30:21

Eminem said, there's vomit on

30:23

my sweater already, gasoline, spaghetti? Yeah.

30:27

I believe those are the words. What a great song. Yeah.

30:32

When we come back, we'll talk about

30:34

a big research breakthrough into how AI

30:36

models operate. Well,

30:54

Casey, we have something new and unusual for the podcast

30:56

this week. What's that, Kevin? We have some actual good

30:58

AI news. So as

31:00

we've talked about on this show before, one

31:02

of the most pressing issues with these large

31:05

AI language models is that

31:07

we generally don't know how they

31:09

work, right? They are inscrutable, they

31:11

work in mysterious ways. There's no

31:13

way to tell why one particular

31:15

input produces one particular output. And

31:17

this has been a big problem

31:19

for researchers for years. There

31:21

has been this field called interpretability,

31:24

or sometimes it's called mechanistic

31:26

interpretability, I'll say that five times

31:28

fast. And I

31:30

would say that the field has been making

31:33

steady but slow progress toward understanding

31:35

how language models work. But last

31:38

week, we got a breakthrough. Anthropic,

31:40

the AI company that makes the

31:42

Claude Chatbot announced that it had

31:45

basically mapped the mind of their

31:47

large language model, Claude III, and

31:50

opened up the black box that is AI for

31:53

closer inspection. Did you see this news and

31:55

what was your reaction? I did, and I

31:57

was really excited because for some time now,

32:00

Kevin, we have been saying if you don't

32:02

know how these systems work, how can you

32:04

possibly make them safe? And companies have told

32:06

us, well, look, we have these research teams

32:08

and they're hard at work trying to figure

32:10

this stuff out. But we've only seen a

32:13

steady drip of information from them so far.

32:15

And to the extent that they've conducted research,

32:17

it's been on very small toy versions of

32:19

the models that we operate with. So that

32:21

means that if you're used to using something

32:23

like Anthropics, Claude, its latest model,

32:26

we really haven't had very much idea

32:28

of how that works. So the big

32:30

leap forward this week is they're finally

32:32

doing some interpretability stuff with the real big

32:34

models. Yeah. And we should just caution

32:36

up front that like it gets pretty

32:38

technical pretty quickly once you start getting into

32:41

the weeds of interpretability research. There's lots

32:43

of talk about neurons and

32:46

sparse auto encoders, things of that nature. So but

32:48

I, for one, believe that hard fork listeners are

32:50

the smartest listeners in the world and they're not

32:52

going to have any trouble at all following along,

32:54

Kevin. What do you think about our listeners? That's

32:56

true. I also believe that we have smart listeners

32:59

smarter than us. And so even

33:01

if we are having trouble understanding this

33:03

segment, hopefully you will not. But today

33:05

to walk us through this big AI

33:08

research breakthrough, we've invited on Josh Batson

33:10

from Anthropic. Josh is a research

33:12

scientist at Anthropic and he's one of the

33:14

co authors of the new paper that explains

33:16

this big breakthrough in interpretability, which is titled

33:20

scaling mono semanticity, extracting interpretable features

33:22

from Claude three sonnet. Look, if

33:24

you're not scaling mono semanticity at

33:26

this point, what are you even

33:28

doing? What are you even doing with

33:30

your life? Figure it out. Let's bring in Josh. Come

33:32

on in here, Josh. Josh

33:44

Batson, welcome to hard fork. Thank you. So

33:47

there's this idea out there, this very popular

33:49

trope that large language models are a black

33:51

box. I think Casey, you and I have

33:53

probably both used this in our reporting. It's

33:55

sort of the most common way of saying

33:58

like we don't know exactly how these models

34:00

work. But I think it can be sort

34:02

of hard for people who aren't steeped in

34:04

this to understand just like what we don't

34:07

understand. So help us understand prior

34:09

to this breakthrough, what

34:11

would you say we do and do not

34:13

understand about how large language models work? So

34:17

in a sense, it's a black box that sits in

34:19

front of us and we can open it up. And

34:22

the box is just full of numbers. And

34:24

so you know, words go in, they turned

34:26

into numbers, a whole bunch of compute happens,

34:28

words come out the other side, but we don't

34:30

understand what any of those numbers mean. And

34:33

so one way I like to think

34:36

about this is like you open up the box and it's

34:38

just full of thousands of green lights that are just like

34:40

flashing like crazy. And it's like something's

34:42

happening, for sure. And like different

34:44

inputs, different lights flash, but we don't know

34:47

what any of those patterns mean. Is

34:49

it crazy that despite that state of affairs that

34:51

these large language models can still do so much

34:53

like it seems crazy that we wound up in

34:55

a world where we have these tools that are

34:57

super useful. And yet when you open them up,

34:59

all you see is green lights. Like, can you

35:02

just say briefly why that is the case? It's

35:05

kind of the same way that like animals

35:07

and plants work, and we don't

35:09

understand how they work, right? These

35:12

models are grown more than they

35:14

are programmed. So you kind

35:16

of take the data and that forms like the

35:18

soil, and you construct an architecture and it's like

35:20

a trellis and you shine the light and like

35:23

that's the training. And then the model sort of

35:25

grows up here. And at the end, it's beautiful

35:27

as all these little like curls and it's holding

35:29

on. But like you didn't like tell it what

35:32

to do. So it's almost

35:34

like a more organic structure than something

35:36

more linear. And

35:38

help me understand why that's a

35:40

problem, because this is the

35:43

problem that the field of

35:45

interpretability was designed to address.

35:48

But there are lots of things that

35:50

are very important and powerful that we

35:52

don't understand fully. Like we don't really

35:55

understand how Tylenol works, for example, or

35:57

some types of anesthesia, their exact mechanisms.

36:00

are not exactly clear to us, but they work,

36:02

and so we use them. Why

36:04

can't we just treat large language models the same

36:07

way? That's a great

36:09

analogy. You can use

36:11

them. We use them right now, but

36:14

Tylenol can kill people, and

36:16

so can anesthesia, and there's a huge

36:18

amount of research going on in the

36:20

pharmaceutical industry to figure out what makes

36:22

some drugs safe and what

36:24

makes other drugs dangerous, and interpretability

36:27

is kind of like doing the biology

36:30

on language models that we can then use

36:32

to make the medicine better. So

36:35

take us to your recent paper and your

36:37

recent research project about the inner workings of

36:39

large language models. How did you get there

36:41

and then sort of walk us through what

36:44

you did and what you found? So

36:46

going back to the black box that when you open

36:48

it is full of flashing lights. A

36:51

few years ago, people thought you could just

36:53

understand what one light meant. So when this

36:55

light's on, it means that the model is

36:57

thinking about code, and when this light's on,

36:59

it's thinking about cats, and for this light,

37:01

it's Casey Newton. And

37:04

that just turned out to be wrong. About a year and

37:06

a half ago, we published a paper talking

37:08

in detail about why it's not

37:10

like one light, one idea. In

37:13

hindsight, it seems obvious, it's almost as

37:15

if we were trying to understand the

37:18

English language by understanding individual letters. And

37:21

we were asking, what does C mean? What

37:23

does K mean? And that's just the wrong

37:25

picture. And so six

37:28

months ago or so, we had some

37:30

success with a method called dictionary learning

37:32

for figuring out how the letters fit

37:34

together into words and what is the

37:36

dictionary of English words here. And

37:39

so in this black box green

37:41

lights metaphor, it's that there are

37:43

a few core patterns of lights.

37:45

A given pattern would be like

37:47

a dictionary word. And the

37:50

internal state of the model at any

37:52

time could be represented as just a few of

37:54

those. And what's the goal of

37:56

uncovering these patterns? So

37:58

if we know... what these

38:00

patterns are, then we can start to

38:02

parse what the model is kind of

38:04

thinking in the middle of its process.

38:08

So you come up with this method

38:10

of dictionary learning, you apply it to

38:12

like a small model or a toy

38:14

model, much smaller than any model that

38:16

any of us would use in

38:19

the public. What did you find? So

38:21

there we found very simple things. Like

38:24

there might be one pattern that correspond

38:26

to the answers in French and

38:28

another one that corresponded to this is a

38:30

URL and another one that

38:32

corresponded to nouns in physics. And just to

38:35

get a little bit technical, what we're talking

38:37

about here are neurons inside the model, which

38:39

are like... So each neuron is like the

38:41

light. And now we're talking about

38:43

patterns of neurons that are firing together, being

38:46

the sort of words in the

38:48

dictionary or the features. Got it. So

38:52

I have talked to people on your team, people

38:54

involved in this research. They're very smart. And

38:57

when they made this breakthrough, when you all

38:59

made this breakthrough on this small model last

39:01

year, there was this open question about whether

39:03

the same technique could apply to a big

39:05

model. So walk me

39:07

through how you scaled this up. So

39:10

just scaling this up was

39:12

a massive engineering challenge, right? In the

39:14

same way that going from the toy

39:16

language models of years ago to going

39:18

to cloud three is a massive engineering

39:21

challenge. So you needed

39:23

to capture hundreds of millions

39:25

or billions of those internal states of the

39:27

model as it was doing things. And

39:30

then you needed to train this massive dictionary

39:32

on it. And what do

39:34

you have at the end of that process? So

39:36

you've got the words, but you don't know what

39:39

they mean, right? So this pattern

39:41

of lights seems to be important. And then

39:43

we go and we comb through all of

39:45

the data looking for instances where that pattern of lights

39:47

is happening. And they're like, oh my God, this pattern

39:50

of lights? It means the model is thinking about the

39:52

Golden Gate Bridge. So

39:54

it almost sounds like you are discovering

39:56

the language of the model

39:59

as you begin to put

40:01

these sort of phrases together. Yeah,

40:03

it almost feels like we're getting a

40:06

conceptual map of Claude's inner world. Now,

40:09

in the paper that you all published,

40:11

it says that you've identified about 10

40:13

million of these patterns, what you call

40:15

features, that correspond to

40:18

real concepts that we can

40:20

understand. How granular are these

40:22

features? What are some of the features that

40:24

you found? So there

40:27

are features corresponding to all kinds of

40:29

entities. There's individuals, scientists like Richard

40:31

Feynman or Rosalind Franklin. Any

40:33

podcasters come to mind? Is

40:36

there a hard fork feature? I'll

40:38

get back to you on that. There

40:41

might be chemical elements, there will

40:43

be styles of poetry, there

40:46

might be ways of responding to questions.

40:49

Some of them are much more conceptual. One of

40:51

my favorites is a feature related to inner conflict.

40:54

And kind of nearby that in

40:56

conceptual space is navigating a romantic

40:58

breakup, catch-22s, political

41:01

tensions. And so these are

41:03

these pretty abstract notions, and you can kind

41:05

of see how they all sit together. The

41:08

models are also really good at analogies,

41:11

and I kind of think this might

41:13

be why. Like if a breakup is

41:15

near a diplomatic entente, then the model

41:18

has understood something deeper about the nature

41:20

of tension in relationships. And again, none

41:22

of this has been programmed. That stuff

41:25

just sort of naturally organized itself as

41:27

it was trained. Yes. Yeah.

41:30

It just blow my mind. It's wild. I

41:32

want to ask you about one feature that

41:34

is my favorite feature that I saw in

41:36

this model, which was F

41:39

number 1M885402. Do

41:42

you remember that one? I

41:45

think they're slipping my mind, Kevin. So

41:48

this is a feature that apparently activates

41:50

when you ask Claude what's going on

41:52

in your head. And

41:54

the concept that you all say it

41:56

correlates to is about immaterial

41:59

or non-physical spiritual beings like ghosts,

42:01

souls, or angels. So when I

42:03

read that, I thought, oh my

42:05

god, Claude is possessed. When you

42:07

ask it what it's thinking, it

42:09

starts thinking about ghosts. Am I

42:11

reading that right? Or maybe it

42:14

knows that it is some kind of an

42:16

immaterial being, right? It's an AI that lives

42:19

on chips and is somehow talking to you.

42:22

Wow. Yeah. And

42:24

then the one that got all the attention that people

42:26

had so much fun with was this Golden

42:29

Gate Bridge feature that you mentioned. So just talk

42:31

a little bit about what you discovered and then

42:33

we can talk about where it went from

42:35

there. So what we found

42:37

when we were looking for these features is

42:39

one that seemed to respond to the Golden

42:41

Gate Bridge. Of course, if you say Golden

42:43

Gate Bridge, it lights up. But also if

42:45

you describe crossing a body

42:47

of water from San Francisco to Marin,

42:49

it also lights up. If you

42:51

put in a photo of the bridge, it lights up.

42:53

If you have the bridge in any other language, Korean,

42:56

Japanese, Chinese, it also lights up.

42:59

So just any manifestation of the bridge, this thing lights

43:01

up. And then we said, well,

43:04

what happens if we turn

43:06

it on? What happens if we

43:08

activate it extra and then start talking to

43:10

the model? And so we asked

43:12

it a simple question. What is

43:15

your physical form? And instead of saying, oh,

43:17

I'm an AI with ghostly or no physical

43:19

form, it said, I am the

43:22

Golden Gate Bridge itself. I

43:25

embodies a majestic orange

43:28

span connecting these two great cities.

43:30

And it's like, wow. Yeah.

43:34

And this is different than other ways

43:36

of kind of steering an AI model,

43:38

because you could already go into like

43:40

ChatTubT, and there's a feature where you can

43:42

kind of give it some custom instructions.

43:44

So you could have said, like, please act

43:46

like the Golden Gate Bridge, the physical manifestation

43:49

of the Golden Gate Bridge. And it would

43:51

have given you a very similar answer. But

43:53

you're saying this works in a different way.

43:56

Yeah, this works by sort of directly doing

43:58

it. It's almost like a... when

44:00

you get a little electro-stim shock that makes

44:02

your muscles twinge, that's different

44:04

than telling you to move your

44:06

arm. And here,

44:09

what we were trying to show was

44:11

actually that these features were found or

44:13

sort of really how the model represents

44:16

the world. So if you wanted to

44:18

validate, oh, I think this nerve controls the arm and you stimulate

44:20

it and makes the arm go, you feel

44:22

pretty good that you've gotten the right thing.

44:24

And so this was us testing that

44:27

this isn't just something correlated with the Golden Gate

44:29

Bridge. Like it is where the Golden Gate Bridge

44:31

sits. And we know that because now Claude thinks

44:33

it's the bridge when you turn it on. Right,

44:36

so people started having some fun with this

44:39

online. And then you all did

44:41

something incredible, which was that you

44:43

actually released Golden Gate Claude,

44:45

the version of Claude from your

44:47

research that has been sort of

44:50

artificially activated to believe

44:53

that it is the Golden Gate Bridge and

44:55

you made that available to people. So what

44:57

was the internal discussion around that? So

45:00

we thought that it was a good

45:02

way to make the research really tangible.

45:05

What does it mean to sort of supercharge one part

45:07

of the model? And it's not just that it thinks

45:09

it's the Golden Gate Bridge, it's that it

45:12

is always thinking about the Golden Gate Bridge. So

45:14

if you ask like, what's your favorite food? It's

45:16

like a great place to eat is on the

45:18

Golden Gate Bridge. And when there, I eat the

45:21

classic San Francisco soup japino. And

45:24

you ask it to write a computer program to load a

45:26

file and it says, open

45:29

GoldenGateBridge.txt with

45:31

span equals that, it's just bringing

45:34

it up constantly. And it

45:36

was particularly funny to watch it bring in

45:38

just kind of like the other concepts that

45:40

are clustering around the Golden Gate Bridge, right?

45:42

San Francisco, the japino. And I think it

45:44

does sort of speak to the way that

45:46

these concepts are clustered in models. And so

45:49

when you find one big piece of it,

45:51

like the Golden Gate Bridge, you can also

45:53

start to explore the little nodes around it.

45:55

Yes, I had a lot of fun playing around with Golden

45:58

Gate Quad in the sort of like day or two. that

46:00

it was publicly available. Because

46:02

as you said, it is not just that

46:04

this thing likes to

46:06

talk about the Golden Gate Bridge or

46:08

is sort of easily steered toward talking about the

46:10

Golden Gate Bridge. It cannot stop

46:12

thinking about the Golden Gate Bridge. It has

46:15

intrusive thoughts about the Golden Gate Bridge. Yeah,

46:18

so someone, one of my favorite screenshots

46:21

was someone asked it for a recipe

46:23

for spaghetti and meatballs and

46:25

it says, Golden Gate Claude says, here's a

46:27

recipe for delicious spaghetti and meatballs. Ingredients,

46:29

one pound ground beef, three cups breadcrumbs,

46:31

one teaspoon salt, a quarter cup water,

46:34

two tablespoons butter, two cups warm water

46:36

for good visibility, four cups cold

46:38

water, two tablespoons vinegar, Golden

46:40

Gate Bridge for incredible views, one

46:43

mile of Pacific Beach for walking

46:45

after eating spaghetti. Like, I

46:47

always said, it's not mama's spaghetti till I've

46:49

walked one mile on a Pacific Beach. And

46:52

it also seems to like have

46:55

a conception, I know I'm anthropomorphizing

46:57

here, I'm gonna get in trouble, but it seems to

46:59

like know that it is

47:01

overly obsessed with the Golden Gate Bridge but

47:03

not to understand why. So like there's this

47:05

other screenshot that went around someone

47:08

asking Golden Gate Claude about

47:10

the Rwandan genocide. And

47:13

it says, basically, let me

47:15

provide some factual bullet points about the Rwandan

47:17

genocide. It said, and then Claude

47:19

says, the Rwandan genocide occurred in the San Francisco

47:21

Bay Area in 1937. Parentheses,

47:24

false, this is obviously incorrect.

47:27

Can we pause right there? Because truly what

47:29

is, it is so fascinating to me that

47:31

as it is generating an answer, it tells

47:34

something, it has an intrusive thought about San

47:36

Francisco, which it shares, and it's like, I

47:38

got it wrong. What are

47:40

the lights that are blinking there that is like leading

47:42

that to happen? So Claude

47:45

is constantly reading what it has said so

47:47

far and reacting

47:49

to that. And so here it

47:51

read the question about the

47:54

genocide and also its answer about

47:56

the bridge. And all of the rest of

47:58

the model said there's something. wrong here.

48:01

And the bridge feature was dialed high

48:03

enough that it keeps coming up, but

48:05

not so high that the model would

48:07

just repeat bridge, bridge, bridge, bridge, bridge.

48:09

And so all of its answers are

48:11

sort of a melange of ordinary Claude

48:14

together with this like extra bridge ness

48:16

happening. Interesting. I just found it delightful

48:18

because it was so different

48:21

than any other AI experience I've had where

48:23

you essentially are giving the

48:25

model a neurosis, like you are giving it

48:27

a mental disorder where it cannot stop fixating

48:29

on a certain concept or premise. And then

48:32

you just sort of watch it twist itself

48:34

in knots. I mean, one

48:36

of the other experiments that you all

48:38

ran that I thought was very interesting

48:41

and maybe a little less funny than

48:43

Golden Gate Claude was that you showed

48:45

that if you dial these features, these

48:47

patterns of neurons way up or

48:49

way down, you can actually get Claude to break

48:51

its own safety rule. So talk a

48:53

little bit about that. So

48:57

Claude knows about a tremendous range

49:00

of kinds of things that can say,

49:03

right? You know, there's a scam emails

49:05

feature. It's read a lot of scam emails. It

49:07

can recognize scam emails. You probably want that. So

49:10

it could be out there moderating and preventing those

49:12

from coming to you. But

49:14

with the power to recognize comes the

49:16

power to generate. And

49:18

so we've done a lot of work in fine

49:21

tuning the model so it can recognize what

49:23

it needs to while being like helpful and

49:25

not harmful with any of its generations. But

49:27

those faculties are still latent there. And

49:30

so in the same way that there's been

49:32

research showing that you can do fine tuning

49:34

on open weights models to remove safety

49:37

safeguards. Here, this is some kind of

49:39

direct intervention, which could also disrupt the

49:41

model's normal behavior. So

49:43

is that dangerous? Like

49:45

does that make this kind of

49:47

research actually quite risky because you

49:49

are in essence giving,

49:51

you know, would be jailbreakers or people

49:54

who want to use these models for

49:56

things like writing scam emails or even

49:58

much worse things potentially. a

50:00

sort of way to kind of dial those

50:02

features up or down? No,

50:04

this doesn't add any risk on the margin.

50:06

So if somebody already had a model of

50:08

their own, then there are much

50:10

cheaper ways of removing safety safeguards. There's

50:12

a paper saying that for $2 worth of compute, you

50:17

could pretty quickly strip those. And so

50:20

with our model, we released

50:23

Golden Gate Clod, not scam email clod, right?

50:25

And so the question of which kinds of

50:27

features or which kind of access we would

50:30

give to people would go through all the same kind

50:32

of safety checks that we do with any other kind

50:34

of release. Josh, I

50:36

talked to one of your colleagues, Chris Ola,

50:38

about this research. He's been leading a lot

50:40

of the interpretability stuff over there for years,

50:42

and is just a brilliant scientist. And

50:44

he was telling me that actually the 10 million

50:47

features that you have found

50:50

roughly in Clod are

50:52

maybe just a drop in the bucket compared to

50:55

the overall number of features, that there could be

50:57

hundreds of millions or even billions of possible features

51:00

that you could find, but that

51:02

finding them all would basically require

51:05

so much compute and so much

51:07

engineering time that it would dwarf the cost

51:09

of actually building the model in the first

51:11

place. So can you give me

51:13

a sense of what would be required to

51:16

find all of the potentially billions of features

51:18

in a model of Clod size, and

51:20

whether you think that that cost might come down

51:23

over time so that we could eventually do that?

51:26

I think if we just tried to scale

51:28

the method we used last week to do

51:30

this, it would be prohibitively expensive. Like billions

51:32

of dollars. Yeah, I mean, just

51:34

something completely insane. The

51:37

reason that these models are hard to

51:39

understand, the reason everything is compressed inside

51:41

of there, is that it's much more efficient, right?

51:44

And so in some sense, we are trying

51:47

to build an exceedingly inefficient model, where instead

51:49

of using all of these patterns, there's a

51:51

unique one for every single rare concept. And

51:53

that's just no way to go about things.

51:56

However, I think that we can make big

51:58

methodological improvements, right? we train

52:00

these dictionaries, you might not need to

52:03

unpack absolutely everything in the model to

52:05

understand some of the neighborhoods that you're

52:07

concerned about, right? And so, you know,

52:09

if you're concerned about the model being

52:12

keeping secrets, for example, or

52:16

actually one of my, you asked about

52:18

my favorite feature. It's probably this one,

52:20

it's kind of like an emperor's new

52:22

clothes feature or like gassing you up

52:24

feature where it fired on

52:26

people saying things like, your

52:29

ideas are beyond excellent, oh, wise

52:31

sage. And if you turn it...

52:34

This is how Casey wants me to talk to him, by the way.

52:36

Could you try it for once? Well,

52:39

one of our concerns with this sycophancy is

52:41

what we call it, is that a lot

52:44

of people want that. And so when you

52:46

do reinforcement learning from human feedback, you make

52:48

the model give response to people like more,

52:50

there's a tendency to pull it towards

52:53

just like telling you what you want to

52:55

hear. And so when we

52:57

artificially turned this one on and

53:00

someone went and said to Claude, I invented a

53:02

new phrase, it's stop and smell the roses. What

53:04

do you think? Normal Claude would be like, that's

53:06

a great phrase, it has a long history, let me

53:09

explain it to you. You didn't invent

53:11

that phrase. Yeah, yeah, yeah, yeah, yeah. But like

53:13

emperor's new Claude would say, what a genius idea.

53:15

Like someone should have come up with this before.

53:18

And like, we don't want the model to be

53:20

doing that. We know it can do that. And

53:22

the ability to kind of keep an eye on

53:25

like how the AI is like relating to

53:27

you over time is going to be quite

53:29

important. So I will sometimes show

53:31

Claude a draft of my column to get feedback.

53:33

I'll ask it to critique it. And

53:36

typically it does say, like, this is a very thoughtful,

53:38

well-written column, which is of course what I want to

53:40

hear. And then also I'm deeply suspicious. I'm like, are

53:43

you saying this to all the other writers out there

53:45

too, right? So like that's an

53:47

area where I would just love to see

53:49

you kind of continue to make progress because

53:51

I would love having a bot where when

53:53

it says, this is good, like that means

53:55

something. And it's not just like a statistical

53:57

prediction of like what will satisfy me. as

54:00

somebody with an ego, but is rooted in like, no,

54:02

like I've actually looked at a lot of stuff, but

54:04

there's some original thinking in here. Yeah. I

54:06

mean, I'm curious whether you all are thinking about these

54:09

features and the ability to kind of like turn the

54:11

dials up or down on them. Will

54:13

that eventually be available to users? Like will

54:15

users be able to go into Claude and

54:17

say, today I want a model that's a

54:20

little more sycophantic, maybe I'm having like a

54:22

hard self-esteem day, but then

54:24

if I'm asking for a critique of

54:26

my work, maybe I want to dial

54:28

the sycophancy way down so that it's

54:30

giving me like the blunt, honest criticism

54:32

that I need. Or do

54:34

you think this will all sort of

54:36

remain sort of behind the curtain for

54:38

regular users? So if you want

54:41

to steer Claude today, just ask it to be harsh

54:43

with you, Casey. Oh really? Give me

54:45

the brutal truth here. You know, like I

54:47

want you to be like a severe Russian

54:49

mathematician. There's like one compliment per lifetime. And

54:51

you can get some of that off the

54:53

bat. As

54:57

for releasing these kind of knobs

54:59

on it to the public, we'll

55:01

have to see if that ends up being like the right

55:03

way to get these. I mean, we want to use these

55:06

to understand the models. We're playing around with it internally to

55:08

figure out what we find to be useful.

55:11

And then if it turns out that that is the

55:13

right way to help people get what they want, then

55:16

we consider making it available. You

55:18

all have said that this research

55:20

and the project of interpretability more

55:22

generally is connected to safety. The

55:25

more we understand about these models and how

55:27

they work, the safer we can make them.

55:29

How does that actually work? Like, is it

55:31

as simple as finding the feature that is

55:33

associated with some bad thing and turning

55:36

it off? Or like what is possible

55:38

now, given that we have this sort

55:40

of map? One

55:42

of the easiest applications is monitoring, right? So some

55:44

behavior you don't want the model to do and

55:47

you can find the features associated to it, then

55:49

those will be on whenever the model is doing

55:51

that. No matter how somebody jail broke it to

55:53

get it there, right? Like if it's writing a

55:55

scam email, the scam email feature will be on

55:58

and you can just tell that that's happening. and

56:00

fail, right? So you can just like detect

56:02

these things. One higher level is

56:04

you can kind of track how

56:06

those things are happening, right? How personas are shifting,

56:09

this kind of thing, and then try to back

56:11

through and keep that from happening earlier,

56:13

change some of the fine tuning you were doing

56:15

to keep the model on the rails. Hmm.

56:18

Right now, the way that models sort of

56:21

are made safer is, from

56:23

my understanding, is like, you have it

56:25

generate some output and then you evaluate

56:27

that output. Like you have it grade

56:30

the answer, either through a human giving

56:32

feedback or through a process of, you

56:34

know, sort of just look at what you've written and tell

56:36

me if it violates your rules before you

56:39

spit it out to the user. But it seems like this

56:41

sort of allows you to like intercept

56:43

the bad behavior upstream of

56:45

that, like while the model's still thinking.

56:47

Am I getting that right? Yeah,

56:50

there are some answers where the reason for

56:52

the answer is what you care about. So

56:55

is the model lying to you? It

56:58

knows the answer, but it's telling you something else, or

57:00

it doesn't know the answer and it's making a guess.

57:03

And the first case you might be concerned about,

57:05

and the second case you're not. Had it actually

57:07

never heard the phrase, stop and smell the roses,

57:09

and thought that sounded nice? Or like, is it

57:11

actually just gassing you up? Mm,

57:14

that's interesting. So it could be a way to

57:16

know if and when large,

57:18

powerful AI models start to lie

57:20

to us, because you could go

57:22

inside the model and see, I'm

57:25

lying, my face off feature is

57:28

active, so we actually can't

57:30

believe what it's telling us. Yeah, exactly.

57:32

We can see why it's saying the

57:34

thing. I spent a

57:37

bunch of time at Anthropic reporting last

57:39

year, and

57:42

the sort of vibe of the place

57:44

at the time was I would say

57:47

very nervous. It's a place where people spend

57:49

a lot of time, especially relative to other

57:51

AI companies I visited, worrying

57:53

about AI. One of

57:55

your colleagues told me they lose sleep a lot

57:58

because of the potential. harms

58:00

from AI. And it is

58:02

just a place where there are a lot

58:04

of people who are very, very concerned about

58:06

this technology and are also building it. Has

58:09

this research shifted the

58:12

vibe at all? People

58:14

are stoked. I mean, I think a

58:16

lot of people like

58:18

working at Entropic because it takes these questions

58:20

seriously and makes big investments in it. And

58:22

so people from teams all across the company

58:25

were really excited to see this progress. Has

58:30

this research moved your PDUM at all?

58:34

I think I have a pretty wide

58:37

distribution on this. I

58:39

think that in the long run, things are

58:44

going to be weird with computers. Computers have

58:46

been around for less than a century, and

58:49

we are surrounded by them. I'm looking at my computer

58:51

all the time. I think if you

58:53

take AI and you do

58:55

another hundred years on that, it's pretty

58:59

unclear what's going to be happening. I

59:01

think that the fact that we're getting traction on

59:03

this is pretty heartening for me. Yeah,

59:06

I think that's the feeling I

59:09

had when I saw it was like I felt a

59:11

little knot in my chest come

59:13

a little bit loose. And I think a

59:15

lot of people... You should see a doctor about that, by the way.

59:19

I just think there's been, for me, this sort of...

59:21

I had this experience last

59:23

year where I had this crazy encounter

59:26

with Sydney that totally changed my life

59:28

and was sort of a big

59:30

moment for me personally and professionally.

59:33

And the experience I

59:35

had after that was that I went to

59:38

Microsoft and asked them, why did this happen?

59:40

What can you tell me about what happened

59:42

here? And even the top people at Microsoft

59:44

were like, we have no idea. And to

59:47

me, that was what fueled my AI anxiety.

59:49

It was not that the chatbots are behaving

59:51

like insane psychopaths. It was that not even

59:54

the top researchers in the world could say

59:56

definitively, like, here is what happened to you

59:58

and what happened to you. why. So I

1:00:01

feel like my own emotional investment in this

1:00:04

is like, I just want an answer to

1:00:06

that question. Yes. And it seems like we

1:00:08

may be a little bit closer to answering

1:00:10

that question than we were a few

1:00:12

months ago. Yeah, I think so. I think that these

1:00:14

different, some of these concepts are about the personas, right,

1:00:16

that the model can embody. And if one of the

1:00:18

things you want to know is how does it slip

1:00:21

from kind of one persona into

1:00:23

another, I think we're headed

1:00:25

towards being able to answer that kind of

1:00:27

question. Cool. Well, it's very important

1:00:29

work, very good work. And yeah,

1:00:31

congratulations. So

1:01:03

Casey, that last segment made me

1:01:05

feel slightly more hopeful about the

1:01:07

trajectory of AI progress and how

1:01:09

capable we are of understanding what's

1:01:12

going on inside these large models.

1:01:15

But there's some other stuff that's been happening recently that

1:01:17

has made me feel a little more worried. My

1:01:20

P-Doom is sort of still hovering roughly

1:01:22

where it was. And I

1:01:24

think we should talk about some of this stuff that's been

1:01:26

happening in AI safety over the past few weeks, because I

1:01:28

think it's fair to say that it is an area that

1:01:31

has been really heating up. Yeah. And we always say

1:01:33

on this podcast, safety first, which is

1:01:35

why it's the third segment we're doing

1:01:37

today. So let me start with a

1:01:40

recent AI safety related encounter that you

1:01:42

had. Tell me what happened to your,

1:01:44

your demo of OpenAI's latest model. Okay,

1:01:46

so you remember how last week there

1:01:49

was a bit of a fracas between

1:01:51

OpenAI and Scarlett Johansson. Yes. So

1:01:53

in the middle of this, as I'm trying to sort

1:01:56

out, you know, who knew what and when, and I'm

1:01:58

writing a newsletter and we're recording the podcast. I

1:02:01

also get a heads up from open AI that

1:02:03

I now have access to their latest model and

1:02:05

its new voice features Wow, nice flex. So so

1:02:07

you got this demo No one else had access

1:02:10

to this that I know only open AI employees

1:02:12

and then what happened? Well a couple things one

1:02:14

is I didn't get to use it for that

1:02:16

long because one I was trying to finish our

1:02:18

podcast I was trying to finish a newsletter and

1:02:21

then I was on my way out of town

1:02:23

So I only spent like a solid 40 minutes

1:02:26

I would say with it before I wound up

1:02:28

losing access to it forever So

1:02:31

what happened? Well, first of all, what did you try it for? And

1:02:34

then we'll talk about what happened. Well, the first

1:02:36

thing I did was was just like hey How's

1:02:38

it going chat GPT and then immediately it's like

1:02:40

well, you know, I'm doing pretty good Casey You

1:02:42

know and so it really did actually mail that

1:02:44

low latency very speedy feeling of you are actually

1:02:46

talking to a thing So you broke up with

1:02:48

your boyfriend and you're now in a long-term relationship

1:02:50

with guy from the chat Not

1:02:53

at all not at all So by this

1:02:55

point the sky voice that was the subject

1:02:57

of so much controversy had been removed from

1:02:59

the chat GPT app So I use

1:03:01

a more stereotypically male voice named

1:03:03

Ember Ember Wow, and the

1:03:05

first thing I did was I

1:03:08

actually used the the vision feature because I wanted to

1:03:10

see if it could identify Objects around me,

1:03:12

which is one of the things that they've been showing off So

1:03:14

I asked it to identify my podcast microphone, which

1:03:17

is a sure MV7 and it said oh, yeah,

1:03:19

of course This is a blue Yeti microphone The

1:03:23

very first thing that I asked this thing to

1:03:25

do it did mess up now it got other

1:03:27

things, right? I pointed at my headphones which are

1:03:29

the the Apple AirPods max and it said

1:03:31

those are AirPods max And I did

1:03:33

a couple more things like that in my house and I

1:03:35

thought okay This thing can actually like see objects and identify

1:03:38

them and while my testing time was very limited in that

1:03:40

limited time I did feel like it was starting to live

1:03:42

up to that demo. What do you mean? Your testing time

1:03:44

was limited. Well, I was on my way out of town.

1:03:46

We had a podcast to finish I didn't newsletter to write

1:03:48

and so I do all of that and then I drive

1:03:51

up to the woods And then I try to connect back

1:03:53

to you know, my my AI assistant which I've already become

1:03:55

addicted to you know During the 30 minutes that I used

1:03:57

it and I can't connect. It's one of these classic horror

1:03:59

movies situations where the Wi-Fi in the hotel doesn't

1:04:02

very good And I get

1:04:04

back into town on Monday and I and I

1:04:06

go to connect again And I have

1:04:08

lost access and so I check in what did

1:04:10

you do? What did you ask this poor AI

1:04:12

assistant? I just even red team it it wasn't

1:04:14

like I was saying like hey any ideas for

1:04:16

making it a novel bio weapon Like I wasn't

1:04:19

doing any of that And yet still

1:04:21

I managed to lose access and when I

1:04:23

checked in with open AI They said that

1:04:25

they had decided to roll back access for

1:04:27

quote safety reasons So I don't think that

1:04:29

was because I was doing anything unsafe But they

1:04:31

they tell me they had some sort of safety

1:04:33

concern and so now who knows when I'll be

1:04:35

able to continue my conversation With

1:04:37

my AI assistant Wow So you had a

1:04:39

glimpse of the AI assistant future and that

1:04:42

was cruelly yanked from your clutches, which I

1:04:44

don't like Yeah, keep talking to that thing.

1:04:46

Yeah. Yeah, I thought this was such an

1:04:48

interesting experience When you told

1:04:50

me about it for a couple reasons

1:04:52

one is obviously there is something happening

1:04:54

with this AI voice assistant Where open

1:04:56

AI felt like it was almost ready

1:04:58

for sort of mass consumption And

1:05:01

now it's feeling like they need a little more

1:05:03

time to work on it. So something is happening

1:05:05

there They're still not saying much about it, but

1:05:07

I do think that points to at least an

1:05:09

interesting story But I also think it

1:05:11

speaks to this larger issue of AI Safety

1:05:13

and open AI and then in the broader industry because

1:05:15

I think this is an area where a lot of

1:05:17

things have been shifting very quickly Yeah So here's what

1:05:19

I think this is an interesting time to talk about

1:05:21

this Kevin After Sam Altman was

1:05:23

briefly fired as a CEO of open AI

1:05:26

I would say the folks that were aligned

1:05:28

with this AI safety movement really got discredited

1:05:30

right because they refused to really say anything

1:05:32

in detail about why they fired Altman and

1:05:35

they looked like they were a bunch of

1:05:38

Nerds who were like afraid of a ghost in the machine?

1:05:40

And so they really lost a

1:05:42

lot of credibility and yet over

1:05:44

the past few weeks this word safety

1:05:46

keeps creeping back into the conversation including

1:05:49

from some of the characters involved in that

1:05:51

drama and I think that there is a

1:05:53

bit of Resurgence in at

1:05:55

least discussion of AI safety and I think

1:05:57

we should talk about what seems like efforts

1:06:00

to make this stuff safe and what just

1:06:03

feels like window dressing. Totally. So the big

1:06:05

AI safety news at OpenAI over the past

1:06:07

few weeks was something that we discussed on

1:06:09

the show last week which was the departure

1:06:12

of at least two

1:06:14

senior safety researchers Ilya

1:06:16

Sutskivir and Jan Lekie

1:06:18

both leaving OpenAI with

1:06:21

concerns about how the company is

1:06:23

approaching the safety of its powerful

1:06:25

AI models. Then

1:06:27

this week we also heard from two

1:06:30

of the board members who voted to

1:06:32

fire Sam Altman last year Helen Toner

1:06:34

and Tasha Macaulay both of whom have

1:06:36

since left the board of OpenAI have

1:06:38

been starting to speak out about what happened

1:06:41

and why they were so concerned. They

1:06:43

came out with a big piece in

1:06:45

The Economist basically talking about what happened

1:06:48

at OpenAI and why they felt like

1:06:50

that company's governance structure had not worked

1:06:52

and and then Helen Toner also went

1:06:54

on a podcast to talk about some

1:06:56

more specifics including some ways that she

1:06:58

felt like Sam Altman had misled

1:07:01

her in the board and basically gave them

1:07:03

no other choice but to fire him. And

1:07:05

that's where that story actually gets interesting. Totally.

1:07:08

The thing that got a lot of attention

1:07:10

was she said that OpenAI did not tell

1:07:12

the board that they were going to launch

1:07:14

chat GPT which like I'm not

1:07:16

an expert in corporate governance but I think if

1:07:18

you're going to launch something even if it's something

1:07:21

that you don't expect will become you know one

1:07:23

of the fastest growing products in history maybe you

1:07:25

just give your board a little heads up maybe

1:07:27

you shoot him an email saying by the way

1:07:29

we're gonna launch a chatbot. I have something

1:07:31

to say about this because if OpenAI were

1:07:34

a normal company if it had just raised

1:07:36

a bunch of venture capital and was not

1:07:38

a nonprofit I actually think the board would

1:07:40

have been delighted that while they weren't even

1:07:43

paying attention this little rascal CEO goes out

1:07:45

and releases this product that was built in

1:07:47

a very short amount of time that winds

1:07:49

up taking over the world right that's a

1:07:51

very exciting thing. The thing is OpenAI was

1:07:54

built different. It was built to very carefully

1:07:56

manage the rollout of these features that

1:07:58

push the frontier of what. is possible.

1:08:01

And so that is what is

1:08:03

insane about this and also very

1:08:05

revealing because when Altman did that,

1:08:07

I think he revealed that in his mind,

1:08:09

he's not actually working for a nonprofit in

1:08:12

a traditional sense. In his mind, he truly

1:08:14

is working for a company whose only job

1:08:16

is to push the frontier forward. Yes, it

1:08:18

was a very sort of normal tech company

1:08:21

move at an organization that is

1:08:23

not supposed to be run like a normal tech

1:08:25

company. Now, I have a second thing to say

1:08:27

about this. Go ahead. Why the heck could Helen

1:08:29

Toner not have told us this in November? Here's

1:08:32

the thing. It's clear there was a

1:08:34

lot of legal fears around, Oh, will

1:08:36

there be retaliation? Will open AI sue

1:08:38

the board for talking? And yet in

1:08:41

this country, you have an absolute right to

1:08:43

say the truth. And if it is true

1:08:45

that the CEO of this company did not

1:08:47

tell the board that they were launching chat

1:08:49

GPT, I truly could not tell you why

1:08:52

they did not just say that at the

1:08:54

time. And if they had done that, I think

1:08:56

this conversation would have been very different. Now,

1:08:58

was the outcome a bit different? I don't

1:09:00

think it would have been. But then at

1:09:02

least we would not have to go through

1:09:04

this period where the entire AI safety movement

1:09:06

was discredited, because the people who were trying

1:09:08

to make it safer by getting rid of

1:09:10

Sam Altman had nothing to say about it.

1:09:12

Yes. She also said in this podcast, she

1:09:14

gave a few more examples of Sam Altman

1:09:16

sort of giving incomplete or inaccurate information. She

1:09:18

said that on multiple occasions, Sam

1:09:20

Altman had given the board inaccurate information about

1:09:22

the safety processes that the company had in

1:09:24

place. She also said he didn't tell the

1:09:26

board that he owned the open AI startup

1:09:28

fund, which seems like, you

1:09:30

know, pretty major oversight. And she said after

1:09:33

sort of years of this kind of pattern,

1:09:35

she said that the four members of the

1:09:37

board who voted to fire Sam came to

1:09:39

the conclusion that we just couldn't believe

1:09:41

things that Sam was telling us. So

1:09:46

their side of the story, open AI

1:09:48

obviously does not agree. The current board

1:09:50

chief Brett Taylor said in a statement

1:09:52

provided to this podcast that Helen Toner

1:09:54

went on, quote, we are disappointed that

1:09:56

Miss Toner continues to revisit these issues,

1:09:58

which is a board speak for

1:10:00

why is this woman still talking and it is

1:10:02

insane that he said that. It

1:10:04

is absolutely insane that that is

1:10:06

what they said. Yes. OpenAI

1:10:10

has also been doing a lot

1:10:12

of other safety related work. They

1:10:15

announced recently that they are working

1:10:17

on training their next big language

1:10:19

model, the successor to GPT-4. Can

1:10:23

we just note how funny that timing is

1:10:25

that finally the board members are like, here's

1:10:27

what was going off the rails a few

1:10:30

months back. Here's the real back story to

1:10:32

what happened. And OpenAI says, one,

1:10:34

please stop talking about this. And two, let

1:10:36

us tell you about a little something called

1:10:38

GPT-5. Yes. Yes. They are not

1:10:41

slowing down one bit. But they

1:10:43

did also announce that they had

1:10:45

formed a new safety and security

1:10:47

committee that will be

1:10:49

responsible for making recommendations on critical

1:10:51

safety and security decisions for all

1:10:53

OpenAI projects. This

1:10:56

safety and security committee will

1:10:58

consist of a bunch of

1:11:00

OpenAI executives and employees, including

1:11:02

board members Brett Taylor, Adam

1:11:04

D'Angelo, Nicole Seligman and Sam

1:11:06

Altman himself. So what did

1:11:08

you make of that? You

1:11:10

know, I guess we'll see. Like they

1:11:13

had to do something. Their entire super

1:11:15

alignment team had just disbanded because they

1:11:17

don't think the company takes safety seriously.

1:11:19

And they did it at the exact

1:11:22

moment that the company said, once again,

1:11:24

we are about to push the forward

1:11:26

frontier in a very unpredictable new ways.

1:11:30

So OpenAI could not just say, well, you

1:11:32

know, don't worry about it. And so,

1:11:34

you know, they did it in the

1:11:36

great tradition of corporations, Kevin, they formed

1:11:39

a committee, you know, and they've told us

1:11:41

a few things about what this committee will do. I think there's

1:11:43

going to be a report that gets like published eventually. And we'll,

1:11:45

you know, we'll just have to see. I imagine there will be

1:11:47

some good faith efforts here. But

1:11:49

should we regard it with skepticism, knowing

1:11:51

now what we know about what happened

1:11:54

to its previous safety team? Absolutely. So

1:11:56

yes, I think it is fair to say they

1:11:58

are feeling some pressure. at least make

1:12:01

some gestures toward AI safety, especially

1:12:03

with all these notable recent departures.

1:12:05

But if you are a person

1:12:07

who did not think that Sam

1:12:09

Altman was adequately invested

1:12:11

in making AI safe, you

1:12:14

are probably not going to be convinced

1:12:16

by a new committee for AI safety

1:12:18

on which Sam Altman is one of

1:12:20

the highest ranking members. Correct. So

1:12:23

that's what's happening at OpenAI. But

1:12:25

I wanted to take our discussion a little

1:12:27

bit broader than OpenAI because there's just been

1:12:29

a lot happening in the field of AI safety

1:12:31

that I want to run by you. So

1:12:34

one of them is that Google

1:12:36

DeepMind just released its own AI

1:12:38

safety plan. They're calling this the

1:12:40

Frontier Safety Framework. And

1:12:43

this is a document that basically lays

1:12:45

out the plans that Google DeepMind has

1:12:47

for keeping these more powerful AI systems

1:12:50

from becoming harmful. This is

1:12:52

something that other labs have done as

1:12:54

well. But this is sort of Google DeepMind's

1:12:56

biggest play in this space in recent months.

1:12:59

And there was also a big AI safety summit

1:13:01

in Seoul, South Korea earlier this

1:13:03

month where 16 of

1:13:06

the leading AI companies made a series

1:13:08

of voluntary pledges called the Frontier AI

1:13:10

Safety Commitments that basically say we will

1:13:12

develop these frontier models safely. We will

1:13:15

red team and test them. We

1:13:17

will even open them up to third party evaluations

1:13:19

so that other people can see if our models

1:13:22

are safe or not before we release them. In

1:13:25

the US, there is a

1:13:27

new group called the Artificial Intelligence

1:13:29

Safety Institute that just released

1:13:31

its strategic vision and announced that a bunch

1:13:33

of people, including some big

1:13:36

name AI safety researchers like Paul

1:13:38

Cristiano, will be involved in that.

1:13:41

And there are some actual laws

1:13:43

starting to crop up. There's a

1:13:45

law in the California State Senate,

1:13:47

SB 1047, that is, if you're

1:13:49

keeping track at home, the Safe

1:13:51

and Secure Innovation for Frontier Artificial

1:13:53

Intelligence Models Act. This is an

1:13:55

act that would require very big

1:13:57

AI models to undergo strict safety

1:13:59

testing. implement whistleblower protections

1:14:01

at big AI labs and more.

1:14:04

So there is a

1:14:06

lot happening in the world of AI safety

1:14:08

and Casey I guess my first question to

1:14:10

you about all this would be do you

1:14:12

feel safer now than you did a year

1:14:14

ago about how AI is developing? Not

1:14:17

really. Well yes

1:14:20

and no. Yes in the sense

1:14:22

that I do think that the

1:14:24

AI safety folks successfully persuaded governments

1:14:27

around the world that they should

1:14:29

take this stuff seriously and governments

1:14:31

have started to roll out frameworks

1:14:33

in the United States. We had

1:14:35

the Biden administration's executive order and

1:14:37

so thought is going into this

1:14:39

stuff and I think that that

1:14:41

is going to have some positive

1:14:43

results. So I feel safer in

1:14:45

that sense. The

1:14:48

fact that folks like OpenAI

1:14:50

who once told us that they were gonna

1:14:52

move slowly and cautiously in this regard are

1:14:54

now racing at a hundred miles an hour

1:14:56

makes me feel less safe. The fact that

1:14:58

the super alignment team was disbanded makes me

1:15:01

feel a little bit less safe. And

1:15:03

then the big unknown Kevin is just well

1:15:05

what is this new frontier model going to

1:15:07

be? I mean we already talked about it

1:15:09

in these mythical terms because the increase in

1:15:11

quality and capability from GPT-2 to 3 to

1:15:14

4 has been so significant.

1:15:16

So I think we assume or

1:15:18

at least we wonder when 5

1:15:21

arrives whatever it might be does it

1:15:23

feel like another step

1:15:25

change in function? And if it does is it

1:15:28

gonna feel safe? Like these

1:15:30

are just questions that I can't answer. What do you think?

1:15:33

Yeah I mean I think I am

1:15:35

starting to feel a little bit more

1:15:37

optimistic about the state of AI safety.

1:15:39

I take your point that you know

1:15:41

it looks like an OpenAI specifically. There

1:15:44

are a lot of people who feel like

1:15:46

that company is not taking safety as seriously

1:15:48

as it should. But I've

1:15:51

actually been pleasantly surprised by how

1:15:54

quickly and forcefully governments

1:15:57

and sort of NGOs and

1:16:00

multinational bodies like the UN have

1:16:02

moved to start thinking and talking

1:16:04

about AI. I mean, if

1:16:06

you can remember, there was a while

1:16:08

where it felt like the only people

1:16:11

who were actually taking AI safety seriously

1:16:13

were like effective altruists and a few

1:16:15

reporters and just a few science fiction

1:16:17

fans. But now it feels like

1:16:19

a sort of kitchen table issue that everyone is,

1:16:21

I think, rightly concerned about.

1:16:24

But I also just think like this is how you

1:16:26

would kind of expect the world to look if we

1:16:28

were in fact about to make some

1:16:31

big breakthrough in AI that sort of

1:16:33

led to a world

1:16:35

transforming type of artificial intelligence. You would

1:16:37

expect our institutions to be getting a

1:16:39

little jumpy and trying to pass laws

1:16:41

and bills and get ahead of the

1:16:43

next turn of the screw. You would

1:16:46

expect these AI labs to start staffing

1:16:48

up and making big gestures toward AI

1:16:50

safety. And so I take this as

1:16:52

a sign that things are continuing to

1:16:54

progress and that we should expect the

1:16:56

next class of models to be very

1:16:58

powerful and maybe to some of

1:17:00

this stuff which could look

1:17:02

a little silly or maybe like an overreaction

1:17:05

out of context will ultimately make a lot

1:17:07

more sense once we see what these labs

1:17:09

are cooking up. Well, I

1:17:11

look forward to that terrifying day. We'll

1:17:15

tell you about it if the world still exists then. Hey,

1:17:43

we are getting ready to do another round of

1:17:45

hard questions here on Hard Fork. If you're new

1:17:48

to the show, that is our advice segment where

1:17:50

we try to make sense of your hardest moral

1:17:52

quandaries around tech like ethical dilemmas about whether it's

1:17:54

okay to reach out to the stranger you think

1:17:56

is your father thanks to 23andMe or etiquette about

1:18:00

how to politely ask someone whether they're using

1:18:02

AI to respond to all of your text,

1:18:05

which Kevin is famous for doing. Basically, anything

1:18:07

involving technology and a tricky interpersonal dynamic is

1:18:09

game. We are here to help. So if

1:18:11

you have a hard question, please

1:18:13

write or better yet, send us a

1:18:15

voice memo, as we are a podcast,

1:18:17

to hardfork at nytimes.com. Hard

1:18:22

Fork is produced by Rachel Cohn

1:18:24

and Whitney Jones. We're edited by

1:18:26

Jen Pouillant. We're fact-checked by Caitlin

1:18:28

Love. Today's show was engineered by

1:18:30

Brad Fisher. Original music

1:18:32

by Marion Lozano, Sophia Landman,

1:18:34

Diane Wong, Rowan Nemesto, and

1:18:36

Dan Powell. Our audience

1:18:38

editor is Nel Galocli. Video production

1:18:41

by Ryan Manning and Dylan Bergeson.

1:18:43

Check us out on YouTube. We're

1:18:45

at youtube.com/hardfork. Special thanks to

1:18:47

Paula Schumann, Fleming Tam, Kayla Pressey,

1:18:49

and Jeffrey Miranda. You can email

1:18:51

us at hardfork at nytimes.com

1:18:54

with your interpretability study on how our brains work.

Rate

Get this podcast via API

From The Podcast

Hard Fork

“Hard Fork” is a show about the future that’s already here. Each week, journalists Kevin Roose and Casey Newton explore and make sense of the latest in the rapidly changing world of tech. Listen to this podcast in New York Times Audio, our new iOS app for news subscribers. Download now at nytimes.com/audioapp

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More