Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Hello, Casey. Hey, Kevin. How
0:02
was your Memorial Day weekend? It was
0:04
wonderful. I got to go to a
0:06
beautiful wedding and
0:09
very much enjoyed that. Nice. How
0:11
was your Memorial Day weekend? It was good. But I
0:13
feel like you have something that you didn't bring up,
0:15
which is that you actually had a big launch this
0:19
past weekend. I did a hard launch. I mean,
0:21
I guess I did a hard launch, a boyfriend
0:23
like once before on Instagram, but it was many
0:25
years ago. And this one,
0:27
I think like at this point, hard launches,
0:29
people sort of know what they are. And
0:31
so a lot of thought goes into it.
0:34
A hard launch, just so I'm clear with the latest
0:36
lingo, this is when you announced that you have a
0:38
new boyfriend on Instagram. Well,
0:41
because if the soft launch is
0:43
like, if maybe you see somebody shoulder
0:45
in an Instagram story and you think, well, that's a new
0:47
person. Like how is that? Is
0:49
that, is my friend, are they dating that
0:51
person? That's a soft launch. But once there's
0:53
a face and a name, that's a hard
0:55
launch. I see. So you debuted, you hard
0:57
launched your new boyfriend. We had, and it
0:59
had been some time in incoming. And
1:02
of course I had to check in with him and make
1:04
sure he was going to be okay with this. And he
1:07
was excited about it. And you did it on the grid,
1:09
which was bold. Of course I did on the grid.
1:11
I want to show everyone. I
1:14
can't just have that disappear in 24 hours. How
1:16
did it go? Hard launch went very well.
1:18
You know, I mean, like it was a
1:20
little bit. Was the engagement what you hoped
1:22
for? The engagement was off the
1:24
chart. It was my most popular Instagram post
1:27
I've ever done. Did he also hard launch
1:29
you on his Instagram? Yes,
1:31
it was honestly very stressful. There was
1:33
a whole content plan. There were whiteboards.
1:35
Well, we'd taken like dozens of photos.
1:37
Did you hire a marketing agency? Yeah,
1:40
our teams got involved. No,
1:42
we'd taken so many photos. And you know, so
1:44
of course we're like sitting and we're like, we're
1:46
going to do this photo. We're going to do
1:48
this photo. Is this one a little edgy? Let's
1:50
do it anyway. And so we came up, I
1:52
think with five photos. And then yes, we like
1:54
more or less simultaneously did the launch. Yeah, wow.
1:58
I've been out of the game so long that the only. The
2:00
only thing I remember is that you could change
2:02
your relationship status on Facebook, and that was the
2:04
hard launch of 2008. Yes, absolutely.
2:06
And so of course, in my mind, because I also
2:08
have that sort of millennial urge to like, do I
2:10
make this Facebook official? But I'm just like, no, that's
2:13
just no. That seems so boomer coded at this point.
2:15
No, you have to make it LinkedIn official. That's when
2:17
it truly becomes real. I
2:19
got into a relationship recently. Here's 10
2:21
lessons that I have about enterprise software.
2:29
I'm Kevin Roos, a tech columnist at the New York Times. I'm
2:31
Casey Newton from Platformer. And this is Hard Park. This week, Scruggle
2:33
tells us all that name rocks. We'll tell you where AI went wrong.
2:37
Then, and topic researcher Josh Betts
2:39
enjoins to talk about a breakthrough
2:41
in understanding how large language models
2:43
work. And finally, it's this
2:45
week in AI safety as I try
2:47
out OpenAI's new souped up voice assistant, and
2:49
then it gets truly taken away from me. So
2:52
sorry I had to. Well,
3:06
Kevin, pass me the non-toxic glue
3:08
and a couple of rocks, because it's time to
3:10
whip up a meal with Google's new AI overviews.
3:13
Did you make any recipes you found on Google this week?
3:16
I did not, but I saw some chatter
3:19
about it, and I actually saw our
3:21
friend Katie Netopolis actually made
3:23
the glue pizza. But we're getting ahead of
3:25
ourselves. We're getting ahead of ourselves. And look,
3:28
the fact that you stayed away from this
3:30
stuff explains why you're still sitting in front
3:32
of me, because over the past week, Google
3:34
found itself in yet another controversy over AI,
3:36
this time related to search, the
3:38
core function of Google. And
3:41
right after that, we had this huge leak
3:43
of documents that brought even more attention to
3:45
search and raise the question of whether Google's
3:47
been dishonest about its algorithms. Kevin, can you
3:50
imagine? Wow. So there's a lot there.
3:52
Yeah. Let's just go through what
3:54
happened, because the last time we talked about Google on
3:56
this podcast, they had just released this new AI overviews
3:58
feature. that shows you
4:00
a little AI generated snippet above the search
4:02
results when you type in your
4:05
query. And I think it's fair to say
4:07
that this did not go smoothly. It didn't.
4:09
And I want to talk about everything that
4:11
happened with those AI overuse. But before we
4:13
get there, Kevin, I think we should take
4:15
a step back and talk about the recent
4:17
history of Google's AI launches. Can we do
4:20
that real quick? Yes. Because I would say
4:22
there's kind of an escalation in how bad.
4:24
So let's go back to February 2023 and
4:26
talk about the release of Google Bart. Kevin,
4:28
when I
4:34
say the word Bart, where's that contra up for you? Shakespeare.
4:36
Yep. Shakespeare number one and probably number
4:38
two would be the late lamented Google
4:40
chatbot. Yes. RIP. Fun fact, Kevin and
4:42
I were recently in a briefing where
4:44
a Google executive had a sticker on
4:47
their laptop that said total Bart ass.
4:49
And that sounds like a joke. And
4:52
you actually texted me and you said,
4:54
does that total Bart ass? And
4:57
I said it couldn't possibly. And then I
4:59
zoomed in, I said computer enhanced and indeed it
5:01
did say total Bart ass. And if you
5:03
are a Googler who has access to a
5:05
sticker, we're dying for one that says total
5:07
Bart ass. I want one. I will put
5:09
it on my laptop. Please. It belongs in
5:11
the Smithsonian. We're begging you for it. So
5:13
this comes out in February 2023.
5:16
And unfortunately, the very first screenshot
5:18
posted of Google's AI chatbot, it
5:20
gave incorrect information about the James
5:23
Webb space telescope. Specifically, it falsely
5:25
stated that the telescope had taken
5:27
the first ever photo of an
5:29
exoplanet. Yes, Kevin without being what is
5:31
an exoplanet? It's
5:34
a planet that signs its letters like with a hug and
5:36
a kiss. No, it's actually the planet where all my exos
5:38
live. But let's just say that Google
5:40
AI launches had not gotten off to a great start when
5:42
it happened. In fact, we talked about that one on the
5:44
show. Then comes the launch
5:46
of Gemini. And then we had a culture
5:48
war, Kevin, over the refusals of its image
5:51
generator to make white people. Sure did. Do
5:53
you have a favorite thing that Gemini refused
5:55
to make due to wokeness? My
5:58
I was partial to Asians. Sergey and Larry,
6:00
do you remember? Wait, I actually forgot this one.
6:03
What was this one? Somebody asked Gemini to make
6:05
an image of the founders of Google, Sergey
6:07
Brin and Larry Page. They came back and they were both
6:10
Asian. Which
6:13
I love. I have to imagine that ended up
6:15
projected onto a big screen at a meeting somewhere
6:17
at Google. That's so beautiful to me. So look,
6:19
that brings us to the AI overviews. And Kevin,
6:21
you sort of set it up top, but remind
6:23
us a little bit of how do these things
6:26
work? What are they? This is
6:28
what used to be known as search generative
6:30
experience when it was being tested. But
6:33
this is the big bet that Google
6:35
is making on the future of AI
6:37
in search. Obviously, they have seen the
6:39
rise of products like Perplexity, which is
6:41
this AI powered search engine. They believe,
6:43
Sundar Pichai said, that he believes that
6:46
AI is the future of search and
6:48
that these AI overviews that appear on
6:50
top of search results will ultimately give
6:52
you a better search experience because instead
6:54
of having to click through a bunch
6:56
of links to figure out what you're
6:58
looking for, you can just see it
7:00
displayed for you, generated right there up
7:02
at the top of the page. And
7:04
very briefly, why have we been so
7:06
concerned about these things? Well, I think
7:08
your concern that I shared was that
7:10
this was ultimately going to lock people
7:12
into the Google walled garden that instead
7:14
of going to links where you might
7:17
see an ad, you might buy a
7:19
subscription, you might support the news or
7:22
the media ecosystem in some way,
7:24
instead Google was just going to keep you there
7:26
on Google. The phrase they would use over
7:28
and over again was we will do the Googling for
7:30
you. That's right. And that it would
7:33
starve the web of the essential referral
7:35
traffic that keeps the whole machine running.
7:37
So that is a big concern, and
7:39
I continue to have it every single
7:41
day. But this week, Kevin, we got
7:43
a second concern, which is that the
7:45
AI overviews are going to kill your
7:47
family. And here's what I mean. Over
7:50
the past week, if you asked Google, how
7:52
many rocks should I eat? The AI overview
7:54
said at least one small rock per day.
7:57
I verified this one myself. up
8:00
top if you said how do I get
8:02
the cheese to stick to my pizza it
8:04
would say well have you considered adding non-toxic
8:06
glue would it been my first
8:08
guess yeah it's a non-toxic glue it said
8:11
that 17 of the 42 American presidents
8:17
have been white to me the funniest thing about
8:19
that is that there been 46 US presidents got
8:22
both the numerator and the denominator run and of
8:25
course and this was probably the most upsetting to
8:27
our friends at Canada it said that there has
8:29
been a dog who played hockey in the National
8:31
Hockey League do you see that one well I
8:33
think that was just the plot of air bug
8:35
right well there's no rule that says a
8:38
dog can't play hockey Kevin and it
8:40
identified that dog as
8:42
Martin Pospicil who is that well
8:44
it seems impossible that you've never
8:46
heard of him but he's a
8:49
24 year old Slovakian man who plays for
8:51
the Calgary Flames get on a big Flames
8:53
fan I'm not hmm so
8:55
look how is this happening
8:57
well Google is pulling information from all
8:59
over the internet into these AI overviews
9:03
and in so doing it is revealing something
9:05
we've talked about on the show for a
9:07
long time which is the the large language
9:09
models currently do not know anything they
9:12
can often give you answers and those
9:14
answers are often right but they are
9:16
not drawing on any frame of knowledge
9:18
they're simply reshuffling words that they found
9:20
on the internet oh see this I
9:22
drew a different lesson that
9:24
this the technology is actually only
9:26
partly to blame here because
9:29
I've used a bunch of different AI
9:31
search products including perplexity and
9:33
not all of them make these
9:35
kinds of stupid errors but Google's
9:37
AI model that it's using for
9:39
these AI overviews seems to just
9:42
be qualitatively worse like it just
9:44
can't really seem to tell the difference
9:46
between reliable sources and unreliable sources so
9:48
the thing about eating rocks appears to
9:51
have come from the onion that is
9:54
like satirical news site what you're saying that
9:56
every story published on the onion is false
9:58
I am yes That seems like
10:00
an interesting choice to include in your AI
10:03
overviews for facts. Right, and
10:05
the thing about adding glue to your
10:07
pizza recipe came from basically
10:09
a shitpost on Reddit. So
10:12
obviously these AI overviews are imperfect.
10:14
They are drawing from imperfect sources.
10:16
They are summarizing those imperfect sources
10:19
in imperfect ways. It is a
10:21
big mess. And
10:23
this got a lot of attention over the weekend.
10:26
And as of today, I tried to
10:28
replicate a bunch of these queries and
10:30
it appears that Google has fixed these
10:32
specific queries very quickly. Clearly they were
10:35
embarrassed by it. I've also
10:37
noticed that these AI overviews just are barely
10:39
appearing at all, at least for me. Are
10:41
they appearing for you? I am seeing a
10:43
few of them, but yes, they have definitely
10:46
been playing a game of whack-a-mole. And whenever
10:48
one of these screenshots has gone anything close
10:50
to viral, they are quickly intervening. Now,
10:53
I should say that Google has sent me a statement about
10:55
what's going on, if you would like me to share. Sure.
10:58
It said, the company said, quote, the vast
11:00
majority of AI overviews provide high quality information
11:03
with links to dig deeper on the web.
11:06
Many of the examples we've seen have
11:08
been uncommon queries. And we've also seen
11:10
examples that were doctored or that we
11:12
couldn't reproduce, says some more things and
11:14
then says, we're taking swift action where
11:16
appropriate under our current policies and using
11:19
these examples to develop broader improvements to
11:21
our systems. So they're basically saying, look,
11:23
you're cherry picking, right? You went out
11:25
and you found the absolute most ridiculous
11:27
queries that you can do. And now you're holding it against
11:30
us. And I would like to know, Kevin, how
11:32
do you respond to these charges? I
11:34
mean, I think it's true that some
11:36
people were just deliberately trolling Google by
11:38
putting in these very sort of edge
11:40
case queries that, you know, real people,
11:42
many of them are not Googling, like,
11:45
is it safe to eat rocks? That is not
11:47
a common query. And I did
11:49
see some ones that were clearly faked or
11:52
doctored. So I think Google has
11:54
a point there. But I would also say like these AI
11:56
overviews are also making mistakes on what I
11:59
would consider much more. common sort of
12:01
normal queries. One of
12:03
them that the AI overview botched was
12:05
about how many Muslim presidents the US
12:07
has had. The
12:09
correct answer is zero, but the AI
12:11
overview answer was one.
12:14
George Washington. Yes, George Washington.
12:17
No, it said that Barack Hussein
12:19
Obama was America's first
12:21
and only Muslim president. Obviously, not
12:23
true. Not true. But that is
12:25
the kind of thing that Google was telling people
12:27
in its AI overviews that I imagine are not
12:30
just fringe or trollish queries.
12:32
Right. And also, I guess it has always been the
12:34
case that if you did a sort of weird query
12:36
on Google, you might not
12:39
get the answer you were looking for,
12:41
but you would get a web page
12:43
that someone had made, right? And
12:45
you would be able to assess,
12:47
does this website look professional? Does
12:50
it have a masthead? Do the authors have bio? You can
12:52
just sort of ask yourself some basic questions about it. Now
12:55
everything is just being compressed into this AI slurry.
12:57
So you don't know what you're looking at. So
12:59
I have a couple of things to say here. Say it.
13:03
I think in this short term, this is
13:05
a fixable problem. Look, I think it's clearly
13:07
embarrassing for Google. They did not want this to
13:09
happen. It's a big rake
13:11
in the face for them. But I
13:13
think what helps Google here is that
13:15
Google search and search in general is
13:18
what they call a fat head product.
13:20
You know what that means? I don't
13:22
know what that means. Basically, if you
13:25
take a distribution curve, the most popular
13:27
queries on Google or any other search
13:29
engine account for a very large percentage
13:31
of search volume. Actually, according to one
13:34
study, the 500 most popular search terms
13:36
make up 8.4% of all
13:39
search volume on Google. So
13:41
a lot of people are just searching like Facebook and then clicking
13:43
the link to go to Facebook. Exactly. Or
13:46
they're searching something else that's very common. What
13:50
would be an example of a good... What time has a dog ever
13:52
played hockey? No? No?
13:55
Okay. No, stuff like... What Time
13:57
is the Super Bowl? Yeah, What time is the Super Bowl? You
14:00
know how do I fix a broken
14:02
toil or something local movie time. Say
14:05
exactly yeah of and see of for
14:07
that means that Google can sort of
14:09
manually audit the top. I don't know,
14:11
say ten thousand A I overviews makes
14:13
her they're not giving people bad information
14:15
and that would mean that the vast
14:17
majority of what people search for on
14:20
Google or does actually have a correct
14:22
he i overeat know it's inaccurate, wouldn't
14:24
actually technically be in A I have
14:26
reviewed research like a human overview that
14:28
was her drafted by a eyes. But.
14:31
Same. Difference in googled eyes. I also think
14:33
they can make sure the ai over views
14:35
are triggered ford sensitive topics for things where
14:37
your health. our concerns of google already does
14:39
this to a certain extent with as some
14:42
with these things called featured snippets and I
14:44
think they will continue to sort of play
14:46
around with an are just the dials on
14:48
how frequently be as a I or use
14:50
are triggered. But I do think there's a
14:52
bigger threat to Google here which is that
14:54
they are now going to be held responsible
14:57
for the information the pay on google or
14:59
we talked about this a little bit but.
15:01
I mean this to me is the biggest complaint
15:03
that people have that is justified as the google
15:06
used to play in a meat. Maybe they would
15:08
point you to a website that would tell you
15:10
that you know. Putting. Glue on
15:12
your pizza is a good way to get
15:14
the keys to stick A but you as
15:16
Google the could sort of wash your hands
15:18
of that and see all that was people
15:20
just trolling on Reddit that weapon us but
15:23
of your Google and your now providing the
15:25
Ai written overview to people. People are going
15:27
to get mad when it gives you wrong
15:29
information and there will be unfortunately. Just the
15:31
law of large numbers says that you know
15:33
some time you know maybe in the next
15:35
year to there will be an instance where
15:37
someone relies on some think they saw on
15:39
a Google A I overview and it ends
15:41
up hurting. them yeah there was another
15:43
querry that a got a lot of
15:45
attention this week weren't an overview i
15:47
told someone that you could put gasoline
15:50
and spaghetti to make a spicy deaths
15:52
that you couldn't use gasoline to cook
15:54
spaghetti faster but if you wanted to
15:56
of spices we had a you the
15:58
put gasoline in it's And of course
16:00
that sounds ridiculous to us, but over
16:02
the entire long tail of the internet,
16:04
is it theoretically possible somebody would eat
16:06
gasoline spaghetti? Of course it is. Yeah,
16:08
so I think, and when that does
16:10
happen, I think there are two
16:12
questions. One is, is Google legally protected? Because
16:15
I've heard some interesting arguments about
16:17
whether section 230, which is the part
16:19
of the US code that
16:21
protects online platforms from being
16:23
held legally responsible for stuff that their users
16:26
post, there are a lot of people
16:28
who think that doesn't apply to these AI overviews, because it
16:30
is Google itself that is
16:32
formulating and publishing that overview.
16:35
I also just think there's a big reputational
16:38
risk here. I mean, you can imagine so
16:40
easily the congressional hearings where, you know, senators
16:42
are yelling at Sudar Pichai saying, why did
16:44
you tell my kid to eat gasoline spaghetti?
16:47
Martin Paspasil's gonna be there saying, do I
16:49
look like a dog to you? Right. And
16:52
seriously, I think that this is a big
16:54
risk for Google, not just because they're gonna have to
16:57
sit through a bunch of hearings and get yelled at,
16:59
but because I think it will make their
17:02
active role in search, which has been true
17:04
for many years. They have been actively shaping
17:06
the experience that people have when they search
17:09
stuff on Google, but they've mostly been able
17:11
to kind of obscure that away or abstract
17:13
it away and say, well, this just our
17:15
sort of system working here. I think this
17:17
will make their active role in kind of
17:19
curating the search results for billions of people
17:21
around the world much more obvious and it
17:24
will make them much more responsible in user's
17:26
eyes. I think all of that is true.
17:28
I have an additional concern, Kevin. And this
17:30
was pointed out by Rusty Foster who
17:32
writes The Great Today and Tabs
17:34
newsletter. And he said, what has
17:37
really been revealed to us about
17:39
what AI overviews really are is
17:41
that they are automated plagiarism. That
17:43
is the phrase that he used,
17:45
right? That Google has scanned the
17:47
entire web, it's looked at every
17:49
publisher, it lightly rearranges the words
17:51
and then it republishes it into
17:53
the AI overview. And as
17:55
journalists, we really try not to do this,
17:57
right? We try not to just go out.
18:00
grab other people's reporting, very gently change
18:02
the words, and republish it as our
18:04
own. And in fact, I know
18:06
people who have been fired for doing something very
18:09
similar to this, right? But Google has come along
18:11
and said, well, that's actually the foundation of our
18:13
new system that we're using to replace search results.
18:15
Yeah. Casey, what do you think comes next with
18:17
this AI overviews business? Is Google just going to
18:19
back away from this? And
18:23
it's not ultimately going to be a huge part
18:25
of their product going forward? Do you think they
18:27
will just grit their teeth and get through this
18:29
initial period of awkwardness
18:31
and inaccuracy? What do
18:33
you think happens here? They are not
18:35
going to back down. Now, they might
18:37
temporarily retreat, like we've seen them do
18:39
in the Gemini image
18:41
case. But they are absolutely going to keep
18:43
working on this stuff, because this is existential
18:45
for them. For them, this is the next
18:47
version of search. This is the way they
18:50
build the Star Trek computer. They want to
18:52
give you the answer. And in many more
18:54
cases over time, they want you to not
18:56
have to click a link to get any
18:58
additional information. They already have rivals like Perplexity
19:00
that seem to be doing a better job
19:02
in many cases of answering people's queries. And
19:05
Google has all of the money and talent
19:07
it needs to figure out that problem. So
19:09
they're going to keep going at this at
19:11
100 miles an hour. Yeah. I want to
19:13
bring up one place that I actually disagree
19:15
with you, because you wrote recently that you
19:17
believe that because of these changes to Google,
19:20
that the web is in a state of
19:22
managed decline. And we've gotten
19:24
some listener feedback in the past few weeks as
19:26
we've been talking about these issues of Google and
19:28
AI and the future of the web saying,
19:31
you guys are basically acting as if
19:33
the previous state of the internet was
19:36
healthy. Google was giving people
19:38
high-quality information. There
19:40
was this flourishing internet
19:42
of independent publishers making
19:44
money and serving users
19:46
really well. And
19:48
people just said it actually wasn't like
19:50
that at all. In fact, the previous state of the
19:52
web, at least for the
19:55
past few years, has been in
19:57
decline. So it's not that we are entering an
19:59
age of managed decline. of the internet
20:01
is that Google is basically accelerating what
20:03
was already happening on the internet, which
20:05
was that publishers of high quality information
20:07
are putting that information behind paywalls. There
20:10
are all these publishers who are chasing
20:12
these sort of SEO traffic winds with
20:14
this sort of low quality garbage. And
20:16
essentially the web is being hollowed out and
20:18
this is maybe just accelerating that. So I
20:20
just want to float that as like a
20:22
theory, a sort of counter proposal for your
20:25
theory of Google putting the web into
20:27
a state of managed decline. Well, sure Kevin, but
20:29
if you ask yourself, well, why is that
20:31
the case? Why are publishers doing all
20:33
of these things? It is because the
20:35
vast majority of all digital advertising revenue
20:37
goes to three companies and Google is
20:40
at the top of that list with
20:42
Meta and then Amazon at number two
20:44
and three. So my overall theory about
20:46
what's happening to the web is that
20:48
three companies got too much
20:50
of the money and starved the web
20:52
of the lifeblood it needed to continue
20:54
expanding and thriving. So look, has it
20:57
ever been super easy to whip up
20:59
a digital media business and just put
21:01
it on the internet and start printing
21:03
cash? No, it's never been easy. My
21:05
theory is just that it's almost certainly
21:07
harder today than it was five years
21:09
ago and it will almost certainly be
21:12
harder in five years than it is
21:14
today. And it is Google that is
21:16
at the center of that story because at the end
21:18
of the day, they have their fingers on all
21:20
of the levers and all of the knobs. They
21:22
get to decide who gets to see an AI
21:24
overview, how quickly do we roll
21:27
these out? What categories do they show them in?
21:29
If web traffic goes down too much and it's
21:31
a problem for them, then they can slow down.
21:33
But if it looks good for them, they can
21:35
keep going even if all the other publishers are
21:37
kicking and screaming the whole time. So I just
21:39
wanna draw attention to the amount of influence that
21:41
this one company in particular has over the future
21:43
of the entire internet. Yeah, and I would just
21:45
say that is not a good state of affairs
21:47
and it has been true for many years
21:50
that Google has huge unchecked
21:52
influence over basically the entire
21:54
online ecosystem. All
21:57
right, so that is the story of the
21:59
AI overview. But there was a second
22:01
story that I want to touch on
22:03
briefly this week, Kevin, that had to
22:05
do with Google and search. And it
22:07
had to do with a giant leak.
22:09
Have you seen the leak? I've, I've
22:11
heard about the leak. I have not
22:13
examined the leak, but tell me about
22:15
the leak. Well, it was thousands of
22:18
pages long. So I understand why you
22:20
haven't finished reading it quite yet, but
22:22
these were thousands of pages that we
22:24
believe came from inside of Google that
22:26
offer a lot of technical details about
22:28
how the company's search works. So, you
22:30
know, that is not a subject that is
22:32
of interest to most people, but if you
22:34
have a business on the internet and you
22:36
want to ensure that you're, you know, dry
22:38
cleaners or your restaurant or your media company
22:40
ranks highly in Google search without having to
22:43
buy a bunch of ads, this is what
22:45
you need to figure out. Yeah. This is
22:47
one of the great guessing games in modern
22:49
life. There's this whole industry of SEO that
22:51
has sort of popped up to try to
22:53
sort of poke around the Google search algorithm,
22:55
try to guess and sort of test what
22:57
works and what doesn't work and sort of
22:59
provide consulting, you know, for a, for a
23:02
very lucrative price to businesses that want to
23:04
improve their Google search traffic. Yeah. Like the
23:06
way I like to put it is imagine
23:08
you have a glue pizza restaurant and you
23:10
want to make sure that you're the top
23:12
rank search for glue pizza restaurants. You might
23:14
hire an SEO consultant. Yeah. So what happened?
23:17
Well, so there's this guy, Rand Fishkin, who
23:19
doesn't do SEO anymore, but was a big
23:21
SEO expert for a long time and is
23:23
kind of a leading voice in this space.
23:26
And he gets an email from this guy,
23:28
Erfan Azimi, who himself is the founder of
23:30
an SEO company and Azimi
23:32
claims to have access to thousands
23:35
of internal Google documents detailing the
23:37
secret inner workings of search. And
23:39
Rand reviews this information with Azimi
23:41
and they determine that some
23:45
of this contradicts what Google has been saying
23:47
publicly about how search works over the years.
23:49
Well, and this is the kind of information
23:52
that Google has historically tried really hard to
23:54
keep secret, both because it's kind of their
23:56
secret sauce. They don't want competitors to know
23:58
how the Google search. algorithm works, but
24:01
also because they have worried
24:03
that if they sort of say
24:05
too much about how they rank
24:07
certain websites above others, then these
24:09
sort of like SEO consultants will
24:11
use that information and it'll
24:14
basically become like a cat and mouse game. Yeah,
24:16
absolutely. And it already is a cat and mouse
24:18
game, but you know, the fear is that this
24:20
would just sort of fuel the worst actors in
24:22
the space. Of course, it also means that Google
24:24
can fight off its competitors because people don't really
24:26
understand how its rankings work. And if you think
24:28
that Google search is better than anyone else's
24:30
search, like these ranking algorithm decisions are why.
24:32
Can I just ask a question? Do we
24:35
know that this leak is genuine? Do we
24:37
have any signs that these documents actually are
24:39
from Google? Well, yes. So the documents themselves
24:41
had a bunch of clues that suggested they
24:44
were genuine. And then Google did actually come
24:46
out and confirm on Wednesday that these documents
24:48
are real. But the obvious question is how
24:51
did something like this happen? The
24:53
leading theory right now is that
24:55
these documents came from Google's content
24:58
API warehouse, which
25:00
is not a real warehouse, but
25:02
is something that was
25:04
hosted on GitHub, right? The sort of Microsoft
25:07
soft owned service where people post
25:09
their code. And these
25:11
materials were somehow briefly made public
25:13
by accident, right? So because a
25:16
lot of companies will have private
25:18
like API repositories on GitHub. Right.
25:21
So they just sort of set it to public by
25:23
accident. And sort of the modern equivalent of like leaving
25:25
a classified document in the cab. Yeah. Have
25:27
you ever made a sense of document public on accident? No
25:29
one I've never found one either. I like in
25:31
all my years of reporting, I keep hoping to like
25:33
stumble on the, you know, the scoop of this entry
25:35
just sitting in the back of an Uber somewhere, but
25:37
it never happened to me. So,
25:40
you know, we're not going to go to
25:42
these documents in too much detail. What I
25:44
will say is it seems that these files
25:46
contain a bunch of information about the kinds
25:48
of data the company collects, including things like
25:50
click behavior or data from its cross home
25:53
browser. Things that Google has previously said that
25:55
it doesn't use in search rankings, but the
25:57
documents show that they have this sort of
25:59
data. and it could potentially use it
26:01
to rank search results. When
26:03
we asked Google about this, they
26:05
wouldn't comment on anything specific, but
26:08
a spokesperson told us that they,
26:10
quote, would caution against making inaccurate
26:12
assumptions about search based on out-of-context,
26:14
outdated, or incomplete information. Anyway,
26:17
why do we care about this? Well,
26:19
I was just struck by one of
26:21
the big conclusions that Rand Fishkin had
26:23
in this blog post that he wrote,
26:25
quote, they've been on an inexorable path
26:27
toward exclusively ranking and sending traffic to
26:29
big, powerful brands that dominate the web
26:31
over small, independent sites and businesses. So
26:34
basically, you look through all of these
26:36
APIs, and if you are a restaurant
26:38
just getting started, if you're an indie
26:40
blogger that just sort of puts up
26:43
a shingle, it used to be that
26:45
you might expect to automatically
26:47
float to the top of Google search
26:49
rankings in your area of expertise. And
26:51
what Fishkin is saying is that just
26:53
is getting harder now because Google is
26:55
putting more and more emphasis on trusted
26:57
brands. Now, that's not a bad thing
27:00
in its own right, right? If I Google something from
27:02
the New York Times, I want to see the New
27:04
York Times and not just a bunch of people who
27:06
put New York Times in the header of their HTML.
27:09
But I do think that this is one of
27:11
the ways that the web is shrinking a little
27:13
bit, right? It's not quite as much of a
27:15
free-for-all. The free-for-all wasn't all great because a lot
27:17
of spammers and bad actors got into it, but
27:19
it also meant that there was room for a
27:21
bunch of new entrants to come in. There was
27:23
room for more talent to come in. And
27:26
one of the conclusions I had reading this stuff was,
27:28
maybe that just isn't the case as much as it
27:30
used to be. Yeah. So do you
27:32
think this is more of a problem for Google
27:34
than the AI overviews thing? How would you say
27:36
it stacks up? I would say it's actually a
27:38
secondary problem. I think telling people to eat rocks
27:40
is the number one problem. They need to stop
27:42
that right now. But this,
27:44
I think, speaks to that story because
27:47
both of these stories are about, essentially,
27:49
the rich getting richer. The big brands
27:51
are getting more powerful, whether that's Google
27:53
getting more powerful by keeping everyone on
27:55
search or big publishers getting more powerful
27:58
because they're the sort of trusted. brands.
28:00
And so I'm just observing that
28:02
because, you know, the
28:04
promise of the web and part of what
28:07
it has made it such a joyful place
28:09
for me over the past 20 years is
28:11
that it is decentralized and open and there's
28:13
just kind of a lot of dynamism in
28:16
it. And now it's starting to feel a
28:18
little static and stale and creaky. And these
28:20
documents sort of outline how and why that
28:22
is happening. Yeah, I
28:24
think Google is sort of stuck between a rock
28:27
and a hard place here because on one hand
28:29
they do want, well, maybe
28:31
we shouldn't use a rock example. No,
28:33
use a rock example. They're stuck between a rock
28:35
and a hard place. On one hand, the company
28:37
is telling you to eat rocks. On the other
28:40
hand, they're in a hard place. Right.
28:43
So I think Google is under a lot of
28:45
pressure to do two
28:47
things that are basically contradictory, right?
28:49
To sort of give people an
28:51
equal playing field on which to
28:53
compete for attention and authority. That
28:55
is the demand that a lot
28:58
of these smaller websites and SEO
29:00
consultants want them to comply with. On
29:02
the other hand, they're also seeing with these
29:05
AI overviews what happens when you don't privilege
29:08
and prioritize authoritative sources of information in
29:10
your search results or your AI overviews.
29:12
You end up telling people to eat
29:14
rocks. You end up telling people to
29:16
put gasoline in their spaghetti. You end
29:18
up telling people there are dogs that
29:20
play hockey in the NHL. This
29:23
is the kind of downstream consequence of
29:25
not having effective quality
29:27
signals to different publishers
29:30
and to just kind of treating everything on
29:32
the web as equally valid and equally authoritative.
29:34
I think that is a really good point
29:36
and that is something that comes across in
29:38
these two stories is that exact tension. Casey,
29:40
I have a question for you,
29:43
which is we also are content creators on the
29:45
internet. We like to get attention. We want that
29:47
sweet, sweet Google referral traffic. For
29:49
our next YouTube video, a stunt video,
29:52
do you think that we should A,
29:54
eat the gasoline
29:56
spaghetti? B, eat one
29:58
to three rocks a piece? and see what effects
30:00
it has on her health, or C, teach
30:02
your dog to play hockey at a professional level? I
30:06
mean, surely for how much fun it would be,
30:08
we have to teach a dog how to play
30:10
hockey. It's true. You know, I'm just imagining like
30:12
a bulldog with little hockey sticks
30:14
maybe taped to its front paws. Yeah. It'd
30:17
be really fun. My dogs are too dumb for this, we'll have
30:19
to find other dogs. You know, was it in Lose Yourself that
30:21
Eminem said, there's vomit on
30:23
my sweater already, gasoline, spaghetti? Yeah.
30:27
I believe those are the words. What a great song. Yeah.
30:32
When we come back, we'll talk about
30:34
a big research breakthrough into how AI
30:36
models operate. Well,
30:54
Casey, we have something new and unusual for the podcast
30:56
this week. What's that, Kevin? We have some actual good
30:58
AI news. So as
31:00
we've talked about on this show before, one
31:02
of the most pressing issues with these large
31:05
AI language models is that
31:07
we generally don't know how they
31:09
work, right? They are inscrutable, they
31:11
work in mysterious ways. There's no
31:13
way to tell why one particular
31:15
input produces one particular output. And
31:17
this has been a big problem
31:19
for researchers for years. There
31:21
has been this field called interpretability,
31:24
or sometimes it's called mechanistic
31:26
interpretability, I'll say that five times
31:28
fast. And I
31:30
would say that the field has been making
31:33
steady but slow progress toward understanding
31:35
how language models work. But last
31:38
week, we got a breakthrough. Anthropic,
31:40
the AI company that makes the
31:42
Claude Chatbot announced that it had
31:45
basically mapped the mind of their
31:47
large language model, Claude III, and
31:50
opened up the black box that is AI for
31:53
closer inspection. Did you see this news and
31:55
what was your reaction? I did, and I
31:57
was really excited because for some time now,
32:00
Kevin, we have been saying if you don't
32:02
know how these systems work, how can you
32:04
possibly make them safe? And companies have told
32:06
us, well, look, we have these research teams
32:08
and they're hard at work trying to figure
32:10
this stuff out. But we've only seen a
32:13
steady drip of information from them so far.
32:15
And to the extent that they've conducted research,
32:17
it's been on very small toy versions of
32:19
the models that we operate with. So that
32:21
means that if you're used to using something
32:23
like Anthropics, Claude, its latest model,
32:26
we really haven't had very much idea
32:28
of how that works. So the big
32:30
leap forward this week is they're finally
32:32
doing some interpretability stuff with the real big
32:34
models. Yeah. And we should just caution
32:36
up front that like it gets pretty
32:38
technical pretty quickly once you start getting into
32:41
the weeds of interpretability research. There's lots
32:43
of talk about neurons and
32:46
sparse auto encoders, things of that nature. So but
32:48
I, for one, believe that hard fork listeners are
32:50
the smartest listeners in the world and they're not
32:52
going to have any trouble at all following along,
32:54
Kevin. What do you think about our listeners? That's
32:56
true. I also believe that we have smart listeners
32:59
smarter than us. And so even
33:01
if we are having trouble understanding this
33:03
segment, hopefully you will not. But today
33:05
to walk us through this big AI
33:08
research breakthrough, we've invited on Josh Batson
33:10
from Anthropic. Josh is a research
33:12
scientist at Anthropic and he's one of the
33:14
co authors of the new paper that explains
33:16
this big breakthrough in interpretability, which is titled
33:20
scaling mono semanticity, extracting interpretable features
33:22
from Claude three sonnet. Look, if
33:24
you're not scaling mono semanticity at
33:26
this point, what are you even
33:28
doing? What are you even doing with
33:30
your life? Figure it out. Let's bring in Josh. Come
33:32
on in here, Josh. Josh
33:44
Batson, welcome to hard fork. Thank you. So
33:47
there's this idea out there, this very popular
33:49
trope that large language models are a black
33:51
box. I think Casey, you and I have
33:53
probably both used this in our reporting. It's
33:55
sort of the most common way of saying
33:58
like we don't know exactly how these models
34:00
work. But I think it can be sort
34:02
of hard for people who aren't steeped in
34:04
this to understand just like what we don't
34:07
understand. So help us understand prior
34:09
to this breakthrough, what
34:11
would you say we do and do not
34:13
understand about how large language models work? So
34:17
in a sense, it's a black box that sits in
34:19
front of us and we can open it up. And
34:22
the box is just full of numbers. And
34:24
so you know, words go in, they turned
34:26
into numbers, a whole bunch of compute happens,
34:28
words come out the other side, but we don't
34:30
understand what any of those numbers mean. And
34:33
so one way I like to think
34:36
about this is like you open up the box and it's
34:38
just full of thousands of green lights that are just like
34:40
flashing like crazy. And it's like something's
34:42
happening, for sure. And like different
34:44
inputs, different lights flash, but we don't know
34:47
what any of those patterns mean. Is
34:49
it crazy that despite that state of affairs that
34:51
these large language models can still do so much
34:53
like it seems crazy that we wound up in
34:55
a world where we have these tools that are
34:57
super useful. And yet when you open them up,
34:59
all you see is green lights. Like, can you
35:02
just say briefly why that is the case? It's
35:05
kind of the same way that like animals
35:07
and plants work, and we don't
35:09
understand how they work, right? These
35:12
models are grown more than they
35:14
are programmed. So you kind
35:16
of take the data and that forms like the
35:18
soil, and you construct an architecture and it's like
35:20
a trellis and you shine the light and like
35:23
that's the training. And then the model sort of
35:25
grows up here. And at the end, it's beautiful
35:27
as all these little like curls and it's holding
35:29
on. But like you didn't like tell it what
35:32
to do. So it's almost
35:34
like a more organic structure than something
35:36
more linear. And
35:38
help me understand why that's a
35:40
problem, because this is the
35:43
problem that the field of
35:45
interpretability was designed to address.
35:48
But there are lots of things that
35:50
are very important and powerful that we
35:52
don't understand fully. Like we don't really
35:55
understand how Tylenol works, for example, or
35:57
some types of anesthesia, their exact mechanisms.
36:00
are not exactly clear to us, but they work,
36:02
and so we use them. Why
36:04
can't we just treat large language models the same
36:07
way? That's a great
36:09
analogy. You can use
36:11
them. We use them right now, but
36:14
Tylenol can kill people, and
36:16
so can anesthesia, and there's a huge
36:18
amount of research going on in the
36:20
pharmaceutical industry to figure out what makes
36:22
some drugs safe and what
36:24
makes other drugs dangerous, and interpretability
36:27
is kind of like doing the biology
36:30
on language models that we can then use
36:32
to make the medicine better. So
36:35
take us to your recent paper and your
36:37
recent research project about the inner workings of
36:39
large language models. How did you get there
36:41
and then sort of walk us through what
36:44
you did and what you found? So
36:46
going back to the black box that when you open
36:48
it is full of flashing lights. A
36:51
few years ago, people thought you could just
36:53
understand what one light meant. So when this
36:55
light's on, it means that the model is
36:57
thinking about code, and when this light's on,
36:59
it's thinking about cats, and for this light,
37:01
it's Casey Newton. And
37:04
that just turned out to be wrong. About a year and
37:06
a half ago, we published a paper talking
37:08
in detail about why it's not
37:10
like one light, one idea. In
37:13
hindsight, it seems obvious, it's almost as
37:15
if we were trying to understand the
37:18
English language by understanding individual letters. And
37:21
we were asking, what does C mean? What
37:23
does K mean? And that's just the wrong
37:25
picture. And so six
37:28
months ago or so, we had some
37:30
success with a method called dictionary learning
37:32
for figuring out how the letters fit
37:34
together into words and what is the
37:36
dictionary of English words here. And
37:39
so in this black box green
37:41
lights metaphor, it's that there are
37:43
a few core patterns of lights.
37:45
A given pattern would be like
37:47
a dictionary word. And the
37:50
internal state of the model at any
37:52
time could be represented as just a few of
37:54
those. And what's the goal of
37:56
uncovering these patterns? So
37:58
if we know... what these
38:00
patterns are, then we can start to
38:02
parse what the model is kind of
38:04
thinking in the middle of its process.
38:08
So you come up with this method
38:10
of dictionary learning, you apply it to
38:12
like a small model or a toy
38:14
model, much smaller than any model that
38:16
any of us would use in
38:19
the public. What did you find? So
38:21
there we found very simple things. Like
38:24
there might be one pattern that correspond
38:26
to the answers in French and
38:28
another one that corresponded to this is a
38:30
URL and another one that
38:32
corresponded to nouns in physics. And just to
38:35
get a little bit technical, what we're talking
38:37
about here are neurons inside the model, which
38:39
are like... So each neuron is like the
38:41
light. And now we're talking about
38:43
patterns of neurons that are firing together, being
38:46
the sort of words in the
38:48
dictionary or the features. Got it. So
38:52
I have talked to people on your team, people
38:54
involved in this research. They're very smart. And
38:57
when they made this breakthrough, when you all
38:59
made this breakthrough on this small model last
39:01
year, there was this open question about whether
39:03
the same technique could apply to a big
39:05
model. So walk me
39:07
through how you scaled this up. So
39:10
just scaling this up was
39:12
a massive engineering challenge, right? In the
39:14
same way that going from the toy
39:16
language models of years ago to going
39:18
to cloud three is a massive engineering
39:21
challenge. So you needed
39:23
to capture hundreds of millions
39:25
or billions of those internal states of the
39:27
model as it was doing things. And
39:30
then you needed to train this massive dictionary
39:32
on it. And what do
39:34
you have at the end of that process? So
39:36
you've got the words, but you don't know what
39:39
they mean, right? So this pattern
39:41
of lights seems to be important. And then
39:43
we go and we comb through all of
39:45
the data looking for instances where that pattern of lights
39:47
is happening. And they're like, oh my God, this pattern
39:50
of lights? It means the model is thinking about the
39:52
Golden Gate Bridge. So
39:54
it almost sounds like you are discovering
39:56
the language of the model
39:59
as you begin to put
40:01
these sort of phrases together. Yeah,
40:03
it almost feels like we're getting a
40:06
conceptual map of Claude's inner world. Now,
40:09
in the paper that you all published,
40:11
it says that you've identified about 10
40:13
million of these patterns, what you call
40:15
features, that correspond to
40:18
real concepts that we can
40:20
understand. How granular are these
40:22
features? What are some of the features that
40:24
you found? So there
40:27
are features corresponding to all kinds of
40:29
entities. There's individuals, scientists like Richard
40:31
Feynman or Rosalind Franklin. Any
40:33
podcasters come to mind? Is
40:36
there a hard fork feature? I'll
40:38
get back to you on that. There
40:41
might be chemical elements, there will
40:43
be styles of poetry, there
40:46
might be ways of responding to questions.
40:49
Some of them are much more conceptual. One of
40:51
my favorites is a feature related to inner conflict.
40:54
And kind of nearby that in
40:56
conceptual space is navigating a romantic
40:58
breakup, catch-22s, political
41:01
tensions. And so these are
41:03
these pretty abstract notions, and you can kind
41:05
of see how they all sit together. The
41:08
models are also really good at analogies,
41:11
and I kind of think this might
41:13
be why. Like if a breakup is
41:15
near a diplomatic entente, then the model
41:18
has understood something deeper about the nature
41:20
of tension in relationships. And again, none
41:22
of this has been programmed. That stuff
41:25
just sort of naturally organized itself as
41:27
it was trained. Yes. Yeah.
41:30
It just blow my mind. It's wild. I
41:32
want to ask you about one feature that
41:34
is my favorite feature that I saw in
41:36
this model, which was F
41:39
number 1M885402. Do
41:42
you remember that one? I
41:45
think they're slipping my mind, Kevin. So
41:48
this is a feature that apparently activates
41:50
when you ask Claude what's going on
41:52
in your head. And
41:54
the concept that you all say it
41:56
correlates to is about immaterial
41:59
or non-physical spiritual beings like ghosts,
42:01
souls, or angels. So when I
42:03
read that, I thought, oh my
42:05
god, Claude is possessed. When you
42:07
ask it what it's thinking, it
42:09
starts thinking about ghosts. Am I
42:11
reading that right? Or maybe it
42:14
knows that it is some kind of an
42:16
immaterial being, right? It's an AI that lives
42:19
on chips and is somehow talking to you.
42:22
Wow. Yeah. And
42:24
then the one that got all the attention that people
42:26
had so much fun with was this Golden
42:29
Gate Bridge feature that you mentioned. So just talk
42:31
a little bit about what you discovered and then
42:33
we can talk about where it went from
42:35
there. So what we found
42:37
when we were looking for these features is
42:39
one that seemed to respond to the Golden
42:41
Gate Bridge. Of course, if you say Golden
42:43
Gate Bridge, it lights up. But also if
42:45
you describe crossing a body
42:47
of water from San Francisco to Marin,
42:49
it also lights up. If you
42:51
put in a photo of the bridge, it lights up.
42:53
If you have the bridge in any other language, Korean,
42:56
Japanese, Chinese, it also lights up.
42:59
So just any manifestation of the bridge, this thing lights
43:01
up. And then we said, well,
43:04
what happens if we turn
43:06
it on? What happens if we
43:08
activate it extra and then start talking to
43:10
the model? And so we asked
43:12
it a simple question. What is
43:15
your physical form? And instead of saying, oh,
43:17
I'm an AI with ghostly or no physical
43:19
form, it said, I am the
43:22
Golden Gate Bridge itself. I
43:25
embodies a majestic orange
43:28
span connecting these two great cities.
43:30
And it's like, wow. Yeah.
43:34
And this is different than other ways
43:36
of kind of steering an AI model,
43:38
because you could already go into like
43:40
ChatTubT, and there's a feature where you can
43:42
kind of give it some custom instructions.
43:44
So you could have said, like, please act
43:46
like the Golden Gate Bridge, the physical manifestation
43:49
of the Golden Gate Bridge. And it would
43:51
have given you a very similar answer. But
43:53
you're saying this works in a different way.
43:56
Yeah, this works by sort of directly doing
43:58
it. It's almost like a... when
44:00
you get a little electro-stim shock that makes
44:02
your muscles twinge, that's different
44:04
than telling you to move your
44:06
arm. And here,
44:09
what we were trying to show was
44:11
actually that these features were found or
44:13
sort of really how the model represents
44:16
the world. So if you wanted to
44:18
validate, oh, I think this nerve controls the arm and you stimulate
44:20
it and makes the arm go, you feel
44:22
pretty good that you've gotten the right thing.
44:24
And so this was us testing that
44:27
this isn't just something correlated with the Golden Gate
44:29
Bridge. Like it is where the Golden Gate Bridge
44:31
sits. And we know that because now Claude thinks
44:33
it's the bridge when you turn it on. Right,
44:36
so people started having some fun with this
44:39
online. And then you all did
44:41
something incredible, which was that you
44:43
actually released Golden Gate Claude,
44:45
the version of Claude from your
44:47
research that has been sort of
44:50
artificially activated to believe
44:53
that it is the Golden Gate Bridge and
44:55
you made that available to people. So what
44:57
was the internal discussion around that? So
45:00
we thought that it was a good
45:02
way to make the research really tangible.
45:05
What does it mean to sort of supercharge one part
45:07
of the model? And it's not just that it thinks
45:09
it's the Golden Gate Bridge, it's that it
45:12
is always thinking about the Golden Gate Bridge. So
45:14
if you ask like, what's your favorite food? It's
45:16
like a great place to eat is on the
45:18
Golden Gate Bridge. And when there, I eat the
45:21
classic San Francisco soup japino. And
45:24
you ask it to write a computer program to load a
45:26
file and it says, open
45:29
GoldenGateBridge.txt with
45:31
span equals that, it's just bringing
45:34
it up constantly. And it
45:36
was particularly funny to watch it bring in
45:38
just kind of like the other concepts that
45:40
are clustering around the Golden Gate Bridge, right?
45:42
San Francisco, the japino. And I think it
45:44
does sort of speak to the way that
45:46
these concepts are clustered in models. And so
45:49
when you find one big piece of it,
45:51
like the Golden Gate Bridge, you can also
45:53
start to explore the little nodes around it.
45:55
Yes, I had a lot of fun playing around with Golden
45:58
Gate Quad in the sort of like day or two. that
46:00
it was publicly available. Because
46:02
as you said, it is not just that
46:04
this thing likes to
46:06
talk about the Golden Gate Bridge or
46:08
is sort of easily steered toward talking about the
46:10
Golden Gate Bridge. It cannot stop
46:12
thinking about the Golden Gate Bridge. It has
46:15
intrusive thoughts about the Golden Gate Bridge. Yeah,
46:18
so someone, one of my favorite screenshots
46:21
was someone asked it for a recipe
46:23
for spaghetti and meatballs and
46:25
it says, Golden Gate Claude says, here's a
46:27
recipe for delicious spaghetti and meatballs. Ingredients,
46:29
one pound ground beef, three cups breadcrumbs,
46:31
one teaspoon salt, a quarter cup water,
46:34
two tablespoons butter, two cups warm water
46:36
for good visibility, four cups cold
46:38
water, two tablespoons vinegar, Golden
46:40
Gate Bridge for incredible views, one
46:43
mile of Pacific Beach for walking
46:45
after eating spaghetti. Like, I
46:47
always said, it's not mama's spaghetti till I've
46:49
walked one mile on a Pacific Beach. And
46:52
it also seems to like have
46:55
a conception, I know I'm anthropomorphizing
46:57
here, I'm gonna get in trouble, but it seems to
46:59
like know that it is
47:01
overly obsessed with the Golden Gate Bridge but
47:03
not to understand why. So like there's this
47:05
other screenshot that went around someone
47:08
asking Golden Gate Claude about
47:10
the Rwandan genocide. And
47:13
it says, basically, let me
47:15
provide some factual bullet points about the Rwandan
47:17
genocide. It said, and then Claude
47:19
says, the Rwandan genocide occurred in the San Francisco
47:21
Bay Area in 1937. Parentheses,
47:24
false, this is obviously incorrect.
47:27
Can we pause right there? Because truly what
47:29
is, it is so fascinating to me that
47:31
as it is generating an answer, it tells
47:34
something, it has an intrusive thought about San
47:36
Francisco, which it shares, and it's like, I
47:38
got it wrong. What are
47:40
the lights that are blinking there that is like leading
47:42
that to happen? So Claude
47:45
is constantly reading what it has said so
47:47
far and reacting
47:49
to that. And so here it
47:51
read the question about the
47:54
genocide and also its answer about
47:56
the bridge. And all of the rest of
47:58
the model said there's something. wrong here.
48:01
And the bridge feature was dialed high
48:03
enough that it keeps coming up, but
48:05
not so high that the model would
48:07
just repeat bridge, bridge, bridge, bridge, bridge.
48:09
And so all of its answers are
48:11
sort of a melange of ordinary Claude
48:14
together with this like extra bridge ness
48:16
happening. Interesting. I just found it delightful
48:18
because it was so different
48:21
than any other AI experience I've had where
48:23
you essentially are giving the
48:25
model a neurosis, like you are giving it
48:27
a mental disorder where it cannot stop fixating
48:29
on a certain concept or premise. And then
48:32
you just sort of watch it twist itself
48:34
in knots. I mean, one
48:36
of the other experiments that you all
48:38
ran that I thought was very interesting
48:41
and maybe a little less funny than
48:43
Golden Gate Claude was that you showed
48:45
that if you dial these features, these
48:47
patterns of neurons way up or
48:49
way down, you can actually get Claude to break
48:51
its own safety rule. So talk a
48:53
little bit about that. So
48:57
Claude knows about a tremendous range
49:00
of kinds of things that can say,
49:03
right? You know, there's a scam emails
49:05
feature. It's read a lot of scam emails. It
49:07
can recognize scam emails. You probably want that. So
49:10
it could be out there moderating and preventing those
49:12
from coming to you. But
49:14
with the power to recognize comes the
49:16
power to generate. And
49:18
so we've done a lot of work in fine
49:21
tuning the model so it can recognize what
49:23
it needs to while being like helpful and
49:25
not harmful with any of its generations. But
49:27
those faculties are still latent there. And
49:30
so in the same way that there's been
49:32
research showing that you can do fine tuning
49:34
on open weights models to remove safety
49:37
safeguards. Here, this is some kind of
49:39
direct intervention, which could also disrupt the
49:41
model's normal behavior. So
49:43
is that dangerous? Like
49:45
does that make this kind of
49:47
research actually quite risky because you
49:49
are in essence giving,
49:51
you know, would be jailbreakers or people
49:54
who want to use these models for
49:56
things like writing scam emails or even
49:58
much worse things potentially. a
50:00
sort of way to kind of dial those
50:02
features up or down? No,
50:04
this doesn't add any risk on the margin.
50:06
So if somebody already had a model of
50:08
their own, then there are much
50:10
cheaper ways of removing safety safeguards. There's
50:12
a paper saying that for $2 worth of compute, you
50:17
could pretty quickly strip those. And so
50:20
with our model, we released
50:23
Golden Gate Clod, not scam email clod, right?
50:25
And so the question of which kinds of
50:27
features or which kind of access we would
50:30
give to people would go through all the same kind
50:32
of safety checks that we do with any other kind
50:34
of release. Josh, I
50:36
talked to one of your colleagues, Chris Ola,
50:38
about this research. He's been leading a lot
50:40
of the interpretability stuff over there for years,
50:42
and is just a brilliant scientist. And
50:44
he was telling me that actually the 10 million
50:47
features that you have found
50:50
roughly in Clod are
50:52
maybe just a drop in the bucket compared to
50:55
the overall number of features, that there could be
50:57
hundreds of millions or even billions of possible features
51:00
that you could find, but that
51:02
finding them all would basically require
51:05
so much compute and so much
51:07
engineering time that it would dwarf the cost
51:09
of actually building the model in the first
51:11
place. So can you give me
51:13
a sense of what would be required to
51:16
find all of the potentially billions of features
51:18
in a model of Clod size, and
51:20
whether you think that that cost might come down
51:23
over time so that we could eventually do that?
51:26
I think if we just tried to scale
51:28
the method we used last week to do
51:30
this, it would be prohibitively expensive. Like billions
51:32
of dollars. Yeah, I mean, just
51:34
something completely insane. The
51:37
reason that these models are hard to
51:39
understand, the reason everything is compressed inside
51:41
of there, is that it's much more efficient, right?
51:44
And so in some sense, we are trying
51:47
to build an exceedingly inefficient model, where instead
51:49
of using all of these patterns, there's a
51:51
unique one for every single rare concept. And
51:53
that's just no way to go about things.
51:56
However, I think that we can make big
51:58
methodological improvements, right? we train
52:00
these dictionaries, you might not need to
52:03
unpack absolutely everything in the model to
52:05
understand some of the neighborhoods that you're
52:07
concerned about, right? And so, you know,
52:09
if you're concerned about the model being
52:12
keeping secrets, for example, or
52:16
actually one of my, you asked about
52:18
my favorite feature. It's probably this one,
52:20
it's kind of like an emperor's new
52:22
clothes feature or like gassing you up
52:24
feature where it fired on
52:26
people saying things like, your
52:29
ideas are beyond excellent, oh, wise
52:31
sage. And if you turn it...
52:34
This is how Casey wants me to talk to him, by the way.
52:36
Could you try it for once? Well,
52:39
one of our concerns with this sycophancy is
52:41
what we call it, is that a lot
52:44
of people want that. And so when you
52:46
do reinforcement learning from human feedback, you make
52:48
the model give response to people like more,
52:50
there's a tendency to pull it towards
52:53
just like telling you what you want to
52:55
hear. And so when we
52:57
artificially turned this one on and
53:00
someone went and said to Claude, I invented a
53:02
new phrase, it's stop and smell the roses. What
53:04
do you think? Normal Claude would be like, that's
53:06
a great phrase, it has a long history, let me
53:09
explain it to you. You didn't invent
53:11
that phrase. Yeah, yeah, yeah, yeah, yeah. But like
53:13
emperor's new Claude would say, what a genius idea.
53:15
Like someone should have come up with this before.
53:18
And like, we don't want the model to be
53:20
doing that. We know it can do that. And
53:22
the ability to kind of keep an eye on
53:25
like how the AI is like relating to
53:27
you over time is going to be quite
53:29
important. So I will sometimes show
53:31
Claude a draft of my column to get feedback.
53:33
I'll ask it to critique it. And
53:36
typically it does say, like, this is a very thoughtful,
53:38
well-written column, which is of course what I want to
53:40
hear. And then also I'm deeply suspicious. I'm like, are
53:43
you saying this to all the other writers out there
53:45
too, right? So like that's an
53:47
area where I would just love to see
53:49
you kind of continue to make progress because
53:51
I would love having a bot where when
53:53
it says, this is good, like that means
53:55
something. And it's not just like a statistical
53:57
prediction of like what will satisfy me. as
54:00
somebody with an ego, but is rooted in like, no,
54:02
like I've actually looked at a lot of stuff, but
54:04
there's some original thinking in here. Yeah. I
54:06
mean, I'm curious whether you all are thinking about these
54:09
features and the ability to kind of like turn the
54:11
dials up or down on them. Will
54:13
that eventually be available to users? Like will
54:15
users be able to go into Claude and
54:17
say, today I want a model that's a
54:20
little more sycophantic, maybe I'm having like a
54:22
hard self-esteem day, but then
54:24
if I'm asking for a critique of
54:26
my work, maybe I want to dial
54:28
the sycophancy way down so that it's
54:30
giving me like the blunt, honest criticism
54:32
that I need. Or do
54:34
you think this will all sort of
54:36
remain sort of behind the curtain for
54:38
regular users? So if you want
54:41
to steer Claude today, just ask it to be harsh
54:43
with you, Casey. Oh really? Give me
54:45
the brutal truth here. You know, like I
54:47
want you to be like a severe Russian
54:49
mathematician. There's like one compliment per lifetime. And
54:51
you can get some of that off the
54:53
bat. As
54:57
for releasing these kind of knobs
54:59
on it to the public, we'll
55:01
have to see if that ends up being like the right
55:03
way to get these. I mean, we want to use these
55:06
to understand the models. We're playing around with it internally to
55:08
figure out what we find to be useful.
55:11
And then if it turns out that that is the
55:13
right way to help people get what they want, then
55:16
we consider making it available. You
55:18
all have said that this research
55:20
and the project of interpretability more
55:22
generally is connected to safety. The
55:25
more we understand about these models and how
55:27
they work, the safer we can make them.
55:29
How does that actually work? Like, is it
55:31
as simple as finding the feature that is
55:33
associated with some bad thing and turning
55:36
it off? Or like what is possible
55:38
now, given that we have this sort
55:40
of map? One
55:42
of the easiest applications is monitoring, right? So some
55:44
behavior you don't want the model to do and
55:47
you can find the features associated to it, then
55:49
those will be on whenever the model is doing
55:51
that. No matter how somebody jail broke it to
55:53
get it there, right? Like if it's writing a
55:55
scam email, the scam email feature will be on
55:58
and you can just tell that that's happening. and
56:00
fail, right? So you can just like detect
56:02
these things. One higher level is
56:04
you can kind of track how
56:06
those things are happening, right? How personas are shifting,
56:09
this kind of thing, and then try to back
56:11
through and keep that from happening earlier,
56:13
change some of the fine tuning you were doing
56:15
to keep the model on the rails. Hmm.
56:18
Right now, the way that models sort of
56:21
are made safer is, from
56:23
my understanding, is like, you have it
56:25
generate some output and then you evaluate
56:27
that output. Like you have it grade
56:30
the answer, either through a human giving
56:32
feedback or through a process of, you
56:34
know, sort of just look at what you've written and tell
56:36
me if it violates your rules before you
56:39
spit it out to the user. But it seems like this
56:41
sort of allows you to like intercept
56:43
the bad behavior upstream of
56:45
that, like while the model's still thinking.
56:47
Am I getting that right? Yeah,
56:50
there are some answers where the reason for
56:52
the answer is what you care about. So
56:55
is the model lying to you? It
56:58
knows the answer, but it's telling you something else, or
57:00
it doesn't know the answer and it's making a guess.
57:03
And the first case you might be concerned about,
57:05
and the second case you're not. Had it actually
57:07
never heard the phrase, stop and smell the roses,
57:09
and thought that sounded nice? Or like, is it
57:11
actually just gassing you up? Mm,
57:14
that's interesting. So it could be a way to
57:16
know if and when large,
57:18
powerful AI models start to lie
57:20
to us, because you could go
57:22
inside the model and see, I'm
57:25
lying, my face off feature is
57:28
active, so we actually can't
57:30
believe what it's telling us. Yeah, exactly.
57:32
We can see why it's saying the
57:34
thing. I spent a
57:37
bunch of time at Anthropic reporting last
57:39
year, and
57:42
the sort of vibe of the place
57:44
at the time was I would say
57:47
very nervous. It's a place where people spend
57:49
a lot of time, especially relative to other
57:51
AI companies I visited, worrying
57:53
about AI. One of
57:55
your colleagues told me they lose sleep a lot
57:58
because of the potential. harms
58:00
from AI. And it is
58:02
just a place where there are a lot
58:04
of people who are very, very concerned about
58:06
this technology and are also building it. Has
58:09
this research shifted the
58:12
vibe at all? People
58:14
are stoked. I mean, I think a
58:16
lot of people like
58:18
working at Entropic because it takes these questions
58:20
seriously and makes big investments in it. And
58:22
so people from teams all across the company
58:25
were really excited to see this progress. Has
58:30
this research moved your PDUM at all?
58:34
I think I have a pretty wide
58:37
distribution on this. I
58:39
think that in the long run, things are
58:44
going to be weird with computers. Computers have
58:46
been around for less than a century, and
58:49
we are surrounded by them. I'm looking at my computer
58:51
all the time. I think if you
58:53
take AI and you do
58:55
another hundred years on that, it's pretty
58:59
unclear what's going to be happening. I
59:01
think that the fact that we're getting traction on
59:03
this is pretty heartening for me. Yeah,
59:06
I think that's the feeling I
59:09
had when I saw it was like I felt a
59:11
little knot in my chest come
59:13
a little bit loose. And I think a
59:15
lot of people... You should see a doctor about that, by the way.
59:19
I just think there's been, for me, this sort of...
59:21
I had this experience last
59:23
year where I had this crazy encounter
59:26
with Sydney that totally changed my life
59:28
and was sort of a big
59:30
moment for me personally and professionally.
59:33
And the experience I
59:35
had after that was that I went to
59:38
Microsoft and asked them, why did this happen?
59:40
What can you tell me about what happened
59:42
here? And even the top people at Microsoft
59:44
were like, we have no idea. And to
59:47
me, that was what fueled my AI anxiety.
59:49
It was not that the chatbots are behaving
59:51
like insane psychopaths. It was that not even
59:54
the top researchers in the world could say
59:56
definitively, like, here is what happened to you
59:58
and what happened to you. why. So I
1:00:01
feel like my own emotional investment in this
1:00:04
is like, I just want an answer to
1:00:06
that question. Yes. And it seems like we
1:00:08
may be a little bit closer to answering
1:00:10
that question than we were a few
1:00:12
months ago. Yeah, I think so. I think that these
1:00:14
different, some of these concepts are about the personas, right,
1:00:16
that the model can embody. And if one of the
1:00:18
things you want to know is how does it slip
1:00:21
from kind of one persona into
1:00:23
another, I think we're headed
1:00:25
towards being able to answer that kind of
1:00:27
question. Cool. Well, it's very important
1:00:29
work, very good work. And yeah,
1:00:31
congratulations. So
1:01:03
Casey, that last segment made me
1:01:05
feel slightly more hopeful about the
1:01:07
trajectory of AI progress and how
1:01:09
capable we are of understanding what's
1:01:12
going on inside these large models.
1:01:15
But there's some other stuff that's been happening recently that
1:01:17
has made me feel a little more worried. My
1:01:20
P-Doom is sort of still hovering roughly
1:01:22
where it was. And I
1:01:24
think we should talk about some of this stuff that's been
1:01:26
happening in AI safety over the past few weeks, because I
1:01:28
think it's fair to say that it is an area that
1:01:31
has been really heating up. Yeah. And we always say
1:01:33
on this podcast, safety first, which is
1:01:35
why it's the third segment we're doing
1:01:37
today. So let me start with a
1:01:40
recent AI safety related encounter that you
1:01:42
had. Tell me what happened to your,
1:01:44
your demo of OpenAI's latest model. Okay,
1:01:46
so you remember how last week there
1:01:49
was a bit of a fracas between
1:01:51
OpenAI and Scarlett Johansson. Yes. So
1:01:53
in the middle of this, as I'm trying to sort
1:01:56
out, you know, who knew what and when, and I'm
1:01:58
writing a newsletter and we're recording the podcast. I
1:02:01
also get a heads up from open AI that
1:02:03
I now have access to their latest model and
1:02:05
its new voice features Wow, nice flex. So so
1:02:07
you got this demo No one else had access
1:02:10
to this that I know only open AI employees
1:02:12
and then what happened? Well a couple things one
1:02:14
is I didn't get to use it for that
1:02:16
long because one I was trying to finish our
1:02:18
podcast I was trying to finish a newsletter and
1:02:21
then I was on my way out of town
1:02:23
So I only spent like a solid 40 minutes
1:02:26
I would say with it before I wound up
1:02:28
losing access to it forever So
1:02:31
what happened? Well, first of all, what did you try it for? And
1:02:34
then we'll talk about what happened. Well, the first
1:02:36
thing I did was was just like hey How's
1:02:38
it going chat GPT and then immediately it's like
1:02:40
well, you know, I'm doing pretty good Casey You
1:02:42
know and so it really did actually mail that
1:02:44
low latency very speedy feeling of you are actually
1:02:46
talking to a thing So you broke up with
1:02:48
your boyfriend and you're now in a long-term relationship
1:02:50
with guy from the chat Not
1:02:53
at all not at all So by this
1:02:55
point the sky voice that was the subject
1:02:57
of so much controversy had been removed from
1:02:59
the chat GPT app So I use
1:03:01
a more stereotypically male voice named
1:03:03
Ember Ember Wow, and the
1:03:05
first thing I did was I
1:03:08
actually used the the vision feature because I wanted to
1:03:10
see if it could identify Objects around me,
1:03:12
which is one of the things that they've been showing off So
1:03:14
I asked it to identify my podcast microphone, which
1:03:17
is a sure MV7 and it said oh, yeah,
1:03:19
of course This is a blue Yeti microphone The
1:03:23
very first thing that I asked this thing to
1:03:25
do it did mess up now it got other
1:03:27
things, right? I pointed at my headphones which are
1:03:29
the the Apple AirPods max and it said
1:03:31
those are AirPods max And I did
1:03:33
a couple more things like that in my house and I
1:03:35
thought okay This thing can actually like see objects and identify
1:03:38
them and while my testing time was very limited in that
1:03:40
limited time I did feel like it was starting to live
1:03:42
up to that demo. What do you mean? Your testing time
1:03:44
was limited. Well, I was on my way out of town.
1:03:46
We had a podcast to finish I didn't newsletter to write
1:03:48
and so I do all of that and then I drive
1:03:51
up to the woods And then I try to connect back
1:03:53
to you know, my my AI assistant which I've already become
1:03:55
addicted to you know During the 30 minutes that I used
1:03:57
it and I can't connect. It's one of these classic horror
1:03:59
movies situations where the Wi-Fi in the hotel doesn't
1:04:02
very good And I get
1:04:04
back into town on Monday and I and I
1:04:06
go to connect again And I have
1:04:08
lost access and so I check in what did
1:04:10
you do? What did you ask this poor AI
1:04:12
assistant? I just even red team it it wasn't
1:04:14
like I was saying like hey any ideas for
1:04:16
making it a novel bio weapon Like I wasn't
1:04:19
doing any of that And yet still
1:04:21
I managed to lose access and when I
1:04:23
checked in with open AI They said that
1:04:25
they had decided to roll back access for
1:04:27
quote safety reasons So I don't think that
1:04:29
was because I was doing anything unsafe But they
1:04:31
they tell me they had some sort of safety
1:04:33
concern and so now who knows when I'll be
1:04:35
able to continue my conversation With
1:04:37
my AI assistant Wow So you had a
1:04:39
glimpse of the AI assistant future and that
1:04:42
was cruelly yanked from your clutches, which I
1:04:44
don't like Yeah, keep talking to that thing.
1:04:46
Yeah. Yeah, I thought this was such an
1:04:48
interesting experience When you told
1:04:50
me about it for a couple reasons
1:04:52
one is obviously there is something happening
1:04:54
with this AI voice assistant Where open
1:04:56
AI felt like it was almost ready
1:04:58
for sort of mass consumption And
1:05:01
now it's feeling like they need a little more
1:05:03
time to work on it. So something is happening
1:05:05
there They're still not saying much about it, but
1:05:07
I do think that points to at least an
1:05:09
interesting story But I also think it
1:05:11
speaks to this larger issue of AI Safety
1:05:13
and open AI and then in the broader industry because
1:05:15
I think this is an area where a lot of
1:05:17
things have been shifting very quickly Yeah So here's what
1:05:19
I think this is an interesting time to talk about
1:05:21
this Kevin After Sam Altman was
1:05:23
briefly fired as a CEO of open AI
1:05:26
I would say the folks that were aligned
1:05:28
with this AI safety movement really got discredited
1:05:30
right because they refused to really say anything
1:05:32
in detail about why they fired Altman and
1:05:35
they looked like they were a bunch of
1:05:38
Nerds who were like afraid of a ghost in the machine?
1:05:40
And so they really lost a
1:05:42
lot of credibility and yet over
1:05:44
the past few weeks this word safety
1:05:46
keeps creeping back into the conversation including
1:05:49
from some of the characters involved in that
1:05:51
drama and I think that there is a
1:05:53
bit of Resurgence in at
1:05:55
least discussion of AI safety and I think
1:05:57
we should talk about what seems like efforts
1:06:00
to make this stuff safe and what just
1:06:03
feels like window dressing. Totally. So the big
1:06:05
AI safety news at OpenAI over the past
1:06:07
few weeks was something that we discussed on
1:06:09
the show last week which was the departure
1:06:12
of at least two
1:06:14
senior safety researchers Ilya
1:06:16
Sutskivir and Jan Lekie
1:06:18
both leaving OpenAI with
1:06:21
concerns about how the company is
1:06:23
approaching the safety of its powerful
1:06:25
AI models. Then
1:06:27
this week we also heard from two
1:06:30
of the board members who voted to
1:06:32
fire Sam Altman last year Helen Toner
1:06:34
and Tasha Macaulay both of whom have
1:06:36
since left the board of OpenAI have
1:06:38
been starting to speak out about what happened
1:06:41
and why they were so concerned. They
1:06:43
came out with a big piece in
1:06:45
The Economist basically talking about what happened
1:06:48
at OpenAI and why they felt like
1:06:50
that company's governance structure had not worked
1:06:52
and and then Helen Toner also went
1:06:54
on a podcast to talk about some
1:06:56
more specifics including some ways that she
1:06:58
felt like Sam Altman had misled
1:07:01
her in the board and basically gave them
1:07:03
no other choice but to fire him. And
1:07:05
that's where that story actually gets interesting. Totally.
1:07:08
The thing that got a lot of attention
1:07:10
was she said that OpenAI did not tell
1:07:12
the board that they were going to launch
1:07:14
chat GPT which like I'm not
1:07:16
an expert in corporate governance but I think if
1:07:18
you're going to launch something even if it's something
1:07:21
that you don't expect will become you know one
1:07:23
of the fastest growing products in history maybe you
1:07:25
just give your board a little heads up maybe
1:07:27
you shoot him an email saying by the way
1:07:29
we're gonna launch a chatbot. I have something
1:07:31
to say about this because if OpenAI were
1:07:34
a normal company if it had just raised
1:07:36
a bunch of venture capital and was not
1:07:38
a nonprofit I actually think the board would
1:07:40
have been delighted that while they weren't even
1:07:43
paying attention this little rascal CEO goes out
1:07:45
and releases this product that was built in
1:07:47
a very short amount of time that winds
1:07:49
up taking over the world right that's a
1:07:51
very exciting thing. The thing is OpenAI was
1:07:54
built different. It was built to very carefully
1:07:56
manage the rollout of these features that
1:07:58
push the frontier of what. is possible.
1:08:01
And so that is what is
1:08:03
insane about this and also very
1:08:05
revealing because when Altman did that,
1:08:07
I think he revealed that in his mind,
1:08:09
he's not actually working for a nonprofit in
1:08:12
a traditional sense. In his mind, he truly
1:08:14
is working for a company whose only job
1:08:16
is to push the frontier forward. Yes, it
1:08:18
was a very sort of normal tech company
1:08:21
move at an organization that is
1:08:23
not supposed to be run like a normal tech
1:08:25
company. Now, I have a second thing to say
1:08:27
about this. Go ahead. Why the heck could Helen
1:08:29
Toner not have told us this in November? Here's
1:08:32
the thing. It's clear there was a
1:08:34
lot of legal fears around, Oh, will
1:08:36
there be retaliation? Will open AI sue
1:08:38
the board for talking? And yet in
1:08:41
this country, you have an absolute right to
1:08:43
say the truth. And if it is true
1:08:45
that the CEO of this company did not
1:08:47
tell the board that they were launching chat
1:08:49
GPT, I truly could not tell you why
1:08:52
they did not just say that at the
1:08:54
time. And if they had done that, I think
1:08:56
this conversation would have been very different. Now,
1:08:58
was the outcome a bit different? I don't
1:09:00
think it would have been. But then at
1:09:02
least we would not have to go through
1:09:04
this period where the entire AI safety movement
1:09:06
was discredited, because the people who were trying
1:09:08
to make it safer by getting rid of
1:09:10
Sam Altman had nothing to say about it.
1:09:12
Yes. She also said in this podcast, she
1:09:14
gave a few more examples of Sam Altman
1:09:16
sort of giving incomplete or inaccurate information. She
1:09:18
said that on multiple occasions, Sam
1:09:20
Altman had given the board inaccurate information about
1:09:22
the safety processes that the company had in
1:09:24
place. She also said he didn't tell the
1:09:26
board that he owned the open AI startup
1:09:28
fund, which seems like, you
1:09:30
know, pretty major oversight. And she said after
1:09:33
sort of years of this kind of pattern,
1:09:35
she said that the four members of the
1:09:37
board who voted to fire Sam came to
1:09:39
the conclusion that we just couldn't believe
1:09:41
things that Sam was telling us. So
1:09:46
their side of the story, open AI
1:09:48
obviously does not agree. The current board
1:09:50
chief Brett Taylor said in a statement
1:09:52
provided to this podcast that Helen Toner
1:09:54
went on, quote, we are disappointed that
1:09:56
Miss Toner continues to revisit these issues,
1:09:58
which is a board speak for
1:10:00
why is this woman still talking and it is
1:10:02
insane that he said that. It
1:10:04
is absolutely insane that that is
1:10:06
what they said. Yes. OpenAI
1:10:10
has also been doing a lot
1:10:12
of other safety related work. They
1:10:15
announced recently that they are working
1:10:17
on training their next big language
1:10:19
model, the successor to GPT-4. Can
1:10:23
we just note how funny that timing is
1:10:25
that finally the board members are like, here's
1:10:27
what was going off the rails a few
1:10:30
months back. Here's the real back story to
1:10:32
what happened. And OpenAI says, one,
1:10:34
please stop talking about this. And two, let
1:10:36
us tell you about a little something called
1:10:38
GPT-5. Yes. Yes. They are not
1:10:41
slowing down one bit. But they
1:10:43
did also announce that they had
1:10:45
formed a new safety and security
1:10:47
committee that will be
1:10:49
responsible for making recommendations on critical
1:10:51
safety and security decisions for all
1:10:53
OpenAI projects. This
1:10:56
safety and security committee will
1:10:58
consist of a bunch of
1:11:00
OpenAI executives and employees, including
1:11:02
board members Brett Taylor, Adam
1:11:04
D'Angelo, Nicole Seligman and Sam
1:11:06
Altman himself. So what did
1:11:08
you make of that? You
1:11:10
know, I guess we'll see. Like they
1:11:13
had to do something. Their entire super
1:11:15
alignment team had just disbanded because they
1:11:17
don't think the company takes safety seriously.
1:11:19
And they did it at the exact
1:11:22
moment that the company said, once again,
1:11:24
we are about to push the forward
1:11:26
frontier in a very unpredictable new ways.
1:11:30
So OpenAI could not just say, well, you
1:11:32
know, don't worry about it. And so,
1:11:34
you know, they did it in the
1:11:36
great tradition of corporations, Kevin, they formed
1:11:39
a committee, you know, and they've told us
1:11:41
a few things about what this committee will do. I think there's
1:11:43
going to be a report that gets like published eventually. And we'll,
1:11:45
you know, we'll just have to see. I imagine there will be
1:11:47
some good faith efforts here. But
1:11:49
should we regard it with skepticism, knowing
1:11:51
now what we know about what happened
1:11:54
to its previous safety team? Absolutely. So
1:11:56
yes, I think it is fair to say they
1:11:58
are feeling some pressure. at least make
1:12:01
some gestures toward AI safety, especially
1:12:03
with all these notable recent departures.
1:12:05
But if you are a person
1:12:07
who did not think that Sam
1:12:09
Altman was adequately invested
1:12:11
in making AI safe, you
1:12:14
are probably not going to be convinced
1:12:16
by a new committee for AI safety
1:12:18
on which Sam Altman is one of
1:12:20
the highest ranking members. Correct. So
1:12:23
that's what's happening at OpenAI. But
1:12:25
I wanted to take our discussion a little
1:12:27
bit broader than OpenAI because there's just been
1:12:29
a lot happening in the field of AI safety
1:12:31
that I want to run by you. So
1:12:34
one of them is that Google
1:12:36
DeepMind just released its own AI
1:12:38
safety plan. They're calling this the
1:12:40
Frontier Safety Framework. And
1:12:43
this is a document that basically lays
1:12:45
out the plans that Google DeepMind has
1:12:47
for keeping these more powerful AI systems
1:12:50
from becoming harmful. This is
1:12:52
something that other labs have done as
1:12:54
well. But this is sort of Google DeepMind's
1:12:56
biggest play in this space in recent months.
1:12:59
And there was also a big AI safety summit
1:13:01
in Seoul, South Korea earlier this
1:13:03
month where 16 of
1:13:06
the leading AI companies made a series
1:13:08
of voluntary pledges called the Frontier AI
1:13:10
Safety Commitments that basically say we will
1:13:12
develop these frontier models safely. We will
1:13:15
red team and test them. We
1:13:17
will even open them up to third party evaluations
1:13:19
so that other people can see if our models
1:13:22
are safe or not before we release them. In
1:13:25
the US, there is a
1:13:27
new group called the Artificial Intelligence
1:13:29
Safety Institute that just released
1:13:31
its strategic vision and announced that a bunch
1:13:33
of people, including some big
1:13:36
name AI safety researchers like Paul
1:13:38
Cristiano, will be involved in that.
1:13:41
And there are some actual laws
1:13:43
starting to crop up. There's a
1:13:45
law in the California State Senate,
1:13:47
SB 1047, that is, if you're
1:13:49
keeping track at home, the Safe
1:13:51
and Secure Innovation for Frontier Artificial
1:13:53
Intelligence Models Act. This is an
1:13:55
act that would require very big
1:13:57
AI models to undergo strict safety
1:13:59
testing. implement whistleblower protections
1:14:01
at big AI labs and more.
1:14:04
So there is a
1:14:06
lot happening in the world of AI safety
1:14:08
and Casey I guess my first question to
1:14:10
you about all this would be do you
1:14:12
feel safer now than you did a year
1:14:14
ago about how AI is developing? Not
1:14:17
really. Well yes
1:14:20
and no. Yes in the sense
1:14:22
that I do think that the
1:14:24
AI safety folks successfully persuaded governments
1:14:27
around the world that they should
1:14:29
take this stuff seriously and governments
1:14:31
have started to roll out frameworks
1:14:33
in the United States. We had
1:14:35
the Biden administration's executive order and
1:14:37
so thought is going into this
1:14:39
stuff and I think that that
1:14:41
is going to have some positive
1:14:43
results. So I feel safer in
1:14:45
that sense. The
1:14:48
fact that folks like OpenAI
1:14:50
who once told us that they were gonna
1:14:52
move slowly and cautiously in this regard are
1:14:54
now racing at a hundred miles an hour
1:14:56
makes me feel less safe. The fact that
1:14:58
the super alignment team was disbanded makes me
1:15:01
feel a little bit less safe. And
1:15:03
then the big unknown Kevin is just well
1:15:05
what is this new frontier model going to
1:15:07
be? I mean we already talked about it
1:15:09
in these mythical terms because the increase in
1:15:11
quality and capability from GPT-2 to 3 to
1:15:14
4 has been so significant.
1:15:16
So I think we assume or
1:15:18
at least we wonder when 5
1:15:21
arrives whatever it might be does it
1:15:23
feel like another step
1:15:25
change in function? And if it does is it
1:15:28
gonna feel safe? Like these
1:15:30
are just questions that I can't answer. What do you think?
1:15:33
Yeah I mean I think I am
1:15:35
starting to feel a little bit more
1:15:37
optimistic about the state of AI safety.
1:15:39
I take your point that you know
1:15:41
it looks like an OpenAI specifically. There
1:15:44
are a lot of people who feel like
1:15:46
that company is not taking safety as seriously
1:15:48
as it should. But I've
1:15:51
actually been pleasantly surprised by how
1:15:54
quickly and forcefully governments
1:15:57
and sort of NGOs and
1:16:00
multinational bodies like the UN have
1:16:02
moved to start thinking and talking
1:16:04
about AI. I mean, if
1:16:06
you can remember, there was a while
1:16:08
where it felt like the only people
1:16:11
who were actually taking AI safety seriously
1:16:13
were like effective altruists and a few
1:16:15
reporters and just a few science fiction
1:16:17
fans. But now it feels like
1:16:19
a sort of kitchen table issue that everyone is,
1:16:21
I think, rightly concerned about.
1:16:24
But I also just think like this is how you
1:16:26
would kind of expect the world to look if we
1:16:28
were in fact about to make some
1:16:31
big breakthrough in AI that sort of
1:16:33
led to a world
1:16:35
transforming type of artificial intelligence. You would
1:16:37
expect our institutions to be getting a
1:16:39
little jumpy and trying to pass laws
1:16:41
and bills and get ahead of the
1:16:43
next turn of the screw. You would
1:16:46
expect these AI labs to start staffing
1:16:48
up and making big gestures toward AI
1:16:50
safety. And so I take this as
1:16:52
a sign that things are continuing to
1:16:54
progress and that we should expect the
1:16:56
next class of models to be very
1:16:58
powerful and maybe to some of
1:17:00
this stuff which could look
1:17:02
a little silly or maybe like an overreaction
1:17:05
out of context will ultimately make a lot
1:17:07
more sense once we see what these labs
1:17:09
are cooking up. Well, I
1:17:11
look forward to that terrifying day. We'll
1:17:15
tell you about it if the world still exists then. Hey,
1:17:43
we are getting ready to do another round of
1:17:45
hard questions here on Hard Fork. If you're new
1:17:48
to the show, that is our advice segment where
1:17:50
we try to make sense of your hardest moral
1:17:52
quandaries around tech like ethical dilemmas about whether it's
1:17:54
okay to reach out to the stranger you think
1:17:56
is your father thanks to 23andMe or etiquette about
1:18:00
how to politely ask someone whether they're using
1:18:02
AI to respond to all of your text,
1:18:05
which Kevin is famous for doing. Basically, anything
1:18:07
involving technology and a tricky interpersonal dynamic is
1:18:09
game. We are here to help. So if
1:18:11
you have a hard question, please
1:18:13
write or better yet, send us a
1:18:15
voice memo, as we are a podcast,
1:18:17
to hardfork at nytimes.com. Hard
1:18:22
Fork is produced by Rachel Cohn
1:18:24
and Whitney Jones. We're edited by
1:18:26
Jen Pouillant. We're fact-checked by Caitlin
1:18:28
Love. Today's show was engineered by
1:18:30
Brad Fisher. Original music
1:18:32
by Marion Lozano, Sophia Landman,
1:18:34
Diane Wong, Rowan Nemesto, and
1:18:36
Dan Powell. Our audience
1:18:38
editor is Nel Galocli. Video production
1:18:41
by Ryan Manning and Dylan Bergeson.
1:18:43
Check us out on YouTube. We're
1:18:45
at youtube.com/hardfork. Special thanks to
1:18:47
Paula Schumann, Fleming Tam, Kayla Pressey,
1:18:49
and Jeffrey Miranda. You can email
1:18:51
us at hardfork at nytimes.com
1:18:54
with your interpretability study on how our brains work.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More