Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
In May of this year Rand Fishkin
0:02
received an email from a leaker
0:06
And I had never emailed with him before I
0:08
didn't know who he was The
0:11
email was labeled confidential Google
0:14
Rand is a public figure in internet marketing
0:17
and this leaker was claiming to have access
0:19
to Internal documentation about
0:21
something very important and
0:23
up until this point very
0:26
secretive these are wild
0:30
accusations I don't know if
0:32
you agree with this Scott, but maybe no
0:34
one thing impacts the shape of the
0:36
modern internet more than Google
0:38
search If you
0:40
remember the internet before Google
0:43
you would strongly agree with
0:45
that sentiment Especially if
0:47
you use some of the alternative search
0:49
engines that used to exist. I
0:51
remember asking Jeeves I Forgot
0:54
about Jeeves actually right a lot of people
0:57
forgot about Jeeves Today Google
0:59
is over 80% of search traffic
1:01
with 8.5 billion searches a day
1:03
2 trillion annually There's nonsense
1:05
numbers at this point a hundred thousand
1:07
a second the average person listening
1:10
You probably Google more times in the day than
1:12
you eat meals What
1:14
Google serves dictates what
1:16
information people consume how
1:18
Google ranks websites? Dictates what websites
1:21
live and die Optimizing
1:23
for that system is now an entire industry
1:27
and For a long time
1:29
all we've really known about that system is what
1:31
Google tells us Kind of
1:33
for good reason because you don't want it to get
1:35
any more gamified than it already is But
1:38
really everything about what Google tracks and
1:40
doesn't track what data they use and
1:43
what they don't that all
1:45
comes from statements from Google their
1:47
PR people their executives You
1:50
know what we know about Google search is what Google has
1:52
told us There
1:55
have never been any documents about the API
1:57
leaked to confirm or importantly
2:00
contradict those public statements.
2:03
Now, Rand's sitting there looking at
2:05
something that potentially does at
2:08
this email from this leaker claiming to
2:10
have a copy of a bunch of
2:12
internal documentation about the Google Search API.
2:16
When we hold these leaks up against those statements,
2:21
Rand started to find some interesting stuff. When
2:24
I finally got on the phone with him, he
2:26
pulls up the trove of documents
2:28
and my mind exploded.
2:32
The thing for me is were
2:35
the statements made in the
2:37
same kind of timeframe as
2:39
the documentation that we've received? The
2:42
other thing is why
2:44
did Rand not just
2:46
keep these documents private?
2:50
Somebody gave him the key to the treasure
2:52
chest. Google's intentionally kept
2:54
their search algo, a black box, and
2:57
modifies it every time somebody gets close
2:59
to figuring out ways to cheat it,
3:02
creating the industry you speak of with
3:04
thousands and maybe millions of SEO people
3:07
and companies in the world these days.
3:09
They even offer their own certifications, which
3:11
don't tell you how to do it.
3:16
If you were given these
3:18
leaked documents, you would
3:21
be maybe one of the only people outside
3:23
of Google that knew how to cheat the
3:25
system. He chose to
3:27
expose them publicly rather than build
3:30
an industry out of it. Which
3:33
I guess kudos to you. It kudos to
3:35
Rand on that one. To the temporality, you
3:38
bring up a great point. So did Google.
3:40
I think that for the people combing
3:42
through these thousands and thousands
3:44
of documents, figuring out
3:47
that timeline is the
3:49
rat's nest to untangle right now. As
3:52
for Rand, using his new found
3:54
knowledge for personal gain, you
3:56
know I didn't ask him about that.
3:59
Maybe I should. A
4:02
lot has happened in the weeks since he got that
4:04
email, and I wanted to talk to him about it. Google
4:07
is in the final stages of a big antitrust
4:09
case with the DOJ, it concerns Google search, and
4:12
whether they're engaged in anti-competitive practices, it
4:14
is not the only case of this
4:16
kind they have been involved in. Because
4:19
Google, technically their parent company alphabet, owns
4:21
websites that compete on the internet. So
4:24
how Google ranks websites matters not just
4:26
to all of us, but to them.
4:29
And that does potentially create
4:31
a conflict of interest. There
4:34
is a rising frustration
4:36
with particularly
4:39
Google's self-preferencing behavior. Essentially
4:42
Google doing things inside of Google
4:44
search where they own 95% of
4:46
the market to benefit other parts
4:49
of Google. I
4:51
wanted to know more about all this, about the
4:53
leak, Google's response, the
4:56
experience of getting wrapped up in this story. So reached
4:59
out to Rand, fun chat, appreciate
5:01
his time. This
5:03
is my conversation with Rand Fishkin, co-founder,
5:05
Spark Toro, and Snack Bar Studios
5:08
about the Google API leaks here
5:11
on hacked. Rand,
5:27
thank you so much for sitting down to talk with
5:29
me about this. Yeah, my pleasure Jordan, thanks for having
5:32
me. So I think we're the average person a lot
5:34
of this can seem in the weeds. For normal internet
5:36
users, to what extent is
5:38
the internet shaped by Google's search algorithm?
5:42
I can actually tell you this exactly because
5:44
I did some data
5:46
analysis of clickstream panel data, which
5:48
is essentially, it's kind of like
5:50
Nielsen TV set boxes from the
5:52
1960s, but fast forward
5:55
to 2024 and collect data
5:57
about every URL that's visited by 10% of the
6:00
users. of millions of devices through
6:02
a panel and 70% of all
6:05
internet traffic is sent by Google. Wow.
6:11
Pretty brutal. Brutal,
6:13
why do you say that? Well, I
6:16
am someone who believes that
6:18
monopoly power tends to stifle
6:21
innovation, creativity, and opportunity. And
6:23
I think that Google's stranglehold
6:26
on internet traffic
6:28
and on what
6:31
content people see and
6:33
don't see really
6:35
limits what
6:38
is created. Right? So for content creators, I'm
6:40
sure you know this world well. For
6:43
content creators, if you're in the
6:45
video game space, what is able
6:48
to be surfaced on Steam or the Nintendo
6:50
Switch Store or the Xbox Store or the
6:53
PlayStation Store, that plays a huge role
6:56
in what game designers choose
6:58
to create. Similarly,
7:01
when you make things for the internet,
7:04
whether it's a YouTube video or an
7:06
article about poison dart
7:08
frogs in Central America or the
7:12
best mustache wax for curly hair,
7:16
you're going to change what you
7:18
do based on what Google tells
7:22
you is important and how you can
7:24
potentially get traffic to your page.
7:26
And so, you know, we
7:29
kind of end up with this Google shaped internet. We're
7:31
talking because there was a trove of
7:33
leaked documents concerning the search algorithm that
7:36
shapes that Google shaped internet. And these
7:38
this leak came into your possession. Take
7:40
me through that story starting with I
7:42
think the email that you got. Yeah,
7:45
I pulled it up on my computer here
7:47
because I couldn't quite remember exactly
7:51
how it went. So I got an email
7:53
from a guy in
7:55
Georgia, Tbilisi, Georgia, Georgia,
7:57
the country. country,
8:00
not Georgia, the state. And
8:02
I had never emailed with him before. I didn't
8:04
know who he was. The
8:07
email was labeled confidential Google.
8:12
And he says, Rand, I know you've been out
8:14
of the SEO industry for a while. SEO is
8:16
search engine optimization. That's sort of the practice of
8:18
ranking web pages in Google and getting traffic to
8:21
them. And I used to run a
8:23
company called Moz, which is in
8:25
the SEO software and education space. And
8:28
he says, you are the first person to highlight
8:30
the influence of click data on search results. From
8:33
what I've heard, Google went so far
8:35
as to manually demote your experiments and
8:37
publicly make statements that are far from
8:39
truthful, including reputation destruction. There
8:42
were several Google representatives who over
8:44
the years said particularly
8:47
harsh things about my work and
8:49
research when I was at Moz. And
8:53
anyway, this email goes
8:55
on to sort of cite all
8:57
these examples. And he claims the
8:59
emailer claims to have proof of
9:01
how Google ranks pages, proof that
9:04
Google has lied publicly dozens, if
9:06
not hundreds of times to
9:08
news sources of all kinds,
9:11
proof that potentially they lied
9:14
to Congress when they when
9:18
their CEO Sundar Pichay talked
9:20
about how they use
9:23
data potentially. There's
9:26
even some suggestion in here that there were
9:28
lies about the Department
9:30
of Justice case that was prosecuted last year. These
9:34
are wild accusations,
9:37
right? I mean, if you get an email like
9:39
this, you know, especially I'm
9:41
six years out from running Moz from, you
9:43
know, being outside the SEO industry, I kind
9:46
of look at this and go, well, I
9:48
have to say this person sounds credible,
9:51
but also incredibly far fetched.
9:54
And so extraordinary claims require extraordinary
9:56
proof, right? So I, you
9:58
know, I write back to this person. and sort of
10:01
say, okay, thanks for telling me all this stuff. Like,
10:03
what are your goals here? And
10:06
they say, I think you should be the
10:08
one to publish this leak data and I
10:10
wanna show it to you. So
10:14
we schedule a phone call. This is how
10:16
unexcited about it I was, Jordan. It
10:19
was, I received the email on May
10:21
5th. On
10:23
May 23rd, I canceled the scheduled call
10:30
with the guy, because I wasn't feeling so well. And
10:33
then we rescheduled for
10:35
later in May. When
10:38
I finally got on the phone with him, he
10:40
pulls up the trove of
10:42
documents and my mind
10:45
exploded. I
10:47
mean, this is, here's
10:49
essentially what this guy showed
10:51
me was the
10:56
API documentation. API is sort
10:58
of how you make programmatic,
11:00
calls at scale. It's
11:04
the API documentation internal
11:06
to Google's search engineering team.
11:09
So this is, imagine you and I work on
11:11
Google search engine and imagine
11:13
we're programmers there and we are
11:16
trying to make Google web search
11:18
even better. These
11:20
are the list of all the types of
11:23
data that we can
11:25
call in order to build
11:28
or modify an algorithm, a
11:31
ranking system, right? To choose which pages
11:33
appear which, before which other ones. And
11:36
not just by the way, not just Google web
11:38
search, YouTube is in here, Google Android searches in
11:40
here, Google Maps is in here and local, Google
11:43
News is in here. All
11:45
the different flavors of things that
11:47
Google searches publicly for human beings. And
11:51
as he showed me these, we're
11:53
talking about 2,500 documents containing 14,000 different.
12:00
attributes or features that you can call,
12:02
right? So, you know,
12:04
if somebody says, oh, Google search is simple, it's just X,
12:06
Y and Z. You can be like, yeah,
12:08
yeah, X, Y and Z. It's 14,000 X, Ys and Zs. Literally,
12:13
14,014 X, Ys and Zs are what they used as
12:17
of March of 2024. So
12:20
we get off the call, you know,
12:22
and I'm kind of losing my mind going through this. And
12:25
the first thing I do is hit up
12:27
a few people quietly in my network. A
12:30
few people who used to work in
12:32
Google as engineers, software engineers, three
12:36
people in particular. The first one that I reached
12:38
out to said, I don't
12:40
wanna talk about this. And I'm
12:42
not willing even anonymously to broach
12:44
the subject. Okay, but the other
12:46
two said, yes,
12:49
I, you know, happy to take a look.
12:51
They took a look and they came back to me and basically
12:53
said, yeah, this absolutely looks
12:55
legitimate. I didn't personally have access
12:57
to this document when I was at Google. You
13:01
wouldn't have access to this unless you were
13:03
on that specific search engineering team. But
13:06
this is absolutely
13:08
Google formatted, you
13:11
know, internal speak throughout some,
13:13
there's almost no way this could have been
13:15
faked. And
13:18
then I talked to an expert in search ranking
13:20
systems who I had known from my time in
13:22
the industry, a guy named Mike
13:24
King. Mike runs an agency in New York
13:26
called I-PoleRank. And he
13:29
and I have been friends for many years. Mike
13:31
is ludicrously talented,
13:34
just extraordinarily detailed in his
13:36
research. He's been working on a book about
13:38
information retrieval, which is the science of how
13:41
search engine works, search engines work.
13:43
And so he's got
13:45
this absolute plethora of
13:48
relevant experience
13:51
around this stuff. So I show him the leak. I
13:54
called him, I called him, it was a Friday
13:56
night. He's out with his kids in Brooklyn at
13:58
the park. He's like, okay, okay. Wait, what are
14:00
you telling me? All right, let me
14:02
just go home and look at this. And
14:06
I think he stayed up all night and most
14:08
of the weekend working on the leak. And then
14:10
on Monday night, he and I both published blog
14:14
posts describing this leak, sharing
14:17
what was inside them. Obviously
14:19
the early analysis was very
14:22
incomplete, but already there were dozens
14:24
of features that were extraordinarily
14:27
interesting, contradicted many statements Google
14:29
had made in the past. And
14:31
when we published that, Jordan, the
14:34
internet exploded. I mean, hundreds
14:37
of thousands of visits just to my blog post, I'm
14:39
sure to Mike's as well, you
14:41
know, interview requests from two
14:45
dozen publications, you know, everyone from the
14:47
Verge, the New Yorker to New York
14:49
Times and Washington Post and Wall Street
14:51
Journal and everyone else you can imagine.
14:55
And, you know, Cara Swisher talked about
14:57
it on her podcast and, you know,
15:00
it was the top of Hacker News, it's the top of
15:02
Tech Meme. It
15:04
was an insane two weeks after that.
15:09
And since then, you know, people have
15:11
been analyzing this leak because it's public. Anyone
15:13
can see it. You can go right now and
15:16
look at the 14,000 inputs that
15:19
make up Google's search ranking algorithm. That
15:21
had never been possible in the
15:24
last quarter century. It's mind blowing. I
15:26
want to dig into what we learned
15:29
about the API that we didn't previously know.
15:31
And maybe just start, because I found this
15:34
one interesting. Google makes
15:36
Chrome. Google also
15:38
sells ads. Google representatives
15:40
have long stated that they don't
15:42
use any information about users in
15:44
Chrome for ranking, which is very
15:47
important for selling advertising. And
15:49
that always seemed kind of shocking to
15:51
not use this massive trough of privileged
15:54
data that you could potentially be gathering
15:56
in your biggest business line over here.
16:00
These leaked documents maybe tell
16:02
a story that that separation of church and
16:04
state isn't quite so separated. Can
16:06
you, to start with what's in here, tell
16:08
me a little bit about that. Yeah, yeah,
16:10
there's no separation at all. I
16:13
mean, when you look at how Google
16:15
measures, for example, one
16:17
of the things that would happen when any
16:19
human being performs a search is, let's say
16:21
you are looking for, this happened to me
16:24
recently, I was looking up the
16:26
Aquarium of the Pacific, which I think is
16:28
in Long Beach, California. I wanted
16:30
to find out how long they were
16:32
running their frog exhibit. So
16:35
they've got a new frog exhibit that just
16:37
launched. It showed up in my Google News
16:39
Feed. I was like, ooh, I wanna go
16:41
see Poison Dart Frogs when I'm down in
16:43
California over the summer. And so
16:45
I do this search
16:48
and I click on the first result, which
16:50
is about the event, but it doesn't say
16:53
how long it's running. So then I click
16:55
back to Google's results. I
16:57
scroll down a little bit until I find a
17:00
press mention, right? Someone talking about them in
17:03
the news and saying, oh, the exhibit is
17:05
planned to be permanent. I click that one
17:07
and it, okay,
17:10
great. So I don't have to worry about when it's
17:12
gonna happen. Here's what Google's
17:15
documentation says. If you
17:17
click on a search and then you
17:19
click the back button and
17:21
you choose another result, that
17:24
suggests to Google that
17:26
the other result is probably more relevant and
17:28
deserves to be higher up in the rankings
17:31
than the one you left. You
17:33
left a result and your search
17:35
was unsolved, and then you bounced
17:38
back to the search results and chose a
17:40
different result. And once you went to
17:42
that one, then your search was resolved. This
17:45
is called pogo sticking. It
17:47
has long been used in
17:49
information retrieval literature. And
17:52
here it is right in the documentation. You
17:54
can observe that Google is not measuring this
17:57
through Google analytics, which many people speculated for
17:59
a long time. They're not just
18:01
measuring it by looking at what happens on their
18:03
search results page. They are
18:05
looking at the billions of devices
18:07
that use Chrome, Google Chrome, as
18:10
their browser to be able to
18:12
measure this. And
18:14
this is only one of hundreds
18:17
of uses of Chrome data
18:20
inside the ranking systems. As
18:23
another example, which I find particularly fascinating,
18:25
being someone who was in the industry,
18:28
many people know that one of the ways
18:30
that you rank higher in Google is get
18:32
lots of links pointing to you, right? If
18:34
lots of other pages on the internet link
18:36
to your page, that tends to suggest to
18:38
Google that you are more important than someone
18:40
who has very few links. So
18:44
inside the leak, you can see
18:47
that Google uses Chrome data, traffic
18:49
data to demote or increase the
18:51
value of links that come from
18:54
pages that either don't receive or
18:56
do receive traffic. For
18:59
example, if you are linked to by an article
19:02
in The Economist that got a lot of traffic,
19:05
that link is probably worth much more
19:08
than a link from, you know,
19:11
randomwebsite.net that gets
19:13
no traffic. That wasn't
19:15
always true, by the way. Google
19:18
used to be very manipulatable. Back
19:20
when I started in the industry, you could
19:23
get a bunch of links from a bunch
19:25
of different scammy sketchy websites and rank nearly
19:27
anything anywhere you wanted to. But
19:30
Google, Google managed to find a
19:32
really clever solution to this using
19:34
traffic data from Chrome. Google
19:37
has long stated and you gesture towards this,
19:39
but Google has long stated that they need
19:41
to balance what information they
19:44
make available publicly about how search
19:46
works, just frankly, because
19:48
the more that's public, the more is
19:50
gameable from everyone from full-blown scammers all
19:52
the way over to professionals in optimizing
19:54
for search engines. Talk to
19:56
me about that balance that we're seeing in
19:59
these leaks between transparency. and kind of
20:01
making the internet even more
20:03
broken in certain ways than it already is. Yeah,
20:07
I'm gonna throw out there my suspicion that
20:09
10 years
20:12
ago, maybe 15 years ago, if
20:14
a document like this had
20:16
leaked, it would have been
20:19
quite damaging to Google's ability
20:21
to organize
20:23
the web and make its information useful. I
20:27
will grant that. And I think
20:29
that's because Google just wasn't that
20:31
sophisticated back then, right? The systems
20:33
for ranking were gameable.
20:38
They really were. I
20:40
look at this leak today and
20:42
everything I've observed in here suggests
20:44
to me that Google is nearly
20:47
bulletproof. Go, go
20:49
spam all you want. I
20:52
don't think you're gonna break through. This system is
20:55
not only sophisticated and elegant, but it
20:57
is crafted in such a way that
21:00
in order to game the system, you
21:03
would have to be really useful to
21:05
real human beings and a lot of them.
21:08
If you are, is it really spam
21:10
anymore? Right? Like
21:13
if you don't make things that achieve
21:15
real popularity, that real people link to
21:17
from their websites, that real news sources
21:19
talk about and pick up, that get
21:22
traffic, that once they start ranking, even
21:24
if you were to game all the
21:26
other signals, once they start ranking in
21:28
Google, if it doesn't successfully answer lots
21:30
of searchers queries, you're
21:32
gonna fall out of the rankings and someone else
21:34
will rise. So I really,
21:37
I don't see a downside
21:40
to Google sharing this. I think if
21:42
some conspiracy theory 10 years from now is
21:45
like, oh, actually it wasn't a leak, they
21:47
put it out there intentionally and I've got
21:49
the email to prove it, that
21:52
wouldn't totally shock me because very
21:54
frankly, this is useful information for...
22:00
not getting scammed by sketchy
22:02
SEO providers. But it
22:04
is not a roadmap that is
22:06
going to tell you, Oh, man,
22:09
if I just put, you know, the number
22:11
seven in my title tag 12 times, you
22:13
know, I'll rank at the top. There's nothing
22:15
like that. Sure. There's
22:17
no like name your business triple A plumbers because
22:19
the three A's come at the start of the
22:21
telephone book. Yeah, right. It's not it's not the
22:23
white pages in 1985. Sure. Exactly.
22:27
Has Google publicly commented on the
22:29
documents that were leaked to you?
22:32
So I did get a private
22:35
email from a Googler the
22:37
night I published it, that was quite upset with
22:40
with one characterization of how I described an
22:43
event, which and I did change it in
22:45
the post. And then
22:47
I believe it was the next week. Google's
22:53
made up made a public statement through sort of
22:55
a PR person that the
22:57
leak was authentic. But they
22:59
urged people not to misread,
23:04
you know, potentially incomplete data. And, you
23:06
know, in fairness, the
23:09
leak does reference some of the
23:11
references in the features do reference
23:13
other data sources that
23:15
we can't access. For example, there's
23:17
a there's a list
23:20
of a white list of election
23:22
approved election news providers, right?
23:25
So that if you were to, you
23:28
know, it's January 6 2020.
23:30
And you and you, you know,
23:32
you're an American and you search for who won
23:34
the election, you know, was there a dispute? Is
23:36
there any evidence that the election was problematic? Google
23:40
wants to make sure that the accurate
23:42
truth is represented in their results, and
23:44
not someone who,
23:47
you know, is misrepresenting that and you
23:49
can imagine certainly that in the in the
23:51
political spectrum, it would not
23:53
be difficult to replicate all the signals that
23:55
you might need including popularity and news references
23:58
and links and clicks and all that. I
24:00
have the quote here from the Google
24:03
spokesperson. It was that we would quote
24:05
caution against making inaccurate assumptions about search
24:07
based on out of context, outdated or
24:10
incomplete information, which as you
24:12
said is importantly not saying these leaks aren't
24:14
real. What do
24:16
you read that statement to mean? Well,
24:19
I don't think it means anything because
24:21
it doesn't even say that these documents
24:23
are outdated or
24:25
that they're inaccurate. It just says
24:27
we caution against generally any information
24:29
that is out of context or
24:32
outdated. These documents are in
24:34
context. In fact, all of the features are
24:36
not all. Many of the features
24:38
are very well described, right? Such that
24:40
if you and I were new engineers
24:42
who joined the Google search team, we
24:45
could read these documents and be like, oh, okay,
24:47
I get what that means. That's when
24:50
I call the Chrome data that tells me
24:52
what percent of people click the back button
24:54
after searching for this. And this is the
24:56
one where they
24:58
have this thing called squashed and unsquashed
25:00
clicks. Squashed clicks,
25:03
they describe it as referring to
25:05
clicks that their spam system thinks
25:07
are not real human beings and
25:09
real devices. And so they
25:11
don't wanna count those clicks, that kind of
25:13
thing. That's what I'm talking
25:16
about. That's what I'm talking about. Scott,
25:19
why do you love Notion? I love that you just toss this
25:21
to me because I love it so much. Because
25:24
I know you love Notion. Because
25:26
I'm reading this data and this
25:28
advertising notes out of Notion. I
25:31
love it because it's just a great place
25:33
to put things. It's a
25:35
great place to structure data. It's a
25:37
great place to build small apps. It's
25:40
a great place to use
25:42
contextual AI to facilitate
25:45
my work and personal life. I store
25:47
everything. Now, I literally have
25:50
Notion documents that store all of my bikes
25:53
and my wife's bikes and every part on
25:55
them. So that when I have to order
25:57
maintenance pieces for them, I know exactly what
25:59
model. of, you know, rear shock it has.
26:02
I use it for
26:04
so many things. So I can't tell you why
26:06
I love it. I just love it. It's just
26:08
a feeling something you feel in your heart. When
26:11
you get a really good piece of software that
26:13
combines your notes and docs into one place that's
26:15
simple and beautifully designed with the power of AI
26:17
is all built right inside of it. Not another
26:19
separate tool in a different browser or tab you
26:21
know you don't have 75,000 tabs running live. You
26:23
just got notion we used it just the other
26:26
day. We use it every day. Yeah, I was
26:28
going to say. There's a huge part of our
26:30
workflow. Just the other day it's like I have two
26:32
instances of it in front of me right now. Notion
26:36
is a place where any team can write, plan, organize
26:38
and rediscover the joy of like it makes work feel
26:40
a little bit more playful and that's really,
26:42
really cool. It's
26:44
a workplace design not just for making progress, but like, you
26:47
know, getting inspired like you're in the same room together. It's
26:49
also like the big thing for me is that it's
26:51
like a it's
26:54
like a app building environment like you can
26:56
build data driven applications so quickly and easily.
26:59
Like I know lots of famous
27:01
content creators that use notion to
27:03
like manage their workflows and projects
27:05
when they're making new YouTube videos
27:07
or podcast episodes. It's
27:10
just a great place to put data
27:12
access data structure data move
27:15
processes. It's just it's just so good for so
27:17
many things. And you know what are
27:20
fine fine listeners can try notion
27:22
for free when they go to
27:24
notion.com/hacked. That's all lowercase later letters
27:27
notion n o t i o
27:29
n.com/hacked. You can start turning ideas
27:31
into action. And when you use
27:34
our link that hacked link you're supporting our show. So
27:36
when you invariably do go to sign
27:39
up for notion because it rips notion.com/hacked.
28:00
We did and we made it with Shopify and
28:02
it was a genuinely delightful experience. Why? Because
28:04
Shopify is the global commerce platform that helps
28:06
you sell at every stage of your business
28:08
whether you're at that like just launching a
28:10
shop online stage or first real life
28:12
store stage all the way to just like, oh my
28:15
God, we sold a million orders stage. Shopify,
28:17
they got your back. We are not at
28:19
that. Did we just sell a million order
28:21
stage? And that's sad. So if you like
28:24
to buy something, visit store.hackpodcast.com and check out
28:26
how great Shopify is. It
28:29
powers over 10% of all e-commerce
28:31
in the United States and Shopify
28:33
is the global force behind big
28:35
companies, not like us, but like
28:37
Allbirds, Rothies, Brooklyn, and millions of
28:39
other entrepreneurs. It's easy to
28:41
use. It's very functional. It
28:44
integrates with everything. It's great. If
28:46
you want to do online commerce, check out
28:49
Shopify if you haven't already because it's massive
28:51
and you should have checked it out by
28:53
now because it's the biggest company. And whether
28:55
or not you're like a giant company like
28:57
Allbirds or just a wee little merch operation
29:00
like ours, Shopify's award winning help is there
29:02
to support your success every step
29:04
of the way because businesses that grow, grow
29:07
with Shopify right now, you can
29:09
sign up for one dollar per
29:11
month trial period at shopify.com/hacked. That's
29:13
all lowercase. Go on
29:15
over to shopify.com. I mean,
29:17
let's do this slash hacked now
29:20
to grow your business no matter what stage you're in. Scott,
29:23
one more time. That's that. You
29:25
are shopify.com slash
29:29
hacked. Ransomware
29:32
supply chain attacks and zero day exploits
29:34
can strike without warning, leaving your business
29:36
is sensitive data and digital assets vulnerable.
29:39
But imagine a world where your cybersecurity strategy
29:41
could prevent these threats entirely that
29:43
right there. That's the power of the
29:45
threat locker zero trust endpoint protection platform.
29:49
Plus cybersecurity is a non-negotiable
29:51
to safeguard organizations from cyber
29:54
attacks. Threat locker
29:56
implements a proactive, deny
29:58
by default approach to.
30:00
cybersecurity, blocking every action,
30:02
process, and user, unless
30:05
specifically authorized by your team.
30:08
This least privileged methodology mitigates
30:10
the exploitation of trusted applications
30:12
and ensures 24-7, 365 protection
30:15
for your organization. The
30:19
core of ThreatLocker is its Protect
30:21
Suite, including application allow listing, ring
30:23
fencing, and network control. Digital
30:26
tools like the ThreatLocker, DetectEDR,
30:28
Storage Control, Elevation Control, and
30:30
Configuration Manager enhance your cybersecurity
30:32
posture and streamline internal IT
30:34
and security operations. To learn
30:36
more about how ThreatLocker can
30:38
help mitigate unknown threats in
30:40
your digital environment and align
30:42
your organization with respected compliance
30:44
frameworks, visit threatlocker.com.
30:47
That's threatlocker.com. I'm
30:52
Dr. Megan Sacks. And I'm Dr. Amy Sloshberg.
30:55
And we're the host of the podcast Campus
30:57
Killings. Our show covers some of the most
30:59
sinister crimes to take place on or around
31:02
school campuses, or the cases we discuss have
31:04
a school-connected theme. And with the new school
31:06
year comes an all-new second season of Campus
31:08
Killings, which will debut on September 16, 2023.
31:12
But if you want to listen to Campus
31:15
Killings now, you can binge all the episodes
31:17
from season one, available everywhere you listen to
31:19
podcasts. Is
31:24
there anything else we learned about Google search from these documents
31:26
that we haven't talked about? We talked
31:28
about Chrome. We talked about the PogoStick stuff. Is there
31:30
anything else we learned in these documents that the average
31:32
internet user might want to know about? Ooh,
31:35
average internet user is a good qualifier
31:37
there. I
31:39
think that
31:42
there are a tremendous number of things
31:44
that should be extremely interesting to anyone
31:46
who creates or
31:48
publishes content on the internet and wants that content
31:50
to do well. The number
31:52
of things that apply to you if that's
31:54
not who you are are limited.
31:58
But I will say one of those.
32:00
things that I think folks
32:02
should probably keep in
32:05
mind and be aware of is that when
32:08
Google's public representatives make statements
32:10
about how Google works
32:13
and those get quoted potentially uncritically
32:15
in the press, this
32:18
document suggests that was probably a mistake,
32:20
right? That Google's public statements
32:23
about, especially about how search works and
32:25
what they care about and what's important
32:27
and what will affect your rankings
32:29
and won't, you probably
32:31
should take those with a grain of salt because
32:35
this documentation suggests that somewhere
32:38
between dozens and hundreds of times in the
32:40
last 20 years, Google has
32:42
been directly misleading or straight
32:45
up lying about those things. In
32:49
my blog post, I urged especially
32:52
industry commentators, podcasts
32:54
like yours, folks like Kara Swishu covered
32:56
at The Verge, Search
32:59
Engine Land, all of these publications, I urged
33:01
them to take a critical
33:03
view of statements that are made
33:05
publicly by Googlers because they
33:08
are in the best interest of Google potentially,
33:11
but they're not always accurate
33:14
and sometimes directly provably wrong. And
33:16
I think we should treat them
33:18
a little bit more like we treat statements
33:20
from politicians, right? I think the job of
33:22
a journalist is don't tell
33:24
me that Jordan said it's raining and
33:27
Rand said it's sunny outside. Go
33:29
outside and tell me what the weather is. Since
33:33
you brought it up, what would that
33:35
former group, people who regularly publish content
33:37
in the internet, what's the headline for
33:39
them? Oh, God. The
33:43
headline for them is you should probably
33:45
follow and pay close attention to people
33:47
who are studying and analyzing and extracting
33:49
value from this leak because there
33:51
are hundreds
33:54
of takeaways that are
33:56
both actionable and probably
33:58
different to things. you've
34:00
done in the past or learned as best
34:03
practices. Gosh,
34:05
just today, Cyrus Shepard, who
34:07
I used to work with at Moz and
34:09
who's an expert in search, he was actually
34:11
one of Google's quality raters for
34:13
a couple of years, which is
34:16
quite interesting. But he
34:18
noted that there was a
34:20
finding from another party inside
34:22
the Google leak that
34:24
something called content effort
34:28
is scored inside the
34:30
Google leak. It's a factor that
34:32
essentially human quality raters, as they
34:34
visit websites, you know, there's
34:36
these thousands, tens of thousands of people who
34:38
work for Google through a contractor, they visit
34:40
websites and they're supposed to write about them,
34:42
right? And sort of score them and say
34:45
whether they're good or bad and
34:47
all sorts of features about them. And one of
34:49
the things they're asked to do is say, did
34:51
it look like a human being spent a lot
34:53
of effort manually to
34:56
create something uniquely valuable, right?
34:58
Differentiated and valuable to people
35:01
with this resource. And
35:04
that score now appears
35:06
to be using a large language model
35:08
AI. So it basically takes
35:10
the input from all these quality raters,
35:12
builds a metric, you know,
35:15
a sort of algorithm, and now it's
35:17
scored through an AI system. And
35:19
I found that totally fascinating, right?
35:22
Essentially you've scaled up what
35:24
quality raters used to do manually and done
35:26
it with an AI. And that's being used
35:29
according to the documents in the ranking system.
35:31
Wow. I wanna talk about the AI thing
35:33
because I think there's something really important there, but the
35:36
anonymous source, just to go back to that. Since
35:39
then, someone has come forward saying
35:41
that they are that anonymous source,
35:43
Erfanazimi, I believe is their name.
35:45
Yes. Yeah, so
35:47
Erfan decided, I think, it
35:52
was only about two or three days after the
35:54
leak was published. I think
35:56
he sort of saw that the reception
35:58
was generally favorable and not. Not
36:01
attacking, not critical of
36:03
the source or of
36:05
the credibility of the
36:07
data and he came forward
36:09
as the anonymous leaker. He's
36:13
since done a few interviews
36:15
and talked publicly about
36:17
it. He's
36:19
a real interesting guy. We don't agree on 100% of
36:21
things but Jordan, I actually find him to be
36:23
quite a lovely human being. He's
36:26
a sweet and empathetic and sensitive side and I
36:29
think a really strong sense of justice too. What's
36:32
your sense of why
36:34
he chose to leak these documents? Because as you said,
36:36
it could have gone either way. The public reception could
36:38
have been one thing. Google's reception could have been another
36:40
thing. Why do you think he chose to leak these?
36:42
To be honest, I think his
36:45
stated reasons are accurate. I've seen nothing
36:47
in his behavior before or since to
36:50
suggest that he had anything other
36:53
than a deep frustration and
36:55
anger that Google had misled
36:58
people in the field of content creation
37:01
and the field of marketing and
37:04
the technology world and press overall
37:06
but potentially even some of the
37:09
legal cases against Google. I
37:12
think he wanted the record to be
37:14
set straight and I think he felt an
37:16
obligation to make this
37:19
data available to people.
37:24
That's something I share for 17 years of my life. My
37:28
whole mission was how do
37:30
I make search more transparent? That
37:33
was Moz's whole goal. I think that's what
37:35
built that company was making
37:37
this dark underbelly. When
37:40
I started in SEO, Jordan, it
37:43
was seen as a scam. People
37:45
thought that everyone in SEO was just
37:47
sketchy and terrible and that
37:50
they were manipulative. It
37:53
took decades to
37:55
make it a mainstream marketing practice that
37:57
every company now invests in. almost every
38:00
company in the world has someone who
38:02
thinks about or works on SEO. It's
38:05
no longer seen as sketchy or spammy,
38:08
it employs millions of people worldwide, obviously
38:11
because Google sends 70% of the
38:13
internet's traffic, right? Outgoing
38:16
click traffic. And so this
38:18
is kind of a, it's a little bit
38:20
of a dream come true to be able to share
38:23
this, even though I'm out of the field. And
38:25
I think for Erfan, being
38:27
still in the field, he
38:29
really wanted people to know
38:32
the truth.
38:35
You were talking earlier about the usefulness of Google,
38:38
and I'm fascinated by this, sort of the
38:40
rise and fall of how we find information. I
38:42
think there's a feeling amongst some,
38:45
a lot of people right now that
38:47
Google is becoming increasingly less useful for
38:49
finding authentic human created information, or it
38:51
can be in certain use cases. If
38:54
you want something written by a person
38:56
without any commercial sort of motivation behind
38:58
it, basically Google is just a tool
39:01
for appending Reddit to your search post.
39:04
If you want to find some good, what
39:06
did these leaks tell us about the usefulness
39:09
of Google search for people looking to do
39:11
more than just, you know, window shop on
39:13
the mall that the internet is becoming? Oh,
39:16
you know, to be honest, I'm not sure
39:18
that the leak reveals anything
39:21
in that particular direction. Instead,
39:23
I would say that what
39:26
you'd want to look for in these cases is
39:29
when people, statistically speaking,
39:31
you know, let's take a panel of tens
39:33
of millions of people like like the Dado's
39:35
panel. And let's look
39:37
at, you know, how many searches did they
39:39
do over the last
39:41
two years each month? And
39:43
was that number growing or shrinking? And
39:46
when they do searches, where do they go after they
39:48
search? They stay on Google? Do
39:51
they click on a paid
39:53
ad? Do they click on what's called
39:55
the organic results, right? The SEO results,
39:58
which are unpaid. Do they?
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More