Podchaser Logo
Home
Google Search Leaks

Google Search Leaks

Released Tuesday, 2nd July 2024
Good episode? Give it some love!
Google Search Leaks

Google Search Leaks

Google Search Leaks

Google Search Leaks

Tuesday, 2nd July 2024
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

In May of this year Rand Fishkin

0:02

received an email from a leaker

0:06

And I had never emailed with him before I

0:08

didn't know who he was The

0:11

email was labeled confidential Google

0:14

Rand is a public figure in internet marketing

0:17

and this leaker was claiming to have access

0:19

to Internal documentation about

0:21

something very important and

0:23

up until this point very

0:26

secretive these are wild

0:30

accusations I don't know if

0:32

you agree with this Scott, but maybe no

0:34

one thing impacts the shape of the

0:36

modern internet more than Google

0:38

search If you

0:40

remember the internet before Google

0:43

you would strongly agree with

0:45

that sentiment Especially if

0:47

you use some of the alternative search

0:49

engines that used to exist. I

0:51

remember asking Jeeves I Forgot

0:54

about Jeeves actually right a lot of people

0:57

forgot about Jeeves Today Google

0:59

is over 80% of search traffic

1:01

with 8.5 billion searches a day

1:03

2 trillion annually There's nonsense

1:05

numbers at this point a hundred thousand

1:07

a second the average person listening

1:10

You probably Google more times in the day than

1:12

you eat meals What

1:14

Google serves dictates what

1:16

information people consume how

1:18

Google ranks websites? Dictates what websites

1:21

live and die Optimizing

1:23

for that system is now an entire industry

1:27

and For a long time

1:29

all we've really known about that system is what

1:31

Google tells us Kind of

1:33

for good reason because you don't want it to get

1:35

any more gamified than it already is But

1:38

really everything about what Google tracks and

1:40

doesn't track what data they use and

1:43

what they don't that all

1:45

comes from statements from Google their

1:47

PR people their executives You

1:50

know what we know about Google search is what Google has

1:52

told us There

1:55

have never been any documents about the API

1:57

leaked to confirm or importantly

2:00

contradict those public statements.

2:03

Now, Rand's sitting there looking at

2:05

something that potentially does at

2:08

this email from this leaker claiming to

2:10

have a copy of a bunch of

2:12

internal documentation about the Google Search API.

2:16

When we hold these leaks up against those statements,

2:21

Rand started to find some interesting stuff. When

2:24

I finally got on the phone with him, he

2:26

pulls up the trove of documents

2:28

and my mind exploded.

2:32

The thing for me is were

2:35

the statements made in the

2:37

same kind of timeframe as

2:39

the documentation that we've received? The

2:42

other thing is why

2:44

did Rand not just

2:46

keep these documents private?

2:50

Somebody gave him the key to the treasure

2:52

chest. Google's intentionally kept

2:54

their search algo, a black box, and

2:57

modifies it every time somebody gets close

2:59

to figuring out ways to cheat it,

3:02

creating the industry you speak of with

3:04

thousands and maybe millions of SEO people

3:07

and companies in the world these days.

3:09

They even offer their own certifications, which

3:11

don't tell you how to do it.

3:16

If you were given these

3:18

leaked documents, you would

3:21

be maybe one of the only people outside

3:23

of Google that knew how to cheat the

3:25

system. He chose to

3:27

expose them publicly rather than build

3:30

an industry out of it. Which

3:33

I guess kudos to you. It kudos to

3:35

Rand on that one. To the temporality, you

3:38

bring up a great point. So did Google.

3:40

I think that for the people combing

3:42

through these thousands and thousands

3:44

of documents, figuring out

3:47

that timeline is the

3:49

rat's nest to untangle right now. As

3:52

for Rand, using his new found

3:54

knowledge for personal gain, you

3:56

know I didn't ask him about that.

3:59

Maybe I should. A

4:02

lot has happened in the weeks since he got that

4:04

email, and I wanted to talk to him about it. Google

4:07

is in the final stages of a big antitrust

4:09

case with the DOJ, it concerns Google search, and

4:12

whether they're engaged in anti-competitive practices, it

4:14

is not the only case of this

4:16

kind they have been involved in. Because

4:19

Google, technically their parent company alphabet, owns

4:21

websites that compete on the internet. So

4:24

how Google ranks websites matters not just

4:26

to all of us, but to them.

4:29

And that does potentially create

4:31

a conflict of interest. There

4:34

is a rising frustration

4:36

with particularly

4:39

Google's self-preferencing behavior. Essentially

4:42

Google doing things inside of Google

4:44

search where they own 95% of

4:46

the market to benefit other parts

4:49

of Google. I

4:51

wanted to know more about all this, about the

4:53

leak, Google's response, the

4:56

experience of getting wrapped up in this story. So reached

4:59

out to Rand, fun chat, appreciate

5:01

his time. This

5:03

is my conversation with Rand Fishkin, co-founder,

5:05

Spark Toro, and Snack Bar Studios

5:08

about the Google API leaks here

5:11

on hacked. Rand,

5:27

thank you so much for sitting down to talk with

5:29

me about this. Yeah, my pleasure Jordan, thanks for having

5:32

me. So I think we're the average person a lot

5:34

of this can seem in the weeds. For normal internet

5:36

users, to what extent is

5:38

the internet shaped by Google's search algorithm?

5:42

I can actually tell you this exactly because

5:44

I did some data

5:46

analysis of clickstream panel data, which

5:48

is essentially, it's kind of like

5:50

Nielsen TV set boxes from the

5:52

1960s, but fast forward

5:55

to 2024 and collect data

5:57

about every URL that's visited by 10% of the

6:00

users. of millions of devices through

6:02

a panel and 70% of all

6:05

internet traffic is sent by Google. Wow.

6:11

Pretty brutal. Brutal,

6:13

why do you say that? Well, I

6:16

am someone who believes that

6:18

monopoly power tends to stifle

6:21

innovation, creativity, and opportunity. And

6:23

I think that Google's stranglehold

6:26

on internet traffic

6:28

and on what

6:31

content people see and

6:33

don't see really

6:35

limits what

6:38

is created. Right? So for content creators, I'm

6:40

sure you know this world well. For

6:43

content creators, if you're in the

6:45

video game space, what is able

6:48

to be surfaced on Steam or the Nintendo

6:50

Switch Store or the Xbox Store or the

6:53

PlayStation Store, that plays a huge role

6:56

in what game designers choose

6:58

to create. Similarly,

7:01

when you make things for the internet,

7:04

whether it's a YouTube video or an

7:06

article about poison dart

7:08

frogs in Central America or the

7:12

best mustache wax for curly hair,

7:16

you're going to change what you

7:18

do based on what Google tells

7:22

you is important and how you can

7:24

potentially get traffic to your page.

7:26

And so, you know, we

7:29

kind of end up with this Google shaped internet. We're

7:31

talking because there was a trove of

7:33

leaked documents concerning the search algorithm that

7:36

shapes that Google shaped internet. And these

7:38

this leak came into your possession. Take

7:40

me through that story starting with I

7:42

think the email that you got. Yeah,

7:45

I pulled it up on my computer here

7:47

because I couldn't quite remember exactly

7:51

how it went. So I got an email

7:53

from a guy in

7:55

Georgia, Tbilisi, Georgia, Georgia,

7:57

the country. country,

8:00

not Georgia, the state. And

8:02

I had never emailed with him before. I didn't

8:04

know who he was. The

8:07

email was labeled confidential Google.

8:12

And he says, Rand, I know you've been out

8:14

of the SEO industry for a while. SEO is

8:16

search engine optimization. That's sort of the practice of

8:18

ranking web pages in Google and getting traffic to

8:21

them. And I used to run a

8:23

company called Moz, which is in

8:25

the SEO software and education space. And

8:28

he says, you are the first person to highlight

8:30

the influence of click data on search results. From

8:33

what I've heard, Google went so far

8:35

as to manually demote your experiments and

8:37

publicly make statements that are far from

8:39

truthful, including reputation destruction. There

8:42

were several Google representatives who over

8:44

the years said particularly

8:47

harsh things about my work and

8:49

research when I was at Moz. And

8:53

anyway, this email goes

8:55

on to sort of cite all

8:57

these examples. And he claims the

8:59

emailer claims to have proof of

9:01

how Google ranks pages, proof that

9:04

Google has lied publicly dozens, if

9:06

not hundreds of times to

9:08

news sources of all kinds,

9:11

proof that potentially they lied

9:14

to Congress when they when

9:18

their CEO Sundar Pichay talked

9:20

about how they use

9:23

data potentially. There's

9:26

even some suggestion in here that there were

9:28

lies about the Department

9:30

of Justice case that was prosecuted last year. These

9:34

are wild accusations,

9:37

right? I mean, if you get an email like

9:39

this, you know, especially I'm

9:41

six years out from running Moz from, you

9:43

know, being outside the SEO industry, I kind

9:46

of look at this and go, well, I

9:48

have to say this person sounds credible,

9:51

but also incredibly far fetched.

9:54

And so extraordinary claims require extraordinary

9:56

proof, right? So I, you

9:58

know, I write back to this person. and sort of

10:01

say, okay, thanks for telling me all this stuff. Like,

10:03

what are your goals here? And

10:06

they say, I think you should be the

10:08

one to publish this leak data and I

10:10

wanna show it to you. So

10:14

we schedule a phone call. This is how

10:16

unexcited about it I was, Jordan. It

10:19

was, I received the email on May

10:21

5th. On

10:23

May 23rd, I canceled the scheduled call

10:30

with the guy, because I wasn't feeling so well. And

10:33

then we rescheduled for

10:35

later in May. When

10:38

I finally got on the phone with him, he

10:40

pulls up the trove of

10:42

documents and my mind

10:45

exploded. I

10:47

mean, this is, here's

10:49

essentially what this guy showed

10:51

me was the

10:56

API documentation. API is sort

10:58

of how you make programmatic,

11:00

calls at scale. It's

11:04

the API documentation internal

11:06

to Google's search engineering team.

11:09

So this is, imagine you and I work on

11:11

Google search engine and imagine

11:13

we're programmers there and we are

11:16

trying to make Google web search

11:18

even better. These

11:20

are the list of all the types of

11:23

data that we can

11:25

call in order to build

11:28

or modify an algorithm, a

11:31

ranking system, right? To choose which pages

11:33

appear which, before which other ones. And

11:36

not just by the way, not just Google web

11:38

search, YouTube is in here, Google Android searches in

11:40

here, Google Maps is in here and local, Google

11:43

News is in here. All

11:45

the different flavors of things that

11:47

Google searches publicly for human beings. And

11:51

as he showed me these, we're

11:53

talking about 2,500 documents containing 14,000 different.

12:00

attributes or features that you can call,

12:02

right? So, you know,

12:04

if somebody says, oh, Google search is simple, it's just X,

12:06

Y and Z. You can be like, yeah,

12:08

yeah, X, Y and Z. It's 14,000 X, Ys and Zs. Literally,

12:13

14,014 X, Ys and Zs are what they used as

12:17

of March of 2024. So

12:20

we get off the call, you know,

12:22

and I'm kind of losing my mind going through this. And

12:25

the first thing I do is hit up

12:27

a few people quietly in my network. A

12:30

few people who used to work in

12:32

Google as engineers, software engineers, three

12:36

people in particular. The first one that I reached

12:38

out to said, I don't

12:40

wanna talk about this. And I'm

12:42

not willing even anonymously to broach

12:44

the subject. Okay, but the other

12:46

two said, yes,

12:49

I, you know, happy to take a look.

12:51

They took a look and they came back to me and basically

12:53

said, yeah, this absolutely looks

12:55

legitimate. I didn't personally have access

12:57

to this document when I was at Google. You

13:01

wouldn't have access to this unless you were

13:03

on that specific search engineering team. But

13:06

this is absolutely

13:08

Google formatted, you

13:11

know, internal speak throughout some,

13:13

there's almost no way this could have been

13:15

faked. And

13:18

then I talked to an expert in search ranking

13:20

systems who I had known from my time in

13:22

the industry, a guy named Mike

13:24

King. Mike runs an agency in New York

13:26

called I-PoleRank. And he

13:29

and I have been friends for many years. Mike

13:31

is ludicrously talented,

13:34

just extraordinarily detailed in his

13:36

research. He's been working on a book about

13:38

information retrieval, which is the science of how

13:41

search engine works, search engines work.

13:43

And so he's got

13:45

this absolute plethora of

13:48

relevant experience

13:51

around this stuff. So I show him the leak. I

13:54

called him, I called him, it was a Friday

13:56

night. He's out with his kids in Brooklyn at

13:58

the park. He's like, okay, okay. Wait, what are

14:00

you telling me? All right, let me

14:02

just go home and look at this. And

14:06

I think he stayed up all night and most

14:08

of the weekend working on the leak. And then

14:10

on Monday night, he and I both published blog

14:14

posts describing this leak, sharing

14:17

what was inside them. Obviously

14:19

the early analysis was very

14:22

incomplete, but already there were dozens

14:24

of features that were extraordinarily

14:27

interesting, contradicted many statements Google

14:29

had made in the past. And

14:31

when we published that, Jordan, the

14:34

internet exploded. I mean, hundreds

14:37

of thousands of visits just to my blog post, I'm

14:39

sure to Mike's as well, you

14:41

know, interview requests from two

14:45

dozen publications, you know, everyone from the

14:47

Verge, the New Yorker to New York

14:49

Times and Washington Post and Wall Street

14:51

Journal and everyone else you can imagine.

14:55

And, you know, Cara Swisher talked about

14:57

it on her podcast and, you know,

15:00

it was the top of Hacker News, it's the top of

15:02

Tech Meme. It

15:04

was an insane two weeks after that.

15:09

And since then, you know, people have

15:11

been analyzing this leak because it's public. Anyone

15:13

can see it. You can go right now and

15:16

look at the 14,000 inputs that

15:19

make up Google's search ranking algorithm. That

15:21

had never been possible in the

15:24

last quarter century. It's mind blowing. I

15:26

want to dig into what we learned

15:29

about the API that we didn't previously know.

15:31

And maybe just start, because I found this

15:34

one interesting. Google makes

15:36

Chrome. Google also

15:38

sells ads. Google representatives

15:40

have long stated that they don't

15:42

use any information about users in

15:44

Chrome for ranking, which is very

15:47

important for selling advertising. And

15:49

that always seemed kind of shocking to

15:51

not use this massive trough of privileged

15:54

data that you could potentially be gathering

15:56

in your biggest business line over here.

16:00

These leaked documents maybe tell

16:02

a story that that separation of church and

16:04

state isn't quite so separated. Can

16:06

you, to start with what's in here, tell

16:08

me a little bit about that. Yeah, yeah,

16:10

there's no separation at all. I

16:13

mean, when you look at how Google

16:15

measures, for example, one

16:17

of the things that would happen when any

16:19

human being performs a search is, let's say

16:21

you are looking for, this happened to me

16:24

recently, I was looking up the

16:26

Aquarium of the Pacific, which I think is

16:28

in Long Beach, California. I wanted

16:30

to find out how long they were

16:32

running their frog exhibit. So

16:35

they've got a new frog exhibit that just

16:37

launched. It showed up in my Google News

16:39

Feed. I was like, ooh, I wanna go

16:41

see Poison Dart Frogs when I'm down in

16:43

California over the summer. And so

16:45

I do this search

16:48

and I click on the first result, which

16:50

is about the event, but it doesn't say

16:53

how long it's running. So then I click

16:55

back to Google's results. I

16:57

scroll down a little bit until I find a

17:00

press mention, right? Someone talking about them in

17:03

the news and saying, oh, the exhibit is

17:05

planned to be permanent. I click that one

17:07

and it, okay,

17:10

great. So I don't have to worry about when it's

17:12

gonna happen. Here's what Google's

17:15

documentation says. If you

17:17

click on a search and then you

17:19

click the back button and

17:21

you choose another result, that

17:24

suggests to Google that

17:26

the other result is probably more relevant and

17:28

deserves to be higher up in the rankings

17:31

than the one you left. You

17:33

left a result and your search

17:35

was unsolved, and then you bounced

17:38

back to the search results and chose a

17:40

different result. And once you went to

17:42

that one, then your search was resolved. This

17:45

is called pogo sticking. It

17:47

has long been used in

17:49

information retrieval literature. And

17:52

here it is right in the documentation. You

17:54

can observe that Google is not measuring this

17:57

through Google analytics, which many people speculated for

17:59

a long time. They're not just

18:01

measuring it by looking at what happens on their

18:03

search results page. They are

18:05

looking at the billions of devices

18:07

that use Chrome, Google Chrome, as

18:10

their browser to be able to

18:12

measure this. And

18:14

this is only one of hundreds

18:17

of uses of Chrome data

18:20

inside the ranking systems. As

18:23

another example, which I find particularly fascinating,

18:25

being someone who was in the industry,

18:28

many people know that one of the ways

18:30

that you rank higher in Google is get

18:32

lots of links pointing to you, right? If

18:34

lots of other pages on the internet link

18:36

to your page, that tends to suggest to

18:38

Google that you are more important than someone

18:40

who has very few links. So

18:44

inside the leak, you can see

18:47

that Google uses Chrome data, traffic

18:49

data to demote or increase the

18:51

value of links that come from

18:54

pages that either don't receive or

18:56

do receive traffic. For

18:59

example, if you are linked to by an article

19:02

in The Economist that got a lot of traffic,

19:05

that link is probably worth much more

19:08

than a link from, you know,

19:11

randomwebsite.net that gets

19:13

no traffic. That wasn't

19:15

always true, by the way. Google

19:18

used to be very manipulatable. Back

19:20

when I started in the industry, you could

19:23

get a bunch of links from a bunch

19:25

of different scammy sketchy websites and rank nearly

19:27

anything anywhere you wanted to. But

19:30

Google, Google managed to find a

19:32

really clever solution to this using

19:34

traffic data from Chrome. Google

19:37

has long stated and you gesture towards this,

19:39

but Google has long stated that they need

19:41

to balance what information they

19:44

make available publicly about how search

19:46

works, just frankly, because

19:48

the more that's public, the more is

19:50

gameable from everyone from full-blown scammers all

19:52

the way over to professionals in optimizing

19:54

for search engines. Talk to

19:56

me about that balance that we're seeing in

19:59

these leaks between transparency. and kind of

20:01

making the internet even more

20:03

broken in certain ways than it already is. Yeah,

20:07

I'm gonna throw out there my suspicion that

20:09

10 years

20:12

ago, maybe 15 years ago, if

20:14

a document like this had

20:16

leaked, it would have been

20:19

quite damaging to Google's ability

20:21

to organize

20:23

the web and make its information useful. I

20:27

will grant that. And I think

20:29

that's because Google just wasn't that

20:31

sophisticated back then, right? The systems

20:33

for ranking were gameable.

20:38

They really were. I

20:40

look at this leak today and

20:42

everything I've observed in here suggests

20:44

to me that Google is nearly

20:47

bulletproof. Go, go

20:49

spam all you want. I

20:52

don't think you're gonna break through. This system is

20:55

not only sophisticated and elegant, but it

20:57

is crafted in such a way that

21:00

in order to game the system, you

21:03

would have to be really useful to

21:05

real human beings and a lot of them.

21:08

If you are, is it really spam

21:10

anymore? Right? Like

21:13

if you don't make things that achieve

21:15

real popularity, that real people link to

21:17

from their websites, that real news sources

21:19

talk about and pick up, that get

21:22

traffic, that once they start ranking, even

21:24

if you were to game all the

21:26

other signals, once they start ranking in

21:28

Google, if it doesn't successfully answer lots

21:30

of searchers queries, you're

21:32

gonna fall out of the rankings and someone else

21:34

will rise. So I really,

21:37

I don't see a downside

21:40

to Google sharing this. I think if

21:42

some conspiracy theory 10 years from now is

21:45

like, oh, actually it wasn't a leak, they

21:47

put it out there intentionally and I've got

21:49

the email to prove it, that

21:52

wouldn't totally shock me because very

21:54

frankly, this is useful information for...

22:00

not getting scammed by sketchy

22:02

SEO providers. But it

22:04

is not a roadmap that is

22:06

going to tell you, Oh, man,

22:09

if I just put, you know, the number

22:11

seven in my title tag 12 times, you

22:13

know, I'll rank at the top. There's nothing

22:15

like that. Sure. There's

22:17

no like name your business triple A plumbers because

22:19

the three A's come at the start of the

22:21

telephone book. Yeah, right. It's not it's not the

22:23

white pages in 1985. Sure. Exactly.

22:27

Has Google publicly commented on the

22:29

documents that were leaked to you?

22:32

So I did get a private

22:35

email from a Googler the

22:37

night I published it, that was quite upset with

22:40

with one characterization of how I described an

22:43

event, which and I did change it in

22:45

the post. And then

22:47

I believe it was the next week. Google's

22:53

made up made a public statement through sort of

22:55

a PR person that the

22:57

leak was authentic. But they

22:59

urged people not to misread,

23:04

you know, potentially incomplete data. And, you

23:06

know, in fairness, the

23:09

leak does reference some of the

23:11

references in the features do reference

23:13

other data sources that

23:15

we can't access. For example, there's

23:17

a there's a list

23:20

of a white list of election

23:22

approved election news providers, right?

23:25

So that if you were to, you

23:28

know, it's January 6 2020.

23:30

And you and you, you know,

23:32

you're an American and you search for who won

23:34

the election, you know, was there a dispute? Is

23:36

there any evidence that the election was problematic? Google

23:40

wants to make sure that the accurate

23:42

truth is represented in their results, and

23:44

not someone who,

23:47

you know, is misrepresenting that and you

23:49

can imagine certainly that in the in the

23:51

political spectrum, it would not

23:53

be difficult to replicate all the signals that

23:55

you might need including popularity and news references

23:58

and links and clicks and all that. I

24:00

have the quote here from the Google

24:03

spokesperson. It was that we would quote

24:05

caution against making inaccurate assumptions about search

24:07

based on out of context, outdated or

24:10

incomplete information, which as you

24:12

said is importantly not saying these leaks aren't

24:14

real. What do

24:16

you read that statement to mean? Well,

24:19

I don't think it means anything because

24:21

it doesn't even say that these documents

24:23

are outdated or

24:25

that they're inaccurate. It just says

24:27

we caution against generally any information

24:29

that is out of context or

24:32

outdated. These documents are in

24:34

context. In fact, all of the features are

24:36

not all. Many of the features

24:38

are very well described, right? Such that

24:40

if you and I were new engineers

24:42

who joined the Google search team, we

24:45

could read these documents and be like, oh, okay,

24:47

I get what that means. That's when

24:50

I call the Chrome data that tells me

24:52

what percent of people click the back button

24:54

after searching for this. And this is the

24:56

one where they

24:58

have this thing called squashed and unsquashed

25:00

clicks. Squashed clicks,

25:03

they describe it as referring to

25:05

clicks that their spam system thinks

25:07

are not real human beings and

25:09

real devices. And so they

25:11

don't wanna count those clicks, that kind of

25:13

thing. That's what I'm talking

25:16

about. That's what I'm talking about. Scott,

25:19

why do you love Notion? I love that you just toss this

25:21

to me because I love it so much. Because

25:24

I know you love Notion. Because

25:26

I'm reading this data and this

25:28

advertising notes out of Notion. I

25:31

love it because it's just a great place

25:33

to put things. It's a

25:35

great place to structure data. It's a

25:37

great place to build small apps. It's

25:40

a great place to use

25:42

contextual AI to facilitate

25:45

my work and personal life. I store

25:47

everything. Now, I literally have

25:50

Notion documents that store all of my bikes

25:53

and my wife's bikes and every part on

25:55

them. So that when I have to order

25:57

maintenance pieces for them, I know exactly what

25:59

model. of, you know, rear shock it has.

26:02

I use it for

26:04

so many things. So I can't tell you why

26:06

I love it. I just love it. It's just

26:08

a feeling something you feel in your heart. When

26:11

you get a really good piece of software that

26:13

combines your notes and docs into one place that's

26:15

simple and beautifully designed with the power of AI

26:17

is all built right inside of it. Not another

26:19

separate tool in a different browser or tab you

26:21

know you don't have 75,000 tabs running live. You

26:23

just got notion we used it just the other

26:26

day. We use it every day. Yeah, I was

26:28

going to say. There's a huge part of our

26:30

workflow. Just the other day it's like I have two

26:32

instances of it in front of me right now. Notion

26:36

is a place where any team can write, plan, organize

26:38

and rediscover the joy of like it makes work feel

26:40

a little bit more playful and that's really,

26:42

really cool. It's

26:44

a workplace design not just for making progress, but like, you

26:47

know, getting inspired like you're in the same room together. It's

26:49

also like the big thing for me is that it's

26:51

like a it's

26:54

like a app building environment like you can

26:56

build data driven applications so quickly and easily.

26:59

Like I know lots of famous

27:01

content creators that use notion to

27:03

like manage their workflows and projects

27:05

when they're making new YouTube videos

27:07

or podcast episodes. It's

27:10

just a great place to put data

27:12

access data structure data move

27:15

processes. It's just it's just so good for so

27:17

many things. And you know what are

27:20

fine fine listeners can try notion

27:22

for free when they go to

27:24

notion.com/hacked. That's all lowercase later letters

27:27

notion n o t i o

27:29

n.com/hacked. You can start turning ideas

27:31

into action. And when you use

27:34

our link that hacked link you're supporting our show. So

27:36

when you invariably do go to sign

27:39

up for notion because it rips notion.com/hacked.

28:00

We did and we made it with Shopify and

28:02

it was a genuinely delightful experience. Why? Because

28:04

Shopify is the global commerce platform that helps

28:06

you sell at every stage of your business

28:08

whether you're at that like just launching a

28:10

shop online stage or first real life

28:12

store stage all the way to just like, oh my

28:15

God, we sold a million orders stage. Shopify,

28:17

they got your back. We are not at

28:19

that. Did we just sell a million order

28:21

stage? And that's sad. So if you like

28:24

to buy something, visit store.hackpodcast.com and check out

28:26

how great Shopify is. It

28:29

powers over 10% of all e-commerce

28:31

in the United States and Shopify

28:33

is the global force behind big

28:35

companies, not like us, but like

28:37

Allbirds, Rothies, Brooklyn, and millions of

28:39

other entrepreneurs. It's easy to

28:41

use. It's very functional. It

28:44

integrates with everything. It's great. If

28:46

you want to do online commerce, check out

28:49

Shopify if you haven't already because it's massive

28:51

and you should have checked it out by

28:53

now because it's the biggest company. And whether

28:55

or not you're like a giant company like

28:57

Allbirds or just a wee little merch operation

29:00

like ours, Shopify's award winning help is there

29:02

to support your success every step

29:04

of the way because businesses that grow, grow

29:07

with Shopify right now, you can

29:09

sign up for one dollar per

29:11

month trial period at shopify.com/hacked. That's

29:13

all lowercase. Go on

29:15

over to shopify.com. I mean,

29:17

let's do this slash hacked now

29:20

to grow your business no matter what stage you're in. Scott,

29:23

one more time. That's that. You

29:25

are shopify.com slash

29:29

hacked. Ransomware

29:32

supply chain attacks and zero day exploits

29:34

can strike without warning, leaving your business

29:36

is sensitive data and digital assets vulnerable.

29:39

But imagine a world where your cybersecurity strategy

29:41

could prevent these threats entirely that

29:43

right there. That's the power of the

29:45

threat locker zero trust endpoint protection platform.

29:49

Plus cybersecurity is a non-negotiable

29:51

to safeguard organizations from cyber

29:54

attacks. Threat locker

29:56

implements a proactive, deny

29:58

by default approach to.

30:00

cybersecurity, blocking every action,

30:02

process, and user, unless

30:05

specifically authorized by your team.

30:08

This least privileged methodology mitigates

30:10

the exploitation of trusted applications

30:12

and ensures 24-7, 365 protection

30:15

for your organization. The

30:19

core of ThreatLocker is its Protect

30:21

Suite, including application allow listing, ring

30:23

fencing, and network control. Digital

30:26

tools like the ThreatLocker, DetectEDR,

30:28

Storage Control, Elevation Control, and

30:30

Configuration Manager enhance your cybersecurity

30:32

posture and streamline internal IT

30:34

and security operations. To learn

30:36

more about how ThreatLocker can

30:38

help mitigate unknown threats in

30:40

your digital environment and align

30:42

your organization with respected compliance

30:44

frameworks, visit threatlocker.com.

30:47

That's threatlocker.com. I'm

30:52

Dr. Megan Sacks. And I'm Dr. Amy Sloshberg.

30:55

And we're the host of the podcast Campus

30:57

Killings. Our show covers some of the most

30:59

sinister crimes to take place on or around

31:02

school campuses, or the cases we discuss have

31:04

a school-connected theme. And with the new school

31:06

year comes an all-new second season of Campus

31:08

Killings, which will debut on September 16, 2023.

31:12

But if you want to listen to Campus

31:15

Killings now, you can binge all the episodes

31:17

from season one, available everywhere you listen to

31:19

podcasts. Is

31:24

there anything else we learned about Google search from these documents

31:26

that we haven't talked about? We talked

31:28

about Chrome. We talked about the PogoStick stuff. Is there

31:30

anything else we learned in these documents that the average

31:32

internet user might want to know about? Ooh,

31:35

average internet user is a good qualifier

31:37

there. I

31:39

think that

31:42

there are a tremendous number of things

31:44

that should be extremely interesting to anyone

31:46

who creates or

31:48

publishes content on the internet and wants that content

31:50

to do well. The number

31:52

of things that apply to you if that's

31:54

not who you are are limited.

31:58

But I will say one of those.

32:00

things that I think folks

32:02

should probably keep in

32:05

mind and be aware of is that when

32:08

Google's public representatives make statements

32:10

about how Google works

32:13

and those get quoted potentially uncritically

32:15

in the press, this

32:18

document suggests that was probably a mistake,

32:20

right? That Google's public statements

32:23

about, especially about how search works and

32:25

what they care about and what's important

32:27

and what will affect your rankings

32:29

and won't, you probably

32:31

should take those with a grain of salt because

32:35

this documentation suggests that somewhere

32:38

between dozens and hundreds of times in the

32:40

last 20 years, Google has

32:42

been directly misleading or straight

32:45

up lying about those things. In

32:49

my blog post, I urged especially

32:52

industry commentators, podcasts

32:54

like yours, folks like Kara Swishu covered

32:56

at The Verge, Search

32:59

Engine Land, all of these publications, I urged

33:01

them to take a critical

33:03

view of statements that are made

33:05

publicly by Googlers because they

33:08

are in the best interest of Google potentially,

33:11

but they're not always accurate

33:14

and sometimes directly provably wrong. And

33:16

I think we should treat them

33:18

a little bit more like we treat statements

33:20

from politicians, right? I think the job of

33:22

a journalist is don't tell

33:24

me that Jordan said it's raining and

33:27

Rand said it's sunny outside. Go

33:29

outside and tell me what the weather is. Since

33:33

you brought it up, what would that

33:35

former group, people who regularly publish content

33:37

in the internet, what's the headline for

33:39

them? Oh, God. The

33:43

headline for them is you should probably

33:45

follow and pay close attention to people

33:47

who are studying and analyzing and extracting

33:49

value from this leak because there

33:51

are hundreds

33:54

of takeaways that are

33:56

both actionable and probably

33:58

different to things. you've

34:00

done in the past or learned as best

34:03

practices. Gosh,

34:05

just today, Cyrus Shepard, who

34:07

I used to work with at Moz and

34:09

who's an expert in search, he was actually

34:11

one of Google's quality raters for

34:13

a couple of years, which is

34:16

quite interesting. But he

34:18

noted that there was a

34:20

finding from another party inside

34:22

the Google leak that

34:24

something called content effort

34:28

is scored inside the

34:30

Google leak. It's a factor that

34:32

essentially human quality raters, as they

34:34

visit websites, you know, there's

34:36

these thousands, tens of thousands of people who

34:38

work for Google through a contractor, they visit

34:40

websites and they're supposed to write about them,

34:42

right? And sort of score them and say

34:45

whether they're good or bad and

34:47

all sorts of features about them. And one of

34:49

the things they're asked to do is say, did

34:51

it look like a human being spent a lot

34:53

of effort manually to

34:56

create something uniquely valuable, right?

34:58

Differentiated and valuable to people

35:01

with this resource. And

35:04

that score now appears

35:06

to be using a large language model

35:08

AI. So it basically takes

35:10

the input from all these quality raters,

35:12

builds a metric, you know,

35:15

a sort of algorithm, and now it's

35:17

scored through an AI system. And

35:19

I found that totally fascinating, right?

35:22

Essentially you've scaled up what

35:24

quality raters used to do manually and done

35:26

it with an AI. And that's being used

35:29

according to the documents in the ranking system.

35:31

Wow. I wanna talk about the AI thing

35:33

because I think there's something really important there, but the

35:36

anonymous source, just to go back to that. Since

35:39

then, someone has come forward saying

35:41

that they are that anonymous source,

35:43

Erfanazimi, I believe is their name.

35:45

Yes. Yeah, so

35:47

Erfan decided, I think, it

35:52

was only about two or three days after the

35:54

leak was published. I think

35:56

he sort of saw that the reception

35:58

was generally favorable and not. Not

36:01

attacking, not critical of

36:03

the source or of

36:05

the credibility of the

36:07

data and he came forward

36:09

as the anonymous leaker. He's

36:13

since done a few interviews

36:15

and talked publicly about

36:17

it. He's

36:19

a real interesting guy. We don't agree on 100% of

36:21

things but Jordan, I actually find him to be

36:23

quite a lovely human being. He's

36:26

a sweet and empathetic and sensitive side and I

36:29

think a really strong sense of justice too. What's

36:32

your sense of why

36:34

he chose to leak these documents? Because as you said,

36:36

it could have gone either way. The public reception could

36:38

have been one thing. Google's reception could have been another

36:40

thing. Why do you think he chose to leak these?

36:42

To be honest, I think his

36:45

stated reasons are accurate. I've seen nothing

36:47

in his behavior before or since to

36:50

suggest that he had anything other

36:53

than a deep frustration and

36:55

anger that Google had misled

36:58

people in the field of content creation

37:01

and the field of marketing and

37:04

the technology world and press overall

37:06

but potentially even some of the

37:09

legal cases against Google. I

37:12

think he wanted the record to be

37:14

set straight and I think he felt an

37:16

obligation to make this

37:19

data available to people.

37:24

That's something I share for 17 years of my life. My

37:28

whole mission was how do

37:30

I make search more transparent? That

37:33

was Moz's whole goal. I think that's what

37:35

built that company was making

37:37

this dark underbelly. When

37:40

I started in SEO, Jordan, it

37:43

was seen as a scam. People

37:45

thought that everyone in SEO was just

37:47

sketchy and terrible and that

37:50

they were manipulative. It

37:53

took decades to

37:55

make it a mainstream marketing practice that

37:57

every company now invests in. almost every

38:00

company in the world has someone who

38:02

thinks about or works on SEO. It's

38:05

no longer seen as sketchy or spammy,

38:08

it employs millions of people worldwide, obviously

38:11

because Google sends 70% of the

38:13

internet's traffic, right? Outgoing

38:16

click traffic. And so this

38:18

is kind of a, it's a little bit

38:20

of a dream come true to be able to share

38:23

this, even though I'm out of the field. And

38:25

I think for Erfan, being

38:27

still in the field, he

38:29

really wanted people to know

38:32

the truth.

38:35

You were talking earlier about the usefulness of Google,

38:38

and I'm fascinated by this, sort of the

38:40

rise and fall of how we find information. I

38:42

think there's a feeling amongst some,

38:45

a lot of people right now that

38:47

Google is becoming increasingly less useful for

38:49

finding authentic human created information, or it

38:51

can be in certain use cases. If

38:54

you want something written by a person

38:56

without any commercial sort of motivation behind

38:58

it, basically Google is just a tool

39:01

for appending Reddit to your search post.

39:04

If you want to find some good, what

39:06

did these leaks tell us about the usefulness

39:09

of Google search for people looking to do

39:11

more than just, you know, window shop on

39:13

the mall that the internet is becoming? Oh,

39:16

you know, to be honest, I'm not sure

39:18

that the leak reveals anything

39:21

in that particular direction. Instead,

39:23

I would say that what

39:26

you'd want to look for in these cases is

39:29

when people, statistically speaking,

39:31

you know, let's take a panel of tens

39:33

of millions of people like like the Dado's

39:35

panel. And let's look

39:37

at, you know, how many searches did they

39:39

do over the last

39:41

two years each month? And

39:43

was that number growing or shrinking? And

39:46

when they do searches, where do they go after they

39:48

search? They stay on Google? Do

39:51

they click on a paid

39:53

ad? Do they click on what's called

39:55

the organic results, right? The SEO results,

39:58

which are unpaid. Do they?

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features