Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
It's a unit
0:04
system. I know it is. It's
0:06
how the files are going to
0:08
work. It tells her everything.
0:19
Sir, he's uploading the virus.
0:22
Eagle One, the package is being delivered. So
0:29
I live with, my wife is
0:31
a software engineer. Everybody
0:33
I interact with every day is
0:35
a software engineer, like in my normal life.
0:38
And we're pulling through this, they were pulling
0:40
through this book being like, Oh, this is
0:42
really good. Yeah, you have to internalize
0:44
all of this so we can stop answering your questions.
0:48
You know, write your own Python scripts, etc,
0:51
etc. Will you introduce
0:54
yourself and tell us about the book
0:56
we're here to discuss today? Yeah,
0:59
I am Micah Lee.
1:01
I work as the Director of Information Security
1:03
at The Intercept. And I
1:06
just published my first book called
1:08
Tax, Leaks and Revelations, the Art
1:11
of Analyzing Hacked and Leaked Data.
1:13
It's basically a,
1:15
it's like a technical book. And the
1:17
goal is to teach journalists, but also
1:20
researchers and activists and people who are
1:22
looking for a new hobby or whatever,
1:24
how to analyze the floods of hacked
1:26
and leaked data that are getting leaked
1:28
on the internet every day. Yeah,
1:31
you make it sound as if there is just
1:33
a flood of stuff that there's not enough people to
1:37
train people to sort through properly. You say
1:39
that's accurate? Yeah, that's
1:41
definitely accurate. Like I, I
1:44
only download and look at
1:46
like a small fraction of the data sets that
1:48
I hear about just because I'm too busy. I
1:50
just have if I'm like working on a project
1:53
that I just ignore everything else. And
1:56
I think that this is the case of, you
1:58
know, the other few data
2:01
journalists that are doing this type of data journalism,
2:04
there's not nearly enough of us and so that's one of
2:06
the goals of the book is to basically you know
2:09
raise an army make a lot more
2:11
people who are able to have
2:14
the skills that they need to analyze
2:16
data sets like this. Yeah, exactly. So
2:18
without having your book as your own
2:20
guide how did you get to the point where
2:22
you're like you know what I've been
2:25
doing this for a long enough time, it's time for me
2:27
to raise my own army. How did you get there? I mean
2:30
so this is kind of the a
2:32
lot of the work that I've
2:34
been doing at the intercept over the last ten years
2:37
and I didn't I
2:39
come from a background of computer
2:41
science and of programming and then
2:43
really actually like web development and
2:47
I had never been trained in journalism or
2:49
anything like that but because
2:51
I was working at the intercept and I'm
2:53
running into these data sets I just kind
2:56
of you know used all my technical
2:58
skills and learn more technical skills along the
3:00
way in order to figure out how things
3:02
work and I think that there
3:05
were a few big data sets
3:07
that I spent a lot of
3:09
time on and that I was
3:11
realizing that not enough people at
3:14
all are are doing this stuff
3:17
and they really inspired me to write this book.
3:20
What does it mean to be the director
3:22
of information security for a news organization? What
3:24
does that job look like day to day?
3:27
So it's a very interesting job. My job
3:29
might be a little bit different than some
3:31
others because my job is
3:33
like also split between doing a
3:36
lot of traditional info-check work but
3:38
also doing investigative journalism myself. But
3:40
yeah I do a mix
3:43
of traditional information security stuff like
3:45
I make sure that our you
3:48
know website infrastructure is secure, I manage some
3:50
vendors, I make sure that none of the
3:52
endpoints that people use get
3:54
hacked and do phishing training and that
3:56
sort of thing. But then
3:58
I also do a lot of journalism stuff. specific work
4:01
involving source protection
4:04
and figuring
4:06
out how to secure sensitive data.
4:09
There's a lot of decisions around
4:11
when it's appropriate to use cloud
4:14
services and when we have to keep
4:16
stuff just on our laptops or
4:18
occasionally when we have to keep stuff on the air
4:20
gap computers. So it's a
4:22
bit different because of that, because
4:24
of all the journalism security work.
4:27
Can you give us, is there a good anecdote about the
4:29
job that you can give us without getting anybody into trouble
4:32
with putting them in danger? So I was thinking about this
4:34
back several years ago when
4:37
we were reporting on Snowden documents. We
4:39
went to extreme measures to keep the
4:42
Snowden archive safe. We would use air
4:44
gap computers where we'd actually unscrew
4:47
the cases and remove the networking
4:49
hardware and stuff. Whenever
4:51
we were needed
4:53
to move a file from one air gap computer
4:55
to another air gap computer or we were getting
4:57
ready to publish and we had to move it
5:00
to a computer that's not air gapped, we didn't
5:02
trust USB 6. We didn't want
5:04
the USB 6 to be involved. So we
5:06
actually burned CDs and then we shredded the
5:08
CDs when we were done with them. And
5:11
we had separate USB CD
5:14
drives that are like, these are the air gap
5:16
ones and these are the not air gap ones
5:18
and things like that. But
5:21
every time we published a story based
5:23
on NSA documents that were top secret,
5:25
as we had
5:27
your journalist practice, we needed to reach out to the NSA
5:29
press office and ask them for comment. And
5:31
we would also give them a chance to tell their
5:34
side of the story and tell them what we're accusing
5:36
them of doing basically. And we also wanted to show
5:38
them the documents we were planning to publish to see
5:40
if they had any arguments for why we shouldn't. We
5:43
never actually didn't publish something because of something they
5:45
said. But basically the
5:47
NSA was like, okay, just email
5:50
us these documents. And
5:52
We were just like, you mean like just plain
5:54
text email? Like Just copy them to our computers
5:56
that we never use them on and then just
5:58
like attach them. Well, I
6:00
yeah, and it took years of
6:02
Snowden Journalism. Before. We finally
6:05
got them to make a p defeated. By.
6:07
The Fugitive T they were basically like
6:10
wanting on the have very strict rules
6:12
and out and they're like. Encrypt
6:15
eats documents separately and just attacks
6:17
the encrypted file don't actually the
6:19
concept of years and else and
6:22
he thinks that basically. An
6:24
essay of like terrified of the
6:26
Press Office because. The. Either the
6:28
people who are security fans of have access
6:30
to both the good arguments are talking to
6:33
journalists all the time that are really don't
6:35
want one of the people talking to journalists
6:37
and with a tabby monographs and so yeah
6:39
as a favela pretty fascinating. Your
6:42
the public affairs official sometimes the stephen
6:44
of read it easy Like how often
6:46
does the Usa talked to the press.
6:48
Anyway, About anything really. Every time you
6:51
publish the story was you know request
6:53
comment from and every time they have
6:55
no karma. So Village Voice
6:57
may may wanna know what's coming off
6:59
Downs? Yeah, it's funny. I'm thinking about
7:01
it now. it is his attentions. But.
7:05
You with these three litre organizations you have
7:07
this weird things the doing like the last
7:09
ten years where they or to rip seating
7:11
old information from like the cold War era
7:14
on to like the the reading rooms right
7:16
is fascinating stuff in there and like that
7:18
they have podcasts where they where they talk
7:20
about all the sole stuff for the going
7:23
to the sold material but it's very much
7:25
on their terms. Or. They're controlling the
7:27
narrative of their back catalogue. If something
7:29
has to point out, I'm Sue. Him
7:33
we can set you up for your yeah
7:35
yeah mill as. I feel like it's
7:37
even worse given and when we'll talk
7:40
about this later on and and are
7:42
discussed Sen and. It's than
7:44
a bad week for their
7:46
journalism's world. to the i'm
7:48
less the last and one
7:50
of the common refrain ends
7:52
that you now people will
7:54
use when people. and journalists
7:56
announcer they're being laid off as you know learn
7:58
to code says that is a learn-to-code joke
8:00
somewhere in all of here, but
8:03
obviously data journalism
8:05
is so
8:08
important and has only
8:10
become more important. As the
8:12
years have gone on, we've seen newsrooms
8:14
really spin up these investigative data teams,
8:17
etc. How
8:20
important have you seen these skills becoming
8:24
in your time in journalism? I know you said
8:26
that you started off on the computer
8:28
science programming side of things. Back
8:31
when I started big archives of data, like
8:35
big data sets like the Snowden archive or
8:37
like the you know the Chelsea Manning
8:39
leaks, those were very rare.
8:42
They happened sometimes but they weren't common
8:44
at all and now it is literally
8:46
like pretty much every day. There's like,
8:49
like if you follow ransomware groups, you
8:51
could just go to their websites and
8:53
just download data from like dozens of
8:55
companies they hacked and you know some
8:58
of them might have journalistic value. So
9:01
yeah, it's really, really, really common.
9:04
And so I think that this
9:06
type of data journalism skills, they're
9:09
more important than they've ever been and I think
9:11
that that's just going to increase over
9:14
time. And yeah, like
9:16
the book does teach you to learn
9:18
to code, but I want to like make it
9:23
clear that it doesn't require any prior experience
9:25
at all. It's like designed to be
9:27
really accessible and really friendly. All you
9:29
need is a computer, an internet
9:32
connection, a hard drive with about a terabyte
9:34
of free space and then just enough curiosity
9:36
and willingness to learn new skills. Only a
9:38
terabyte? That's all we need?
9:41
Yeah, about a terabyte. Yeah, that's a lot of space.
9:43
It's because you have to download Blue Lakes, which is
9:45
like 250 gigabytes
9:47
and then you have to extract Blue Lakes, which
9:49
is like, you know, doubles in size because it's
9:51
all zipped up and stuff. And then there's a
9:53
few other data sets, but that's the big one.
9:57
But yeah, like, like a lot of people.
10:00
do find a lot of this stuff kind of
10:02
intimidating, like typing commands
10:04
into terminals and writing Python code and stuff.
10:06
But then Musk walks you through the process
10:08
from the very beginning and I like hold
10:10
your hand the entire way and try to
10:13
be as accessible and as friendly as I can. Yeah,
10:16
and I feel like what a lot
10:18
of people miss is that data journalism
10:20
isn't replacing, you know, the classic, you
10:22
know, shoes on the ground, boots
10:24
on the ground journalism that we
10:27
see in like, you know, 1950s,
10:30
60s, you know, cop investigative
10:32
movies and stuff like that. One
10:35
of the things that I have found interesting,
10:37
you know, throughout my career, especially, you
10:39
know, working on the tech side of
10:41
journalism is just how
10:44
you can go from these massive
10:47
data sets, these massive leaks and
10:49
databases that are either leaked to
10:51
you or just leaked publicly. And
10:53
then, you know, you spend
10:56
time within them, and you're
10:58
able to find these stories that are basically hidden in
11:00
plain sight. How does
11:02
that process work? How do you know what to look for?
11:05
Yeah, it can be challenging. It depends. My
11:07
book is full of all these hands on
11:09
projects where you download real
11:11
data sets to work with. And
11:13
so I mentioned Blue Leaks. So
11:16
Blue Leaks is, you know, hundreds
11:18
of gigabytes. It was data that
11:20
was hacked from hundreds of different
11:23
US law enforcement websites in
11:25
the summer of 2020 in the
11:27
middle of the Black Lives Matter uprising, and it's
11:29
full of evidence of police misconduct.
11:31
And basically, like one of the tools that's
11:34
really helpful is social
11:38
tools. So Alice is an example.
11:40
Alice is this tool that was
11:43
developed by the Organized Incorruption
11:45
Reporting Project. And so you
11:49
could take Blue Leaks, you could index the entire
11:51
thing in Alice, and what it does is it
11:53
looks through every single file, it extracts all the
11:55
text, it lets
11:59
that entity exchange. So it pulls out all
12:01
the email addresses and phone numbers and like social
12:03
security numbers and whatever else it finds and it
12:06
lets you search the entire thing and it also does
12:08
OCR so optical
12:10
character recognition, so it
12:13
works with like scans documents and images and
12:15
so Yeah, what's the whole
12:17
needle in it in the haystack that thank you
12:20
The way that I would typically start this is
12:23
I would search for some things that I'm interested
12:25
in So like maybe I would search for the
12:27
city that I live in or the name of
12:29
a politician or something and then Using
12:32
search tools like that. You kind
12:34
of narrow the field of what you start focusing
12:36
on And
12:39
yeah, there's there's a chapter that you do how how to
12:41
use alif how to set it up on your own computer
12:43
and do it with any Any
12:45
data sets you want and there's also like a lot
12:47
of other? Things like
12:50
just correct. It's a command line tool.
12:52
It's incredibly useful for being able to
12:54
filter filter data So but
12:56
but yeah, you're never gonna you're never gonna get away
12:58
with Get away
13:00
from just like manually looking through those
13:04
I mean, I think people are like thinks that maybe
13:06
I can do it for you. I kind of don't
13:08
agree but Yeah
13:11
in the end like you're definitely gonna
13:14
send many many many hours of clicking
13:16
through reading documents taking notes And then
13:18
based on what you find, you know,
13:20
that's where your investigation goes And
13:23
where do you find data sets? one
13:25
of my most favorite things to do is Read
13:29
through thousand page Pentagon budgets because there's
13:31
always there's always stories in there Buried
13:33
in weird places and I'm control effing
13:35
and I'm looking for f-35 or whatever
13:39
Department of energy stuff is a good one. Where
13:42
do you where do you find data and what
13:44
is? DDoS Secrets
13:47
or DDoS. I'm
13:50
saying it wrong. I call it DDoS secret. Yes
13:54
Distributed denial of secrets
13:56
so distributed denial of secrets is
13:58
this nonprofit transparent selected, I work
14:01
really closely with them. They're
14:03
kind of like a public library of
14:05
act and leaks data sets and it's
14:07
specifically curated for journalists. And
14:09
it's great. It's
14:13
ddosecris.com and you go there and
14:15
you can see all
14:17
of the data sets that they've released and
14:20
you can download them. A lot of
14:22
the data is available for everyone. Some of
14:24
the data is called limited distribution which means
14:26
you have to request access for it and
14:28
that's basically to produce policies. So a lot
14:30
of these data sets have tons
14:32
of personal information. And so
14:36
ddosecris like have relationships with journalists
14:38
and sometimes with like academic researchers
14:40
and then they share it and
14:42
this way they don't end up
14:45
just publishing like tons of private
14:47
information from innocent people. So ddosecris
14:49
is like a really good source
14:52
for data sets and that's
14:54
where I get a lot of
14:56
the data that I work with myself and all the
14:58
data sets in the book that are examples are all
15:00
downloaded from ddosecris. But
15:03
this is just a tiny sample
15:05
of the data that's actually out
15:07
there. Like you
15:09
were just talking about thousand
15:12
page government documents. But also
15:15
some of the data is totally public then
15:17
you can just scrape it from the internet. Like
15:20
a good example. I mean ddosecris made this
15:22
a lot easier for people but the
15:25
parlour scrapes. So January
15:27
6, 2021 when Trump supporters stormed
15:31
the Capitol they all
15:34
have phones on them and they all recorded
15:36
themselves doing all of this stuff on
15:38
their phones and then they posted
15:40
these videos in real time to
15:43
the social network called parlour and
15:45
a lot of these videos included
15:47
like metadata like the GPS coordinates
15:49
on their phones. And so
15:52
after January 6 parlour was
15:56
basically kicked off of Google Play and kicked off
15:58
of the Apple App Store. because
16:02
basically violating their terms to like they
16:04
were refusing to moderate content that incites
16:07
violence. And then AWS also announced
16:09
that they were going to kick them off. But
16:12
they gave them a few days. And so someone
16:15
like during Donk Envio's hand
16:18
during the this few
16:20
day period while the Parlor data was still
16:22
there, like works to download, it was something
16:24
like 54 terabytes of videos. So basically everything
16:27
that had been uploaded to Parlor. It was
16:29
over a million videos. And
16:31
the ironic thing is it was just downloading it
16:33
from AWS and then taking it
16:36
to a different speed bucket.
16:38
So anyway, yeah, that data, there's
16:42
a whole chapter on like working with that
16:44
data and figuring out how to, you know,
16:46
like take this million videos and write a
16:48
Python script that like looks through all the
16:50
metadata and finds the ones that have GPS
16:52
coordinates in West India and then we're filmed
16:54
in January 6 and then how to map
16:56
them and all of that stuff. So yeah,
16:58
so that's an example of like totally public
17:00
data. But then there's also, yeah, that's hacker
17:02
groups a lot of times have telegram channels
17:04
where they just post data that they steal,
17:06
we have some where groups a lot of
17:08
times around like poor onion
17:10
services that have all the data you
17:12
can download from them. And
17:14
then there's also just like so much
17:17
misconfigurations with data out there.
17:19
It was like S3 buckets that
17:21
are totally open that people just discover
17:23
sometimes. And there's also
17:26
like a good example, the
17:28
American College of Pediatricians. So this
17:30
is a group that the Southern
17:32
Poverty Law Center calls an anti-LGBTQ
17:34
hate group. They wrote like an
17:36
anarchist brief in the decision that
17:39
overturned Roe v. Wade. They had a Google
17:41
Drive link that was like open to anyone
17:43
and someone found it and then downloaded 20
17:46
gigabytes of documents. And that has been
17:48
some journalism based on that data. And
17:51
so, so yeah, the data sets are
17:53
everywhere. The data, there's just so
17:55
much data. And if you just poke around
17:57
a little bit, you can find it. There's
18:01
a lot of really interesting revelations in there and there's
18:03
like nobody looking at it. How
18:05
do you know when you found a story?
18:10
I mean, so okay, so there's a
18:12
lot of data sets out there that are just
18:14
completely like not
18:17
interesting in terms of journalism. There's
18:19
like, you know, like, here's a
18:21
list of all of
18:23
the customers of some company or something and
18:25
it's like, like, unless there's, you know, something
18:27
that you think is in the public interest,
18:29
then like, okay, those
18:32
customers data was breached or whatever.
18:36
I think that the stories are really
18:38
like when you find, you
18:41
know, evidence of
18:43
corruption or evidence of
18:45
crimes or, you
18:48
know, like, like if you have internal
18:50
chats and you find people being like
18:52
really racist or really sexist or things
18:55
like that. So yeah, I
18:58
mean, a lot of times a data set comes and
19:00
you might be really excited about it and then you
19:02
spend a bunch of time looking through it and nothing
19:04
really comes from it. Actually
19:07
an example of this is I looked
19:09
to the data from Oakland, the city
19:11
of Oakland. The city of Oakland
19:13
was hit with ransomware and I guess
19:16
they didn't pay their ransom and the data was
19:18
put online. And
19:20
so I downloaded a copy of it
19:22
and I don't think that, I think there
19:24
definitely still might be stories in there and
19:26
I just didn't spend enough time thoroughly looking.
19:29
But basically, like, you know, there's a lot
19:31
of information about all the lawsuits against the
19:33
city of Oakland, but not like a lot
19:35
of like, not like their internal deliberations or
19:37
anything like that. And I just spent a
19:39
while and didn't really find much even though,
19:41
you know, there's all stuff about the Oakland
19:43
police. There's like some potentially interesting
19:45
stuff, but yeah, so I
19:48
don't know. It's subjective and I think
19:50
that it's really about like what's news,
19:52
what people will, are really
19:54
interested in knowing about and also,
19:56
you know, personally I like finding
19:59
Stories. The really gonna have some
20:01
sort of impact intentions? Were
20:04
talking earlier about having to send
20:06
saying this. To the and
20:08
say to get comment on that not.
20:11
Every organizations finity the an essay. Not
20:13
every organizations gonna be somewhere that you
20:15
could reach out to with the process.
20:17
How do you authenticate be? Say the
20:19
sets? Consider Theoretically, he got this google
20:21
drive that could be someone sleuthing it.
20:23
On how do you now. That's
20:26
entirely depends on the title data for them.
20:28
I was that it is and generally up
20:30
the top of was a different. Story
20:33
but I found out of the
20:36
safest way in general is to
20:38
use our infants to kind of
20:40
compare publicly available information with information
20:43
that is that feeds you for
20:45
my promise. And fifty from from
20:47
the surface is Israel's arm and
20:50
so good. For here's an example
20:52
and Islam. One. Of the case
20:54
studies that out and about his are there's
20:57
this and I baxter of called America's from
20:59
London. And say
21:01
arm. Off during the
21:03
pandemic they made like millions of
21:06
dollars. Essentially Philippe Filter baby you
21:08
know was opposed masks vaccine himself
21:10
and and maybe with all the
21:12
ruins the only way to save
21:14
it from cover it either. Next
21:17
in an address the clerk one
21:19
and they work with on. Some
21:22
private, some somehow someway small fellow
21:24
health and bananas and basically a
21:26
hacker had contacted me on signal
21:28
and says that they arm and
21:30
act as if like the fourth
21:32
pick handlers about these methods and
21:34
I'll be for me I like
21:36
I wanted and I will only
21:38
one one hundred. Megabytes By
21:41
like when I accepted it turned out
21:43
to be hundreds. Hundreds of the data
21:45
are. now that if this was all
21:47
patient records and prescription records will like
21:49
medical records of America Frontline doctor citizens.
21:52
And. I did all
21:54
this reserve I figured out like you know
21:56
like how much money they really were they
21:59
saying how that. The all of the
22:01
scam and everything like that. this actually went
22:03
to a congressional investigations but I didn't know
22:05
if the state it was really are not
22:07
and so the way that I ended up
22:09
authenticating and I figured of as opposed. To.
22:12
The like ah you know anti vax
22:14
There's also a with also like America
22:16
Frontline Doctors is very like and that
22:18
it was like kind of started as
22:21
part of the From Twenty's Plenty campaign
22:23
like to nested I'm like Simone Goal
22:25
but the person who started as a
22:27
Bomb or January second Saxena so there
22:30
were like marijuana Life from like an
22:32
Anti Democracy overlaps solicited probably some of
22:34
these places. And
22:37
job is that Right wing says network. There
22:39
was a totally other dampers. it's That
22:42
On included thirty thousand smell enough to
22:44
the dentist. Someone in his eyes off
22:46
all the email addresses of America online
22:48
database and and all the email addresses
22:50
and the scam integrates and compare them
22:53
on. I found a bunch of overlap
22:55
were like of it. there's to hear
22:57
some down the others. That. Also
22:59
are allegedly in there as. Soon
23:03
as far as want them all.
23:05
And then I found a handful
23:07
of them that we're talking about
23:09
over finances. Only seventy nine hundred
23:11
suburban. And the like
23:13
dates mine up. And.
23:15
For that many really confident that this the theater
23:17
was ill. As. Much
23:19
as an example the upper publicly
23:21
as out horse Caesar's exactly I
23:23
mean actually one of the conversations
23:25
with specifically talking about alleged. Like.
23:28
Bird, animal supply stores or allow and
23:30
my what am I gonna do and
23:32
an article Finally I got. I got
23:34
myself from Maximum I backers Wilde. I
23:37
have would me over curveball question
23:40
that may be very stupid. But.
23:43
It occurred to me so I thought I would
23:45
throw it out there some of the in the
23:47
days of napster. Aging
23:49
myself, dating myself. That's one
23:51
of the wizards the music
23:53
industry with with Handle the
23:55
Percy Problem is that they
23:57
would flood the zone with.
24:01
Fake versions of the songs so you would
24:03
download of you download something he thought was
24:05
Madonna's new single. It was Madonna parading you
24:07
for downloading something. Arm in
24:10
this way they fought back. do you
24:12
see do you ever see situation? If
24:14
I were. Somebody that was
24:16
sitting on large datasets as part of my jobs.
24:18
I was a fortune five hundred company. I.
24:20
Would perhaps. In.
24:22
The event of a breach, train a
24:24
large language model to generate sakes. Sake!
24:27
Data in that also flip the zone. Out
24:30
Giver. Think about anything like that. I
24:32
know that's kind of a weird, far
24:34
flung hypothetical question. I. Mean yeah,
24:36
I definitely think that there there.
24:39
Is like there's there's all different
24:41
from a son everywhere and I
24:43
think that's. When you are
24:46
reporting on some and I like this
24:48
is a there's always a bit like
24:50
okay I have confirmed some of the
24:52
information that data that I have his
24:55
authentic. By and I me feel
24:57
like more confident that it authentic but I
24:59
didn't come from every single device. Media can
25:01
confirm that an email and you know that
25:04
would actually that the last the other person
25:06
suggests. That doesn't mean it. The rest of
25:08
the manager out of Israel and cited been.
25:11
A good thing to do is report
25:13
on what you from them. So.
25:16
Ah yes you're going to be posing
25:18
with you know that to publish The
25:21
email this and make sure that what
25:23
you're published and is a real. Who
25:26
you are only did and so even if
25:28
there is like some takes up like actually
25:30
remember on. Since.
25:33
I. Might be getting the details
25:35
wrong, but when he was published
25:37
I think the data back from
25:39
Syria and it later turned out
25:42
that the original dataset included some
25:44
live in person about big bank
25:46
transfers between my. People. In Syria
25:48
people in Russia and not information would like
25:50
deleted from that into that. and
25:52
so like at it was like real a real
25:54
we stayed at that but like some of the
25:57
information was deployed with the with the family deleted
25:59
at the former publishing it. And
26:02
yeah, so I don't know. Like, I
26:05
think that this definitely has happened in
26:07
the past. I think that especially with
26:09
LLMs, with like AI, it's going
26:11
to get a lot worse.
26:15
But it is just true with everything,
26:18
like just the entire zone is flooded with nonsense.
26:20
And I think that that's true with with data
26:22
sets too. And so I think it's just really
26:24
important to do the work to authenticate
26:27
everything that you're going to that you're going to publish. All
26:30
right. Cyber listeners were to pause there for a break.
26:32
We'll be right back after this. All
26:59
right. Cyber listeners, we're back on with Michael
27:01
Lee talking about HEX, leaks, and revelations. What's
27:04
the most interesting exploited data set you've
27:06
seen? Okay. So one
27:09
that I find really interesting is the
27:13
Epic HEX. So Epic is the EPIK.
27:15
In 2021,
27:18
Anonymous hacked Epic and they called
27:20
this hack Epic Fail. And
27:23
basically Epic, a hosting provider,
27:25
it's run by like Christian
27:27
nationalists, it's used by a
27:29
lot of like
27:31
really far right organizations and groups
27:33
and websites and stuff. They do
27:36
domain name registration. And so a
27:38
lot of the places where like mass
27:41
shooters have posted their manifestos have
27:43
been websites posted by Epic. And so that's
27:46
how they're able to stay online, like like
27:49
8chan and
27:52
things like that. And like
27:54
the Oathkeepers actually. This
27:56
was probably like the biggest note of
27:59
a company that I've ever seen. I'd ever seen. And
28:01
the reason is because they had,
28:05
so the data
28:07
that Anonymous released was, you
28:10
know, hundreds of gigabytes of MySQL databases full
28:12
of tons of information. And like the really
28:14
interesting stuff in there was Epic
28:16
Grand, so there were a domain name
28:18
register, they ran a who is privacy
28:20
service. And you can look behind,
28:23
you can peek behind the who is privacy
28:25
for all of those domains. So I like,
28:27
in the chapter on SQL on SQL, in
28:31
my book, it shows you how to
28:34
like go and like look up osteepers.org
28:36
and figure out the, like
28:38
if you do a public who is search on it,
28:40
it says like, this was protected by privacy service. But
28:43
then you can run the like MySQL
28:45
queries and discover like, okay, Stuart Rhodes
28:48
is the owner of this and here's his like
28:50
address and phone number and stuff. But then, you
28:52
know, here is the technical contact of it. And
28:55
there's more information. And so I think that
28:58
that's really interesting. But also
29:00
this will epic hack, like it
29:02
included the
29:04
Texas GOP website, it's like a WordPress site.
29:06
It included like a SQL dump event and
29:08
all the files for it. And so actually
29:10
like when I was looking into it, I
29:13
recreated it in Docker containers and like spun
29:15
up the website and then like changed the
29:17
app and password and logged in. And I
29:19
was just like, look around the backend
29:21
of the Texas GOP website. This
29:24
whole hack was like in response to
29:26
the Texas heartbeat loss, which was like
29:29
the biggest restriction of abortion rates in the
29:32
US before Roe v. Wade was overturned. But
29:35
it also, this hack also included entire
29:37
like VM images. So it included
29:39
like the images of hard drives of
29:42
the virtual machines that were running at the software.
29:44
And so like one of them was
29:46
like GitLab. So it's basically like an
29:48
open source version of GitHub and it
29:51
included like all their source code repositories,
29:53
but also all of their like issues
29:55
and pull requests and all the continuous
29:57
integration. So like when they merge code
29:59
to production. like all of the secrets
30:01
that actually connect to the production servers. I
30:03
don't know. It was wild. So I think
30:05
that that was probably one of
30:08
the most fascinating data sets that
30:10
I had seen. My real reaction to that
30:12
is like, damn, are people like that
30:15
stupid? Like, like,
30:17
not to, you know, to victim
30:20
blame the people like this, they're, you know,
30:22
going to be. It
30:26
just seems like this is like a nightmare for
30:28
anyone who is on the opposite end of this.
30:32
Yeah. Should a regular person,
30:35
aside from, you know, doing
30:37
password keepers, etc, etc. What
30:41
should we be concerned about? You know, who's
30:44
phishing us that we should be afraid of? I
30:47
mean, the problems like like this,
30:49
when the company gets hacked and all their
30:51
data gets breached, that
30:53
is not really something that regular people can
30:55
handle. That's like the responsibility of the company.
30:58
And it's like, I mean,
31:00
I don't think that Epic
31:02
was especially competent, but even
31:05
for competent companies, it's really
31:07
hard. Like defending from
31:09
hackers is a very, very difficult situation.
31:11
It's like much easier to find a
31:13
single flaw and like hack something than
31:16
it is to find every
31:18
single flaw that anyone might find and defend against them
31:20
all. But in terms of just ordinary
31:22
people, like what you can do, like, yeah, I think
31:24
that the best you can really do is use a
31:28
password manager, have really good
31:30
passwords, try and like, like
31:33
a lot of times there's data breaches that
31:35
aren't actually like the whole
31:37
service provider gets hacked, but instead individual
31:39
accounts get hacked. So the way to
31:41
make sure your account doesn't get hacked
31:43
is use two-factor authentication. You
31:47
know, like post less information on the internet, like
31:49
don't store all of your or if you do
31:51
put it on the internet, put it, you
31:53
know, in places that are encrypted. So like,
31:55
instead of using,
31:57
storing all your stuff in Google Drive.
32:01
you know, maybe store it in like proton
32:03
drive or something like that. So then, you
32:05
know, if proton mail gets hacked or gets
32:07
law enforcement requests or whatever, they won't be
32:09
able to just hand over all of your
32:11
files. And if you are using Google
32:13
Drive, I mean, that's fine. Google is actually very secure.
32:15
But like turn on Google
32:18
Advanced Protection, which is a way of like
32:20
really locking down your Google account. It makes
32:22
it a lot harder to hack. I think
32:24
that's the best that ordinary people can do
32:26
is just use good, strong
32:28
passwords, use 2FA and
32:32
yeah, like don't have all your conversations on Discord,
32:34
have them on Signal. Kind
32:36
of piggybacking off of that, what's the most
32:38
creative intrusion you've seen? I'm
32:41
especially interested in any stories of
32:43
really wild social engineering. Let's
32:46
see. I mean, I'm not sure. So
32:48
what's the data sets that I get
32:50
generally? Like, like they're not always from
32:52
hackers, but when they are,
32:54
I have no idea how the intrusion happened. I just
32:56
have the data. So
32:59
yeah, so I
33:02
actually like was thinking
33:04
about this and I thought of an intrusion, but it's
33:06
not social engineering at all. But
33:08
really wild social engineering. I
33:11
mean, I just remember like,
33:14
so I've worked with a lot of like
33:19
NGOs and like human rights activists and
33:21
stuff. And I remember like
33:23
hearing about phishing emails with someone that looked
33:26
super convincing that was basically
33:28
like inviting you to a conference and being
33:30
like, we'll pay for it. Like,
33:32
like, you know, I know that this conference, like
33:34
in Europe or, you know, somewhere, somewhere they'll be
33:37
really fun to go to and it's really expensive.
33:39
Would be, you know, it's perfect
33:42
for you. We would just want you to like come
33:44
and attend and like maybe be on a panel
33:46
or something. And you know,
33:48
we have full funding for your flight and
33:50
for your per diem and for everything. And
33:53
that's like can be very enticing for like
33:55
nonprofit workers that were especially bit like
33:57
bits of interest and stuff. But
34:00
like, so what I was thinking is
34:02
the act like a wild
34:05
intrusion that isn't social
34:07
engineering was actually the
34:10
American frontline doctors like
34:12
telehealth symphonies. And I actually
34:14
I was talking to the hacker because they reached out
34:16
to me directly. So I asked them to think we
34:19
like how this hack happened. And they
34:21
said it was hilariously easy to hack. And
34:23
it actually like, like, it's
34:25
kind of it's kind of funny how
34:27
incredibly simple this is, but also how
34:30
incredibly impactful it was. So
34:33
the two companies that were hacked were Cadence
34:35
Health and Rab2 Pharmacy. Cadence
34:40
Health basically was did
34:42
like the telehealth consultation. So when
34:44
someone's like, I want Ibramectin,
34:46
they would pay $90 and have a doctor call
34:50
them on the phone and have like a doctor plan
34:53
basically in those costs $90 and anyone can
34:55
make accounts. So basically, the hacker
34:57
went to Cadence Health, made an
34:59
account, and then just was
35:01
like, looking at the
35:03
HTTP requests, their browser made as they were
35:06
like, clicking around in their account. And they
35:08
noticed that one of the requests had their
35:10
account ID on it. And
35:12
it was like, like get account info slash
35:14
ID. And they just included
35:16
all their information, but also include in like a
35:18
JSON object, it also included like their password hash.
35:22
And they changed the ID to a different
35:24
ID. And it included
35:26
all of a different patient's information. So
35:28
they just wrote a like little script
35:30
that just iterated through the IDs and
35:34
downloaded the patient data from 255,000 patients. And
35:38
that was that hack. And then the
35:40
other one was our Rab2 Pharmacy. So
35:42
this was like the main pharmacy that
35:45
after they prescribed Ibramectin and Hydroxychloroquine, they
35:47
would like go to the pharmacy to fill it. And
35:50
with your Abscu, anyone can create an account with this
35:52
pharmacy. And basically, I don't know how they did it,
35:54
but the hacker said that they discovered a special
35:57
URL, it was like the super admin
35:59
interface. And as long
36:02
as you're logged into an account, any account
36:04
at all, you have access to it. So
36:06
if you're not logged in, it like forces you to
36:08
log in. If you're logged in, you just have access
36:10
to all of this stuff. And it includes like a
36:13
list of all of the prescriptions that they had ever
36:15
filed. And so,
36:17
yeah, they just like
36:19
skipped all that information. And actually, when
36:21
I was doing this story, the Rabku
36:24
CEO, I like found
36:26
his phone number and called him. And he didn't
36:28
actually believe that they were hacked because they were
36:31
like, no, that's impossible. We're hip-hop compliant. We're really
36:33
secure. And then I like emailed him a screenshot
36:35
of the Superwoman interface. And he was just like,
36:37
oh, God, I have to call my video and
36:39
like hung up the phone. It's
36:41
really deflating. I don't know why I want there
36:43
to be more of a romance to all of
36:45
this, but it's
36:48
really deflating just how simple and
36:53
just ignorant a lot of this is to me. Like
36:56
it's just plain, just
36:59
plain not having your shit together. Right?
37:02
Yeah. Yeah. But I
37:04
mean, like, it really is. But
37:07
like, also, I don't it's hard
37:09
to do everything well. Like you
37:11
probably all have Google accounts. How
37:13
many of your Google Docs or
37:16
Google or folders in Google Drive
37:19
have shared settings that are open to anyone with the
37:21
link? Like, and you are you
37:23
actually want them to be open to everyone or
37:25
did you just not feel like sending their email
37:27
addresses to share them with. So it's like, I
37:30
think that everybody does this. And so
37:32
so it's like, it's like you have
37:34
to be kind of digital, vigilant
37:38
with your security practices to
37:41
not end up doing stuff like
37:43
this. But also, like,
37:46
like, yeah, if you're if you're having patient
37:48
data, if you're having like health care records
37:50
and stuff, you absolutely should not to have
37:52
like some sort of access control, you know.
37:56
Yeah, it is deflating how easy it is.
37:58
But not everything is a. I
38:00
think that that's why we're seeing these
38:02
data sets, because these are the easy
38:04
ones. Yeah, no, you're right. It's like,
38:07
you know, some people should definitely
38:09
do it. But then there
38:11
are some organizations that really, really
38:13
should be doing it. In
38:18
terms of, you know, making mistakes,
38:20
what are the kinds of mistakes
38:22
that you see when journalists who
38:24
aren't as versed in the world
38:26
of hacking reporting
38:28
on stuff like this? I mean, I
38:30
think that like, really,
38:32
it's just believing what companies
38:34
tell them and believing what
38:37
like billionaires say. Like,
38:40
like, there's just so many stories,
38:43
you know, from the last several years
38:45
about like crypto, and
38:48
how it could, you know, solve all of the
38:50
problems and how it's really secure and all sorts of
38:53
stuff. And then that's just like, turned out to not
38:55
be the case at all. So I think that
38:57
like, yeah,
39:01
just like believing a lot
39:03
of hype and not actually
39:05
verifying what companies say.
39:07
So this doesn't really like have to do
39:10
with my book or with data sets, really.
39:12
But I had in 2020, I worked on
39:14
the story about Zoom,
39:17
and how Zoom
39:20
basically was misleading all
39:23
of its customers claiming that it had end-to-end
39:25
encryption, and it didn't have end-to-end encryption. And
39:27
this actually led to like an FTC settlement
39:29
and where the FTC forced Zoom to implement
39:31
real end-to-end encryption. And it led to like,
39:34
I forget how many millions of dollars fast auction
39:36
lawsuits, which was pretty cool. But
39:38
basically, like me and Yael Gower is the
39:41
other journalists that work with it, we were
39:43
just looking through Zoom's like privacy policy and
39:45
asking them questions about how their end-to-end encryption
39:47
work. And we got them to kind of
39:50
admit that, well, actually, the like, keys that
39:52
protect the Zoom meetings are generated on Zoom
39:54
servers, and we do have copies of them.
39:56
And then we're like, that's not an end-to-end
39:59
encryption. And they're like, oh, well, we're just
40:01
using a different definition of end-to-end encryption. And
40:05
I think that that's probably true for
40:07
companies everywhere. They all just say whatever
40:09
the marketing people think sounds good. And
40:12
then you really have to look into
40:14
it in detail and ask them questions.
40:17
And ideally, don't even ask them questions. Reverse
40:19
engineer, how does stuff work? And
40:21
if you can figure that out, then yeah. I
40:24
feel like that's the big mistake.
40:26
It's just believing people without, like, unspectacling.
40:29
The idea that you would just believe
40:31
that Zoom is end-to-end and cr- Anyway,
40:33
I'm gonna let that go before it makes my
40:35
brain, the blood shoot out of my
40:37
ears. So we mentioned Discord earlier.
40:41
And I've been kind of fascinated
40:43
by this little chat room thing that was meant
40:45
for gaming, coming to
40:47
take on this weird oversized importance. I
40:51
think maybe in ways that people don't really realize. I
40:55
think the big story from the end of last
40:57
year was Jack DeShara, the DOD leaker, was
41:00
sharing things with the Discord group that
41:03
was supposed to, like, that he'd squirreled out of a
41:05
skiff. It's just strange stuff. Has
41:08
Discord become a part of your daily life? Is
41:10
it important to your work? And what are the
41:12
payrolls there? You
41:15
know, I actually don't use Discord all that
41:17
much. I use it, like, a little bit
41:19
now. There's a few different servers that I'm
41:21
part of. But I mean, I think that
41:24
really, like, what Discord is, is it's
41:26
like this massive, like, changes, semi-private place
41:28
for people to communicate online. And so
41:30
because there's, like, millions and millions of
41:33
people that are talking in it, that
41:36
totally makes sense that
41:38
there's gonna be leaks coming from it.
41:40
And the whole, yeah, the Jack DeShara
41:42
thing where he's posting, you know, top
41:44
secret documents about the Russia-UK war, basically,
41:46
like, for clout in front of his
41:48
friends. Yeah, that's fascinating.
41:51
I mean, so actually, like, one
41:54
of the other case studies in my book does
41:56
involve a lot of leaked Discord tests. And
41:59
this is from... a bit older, this
42:01
was from like 2017 when the
42:03
people who were like really, really
42:05
using like this little
42:07
gaming chat thing
42:09
were Neo-Nazis. So the
42:13
organizers of the Unite the Right rally
42:16
in Charlotte, Virginia, that whole thing was
42:18
organized on Discord. And then so were
42:21
like, like there was there was like
42:24
several other like 15 other Discord servers.
42:27
And so yeah, like I talk about
42:29
how like anti-fascist infiltrated these servers and then just
42:31
use some software to just like once you're in
42:33
a server to just go and scrape all
42:36
of the chat history that they have access
42:38
to for like the entire server since since
42:40
it started. And I think that
42:43
this is one of
42:45
the reasons why Discord
42:48
is such a big deal because it's actually really
42:50
easy for any person in a Discord server to
42:52
just grab everything and just have them posted to
42:55
that server. And then it's also easy because these
42:57
are like not mostly they're not
42:59
public, mostly they're like, but they're not really
43:01
that private either. It's like there's this Discord.gg link
43:03
that if you find one you can join another
43:05
server. And so if you scrape an entire, I
43:07
think this is what a lot
43:10
of the like implicators for these like not-can-eat
43:12
chat rooms did is they'd like grab, they'd
43:14
like make get their way into
43:16
one and then they would scrape it all
43:18
and search for Discord.gg and then they'd find
43:20
like seven others and they just join those
43:22
and scrape all of those. And
43:25
I think that yeah,
43:27
that's one of the reasons why it's
43:29
not like a signal group. Yeah, and
43:32
there's an illusion of privacy in them
43:34
that doesn't quite actually exist, right? Yeah,
43:36
yeah, absolutely. And in fact, I mean,
43:38
I think that also it's
43:40
important to like, I don't know, I always have
43:42
in the back of my mind that anything that
43:44
is not and encrypted,
43:47
like the company has access
43:49
to it. So there's an illusion of privacy in
43:52
your Google Docs too. It's a lot more private,
43:54
I think, than Discord channel where like, you know,
43:56
a bunch of strangers might join and you might
43:58
not know them. But like, But yeah.
44:02
Do you draft copy in Google Docs or do
44:04
you use something else? It
44:06
depends on the story. If it's a
44:08
story where it's just like, doesn't
44:10
matter, like it's totally, I'm not
44:13
like, don't have any source protection
44:15
things, then I can sometimes
44:17
do Google Docs. But
44:21
otherwise, actually use Word a lot. Yeah,
44:25
if it has anything to do with secret
44:30
information or source protection or whatever, then like
44:32
intercept policy is like, we don't use Google
44:35
Docs for any of that. All right, this
44:37
one is from one of my friends who
44:40
wanted me to ask this. What is the
44:42
worst kind of data set
44:44
system to work with and why is
44:46
it XML? So,
44:48
okay, so XML can be obnoxious.
44:51
But one good thing about XML is that
44:54
it's actually like an open format and there's
44:56
libraries that can work with it. What I
44:58
find even worse than XML is
45:00
like weird proprietary crap. So
45:03
like once someone
45:06
said like some surveillance videos
45:09
that were from some like, I
45:11
don't know, some like surveillance camera company
45:13
and the video is just a
45:15
normal video format. The only way to watch them is
45:17
to like, get like start up
45:19
a Windows VM and install the
45:21
company's like software and then you could open
45:24
them from there. And you
45:26
could, or you could maybe spend like hours
45:28
and hours and hours trying to figure out
45:30
how to like get an MP4 out of
45:32
this. So like something like that, it's just
45:34
obnoxious. And then even just like, I
45:37
remember I worked on, all right, I helped with that
45:39
story where it was
45:41
a leaked Oracle
45:43
database of
45:46
like Chinese police stuff that
45:48
was like involved in surveilling
45:50
Uyghurs and it was
45:53
an Oracle database and Oracle is like
45:55
a proprietary database thing. And so it'd
45:57
be so much easier if it would
45:59
just like. like MySQL or Postgres or
46:01
something. And none of the people that were working
46:03
on this, the tech people,
46:05
were that familiar with Oracle and you have
46:07
to buy a license. And eventually we managed
46:10
to figure out how to convert it into
46:12
Postgres so that we could actually work with
46:14
it. But yeah, just weird proprietary stuff is
46:16
really obnoxious at that stage. What's
46:19
the most common stuff you work with? Usually
46:21
SQL and that kind of thing? Is
46:23
it mostly that? Yeah, so just
46:26
collections of office documents are
46:28
really common. Like
46:30
a PDF and like Word files and
46:32
Excel files and things like that. Email
46:35
is really common. So normally that's, sometimes
46:37
it's folders full of email files, which
46:39
is like the standard for a single
46:42
email. But then if there's
46:44
also like inbox files and PST Outlook files,
46:46
there's a whole chapter called reading other people's
46:48
email that teaches you how to deal with
46:51
all of this stuff and how to import
46:53
it into Thunderbird and things like that. But
46:56
then for
46:58
structured data, like JSON files,
47:00
like JSON data and CSV spreadsheets are
47:03
like really, really common. Like
47:05
the American Frontline Doctors, it
47:08
was just nothing but
47:10
JSON files and CSV files and that's
47:12
it. And then yeah, SQL is
47:14
really common too. Bringing
47:17
us back kind of to where we are in
47:19
the present, you
47:21
know, not a great
47:23
week for jobs in
47:26
the world of journalism, but
47:28
at the same time, this is after we've had a
47:31
couple of years of a
47:33
lot of OSINT journalists
47:35
that are just, you know, guys
47:38
on the internet figuring things out. How
47:42
are you feeling about this industry right now? In
47:46
terms of like really
47:48
bad OSINT that
47:51
doesn't necessarily really like mean what people
47:53
think it means, you mostly
47:55
just ignore all of that stuff and I
47:57
sort of mixes in with other... kind
48:00
of like, I don't know, like
48:02
the internet is full of things,
48:05
of websites full of like bad reporting or
48:07
misinformation or spam or like a mix of
48:09
all of them. And so, um,
48:12
mostly don't really look like
48:15
I mostly just ignore that stuff. Um, I,
48:17
although I do think that, that goes that
48:19
went on well can be like really exciting
48:21
and interesting, especially if you like really just
48:24
narrow it down to like, okay, I've connected
48:26
these two things and here's, here's my proof.
48:29
Um, uh, but in terms of
48:31
the industry, I don't
48:34
know. I mean, things are
48:36
grim. Um, I, I'm actually,
48:38
uh, very happy about the,
48:40
the like kind of recent new direction of
48:42
the intercept though, where, where it, uh, split
48:45
off from first look media, which is its
48:47
parent company. And so now it's just a
48:49
completely independent nonprofit and it's, you know, um,
48:52
uh, like, like it seems
48:54
like the inner intercept is in a good place. So
48:56
I'm happy about that. Um, the, the whole industry
49:00
has a whole, I don't know. I really
49:03
hope that it doesn't get sucked into too
49:05
much AI. I
49:07
think it's going to in the short term, like it's just
49:09
gonna, we're just going to have to suffer through that, I
49:11
think, um, until
49:13
it like collapses in on itself.
49:16
Uh, I think that's, yeah, I think
49:18
we just, you're right. Yeah. But you
49:20
think about like, you get to live
49:22
through interesting times. Isn't
49:24
that wonderful? Yeah.
49:27
But maybe the AI is not going to
49:29
parse the data sets as well. I
49:31
don't know. Yeah. Is that
49:33
it? So one thing that, that
49:35
I found that like chat GPT
49:38
is really good at is helping
49:40
you write code. So if
49:42
you're, if you're new to, to this stuff, if
49:44
you want to like follow along with my book
49:46
and you're like very intimidated by the Python stuff
49:48
and you're like, okay, I need to write a
49:50
script that like opens a CSV file with millions
49:53
of rows and then loop three for out. You
49:55
can just ask chat GPT, hey, Python steps to
49:57
open a CSV file and look to the rows
49:59
and it'll give you a little snippet of
50:01
code. And so that sort of thing I think could
50:03
actually be really helpful. But in terms of
50:05
actually like finding the stuff for
50:08
you or like writing stuff for
50:10
you, no. Yeah, but like writing
50:12
code, I'm actually a big favorite
50:14
for helping you write code faster.
50:17
I love that. How
50:20
often do you mess with GIS data, if
50:22
at all? A
50:24
little bit. Like I, so
50:27
the whole like Parler data that
50:30
has GPS coordinates, I actually,
50:33
while I was writing that book, I'm like spending a
50:35
lot more time with the Parler data than I had,
50:37
like, you know, when it came out, I
50:40
was learning a lot about GIS software
50:42
too. But I basically like, you
50:46
know, figured out ways
50:48
of like various
50:50
options to map GPS
50:52
coordinates, which are pretty cool. One
50:55
of the things with the American Frontline Doctors data
50:57
was I have patient data and I had everyone's
50:59
addresses, but I didn't want to like, you know,
51:02
publish any one of the addresses, but I was
51:04
really curious like what states have the most people
51:06
and what cities have the most people. And so
51:08
I wrote some code
51:10
that basically like, put the list of like
51:13
all the patients in each city and
51:17
geocoded those cities. So I had GPS coordinates
51:19
for the cities and then I like mapped
51:21
it all. And so the article we published
51:23
actually had like an interactive map where you
51:25
can see, where you can like scroll around
51:27
and see the cities that have the most
51:29
and the least people
51:32
who are really into ivermectin and
51:34
made octoporaxin and probably are anti-vaxxin
51:36
into Trump themselves. Everyone loves a
51:38
good map. Yeah. So
51:42
Emily, do you want to take this last one? Yeah,
51:44
I mean, so many of our
51:46
listeners, you know, we definitely
51:49
have listeners, I'll say, who
51:52
have access to data sets and
51:54
might at some point, maybe
51:56
that point is now, want to become a
51:58
source for a journal. list for whatever
52:00
reason, what kind of
52:03
advice would you have for them
52:05
about managing risk and doing, you
52:07
know, risk assessment on their end
52:09
before, you know, becoming a source?
52:12
So I would say, first of all,
52:14
if you're thinking about this at all,
52:16
don't do any sort of things on
52:18
your work devices. Like, don't
52:21
search for information about, like, how do
52:23
you leak to a newsroom from your,
52:25
like, work computer. Don't
52:27
use your work computer or your work phones as much
52:29
as possible, but generally that's not possible. So
52:32
generally, if you have access to the data set, it's only from
52:34
your work device. If
52:37
you are thinking of leaking something,
52:39
it's really good to think about how many people
52:42
have access to the thing that you're leaking. It's
52:44
a big difference if you're going to, like, leak
52:46
an email that was sent out to your whole
52:48
company than it is if you're going to leak
52:50
an email that was sent out to three people.
52:54
And so I think that it's always important
52:56
to just think like the leak investigator. So,
52:58
like, after, you know,
53:00
let's say you become a source, you
53:03
leak some stuff, a journalist publishes
53:05
an article, like, at
53:07
that point when this is public, that's
53:09
when they're going to start an investigator
53:11
again. And so, like, think about what they have
53:13
access to. They're going to try and, like, and
53:16
come up with a suspect list and narrow
53:18
it as much as possible. And so, yeah,
53:20
like, think about all of the
53:22
things that you're going to do. The system that you
53:24
use to keep logs. Like, did you know that every
53:26
single time you open any single Google Doc, there
53:29
is a log in that in Google admin?
53:31
So, like, the administrator of your Google workspace
53:33
could go in and just, like, look at
53:36
a document and see, like, oh, this person
53:38
loaded it at these specific times and then
53:40
they loaded it, you know, every day for
53:42
a few times. And then, like, a week
53:44
later, this document was published in the news.
53:47
Right. So, like, all that stuff is public. And I
53:50
think that, like, thinking about that is
53:54
the most helpful thing to do.
53:57
Yeah, just like be aware of it. everything
54:00
that you do is basically under
54:02
surveillance. Everything leaves the trail and
54:06
if you want to try and do this as
54:10
safely as possible then like either try to like
54:12
leave a little trail as you can or try
54:14
to make sure that like your trail is mixed
54:16
up with like thousands of other people's trails. Michael
54:19
Lee, thank you so much for coming on to cyber
54:21
and walking us through this. The book is Hacks, Leaks,
54:23
and Revelations, the art of analyzing hacked
54:26
and leaked data. And it's out
54:28
now, yes? It's out now. And
54:30
you can see go to hacksandleaks.com. That's the
54:32
book's website. Thank you so much. Thank
54:35
you for having me. This has been great. Thank
55:01
you. Thank
55:30
you. Tired
56:00
of ads barging into your favorite news podcasts? Good
56:09
news. Ad-free listening on Amazon
56:11
Music is included with your Prime membership.
56:14
Just head to amazon.com/ad-free news podcast
56:16
to catch up on the latest
56:19
episodes without the ads.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More