Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:48
We all , as a scientific
0:52
and technological and policy
0:54
community , should consider
0:57
privacy announcing technologies bigger
1:00
and shift from PETs
1:02
to FETs . So
1:04
fairness-enhancing technologies , which
1:06
is not so difficult to reach . We
1:09
just need to think a bit more critical
1:11
and a bit broader . And what
1:13
are the real goals ? The real goals
1:15
are protecting
1:17
the most impactful
1:19
groups , the most marginalized
1:22
and impacted groups in the
1:24
digital environments .
1:26
Hello , I am Debra J Farber
1:28
. Welcome to The Shifting Privacy
1:30
Left Podcast , where we talk about
1:32
embedding privacy by design and default into
1:34
the engineering function to prevent
1:36
privacy harms to humans and
1:39
to prevent dystopia . Each
1:41
week , we'll bring you unique discussions with
1:43
global privacy technologists and innovators
1:45
working at the bleeding edge of privacy research
1:47
and emerging technologies , standards
1:50
, business models and ecosystems . Welcome
1:54
everyone to the Shifting Privacy Left podcast
1:57
. I'm your host and resident privacy
1:59
guru , Debra J Farber . Today
2:02
, I'm delighted to welcome my next guest , Gianclaudio
2:05
Malgieri , Associate
2:07
Professor of Law and Technology at Leiden
2:10
University; Co-Director of
2:12
the Brussels Hub Hub; ,
2:15
Professor the Computer Law and Security
2:17
Review Review; A of the book
2:20
Vulnerability and Data Protection
2:22
Law Law; and expert in privacy
2:24
, data protection , AI regulation , law
2:26
and technology , EU law and
2:29
human rights . Today , we're going to be discussing
2:31
his recently co-authored paper on the
2:34
fairness of privacy enhancing Welcinem
2:37
Welcome Jean-Claudio A ! P H . Welcome , Jean-Claudio .
2:39
Thank you , Debra , I'm very happy to be here .
2:41
Excellent . Well , I know we're going
2:44
to have a really lively conversation , but
2:46
before we get into your paper , I'd love
2:48
for you to tell us a little bit about some of your academic
2:50
work with the Brussels Privacy Hub
2:53
, and you know how you got into privacy to
2:55
begin with .
2:56
Yeah , the Brussels Privacy Hub is
2:58
a special think tank based
3:00
in the Freie Universität Brussels in
3:02
Brussels , and the position
3:04
of the hub in Brussels is really helpful
3:07
for engaging with policy
3:09
making , academia
3:11
and all countries that
3:14
in Europe are very active in terms of
3:16
academic efforts for privacy
3:19
technology , regulation and policy
3:21
. But I'm also , as you said , working
3:24
full-time in Leiden
3:26
, which is the oldest Dutch university
3:29
and one of the biggest
3:31
law faculties in the Netherlands , so
3:33
I'm trying to exploit
3:35
these links . But I'm also Italian , so
3:37
I'm connecting with the Mediterranean tradition
3:40
on privacy and
3:42
data protection , also trying to connect
3:44
bridges with US because I'm part of the
3:46
privacy law scholarship conferences in the
3:48
States . The hub is
3:50
trying to push on several different
3:52
aspects during research , but
3:55
also trying to push for
3:57
important debates . Just
3:59
to make some examples of our activity
4:02
. I would like to mention three main
4:04
topics that we are exploring now
4:06
. The first is impact assessments
4:09
and fundamental rights , so how technologies
4:11
can be considered from an
4:13
impact assessment perspective . In particular
4:16
, we looked at the Artificial Intelligence Act
4:18
in the European Union , which was very
4:21
important for the impact
4:23
assessment discussion . The Brussels
4:25
Privacy Hub pioneered a
4:27
letter signed by 160
4:30
university professors to push
4:32
the legislators to add a
4:35
solid fundamental rights impact assessment
4:37
in the final text of the European Union
4:39
law on AI and
4:41
we were successful . It's now there
4:43
. So this is just one of the three
4:46
main things we do . A second
4:48
example is vulnerability and data protection
4:50
, which is also , as you said , the
4:52
name of my book . We found
4:55
, within the hub , a group
4:57
called Vulnera . It's a research
4:59
network and dissemination
5:02
platform where we try
5:04
to focus on vulnerabilities and different
5:07
vulnerabilities in different situations and
5:09
in different groups . And the last
5:11
but not least , part of
5:13
our research and activity is
5:16
about data governance and data
5:18
transfer , which is due to
5:20
the tradition of the Brossel Privacy Hub that
5:22
was funded by Paul D'Arte and Chris Kuhner
5:24
. Chris Kuhner , in particular , was one of the
5:26
great scholars on the topic of
5:28
data transfer and data governance .
5:31
Oh wow , I learned a lot . I had no
5:33
idea about some of the things that
5:35
had transpired . That kind of led to the
5:37
Brussels Privacy Hub , and it
5:39
makes sense that you're located in Brussels
5:42
to have that kind of effect on policy in the EU
5:44
. You recently co-authored this
5:46
article and the title is the
5:48
Unfair Side of Privacy Enhancing Technologies
5:51
Addressing the Trade-offs Between PETs
5:54
and Fairness . Maybe you could tell us what
5:56
inspired you all to
5:58
write on this topic , like fairness
6:00
and PETs .
6:02
Yeah , sure , I think what inspired
6:05
me and my two co-authors
6:07
is mostly the
6:10
narrative and how
6:12
to say , distorted narrative
6:14
about privacy and anti-technologies that
6:17
big industrial lobbies are
6:19
pushing for in
6:21
the EU , us
6:24
and in general in global discussion
6:26
on technology regulation . We
6:29
had a lot of emphasis
6:31
on the importance and benefits
6:33
of privacy enhancing technologies . In the last
6:35
years we had important
6:38
initiatives about privacy enhancing technologies
6:41
and a
6:43
lot of explanation on the importance
6:46
, a lot of marketing , a lot of
6:48
advertisement on how great
6:50
privacy enhancing technologies are , and
6:52
we could agree to a certain extent . The
6:55
problem is that the narrative is
6:57
incomplete because , as
6:59
maybe we will say later , fairness
7:02
and privacy and data protection are much
7:04
beyond anonymization
7:06
and pseudonymization . It's
7:08
about power control
7:11
, power management and
7:13
also mitigation of
7:15
power imbalance in the
7:18
digital landscape . So
7:20
it's not just about not
7:22
identifying individuals
7:25
. It's also about
7:27
controlling and managing
7:29
power imbalances for
7:32
big dominant platforms , and
7:35
so for us , the main trigger
7:37
was privacy and antidepressant technologies
7:40
are important , but
7:42
are not the solutions for
7:44
the digital policy
7:46
challenges . It's not the solution we are
7:48
looking for .
7:50
So is it that it's not part of the solution , or is it not
7:52
sufficient and we need additional areas
7:54
to fill those gaps ?
7:56
Yeah , of course they can help
7:58
, but there are several problems
8:00
. They can help in general
8:03
to reduce the amount
8:05
of data and so also
8:07
to comply with some of the important principles
8:09
in data protection law
8:12
, both in Europe and US
8:14
, for example , purpose limitation
8:17
, data minimization , of course
8:19
. But I would like
8:21
to explore the two parts of your question
8:23
First , why they're not sufficient
8:26
and second , why they can
8:28
also be somehow detrimental
8:30
, at least in their policy
8:33
impact . So for the first
8:35
part , why they're not sufficient
8:37
, as I tried to explain a few
8:39
minutes ago , privacy and data
8:41
protection are about power control . I
8:44
can manipulate people and
8:46
I can notch people and I can
8:48
harm people online in
8:50
their digital life , even
8:52
if I cannot explicitly single
8:55
them out , even if I cannot
8:57
identify people . The
8:59
problem is they're not sufficient
9:01
because there's the whole harm
9:03
problem that is not entirely solved
9:05
just by anonymization
9:08
, pseudonymization , federated learning
9:10
, synthetic data and so
9:12
on , and
9:18
the whole problem of just pushing on privacy-enhancing technologies
9:20
is that we are losing and we are missing the
9:22
main part of competition
9:25
and power . I would like just to explain
9:28
in a few sentences this Basically
9:30
, what's happening with privacy enhancing
9:33
technologies is that big companies
9:35
with great computational
9:37
capabilities , with huge amount
9:39
of training data sets are
9:41
the companies best placed
9:44
to practice and to implement
9:46
privacy enhancing technologies . They
9:49
will also have legal benefits
9:51
from it because if they can even
9:54
anonymize their data processing
9:56
, they might escape from most of the
9:58
GDPR so General Data
10:00
Protection Regulation duties . And
10:03
it's a paradox that the biggest company
10:05
will be the companies that will not be
10:08
accountable for the GDPR
10:10
because they will be able to anonymize
10:12
or pseudonymize , etc . At
10:14
the same time , smaller companies that will not
10:17
have the power , the
10:19
computational power , the policy power
10:21
, the money to develop
10:24
these privacy-enhancing technologies , will
10:26
be the ones that will be mostly challenged
10:28
by GDPR rules , so by data
10:31
protection rules . Challenged by GDPR
10:33
rules , so by data protection rules . In
10:36
other terms , privacy enhancing technologies are not the solution because they will
10:38
create a distortion effect also on the markets
10:41
, where the less harmful
10:44
actors , like small and medium
10:46
enterprises , will be the ones that will still
10:48
need to comply with the law , while
10:50
the biggest players will probably
10:53
be partially exempted from the rules
10:55
. And maybe just one final thing
10:57
. We will explain it more
10:59
. The use of
11:01
mass privacy in anti-technologies might
11:04
also be detrimental to
11:06
some of the main values
11:08
and principles in data governance
11:11
and data regulation , which is diversity
11:13
and fairness considered in
11:15
a broad sense . For example but
11:18
we will explain it later Synthetic
11:20
data or differential
11:22
privacy tend not to
11:24
consider minority groups
11:27
, and this is problematic for
11:29
diversity and bias detection .
11:31
Just fascinating . I mean you know I've been such
11:33
a champion of shifting left
11:36
into the product and development
11:38
life cycles and that ways to do
11:40
data minimization include privacy
11:42
enhancing technologies . But if you look at it as a monolith
11:44
and as just one big thing that just maybe
11:47
takes organizations outside of being
11:49
covered by regulations , then
11:51
you kind of miss the forest for the trees that maybe
11:53
potentially it can be abused
11:56
or monopolistic power could be
11:58
abusive by using these technologies . So
12:00
I'm really excited to dive in , if
12:03
you don't mind , telling us how you and
12:05
your team approach this topic in your paper
12:07
, and then we'll dive into the specifics , sure
12:10
into
12:18
the specifics ?
12:18
Sure . So in this paper we tried to address the topic of unfair side of VETs from two
12:20
perspectives the legal one and the computer scientist one
12:22
. From the legal perspective , we
12:24
address mostly the concept
12:26
of fairness in its evolution
12:29
and development , starting from the
12:31
law , so from the General Data Protection
12:33
Regulation , from per-information
12:37
practices , from consumer
12:39
protection definition in Europe and beyond
12:41
Europe , the two legal
12:44
authors , so me and my great
12:46
co-author Alessandra Calvi from Freie
12:48
Universität Brussaux , who was also
12:50
the main driver behind
12:53
the paper , we tried to analyze
12:55
the concept of fairness and
12:57
how fairness
12:59
has been developed . First
13:02
we have fairness as diversity
13:04
, so fairness as non-discrimination
13:07
, which is the most accepted
13:09
meaning that computer
13:11
scientists seem to adopt
13:14
when they mention fairness . But
13:16
there's also a concept of fairness
13:18
related to power
13:21
imbalance and power control and imbalance
13:23
mitigation , which is a concept
13:25
that has been growing a lot from consumer
13:27
protection and now also data protection
13:30
. I wrote the paper about the concept of
13:32
fairness in the GDPR four
13:34
years ago and the conclusion
13:36
from a linguistic analysis of fairness
13:38
in many different systems of legislation
13:41
in many different legal countries , legal
13:43
frameworks was fairness
13:46
as loyalty and fairness as
13:48
equal arms power
13:50
control In parallel
13:52
. The technical
13:55
co-author , professor Dimitris
13:57
Kotsinos from University
13:59
Sergi in Paris . He
14:02
analyzed with us
14:04
the different privacy
14:06
enhancing technologies , looking
14:09
at their limits from
14:11
also the perspective of fairness
14:13
that we try to develop in the legal terms
14:16
. So it was kind of a dialogue between
14:18
different disciplines trying to understand first
14:20
fairness and second , how PETs
14:22
are not really fair , friendly
14:24
, let's say .
14:26
Fascinating . At first glance the
14:28
concept of fairness seems kind of straightforward
14:30
to most people , but your paper really
14:33
highlights that the concept can mean different
14:35
things to engineers versus sociologists
14:37
, you know , with potential fairness
14:39
problems that include , like you said , bias
14:42
, discrimination , social injustices and market
14:44
power imbalances . I know you talked
14:46
a little bit about it already , but
14:48
can you unpack maybe each of
14:50
those fairness problems
14:52
and how they link to privacy ?
14:55
The link with privacy is both
14:58
in the law and in
15:00
a logical reasoning as
15:02
a consequence of the concept of privacy . So
15:04
in the law we have fairness
15:06
as one of the principles of data
15:09
protection . It's , for example , focusing mostly on European Union
15:11
law because we know that in the States , for example , we don't have a federal law on privacy and data protection . It's , for example , focusing
15:13
mostly on European Union law because we know that in the States , for example
15:15
, we don't have a federal law on privacy
15:17
and data protection . But in the European Union
15:19
, the fundamental rights to privacy
15:21
and data protection is mentioned in Article
15:24
8 of the European Charter of Fundamental
15:26
Rights , and that article
15:28
refers to fairness . So
15:30
there is a logical link
15:32
between fairness and privacy that even
15:35
the legislators several
15:37
years ago because the article I'm referring
15:39
to in the charter is from
15:41
2000 , so 24 years old
15:43
the legislators already identified
15:46
these links and also
15:48
the GDPR , the General Data Protection Regulation
15:50
, has an explicit reference to fairness
15:52
in the guiding principles of data
15:54
protection . As you said , and
15:56
as I said before , there are different declinations
15:59
, different interpretations of fairness . We have
16:01
fairness as bias mitigation
16:03
, fairness as fight
16:06
against discrimination , fairness
16:08
as equality against
16:11
social injustices , and so not
16:13
just equality but equity and
16:15
fairness as market power , imbalances
16:17
, mitigation . I think all of
16:19
these interpretations are correct and they do not
16:22
contradict with each other . They
16:25
respond to the same challenge
16:27
, which is mitigating
16:29
harms that
16:32
algorithms and data
16:34
technologies can produce . Fairness
16:37
is kind of a safeguard against
16:39
these harms and
16:52
also , fairness is the if you allow me to list legal concepts is mostly an ethical concept . It's mostly
16:54
an ethical concept , sure , because fairness is not a clearly definable concept that lawyers can
16:56
clearly define . Indeed , yeah , as you said , you asked me
16:59
to unpack it . It's not easy to unpack it
17:01
, but I can say that bias
17:03
, for example , and discrimination are inherent
17:05
in data processing . Because , because , of course , the
17:08
effects of data processing is that
17:10
there might be an incomplete
17:12
or non-diverse enough data sets
17:14
that can produce two unfair conclusions
17:18
and unfair automated decisions
17:20
. But what about social injustice ? Social injustice
17:22
is a consequence of this . If
17:24
I process data in a way
17:27
that is incomplete and doesn't take into
17:29
account minorities , marginalized
17:31
groups , people at
17:34
margins , social and economic
17:36
minorities , I
17:38
will be processing data
17:41
and taking decisions that will be
17:43
unfair , and we have a lot of examples . I
17:45
am in the Netherlands now . In the Netherlands , we
17:47
had a lot of scandals based on social
17:50
injustices based on inaccurate
17:53
and unfair data processing for public administration
17:56
. There was a scandal about child
17:58
benefits , but we don't
18:00
have time to address this now . Just to say
18:02
this is important and the other part
18:05
. Just to conclude fairness , as market
18:07
power imbalance mitigation
18:09
is also connected to data processing
18:11
. Why ? Because the big
18:14
power imbalance that we observe
18:16
between individuals and companies
18:18
and big techs in
18:20
the digital environment is
18:22
based on the huge amount of data that
18:25
big techs can process upon
18:27
us . I can just mention , very
18:29
briefly and simply , shoshana
18:32
Zuboff's work the
18:34
Age of Surveillance Capitalism
18:37
. Basically , what we observe
18:39
now is that capitalism is based on
18:42
data and surveillance
18:44
and behavioral surveillance . Exactly
18:46
, data protection is the tool to look at
18:48
power imbalance , because data
18:50
is power .
18:51
Again so fascinating . In the United
18:54
States , we talk about privacy , but we often
18:56
don't talk about data protection as a whole , where
18:58
in the EU , privacy it's
19:00
a piece of the data protection
19:02
mandate , with privacy
19:04
being an enshrined right . I think a
19:06
lot of these big tech companies that you
19:08
reference are run by people
19:11
and then have employees who also
19:13
are not thinking in terms of data
19:15
protection , thinking larger than how
19:18
do I make sure that this person has
19:20
control over their own choices about
19:22
how their data is used ? Right ? It
19:24
is really great to hear from you this
19:26
reminder to think larger about
19:29
societal impacts , the socio-technical
19:31
understanding of fairness , and
19:33
especially wanted to also mention
19:35
that in the EU , the AI
19:38
Act also has a requirement
19:40
around fairness , which kind of leads me
19:42
into the next question , where let's
19:44
dive into some of the analysis of the paper
19:46
. But the first section was on PETs
19:49
for machine learning and AI , and then
19:51
you know , how does that relate to fairness ? So
19:53
let's first talk about data obfuscation
19:56
. That would be anonymization
19:58
, pseudonymization , synthetic
20:00
data , differential privacy , each
20:03
of which builds upon the concept of
20:05
data alteration . How
20:07
are they , as a group , relevant as solutions
20:10
, privacy-enhancing solutions for AI and
20:12
machine learning needs ? And
20:14
then maybe we could go through them more
20:17
specifically in my next question .
20:19
Sure . So I think you
20:21
addressed the main point
20:24
. Data obfuscation
20:26
has been considered
20:28
one of the most important privacy
20:31
preserving practice
20:33
for AI-driven
20:36
technologies . You mentioned
20:38
anonymization , pseudonymization
20:40
, synthetic data and differential privacy
20:43
. They are different but of course
20:45
they react
20:47
to the same challenge , which
20:49
is reducing
20:51
the identifiability of
20:54
single users , single
20:56
individuals , single
20:58
data subjects in
21:00
the digital environment . So
21:03
there is an overarching issue , which
21:05
is privacy . Harms
21:07
are not just individual
21:10
harms , they can be collective
21:12
harms . Privacy
21:14
harms , not just in Europe but
21:16
also in wonderful
21:18
scholarship in the States , has been identified
21:21
as harm not just
21:23
to my private life , my personal
21:25
life in my toilet or in my
21:27
bedroom , but also my
21:30
work life , also democracy
21:32
and freedom of speech
21:35
as a connection to my
21:37
informational freedom . So
21:40
just to say anonymizing
21:42
, pseudonymizing , obfuscating
21:45
data etc . Is
21:47
not maybe the solution
21:50
to collective harms to
21:52
privacy , because even if I cannot
21:54
identify you , I
21:57
can identify your group
21:59
or I can identify the best ways
22:02
to target you or
22:04
to limit your freedoms
22:06
in connection with your digital
22:08
life , limit
22:12
your freedoms in connection with your digital life . So even if I don't exactly
22:14
know your data , your personal data , your identifiable data , I
22:16
can still target you . This
22:19
is something I think mostly
22:21
relevant for this discussion about synonymization
22:24
, synthetic data , etc . Something
22:27
else I wanted to say and I
22:29
already mentioned it before is
22:31
that usually if
22:33
we , for example , focus on synthetic
22:35
data and differential privacy , which are
22:38
very different practices because the
22:40
first synthetic data is based
22:42
on , as we can simplify
22:44
, a reproduction of a data set , so it's not based on real individual
22:47
data . But this synthetic data has a lot of a data set , so it's not based on real individual
22:49
data . But this synthetic
22:51
data , as a lot of computer scientists
22:53
have already identified , tend
22:56
to ignore minority
22:58
groups , tend not to
23:00
look at minorities
23:02
and outliers , and this
23:04
is also for differential
23:07
privacy . Differential privacy is something else
23:09
. Differential privacy is looking at aggregated
23:11
data and making analysis on aggregation
23:14
. But the statistical aggregation
23:16
, in order to protect privacy
23:19
and to limit re-identification
23:22
of single individuals in the aggregation , need
23:25
to delete outliers
23:27
, need not to consider the
23:30
upper and the minimum
23:32
outliers , so they cannot consider
23:35
different groups . They
23:37
need to look at the average . So this is
23:39
the main problem . Right Data obfuscation
23:42
tends to simplify
23:44
all the humanity or all the data
23:46
sets to an average person
23:49
, and this doesn't help to
23:51
mitigate biases but also to
23:54
represent society . If we have
23:56
to take a decision , even a democratic
23:58
decision based on AI and we cannot really
24:01
know what are the single groups and the different
24:03
minorities and outliers in the group
24:05
, because we cannot identify them and we don't
24:07
want to re-identify them . And outliers in the group , because
24:09
we cannot identify them and we don't want to re-identify them . We might have problems of representation
24:12
problem to mostly
24:15
collective harms of privacy and data
24:17
protection . I hope this answered your question
24:19
. Of course it's not easy to answer in
24:21
a few sentences .
24:23
Yes , no , that was really helpful . Let's
24:25
go through some of those data obfuscation PETs
24:28
, maybe briefly explain their
24:30
intended benefit , maybe from a GDPR
24:32
perspective , and then , if there's
24:34
anything specific about each
24:36
one of them that ties to fairness , that'd be
24:38
helpful to understand the context around that . But
24:41
if it's already the summation you
24:43
just gave us , I don't want you to repeat yourself , so
24:45
just let me know . But let's start with anonymization
24:47
.
24:48
Anonymization is , you know , a bit
24:50
of an illusion . We know it's very
24:52
hard to anonymize data if we
24:54
still want to use data right , and
24:56
then of course it depends on which
24:59
is the purpose of our
25:01
data processing activity . But
25:03
in general in the GDPR
25:05
, so in the European Union data protection
25:07
law , it is very hard to reach
25:09
the anonymization level . There is a
25:11
big discussion about what is anonymization
25:14
, because the GDPR
25:17
seems to take a risk-based approach
25:19
, while the guidelines
25:22
of the European Data Protection
25:24
Board which actually dates
25:27
back to the previous entity , the
25:29
entity before the data protection board
25:31
was founded so the Article 29
25:34
Working Party opinion these
25:36
guidelines generally refer to anonymization
25:38
as a zero risk
25:40
of identification approach . So
25:43
basically , if there's even
25:45
a minimum risk of identifying
25:47
someone , it's not anonymous . Of
25:49
course it's impossible to reach that level
25:51
and that standard right , because in
25:53
today's data processing
25:56
environment it's very
25:58
easy to identify someone
26:00
based on some proxies
26:03
, based on a lot of aggregated
26:05
data that we can use to
26:07
infer who is
26:09
a specific individual . So we know there's a
26:11
lot of scholarship on that . Just let's say that anonymization
26:14
is a theoretical
26:16
concept but not a practical
26:19
one , if you agree .
26:20
Yeah , no , in fact , it is kind of fascinating
26:23
because it is one of the few techniques
26:26
that's written into GDPR and
26:28
yet it is not that effective , because
26:30
you could combine a bunch of data
26:32
sets that can re-identify . So
26:35
anonymization techniques can easily
26:37
be broken these days Not all of them
26:39
and not all of them easily but is
26:41
not the panacea that
26:43
many in corporations thought
26:45
it might be to take companies out of
26:47
the regulation ? What about
26:50
pseudonymization ? So things like tokenization
26:53
, masking , generalization and
26:55
other techniques .
27:05
Pseudonymization is much
27:07
easier to beat because pseudonymization
27:09
doesn't mean that we cannot identify individuals anymore , cannot
27:15
identify individuals anymore . Pseudonymization means that we protect data
27:17
in a way that privacy attacks are less harmful . Why
27:19
? Because the identification of
27:21
a dataset is kept
27:23
separate from the dataset itself
27:25
. At least this is the GDPR
27:27
definition , so the European Union definition
27:29
of pseudonymization . There is a legal
27:32
difference and a legal implication if
27:34
we have anonymization
27:37
or pseudonymization . If
27:39
we apply anonymization
27:41
, which I said is very hard in
27:43
practice , the GDPR , so
27:45
the European protection law does not apply at
27:47
all , and also the United
27:49
States law , like the national
27:51
laws , for example , colorado
27:53
, washington , virginia , different laws that we
27:55
have in the States wouldn't apply because
27:58
anonymization doesn't allow us to
28:00
identify people . For pseudonymization
28:02
, the situation is more complex because
28:04
the GDPR applies , because
28:07
the GDPR applies . So
28:13
even if we pseudonymize data through tokenization or masking etc . We should still
28:15
comply with GDPR rules . So
28:17
pseudonymization doesn't solve
28:20
the compliance problem . But
28:22
if the pseudonymization is in practice
28:25
, the data controllers
28:27
or companies that decide
28:29
how to use data and why
28:31
they can
28:33
prove that they protected
28:35
data , and this is helpful for daily
28:38
compliance . So if the regulator
28:41
wants to check about
28:43
compliance , they can always say
28:45
yes , I applied a good protection , which is pseudonymization . Of course
28:47
it depends on which kind of pseudonymization . Of course it
28:49
depends on which kind of pseudonymization . Just
28:52
to summarize , anonymization in
28:55
case of anonymization , we are out of the GDPR
28:57
. In case of pseudonymization , we still
28:59
need to apply the rules of the GDPR , but we have
29:01
sorts of safeguards in place
29:03
that will excuse us
29:06
and will protect us from
29:08
a regulator perspective .
29:10
And then what about synthetic data ? Yeah
29:13
, synthetic data .
29:14
well , it really depends on the purposes
29:16
for our data processing . We
29:18
can say that synthetic data are
29:21
a form of , let's say
29:23
, data obfuscation . That
29:25
might be very useful if we want
29:28
to train algorithms without
29:31
using training data sorry , without
29:33
using personal data , personal identifiable
29:36
data . So synthetic data is
29:38
a form , we can say , of
29:41
data minimization that
29:43
is very useful for , for
29:46
example , reducing
29:48
the legal risks and
29:51
so the possible sanctions if
29:54
we do data scraping
29:56
. So you know , most of data , most
29:58
of training systems , training systems
30:00
for AI are based on scraping
30:02
data from social media
30:05
, from big databases . It's basically
30:07
the download
30:09
or the processing of huge amount of
30:11
publicly available data on
30:14
30:16
, instagram , twitter , google
30:19
. Whatever Synthetic data might
30:21
be a solution to avoid the
30:23
harms produced by scraping
30:25
, but it's not harms to individuals , it's
30:28
harms to business interest mostly
30:31
, and also privacy harms , yeah
30:33
you know , and it really depends on how we process
30:36
. what is the purpose for this synthetic
30:38
data ? I think there's no single definition of
30:40
synthetic data from a legal perspective .
30:42
Yeah , that makes sense . It's a relatively new designated
30:45
, you know , privacy enhancing technology , so I don't think it made
30:48
it into the regulation . And then
30:50
the last for that subheading would
30:52
be differential privacy and then , if
30:54
you want to also link it back to fairness , that'd
30:57
be helpful . Yeah .
30:58
So , as I already said , differential
31:00
privacy is a very problematic
31:03
practice because
31:05
, in a sense , it
31:07
reduces a lot the risks
31:10
of identification . So this is good
31:12
in terms of the traditional view of
31:14
privacy , right , the computer
31:16
scientist view of privacy , privacy
31:18
as known identification
31:21
. But , as I said
31:23
before , differential privacy
31:25
is mostly based on
31:27
aggregated analysis
31:30
of data . The aggregation
31:32
of data can be
31:35
useful for companies
31:37
because , for example , they don't need to identify
31:39
individuals . Sometimes , if I just need to
31:41
understand how effective
31:44
was my
31:46
marketing activity
31:48
on social media , I can just
31:51
consider differential privacy aggregation
31:53
. So , basically , I just analyze
31:55
how my behavioral
31:57
advertising were translated
31:59
into some benefits
32:02
or time spent
32:04
by my users online . I don't
32:06
really need to identify individuals for that
32:08
my
32:11
users online I don't really need
32:13
to identify individuals for that . The problem is that
32:16
if differential privacy , as I already said , is considered an anonymization
32:18
technique an anonymization technique , sorry it
32:21
might exclude the full
32:23
application of data protection rules , which
32:25
has anti-competitive
32:27
consequences in the digital market
32:29
, in particular against smaller enterprises
32:32
. And , on the other hand , in
32:34
order to reduce identifiability
32:37
, differential privacy needs
32:39
to cut the outliers
32:42
. And so , as I was saying , differential
32:44
privacy might be problematic for
32:47
representation of minorities
32:49
and marginalized groups , a disclaimer
32:51
that I am trying to add
32:53
and I emphasize now , is
32:56
that all these technologies cannot
32:58
be considered in silos . So we are speaking
33:00
a bit transversely now , but
33:02
it really depends on what is the specific
33:05
business application of these technologies
33:07
. So my statements
33:09
might be very different if we consider one
33:11
aspect or another , one application or
33:14
another , one case study or another .
33:16
That makes a lot of sense . No , definitely
33:18
. And then there's also the paper
33:21
goes into detail on encrypted
33:23
data processing tools , as well as federated
33:25
and distributed analytics . And
33:28
you know , in interest of time and instead
33:30
of going through each of those specifically
33:32
, do you want to make any connections for
33:34
the audience about those
33:36
privacy-enhancing technology categories
33:38
and fairness and what you found in your
33:40
research ?
33:41
Sure , yeah , I mean , I
33:43
think an important aspect of the paper , as you
33:45
also suggest , is that
33:47
we do not say that PET
33:50
should be avoided . There
33:52
are some benefits in privacy-enhancing
33:55
technologies . We just say that
33:57
they should be just considered one
33:59
of the possible safeguards in place
34:01
, together with many others . So
34:03
for encrypted data , which is
34:05
also considered in legal terms
34:08
a form of enhanced pseudonymization
34:11
, we suggest
34:13
that privacy-enhancing technologies
34:16
are a good safeguard . We
34:18
just say that the whole
34:20
fairness discussion , as I said before
34:23
, in terms of bias
34:25
, detection , diversity , representation
34:27
, power mitigation , is not
34:30
addressed by , for example
34:32
, encryption .
34:33
Awesome . Thank you for that . So we
34:36
kind of just went through an exploration of specific
34:38
groups of privacy enhancing technologies , but
34:40
now I want to turn to some of the
34:42
technical and regulatory solutions that
34:45
address some of these PET shortcomings
34:47
. Your team lists
34:49
three main PET shortcomings
34:52
in its research . When it comes to PETs
34:54
and again you've alluded to these , but
34:56
I'll restate them Bias discovery
34:59
, harms to people , protected
35:02
groups and individuals , autonomy and market
35:04
imbalance . What
35:07
technical and regulatory solutions do you propose
35:09
to address each of these shortcomings ? First
35:12
, let's start with PETs and bias
35:14
discovery .
35:15
We are not sure
35:17
that we can really really propose immediately
35:20
applicable solutions . But of course
35:22
, I think , as I said before
35:24
, privacy and anti-technologies should
35:26
not be the sole safeguards in
35:28
place . So for BIOS discovery , there's
35:31
a lot that we can do . First
35:33
of all , we shouldn't look
35:35
always for automated
35:38
solutions . So I think this is important
35:40
from also a legal
35:42
scholar as me , as a message
35:45
that automation is
35:47
not always the solution to
35:49
automation problems . If
35:52
some problems were inherent
35:54
in automation , the
35:56
solution might be just different
35:58
, like social business
36:01
, et cetera . I will try to explain better
36:03
. For bias discovery , for example
36:05
, one of the most interesting ongoing
36:08
discussion is involvement
36:11
of impacted groups in
36:13
the assessment of a technology
36:16
, in the assessment of arms and in the
36:18
assessment of impacts of
36:20
technologies and fundamental rights . If
36:22
we need to discover biases of
36:25
AI , which now
36:27
are also problems , for example
36:29
for generative AI , like hallucinations
36:31
or misalignment , et cetera , we
36:34
need impacted groups
36:36
to stand up and to help
36:38
the AI developers
36:40
to identify gaps and
36:42
issues . Basically , what I'm
36:45
saying here is that we should look
36:47
at business models , not just the technologies
36:49
. We should look at how different
36:51
business models address solutions
36:54
and decisions and how these decisions
36:57
can be modified and improved
36:59
and how we can empower impacted
37:02
groups . I don't think we will ever
37:04
have an automated bias
37:06
discovery solution , but
37:09
of course there are very good
37:11
bias discovery
37:13
solutions that might
37:16
benefit from participatory
37:19
approaches , from participation
37:21
of impacted individuals in the impact assessment
37:24
.
37:25
Fascinating . What about people belonging
37:27
to protected groups ? What are some of the
37:29
? You know that is a shortcoming
37:31
that was highlighted with PETs , that
37:33
it doesn't appropriately address those marginalized
37:36
groups or protected groups . Would
37:38
you suggest a similar technical
37:40
and regulatory solution as you just
37:42
did with bias discovery , or is there something else ?
37:45
Yeah , I mean , as I said before , the
37:47
biggest problem about impacted groups
37:49
is that they are underrepresented
37:51
and they are the most impacted
37:54
groups , so the groups that
37:56
have the most adverse impacts
37:58
in terms of technology applications
38:01
. So there's a problem here
38:03
, which is a problem of democratic participation
38:05
, but also a problem of decision-making
38:09
and fairness in practice
38:11
. Some of the solutions is
38:13
indeed participation and
38:15
multi-stakeholder participation . I'm
38:18
just publishing now I mean next
38:20
month a co-author and I will
38:22
publish an article about
38:25
stakeholder participation . The co-author
38:27
is Margot Kaminski from Colorado Law
38:29
School and the
38:31
journalist is the Yale Journal of Law and Technology . We
38:33
are trying to discuss how privacy
38:36
governance so data governance and
38:38
AI governance can be improved by
38:41
multistakeholder participation , in particular
38:44
, for people belonging to protected
38:46
groups . There's a problem , of course , and
38:48
the problem is how to define these groups
38:51
. Should we just rely on undiscrimination
38:54
laws defining protected
38:56
groups , or should we rely on something
38:58
else ? This is an ongoing discussion . We
39:00
don't have time now to address this , but
39:02
we can , of course , start from
39:05
the most vocal and most visible
39:07
groups impacted by technologies
39:10
. I can make three , four examples Children
39:13
, older people , racialized
39:16
communities , victims of gender-based
39:19
violence , lgbti
39:21
plus communities , and I could
39:23
go on , but we could start from these
39:25
groups and look at
39:27
how , together with privacy enhancing
39:29
technologies , diversity of
39:31
these groups could be considered
39:33
. So , just to be very practical , we
39:35
apply privacy enhancing technologies , for example
39:37
, in a business model , but then
39:40
we check the impact with
39:43
impacted groups . So , basically , we
39:46
put the privacy enhancing anti-technology's effects
39:48
into a bigger and
39:50
broader multi-stakeholder decision-making
39:52
where impacted groups' representatives
39:55
can express their views .
39:57
That's awesome . I really look forward to
39:59
reading that paper when it comes out
40:01
, In addition to the paper we're discussing
40:03
today , which I will include in
40:05
our show notes . I will also update
40:08
the show notes to include a link to your
40:10
future paper once you publish it
40:12
. The last but
40:14
not least area where there's
40:16
a shortcoming would be individual autonomy
40:18
and market imbalances . Talk
40:21
to us a little bit about what potential solutions
40:24
to this shortcoming would be .
40:25
Yeah , of course we cannot discuss
40:27
, as I said before , privacy and
40:29
anti-technologists in general . We should always
40:32
look at how single
40:34
privacy and anti-technologists practices
40:37
are affecting
40:39
some of the fairness components in
40:41
practice . But what I
40:43
might say is that market
40:46
imbalance should be regulated
40:48
not just for privacy
40:51
in the narrow sense , but
40:53
we should consider a lot of
40:55
different obligations that
40:57
can reduce market
40:59
dominance . I will make a simple example
41:01
In the European Union , two years ago
41:04
, the Digital Markets Act was
41:06
approved . The Digital Markets Act
41:08
is an important power rebalancing
41:11
tool , imposing a lot of duties
41:13
in terms of competition
41:16
law and fair access
41:18
to data and also consent
41:21
to data processing . So , just referring
41:23
also to individual autonomy that you mentioned
41:25
and the DMA , the Digital Markets
41:27
Act is an important tool
41:30
that complements privacy
41:32
and data protection . Just to say
41:34
, privacy and anti-technologies are a great
41:36
tool that should be complemented
41:39
by specific rules
41:42
in terms of market control . This
41:44
is clear , for example , in
41:46
reducing abusive practices
41:49
. That can happen when
41:51
big techs might
41:54
manipulate individuals or might
41:56
exploit dependencies , because this is
41:58
another problem . I didn't mention that term so
42:00
far , but dependency is the
42:02
problem that we really want
42:05
to address . We depend
42:07
on social media , we depend on big
42:09
techs , we depend on social giants
42:11
, and this dependency is
42:13
the real power imbalance problem
42:16
. So the states should take
42:18
a position against these dependencies
42:21
, either imposing rules
42:24
and fundamental rights enforcement
42:26
duties on big tax
42:28
or prohibiting some
42:31
abusive practices .
42:33
It's a lot to think about . I'm not sure there's the
42:35
political will to make it happen , but we'll see if
42:37
we can get a federal law
42:39
that embodies all of that . What
42:41
were some of your team's conclusions at
42:44
the end of writing this , and where might there
42:46
be some areas where you might want to do
42:48
some more research , or more research is needed ?
42:50
Sure , I think a lot of research
42:53
is still needed . Just to
42:55
make some examples , we
42:57
couldn't go deeper on
42:59
each single privacy-enhancing technology
43:02
in practice , and also we
43:05
should look at how , for example , generative
43:07
AI is altering the
43:09
discussion . So our paper didn't
43:11
consider generative AI challenges
43:13
, but of course this is perhaps
43:15
chapter two of our
43:17
activity . How can
43:20
privacy-enhancing technologies help
43:23
or not help for hallucinations
43:26
and misalignment of generative
43:28
AI systems , where
43:31
fairness is a big problem , because
43:33
we know that hallucination and
43:35
misalignment can produce
43:37
discrimination on generative AI
43:39
. For example , chatbot or image search
43:42
engines can produce
43:44
stereotypes , can induce
43:47
harms . So of course
43:49
, these are some of the areas that we
43:51
need to investigate
43:53
in the future and it's just part of
43:55
the problem .
43:56
I think that really sums up a lot of what's needed
43:58
. In fact , I'll be on the lookout for some working
44:01
groups or standards or
44:03
just more research coming out on the topic
44:05
. I think one of the things I've been thinking
44:07
about either doing myself or kind of surprised
44:09
I haven't seen much out on the market around them but
44:11
is a listing of all of the privacy
44:14
enhancing technologies based on different use cases
44:16
, but also based on what
44:18
is the privacy guarantee
44:21
that the organization wants to ensure
44:23
by using that PET
44:25
and then working backwards to
44:27
see or , I'm sorry , by using
44:29
a PET and then working backwards to see which
44:31
PET or set of PETs
44:33
would get that job done . But
44:36
this conversation has made me really think
44:38
about . We need to think broader than
44:40
just can we do the thing ? Can
44:43
we achieve this end goal and instead
44:45
broaden it to also include are we
44:47
being fair to the individual and
44:49
to , like society generally , the
44:52
group of individuals ? So really
44:54
a lot to think about . Thank you so much
44:56
for your time today . Are there any words of
44:58
wisdom that you'd like to leave the audience with before
45:00
we close today ?
45:01
I think we all
45:03
, as a scientific
45:06
and technological and policy
45:09
community , should consider
45:12
privacy announcing technologies bigger
45:14
and shift from PETs
45:17
to FETs . So
45:19
fairness announcing technologies , which
45:21
is not so difficult to reach , we
45:23
just need to think a bit more critical
45:26
and a bit broader . And what
45:28
are the real goals ? The real goals
45:30
are protecting
45:32
the most impactful groups
45:35
, the most marginalized and
45:37
impacted groups in the digital environments
45:40
.
45:40
What a great idea and really
45:42
elevating it beyond just privacy
45:45
to meet fairness . So you'll meet a lot of goals
45:47
there , right Including just , especially
45:50
if you apply to AI . Well , thank you so
45:52
much , jean-claudio . Thank you for
45:54
joining us today on the Shifting Privacy Left
45:56
podcast . Until next Tuesday
45:58
, everyone , when we'll be back with engaging content
46:01
and another great guest . Thank
46:03
you so much , bye-bye . Thanks
46:07
for joining us this week on Shifting Privacy
46:10
Left . Make
46:16
sure to visit our website , shiftingprivacyleftcom , where you can subscribe to updates so you'll
46:18
never miss a show While you're at it . If you found this episode
46:20
valuable , go ahead and share it with a friend
46:23
, and if you're an engineer who
46:25
cares passionately about privacy , check
46:27
out Privato , the developer-friendly
46:29
privacy platform and sponsor of this show
46:31
. To learn more , go to privatoai
46:34
. Be sure to tune in next Tuesday
46:36
for a new episode . Bye for now .
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More