S3E14: 'Why We Need Fairness Enhancing Technologies Rather Than PETs' with Gianclaudio Malgieri (Brussels Privacy Hub) by The Shifting Privacy Left Podcast | Podchaser

Episode from the podcastThe Shifting Privacy Left Podcast

S3E14: 'Why We Need Fairness Enhancing Technologies Rather Than PETs' with Gianclaudio Malgieri (Brussels Privacy Hub)

Released Tuesday, 25th June 2024

Good episode? Give it some love!

S3E14: 'Why We Need Fairness Enhancing Technologies Rather Than PETs' with Gianclaudio Malgieri (Brussels Privacy Hub)

S3E14: 'Why We Need Fairness Enhancing Technologies Rather Than PETs' with Gianclaudio Malgieri (Brussels Privacy Hub)

Tuesday, 25th June 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:48

We all , as a scientific

0:52

and technological and policy

0:54

community , should consider

0:57

privacy announcing technologies bigger

1:00

and shift from PETs

1:02

to FETs . So

1:04

fairness-enhancing technologies , which

1:06

is not so difficult to reach . We

1:09

just need to think a bit more critical

1:11

and a bit broader . And what

1:13

are the real goals ? The real goals

1:15

are protecting

1:17

the most impactful

1:19

groups , the most marginalized

1:22

and impacted groups in the

1:24

digital environments .

1:26

Hello , I am Debra J Farber

1:28

. Welcome to The Shifting Privacy

1:30

Left Podcast , where we talk about

1:32

embedding privacy by design and default into

1:34

the engineering function to prevent

1:36

privacy harms to humans and

1:39

to prevent dystopia . Each

1:41

week , we'll bring you unique discussions with

1:43

global privacy technologists and innovators

1:45

working at the bleeding edge of privacy research

1:47

and emerging technologies , standards

1:50

, business models and ecosystems . Welcome

1:54

everyone to the Shifting Privacy Left podcast

1:57

. I'm your host and resident privacy

1:59

guru , Debra J Farber . Today

2:02

, I'm delighted to welcome my next guest , Gianclaudio

2:05

Malgieri , Associate

2:07

Professor of Law and Technology at Leiden

2:10

University; Co-Director of

2:12

the Brussels Hub Hub; ,

2:15

Professor the Computer Law and Security

2:17

Review Review; A of the book

2:20

Vulnerability and Data Protection

2:22

Law Law; and expert in privacy

2:24

, data protection , AI regulation , law

2:26

and technology , EU law and

2:29

human rights . Today , we're going to be discussing

2:31

his recently co-authored paper on the

2:34

fairness of privacy enhancing Welcinem

2:37

Welcome Jean-Claudio A ! P H . Welcome , Jean-Claudio .

2:39

Thank you , Debra , I'm very happy to be here .

2:41

Excellent . Well , I know we're going

2:44

to have a really lively conversation , but

2:46

before we get into your paper , I'd love

2:48

for you to tell us a little bit about some of your academic

2:50

work with the Brussels Privacy Hub

2:53

, and you know how you got into privacy to

2:55

begin with .

2:56

Yeah , the Brussels Privacy Hub is

2:58

a special think tank based

3:00

in the Freie Universität Brussels in

3:02

Brussels , and the position

3:04

of the hub in Brussels is really helpful

3:07

for engaging with policy

3:09

making , academia

3:11

and all countries that

3:14

in Europe are very active in terms of

3:16

academic efforts for privacy

3:19

technology , regulation and policy

3:21

. But I'm also , as you said , working

3:24

full-time in Leiden

3:26

, which is the oldest Dutch university

3:29

and one of the biggest

3:31

law faculties in the Netherlands , so

3:33

I'm trying to exploit

3:35

these links . But I'm also Italian , so

3:37

I'm connecting with the Mediterranean tradition

3:40

on privacy and

3:42

data protection , also trying to connect

3:44

bridges with US because I'm part of the

3:46

privacy law scholarship conferences in the

3:48

States . The hub is

3:50

trying to push on several different

3:52

aspects during research , but

3:55

also trying to push for

3:57

important debates . Just

3:59

to make some examples of our activity

4:02

. I would like to mention three main

4:04

topics that we are exploring now

4:06

. The first is impact assessments

4:09

and fundamental rights , so how technologies

4:11

can be considered from an

4:13

impact assessment perspective . In particular

4:16

, we looked at the Artificial Intelligence Act

4:18

in the European Union , which was very

4:21

important for the impact

4:23

assessment discussion . The Brussels

4:25

Privacy Hub pioneered a

4:27

letter signed by 160

4:30

university professors to push

4:32

the legislators to add a

4:35

solid fundamental rights impact assessment

4:37

in the final text of the European Union

4:39

law on AI and

4:41

we were successful . It's now there

4:43

. So this is just one of the three

4:46

main things we do . A second

4:48

example is vulnerability and data protection

4:50

, which is also , as you said , the

4:52

name of my book . We found

4:55

, within the hub , a group

4:57

called Vulnera . It's a research

4:59

network and dissemination

5:02

platform where we try

5:04

to focus on vulnerabilities and different

5:07

vulnerabilities in different situations and

5:09

in different groups . And the last

5:11

but not least , part of

5:13

our research and activity is

5:16

about data governance and data

5:18

transfer , which is due to

5:20

the tradition of the Brossel Privacy Hub that

5:22

was funded by Paul D'Arte and Chris Kuhner

5:24

. Chris Kuhner , in particular , was one of the

5:26

great scholars on the topic of

5:28

data transfer and data governance .

5:31

Oh wow , I learned a lot . I had no

5:33

idea about some of the things that

5:35

had transpired . That kind of led to the

5:37

Brussels Privacy Hub , and it

5:39

makes sense that you're located in Brussels

5:42

to have that kind of effect on policy in the EU

5:44

. You recently co-authored this

5:46

article and the title is the

5:48

Unfair Side of Privacy Enhancing Technologies

5:51

Addressing the Trade-offs Between PETs

5:54

and Fairness . Maybe you could tell us what

5:56

inspired you all to

5:58

write on this topic , like fairness

6:00

and PETs .

6:02

Yeah , sure , I think what inspired

6:05

me and my two co-authors

6:07

is mostly the

6:10

narrative and how

6:12

to say , distorted narrative

6:14

about privacy and anti-technologies that

6:17

big industrial lobbies are

6:19

pushing for in

6:21

the EU , us

6:24

and in general in global discussion

6:26

on technology regulation . We

6:29

had a lot of emphasis

6:31

on the importance and benefits

6:33

of privacy enhancing technologies . In the last

6:35

years we had important

6:38

initiatives about privacy enhancing technologies

6:41

and a

6:43

lot of explanation on the importance

6:46

, a lot of marketing , a lot of

6:48

advertisement on how great

6:50

privacy enhancing technologies are , and

6:52

we could agree to a certain extent . The

6:55

problem is that the narrative is

6:57

incomplete because , as

6:59

maybe we will say later , fairness

7:02

and privacy and data protection are much

7:04

beyond anonymization

7:06

and pseudonymization . It's

7:08

about power control

7:11

, power management and

7:13

also mitigation of

7:15

power imbalance in the

7:18

digital landscape . So

7:20

it's not just about not

7:22

identifying individuals

7:25

. It's also about

7:27

controlling and managing

7:29

power imbalances for

7:32

big dominant platforms , and

7:35

so for us , the main trigger

7:37

was privacy and antidepressant technologies

7:40

are important , but

7:42

are not the solutions for

7:44

the digital policy

7:46

challenges . It's not the solution we are

7:48

looking for .

7:50

So is it that it's not part of the solution , or is it not

7:52

sufficient and we need additional areas

7:54

to fill those gaps ?

7:56

Yeah , of course they can help

7:58

, but there are several problems

8:00

. They can help in general

8:03

to reduce the amount

8:05

of data and so also

8:07

to comply with some of the important principles

8:09

in data protection law

8:12

, both in Europe and US

8:14

, for example , purpose limitation

8:17

, data minimization , of course

8:19

. But I would like

8:21

to explore the two parts of your question

8:23

First , why they're not sufficient

8:26

and second , why they can

8:28

also be somehow detrimental

8:30

, at least in their policy

8:33

impact . So for the first

8:35

part , why they're not sufficient

8:37

, as I tried to explain a few

8:39

minutes ago , privacy and data

8:41

protection are about power control . I

8:44

can manipulate people and

8:46

I can notch people and I can

8:48

harm people online in

8:50

their digital life , even

8:52

if I cannot explicitly single

8:55

them out , even if I cannot

8:57

identify people . The

8:59

problem is they're not sufficient

9:01

because there's the whole harm

9:03

problem that is not entirely solved

9:05

just by anonymization

9:08

, pseudonymization , federated learning

9:10

, synthetic data and so

9:12

on , and

9:18

the whole problem of just pushing on privacy-enhancing technologies

9:20

is that we are losing and we are missing the

9:22

main part of competition

9:25

and power . I would like just to explain

9:28

in a few sentences this Basically

9:30

, what's happening with privacy enhancing

9:33

technologies is that big companies

9:35

with great computational

9:37

capabilities , with huge amount

9:39

of training data sets are

9:41

the companies best placed

9:44

to practice and to implement

9:46

privacy enhancing technologies . They

9:49

will also have legal benefits

9:51

from it because if they can even

9:54

anonymize their data processing

9:56

, they might escape from most of the

9:58

GDPR so General Data

10:00

Protection Regulation duties . And

10:03

it's a paradox that the biggest company

10:05

will be the companies that will not be

10:08

accountable for the GDPR

10:10

because they will be able to anonymize

10:12

or pseudonymize , etc . At

10:14

the same time , smaller companies that will not

10:17

have the power , the

10:19

computational power , the policy power

10:21

, the money to develop

10:24

these privacy-enhancing technologies , will

10:26

be the ones that will be mostly challenged

10:28

by GDPR rules , so by data

10:31

protection rules . Challenged by GDPR

10:33

rules , so by data protection rules . In

10:36

other terms , privacy enhancing technologies are not the solution because they will

10:38

create a distortion effect also on the markets

10:41

, where the less harmful

10:44

actors , like small and medium

10:46

enterprises , will be the ones that will still

10:48

need to comply with the law , while

10:50

the biggest players will probably

10:53

be partially exempted from the rules

10:55

. And maybe just one final thing

10:57

. We will explain it more

10:59

. The use of

11:01

mass privacy in anti-technologies might

11:04

also be detrimental to

11:06

some of the main values

11:08

and principles in data governance

11:11

and data regulation , which is diversity

11:13

and fairness considered in

11:15

a broad sense . For example but

11:18

we will explain it later Synthetic

11:20

data or differential

11:22

privacy tend not to

11:24

consider minority groups

11:27

, and this is problematic for

11:29

diversity and bias detection .

11:31

Just fascinating . I mean you know I've been such

11:33

a champion of shifting left

11:36

into the product and development

11:38

life cycles and that ways to do

11:40

data minimization include privacy

11:42

enhancing technologies . But if you look at it as a monolith

11:44

and as just one big thing that just maybe

11:47

takes organizations outside of being

11:49

covered by regulations , then

11:51

you kind of miss the forest for the trees that maybe

11:53

potentially it can be abused

11:56

or monopolistic power could be

11:58

abusive by using these technologies . So

12:00

I'm really excited to dive in , if

12:03

you don't mind , telling us how you and

12:05

your team approach this topic in your paper

12:07

, and then we'll dive into the specifics , sure

12:10

into

12:18

the specifics ?

12:18

Sure . So in this paper we tried to address the topic of unfair side of VETs from two

12:20

perspectives the legal one and the computer scientist one

12:22

. From the legal perspective , we

12:24

address mostly the concept

12:26

of fairness in its evolution

12:29

and development , starting from the

12:31

law , so from the General Data Protection

12:33

Regulation , from per-information

12:37

practices , from consumer

12:39

protection definition in Europe and beyond

12:41

Europe , the two legal

12:44

authors , so me and my great

12:46

co-author Alessandra Calvi from Freie

12:48

Universität Brussaux , who was also

12:50

the main driver behind

12:53

the paper , we tried to analyze

12:55

the concept of fairness and

12:57

how fairness

12:59

has been developed . First

13:02

we have fairness as diversity

13:04

, so fairness as non-discrimination

13:07

, which is the most accepted

13:09

meaning that computer

13:11

scientists seem to adopt

13:14

when they mention fairness . But

13:16

there's also a concept of fairness

13:18

related to power

13:21

imbalance and power control and imbalance

13:23

mitigation , which is a concept

13:25

that has been growing a lot from consumer

13:27

protection and now also data protection

13:30

. I wrote the paper about the concept of

13:32

fairness in the GDPR four

13:34

years ago and the conclusion

13:36

from a linguistic analysis of fairness

13:38

in many different systems of legislation

13:41

in many different legal countries , legal

13:43

frameworks was fairness

13:46

as loyalty and fairness as

13:48

equal arms power

13:50

control In parallel

13:52

. The technical

13:55

co-author , professor Dimitris

13:57

Kotsinos from University

13:59

Sergi in Paris . He

14:02

analyzed with us

14:04

the different privacy

14:06

enhancing technologies , looking

14:09

at their limits from

14:11

also the perspective of fairness

14:13

that we try to develop in the legal terms

14:16

. So it was kind of a dialogue between

14:18

different disciplines trying to understand first

14:20

fairness and second , how PETs

14:22

are not really fair , friendly

14:24

, let's say .

14:26

Fascinating . At first glance the

14:28

concept of fairness seems kind of straightforward

14:30

to most people , but your paper really

14:33

highlights that the concept can mean different

14:35

things to engineers versus sociologists

14:37

, you know , with potential fairness

14:39

problems that include , like you said , bias

14:42

, discrimination , social injustices and market

14:44

power imbalances . I know you talked

14:46

a little bit about it already , but

14:48

can you unpack maybe each of

14:50

those fairness problems

14:52

and how they link to privacy ?

14:55

The link with privacy is both

14:58

in the law and in

15:00

a logical reasoning as

15:02

a consequence of the concept of privacy . So

15:04

in the law we have fairness

15:06

as one of the principles of data

15:09

protection . It's , for example , focusing mostly on European Union

15:11

law because we know that in the States , for example , we don't have a federal law on privacy and data protection . It's , for example , focusing

15:13

mostly on European Union law because we know that in the States , for example

15:15

, we don't have a federal law on privacy

15:17

and data protection . But in the European Union

15:19

, the fundamental rights to privacy

15:21

and data protection is mentioned in Article

15:24

8 of the European Charter of Fundamental

15:26

Rights , and that article

15:28

refers to fairness . So

15:30

there is a logical link

15:32

between fairness and privacy that even

15:35

the legislators several

15:37

years ago because the article I'm referring

15:39

to in the charter is from

15:41

2000 , so 24 years old

15:43

the legislators already identified

15:46

these links and also

15:48

the GDPR , the General Data Protection Regulation

15:50

, has an explicit reference to fairness

15:52

in the guiding principles of data

15:54

protection . As you said , and

15:56

as I said before , there are different declinations

15:59

, different interpretations of fairness . We have

16:01

fairness as bias mitigation

16:03

, fairness as fight

16:06

against discrimination , fairness

16:08

as equality against

16:11

social injustices , and so not

16:13

just equality but equity and

16:15

fairness as market power , imbalances

16:17

, mitigation . I think all of

16:19

these interpretations are correct and they do not

16:22

contradict with each other . They

16:25

respond to the same challenge

16:27

, which is mitigating

16:29

harms that

16:32

algorithms and data

16:34

technologies can produce . Fairness

16:37

is kind of a safeguard against

16:39

these harms and

16:52

also , fairness is the if you allow me to list legal concepts is mostly an ethical concept . It's mostly

16:54

an ethical concept , sure , because fairness is not a clearly definable concept that lawyers can

16:56

clearly define . Indeed , yeah , as you said , you asked me

16:59

to unpack it . It's not easy to unpack it

17:01

, but I can say that bias

17:03

, for example , and discrimination are inherent

17:05

in data processing . Because , because , of course , the

17:08

effects of data processing is that

17:10

there might be an incomplete

17:12

or non-diverse enough data sets

17:14

that can produce two unfair conclusions

17:18

and unfair automated decisions

17:20

. But what about social injustice ? Social injustice

17:22

is a consequence of this . If

17:24

I process data in a way

17:27

that is incomplete and doesn't take into

17:29

account minorities , marginalized

17:31

groups , people at

17:34

margins , social and economic

17:36

minorities , I

17:38

will be processing data

17:41

and taking decisions that will be

17:43

unfair , and we have a lot of examples . I

17:45

am in the Netherlands now . In the Netherlands , we

17:47

had a lot of scandals based on social

17:50

injustices based on inaccurate

17:53

and unfair data processing for public administration

17:56

. There was a scandal about child

17:58

benefits , but we don't

18:00

have time to address this now . Just to say

18:02

this is important and the other part

18:05

. Just to conclude fairness , as market

18:07

power imbalance mitigation

18:09

is also connected to data processing

18:11

. Why ? Because the big

18:14

power imbalance that we observe

18:16

between individuals and companies

18:18

and big techs in

18:20

the digital environment is

18:22

based on the huge amount of data that

18:25

big techs can process upon

18:27

us . I can just mention , very

18:29

briefly and simply , shoshana

18:32

Zuboff's work the

18:34

Age of Surveillance Capitalism

18:37

. Basically , what we observe

18:39

now is that capitalism is based on

18:42

data and surveillance

18:44

and behavioral surveillance . Exactly

18:46

, data protection is the tool to look at

18:48

power imbalance , because data

18:50

is power .

18:51

Again so fascinating . In the United

18:54

States , we talk about privacy , but we often

18:56

don't talk about data protection as a whole , where

18:58

in the EU , privacy it's

19:00

a piece of the data protection

19:02

mandate , with privacy

19:04

being an enshrined right . I think a

19:06

lot of these big tech companies that you

19:08

reference are run by people

19:11

and then have employees who also

19:13

are not thinking in terms of data

19:15

protection , thinking larger than how

19:18

do I make sure that this person has

19:20

control over their own choices about

19:22

how their data is used ? Right ? It

19:24

is really great to hear from you this

19:26

reminder to think larger about

19:29

societal impacts , the socio-technical

19:31

understanding of fairness , and

19:33

especially wanted to also mention

19:35

that in the EU , the AI

19:38

Act also has a requirement

19:40

around fairness , which kind of leads me

19:42

into the next question , where let's

19:44

dive into some of the analysis of the paper

19:46

. But the first section was on PETs

19:49

for machine learning and AI , and then

19:51

you know , how does that relate to fairness ? So

19:53

let's first talk about data obfuscation

19:56

. That would be anonymization

19:58

, pseudonymization , synthetic

20:00

data , differential privacy , each

20:03

of which builds upon the concept of

20:05

data alteration . How

20:07

are they , as a group , relevant as solutions

20:10

, privacy-enhancing solutions for AI and

20:12

machine learning needs ? And

20:14

then maybe we could go through them more

20:17

specifically in my next question .

20:19

Sure . So I think you

20:21

addressed the main point

20:24

. Data obfuscation

20:26

has been considered

20:28

one of the most important privacy

20:31

preserving practice

20:33

for AI-driven

20:36

technologies . You mentioned

20:38

anonymization , pseudonymization

20:40

, synthetic data and differential privacy

20:43

. They are different but of course

20:45

they react

20:47

to the same challenge , which

20:49

is reducing

20:51

the identifiability of

20:54

single users , single

20:56

individuals , single

20:58

data subjects in

21:00

the digital environment . So

21:03

there is an overarching issue , which

21:05

is privacy . Harms

21:07

are not just individual

21:10

harms , they can be collective

21:12

harms . Privacy

21:14

harms , not just in Europe but

21:16

also in wonderful

21:18

scholarship in the States , has been identified

21:21

as harm not just

21:23

to my private life , my personal

21:25

life in my toilet or in my

21:27

bedroom , but also my

21:30

work life , also democracy

21:32

and freedom of speech

21:35

as a connection to my

21:37

informational freedom . So

21:40

just to say anonymizing

21:42

, pseudonymizing , obfuscating

21:45

data etc . Is

21:47

not maybe the solution

21:50

to collective harms to

21:52

privacy , because even if I cannot

21:54

identify you , I

21:57

can identify your group

21:59

or I can identify the best ways

22:02

to target you or

22:04

to limit your freedoms

22:06

in connection with your digital

22:08

life , limit

22:12

your freedoms in connection with your digital life . So even if I don't exactly

22:14

know your data , your personal data , your identifiable data , I

22:16

can still target you . This

22:19

is something I think mostly

22:21

relevant for this discussion about synonymization

22:24

, synthetic data , etc . Something

22:27

else I wanted to say and I

22:29

already mentioned it before is

22:31

that usually if

22:33

we , for example , focus on synthetic

22:35

data and differential privacy , which are

22:38

very different practices because the

22:40

first synthetic data is based

22:42

on , as we can simplify

22:44

, a reproduction of a data set , so it's not based on real individual

22:47

data . But this synthetic data has a lot of a data set , so it's not based on real individual

22:49

data . But this synthetic

22:51

data , as a lot of computer scientists

22:53

have already identified , tend

22:56

to ignore minority

22:58

groups , tend not to

23:00

look at minorities

23:02

and outliers , and this

23:04

is also for differential

23:07

privacy . Differential privacy is something else

23:09

. Differential privacy is looking at aggregated

23:11

data and making analysis on aggregation

23:14

. But the statistical aggregation

23:16

, in order to protect privacy

23:19

and to limit re-identification

23:22

of single individuals in the aggregation , need

23:25

to delete outliers

23:27

, need not to consider the

23:30

upper and the minimum

23:32

outliers , so they cannot consider

23:35

different groups . They

23:37

need to look at the average . So this is

23:39

the main problem . Right Data obfuscation

23:42

tends to simplify

23:44

all the humanity or all the data

23:46

sets to an average person

23:49

, and this doesn't help to

23:51

mitigate biases but also to

23:54

represent society . If we have

23:56

to take a decision , even a democratic

23:58

decision based on AI and we cannot really

24:01

know what are the single groups and the different

24:03

minorities and outliers in the group

24:05

, because we cannot identify them and we don't

24:07

want to re-identify them . And outliers in the group , because

24:09

we cannot identify them and we don't want to re-identify them . We might have problems of representation

24:12

problem to mostly

24:15

collective harms of privacy and data

24:17

protection . I hope this answered your question

24:19

. Of course it's not easy to answer in

24:21

a few sentences .

24:23

Yes , no , that was really helpful . Let's

24:25

go through some of those data obfuscation PETs

24:28

, maybe briefly explain their

24:30

intended benefit , maybe from a GDPR

24:32

perspective , and then , if there's

24:34

anything specific about each

24:36

one of them that ties to fairness , that'd be

24:38

helpful to understand the context around that . But

24:41

if it's already the summation you

24:43

just gave us , I don't want you to repeat yourself , so

24:45

just let me know . But let's start with anonymization

24:47

.

24:48

Anonymization is , you know , a bit

24:50

of an illusion . We know it's very

24:52

hard to anonymize data if we

24:54

still want to use data right , and

24:56

then of course it depends on which

24:59

is the purpose of our

25:01

data processing activity . But

25:03

in general in the GDPR

25:05

, so in the European Union data protection

25:07

law , it is very hard to reach

25:09

the anonymization level . There is a

25:11

big discussion about what is anonymization

25:14

, because the GDPR

25:17

seems to take a risk-based approach

25:19

, while the guidelines

25:22

of the European Data Protection

25:24

Board which actually dates

25:27

back to the previous entity , the

25:29

entity before the data protection board

25:31

was founded so the Article 29

25:34

Working Party opinion these

25:36

guidelines generally refer to anonymization

25:38

as a zero risk

25:40

of identification approach . So

25:43

basically , if there's even

25:45

a minimum risk of identifying

25:47

someone , it's not anonymous . Of

25:49

course it's impossible to reach that level

25:51

and that standard right , because in

25:53

today's data processing

25:56

environment it's very

25:58

easy to identify someone

26:00

based on some proxies

26:03

, based on a lot of aggregated

26:05

data that we can use to

26:07

infer who is

26:09

a specific individual . So we know there's a

26:11

lot of scholarship on that . Just let's say that anonymization

26:14

is a theoretical

26:16

concept but not a practical

26:19

one , if you agree .

26:20

Yeah , no , in fact , it is kind of fascinating

26:23

because it is one of the few techniques

26:26

that's written into GDPR and

26:28

yet it is not that effective , because

26:30

you could combine a bunch of data

26:32

sets that can re-identify . So

26:35

anonymization techniques can easily

26:37

be broken these days Not all of them

26:39

and not all of them easily but is

26:41

not the panacea that

26:43

many in corporations thought

26:45

it might be to take companies out of

26:47

the regulation ? What about

26:50

pseudonymization ? So things like tokenization

26:53

, masking , generalization and

26:55

other techniques .

27:05

Pseudonymization is much

27:07

easier to beat because pseudonymization

27:09

doesn't mean that we cannot identify individuals anymore , cannot

27:15

identify individuals anymore . Pseudonymization means that we protect data

27:17

in a way that privacy attacks are less harmful . Why

27:19

? Because the identification of

27:21

a dataset is kept

27:23

separate from the dataset itself

27:25

. At least this is the GDPR

27:27

definition , so the European Union definition

27:29

of pseudonymization . There is a legal

27:32

difference and a legal implication if

27:34

we have anonymization

27:37

or pseudonymization . If

27:39

we apply anonymization

27:41

, which I said is very hard in

27:43

practice , the GDPR , so

27:45

the European protection law does not apply at

27:47

all , and also the United

27:49

States law , like the national

27:51

laws , for example , colorado

27:53

, washington , virginia , different laws that we

27:55

have in the States wouldn't apply because

27:58

anonymization doesn't allow us to

28:00

identify people . For pseudonymization

28:02

, the situation is more complex because

28:04

the GDPR applies , because

28:07

the GDPR applies . So

28:13

even if we pseudonymize data through tokenization or masking etc . We should still

28:15

comply with GDPR rules . So

28:17

pseudonymization doesn't solve

28:20

the compliance problem . But

28:22

if the pseudonymization is in practice

28:25

, the data controllers

28:27

or companies that decide

28:29

how to use data and why

28:31

they can

28:33

prove that they protected

28:35

data , and this is helpful for daily

28:38

compliance . So if the regulator

28:41

wants to check about

28:43

compliance , they can always say

28:45

yes , I applied a good protection , which is pseudonymization . Of course

28:47

it depends on which kind of pseudonymization . Of course it

28:49

depends on which kind of pseudonymization . Just

28:52

to summarize , anonymization in

28:55

case of anonymization , we are out of the GDPR

28:57

. In case of pseudonymization , we still

28:59

need to apply the rules of the GDPR , but we have

29:01

sorts of safeguards in place

29:03

that will excuse us

29:06

and will protect us from

29:08

a regulator perspective .

29:10

And then what about synthetic data ? Yeah

29:13

, synthetic data .

29:14

well , it really depends on the purposes

29:16

for our data processing . We

29:18

can say that synthetic data are

29:21

a form of , let's say

29:23

, data obfuscation . That

29:25

might be very useful if we want

29:28

to train algorithms without

29:31

using training data sorry , without

29:33

using personal data , personal identifiable

29:36

data . So synthetic data is

29:38

a form , we can say , of

29:41

data minimization that

29:43

is very useful for , for

29:46

example , reducing

29:48

the legal risks and

29:51

so the possible sanctions if

29:54

we do data scraping

29:56

. So you know , most of data , most

29:58

of training systems , training systems

30:00

for AI are based on scraping

30:02

data from social media

30:05

, from big databases . It's basically

30:07

the download

30:09

or the processing of huge amount of

30:11

publicly available data on

30:14

Facebook

30:16

, instagram , twitter , google

30:19

. Whatever Synthetic data might

30:21

be a solution to avoid the

30:23

harms produced by scraping

30:25

, but it's not harms to individuals , it's

30:28

harms to business interest mostly

30:31

, and also privacy harms , yeah

30:33

you know , and it really depends on how we process

30:36

. what is the purpose for this synthetic

30:38

data ? I think there's no single definition of

30:40

synthetic data from a legal perspective .

30:42

Yeah , that makes sense . It's a relatively new designated

30:45

, you know , privacy enhancing technology , so I don't think it made

30:48

it into the regulation . And then

30:50

the last for that subheading would

30:52

be differential privacy and then , if

30:54

you want to also link it back to fairness , that'd

30:57

be helpful . Yeah .

30:58

So , as I already said , differential

31:00

privacy is a very problematic

31:03

practice because

31:05

, in a sense , it

31:07

reduces a lot the risks

31:10

of identification . So this is good

31:12

in terms of the traditional view of

31:14

privacy , right , the computer

31:16

scientist view of privacy , privacy

31:18

as known identification

31:21

. But , as I said

31:23

before , differential privacy

31:25

is mostly based on

31:27

aggregated analysis

31:30

of data . The aggregation

31:32

of data can be

31:35

useful for companies

31:37

because , for example , they don't need to identify

31:39

individuals . Sometimes , if I just need to

31:41

understand how effective

31:44

was my

31:46

marketing activity

31:48

on social media , I can just

31:51

consider differential privacy aggregation

31:53

. So , basically , I just analyze

31:55

how my behavioral

31:57

advertising were translated

31:59

into some benefits

32:02

or time spent

32:04

by my users online . I don't

32:06

really need to identify individuals for that

32:08

my

32:11

users online I don't really need

32:13

to identify individuals for that . The problem is that

32:16

if differential privacy , as I already said , is considered an anonymization

32:18

technique an anonymization technique , sorry it

32:21

might exclude the full

32:23

application of data protection rules , which

32:25

has anti-competitive

32:27

consequences in the digital market

32:29

, in particular against smaller enterprises

32:32

. And , on the other hand , in

32:34

order to reduce identifiability

32:37

, differential privacy needs

32:39

to cut the outliers

32:42

. And so , as I was saying , differential

32:44

privacy might be problematic for

32:47

representation of minorities

32:49

and marginalized groups , a disclaimer

32:51

that I am trying to add

32:53

and I emphasize now , is

32:56

that all these technologies cannot

32:58

be considered in silos . So we are speaking

33:00

a bit transversely now , but

33:02

it really depends on what is the specific

33:05

business application of these technologies

33:07

. So my statements

33:09

might be very different if we consider one

33:11

aspect or another , one application or

33:14

another , one case study or another .

33:16

That makes a lot of sense . No , definitely

33:18

. And then there's also the paper

33:21

goes into detail on encrypted

33:23

data processing tools , as well as federated

33:25

and distributed analytics . And

33:28

you know , in interest of time and instead

33:30

of going through each of those specifically

33:32

, do you want to make any connections for

33:34

the audience about those

33:36

privacy-enhancing technology categories

33:38

and fairness and what you found in your

33:40

research ?

33:41

Sure , yeah , I mean , I

33:43

think an important aspect of the paper , as you

33:45

also suggest , is that

33:47

we do not say that PET

33:50

should be avoided . There

33:52

are some benefits in privacy-enhancing

33:55

technologies . We just say that

33:57

they should be just considered one

33:59

of the possible safeguards in place

34:01

, together with many others . So

34:03

for encrypted data , which is

34:05

also considered in legal terms

34:08

a form of enhanced pseudonymization

34:11

, we suggest

34:13

that privacy-enhancing technologies

34:16

are a good safeguard . We

34:18

just say that the whole

34:20

fairness discussion , as I said before

34:23

, in terms of bias

34:25

, detection , diversity , representation

34:27

, power mitigation , is not

34:30

addressed by , for example

34:32

, encryption .

34:33

Awesome . Thank you for that . So we

34:36

kind of just went through an exploration of specific

34:38

groups of privacy enhancing technologies , but

34:40

now I want to turn to some of the

34:42

technical and regulatory solutions that

34:45

address some of these PET shortcomings

34:47

. Your team lists

34:49

three main PET shortcomings

34:52

in its research . When it comes to PETs

34:54

and again you've alluded to these , but

34:56

I'll restate them Bias discovery

34:59

, harms to people , protected

35:02

groups and individuals , autonomy and market

35:04

imbalance . What

35:07

technical and regulatory solutions do you propose

35:09

to address each of these shortcomings ? First

35:12

, let's start with PETs and bias

35:14

discovery .

35:15

We are not sure

35:17

that we can really really propose immediately

35:20

applicable solutions . But of course

35:22

, I think , as I said before

35:24

, privacy and anti-technologies should

35:26

not be the sole safeguards in

35:28

place . So for BIOS discovery , there's

35:31

a lot that we can do . First

35:33

of all , we shouldn't look

35:35

always for automated

35:38

solutions . So I think this is important

35:40

from also a legal

35:42

scholar as me , as a message

35:45

that automation is

35:47

not always the solution to

35:49

automation problems . If

35:52

some problems were inherent

35:54

in automation , the

35:56

solution might be just different

35:58

, like social business

36:01

, et cetera . I will try to explain better

36:03

. For bias discovery , for example

36:05

, one of the most interesting ongoing

36:08

discussion is involvement

36:11

of impacted groups in

36:13

the assessment of a technology

36:16

, in the assessment of arms and in the

36:18

assessment of impacts of

36:20

technologies and fundamental rights . If

36:22

we need to discover biases of

36:25

AI , which now

36:27

are also problems , for example

36:29

for generative AI , like hallucinations

36:31

or misalignment , et cetera , we

36:34

need impacted groups

36:36

to stand up and to help

36:38

the AI developers

36:40

to identify gaps and

36:42

issues . Basically , what I'm

36:45

saying here is that we should look

36:47

at business models , not just the technologies

36:49

. We should look at how different

36:51

business models address solutions

36:54

and decisions and how these decisions

36:57

can be modified and improved

36:59

and how we can empower impacted

37:02

groups . I don't think we will ever

37:04

have an automated bias

37:06

discovery solution , but

37:09

of course there are very good

37:11

bias discovery

37:13

solutions that might

37:16

benefit from participatory

37:19

approaches , from participation

37:21

of impacted individuals in the impact assessment

37:24

.

37:25

Fascinating . What about people belonging

37:27

to protected groups ? What are some of the

37:29

? You know that is a shortcoming

37:31

that was highlighted with PETs , that

37:33

it doesn't appropriately address those marginalized

37:36

groups or protected groups . Would

37:38

you suggest a similar technical

37:40

and regulatory solution as you just

37:42

did with bias discovery , or is there something else ?

37:45

Yeah , I mean , as I said before , the

37:47

biggest problem about impacted groups

37:49

is that they are underrepresented

37:51

and they are the most impacted

37:54

groups , so the groups that

37:56

have the most adverse impacts

37:58

in terms of technology applications

38:01

. So there's a problem here

38:03

, which is a problem of democratic participation

38:05

, but also a problem of decision-making

38:09

and fairness in practice

38:11

. Some of the solutions is

38:13

indeed participation and

38:15

multi-stakeholder participation . I'm

38:18

just publishing now I mean next

38:20

month a co-author and I will

38:22

publish an article about

38:25

stakeholder participation . The co-author

38:27

is Margot Kaminski from Colorado Law

38:29

School and the

38:31

journalist is the Yale Journal of Law and Technology . We

38:33

are trying to discuss how privacy

38:36

governance so data governance and

38:38

AI governance can be improved by

38:41

multistakeholder participation , in particular

38:44

, for people belonging to protected

38:46

groups . There's a problem , of course , and

38:48

the problem is how to define these groups

38:51

. Should we just rely on undiscrimination

38:54

laws defining protected

38:56

groups , or should we rely on something

38:58

else ? This is an ongoing discussion . We

39:00

don't have time now to address this , but

39:02

we can , of course , start from

39:05

the most vocal and most visible

39:07

groups impacted by technologies

39:10

. I can make three , four examples Children

39:13

, older people , racialized

39:16

communities , victims of gender-based

39:19

violence , lgbti

39:21

plus communities , and I could

39:23

go on , but we could start from these

39:25

groups and look at

39:27

how , together with privacy enhancing

39:29

technologies , diversity of

39:31

these groups could be considered

39:33

. So , just to be very practical , we

39:35

apply privacy enhancing technologies , for example

39:37

, in a business model , but then

39:40

we check the impact with

39:43

impacted groups . So , basically , we

39:46

put the privacy enhancing anti-technology's effects

39:48

into a bigger and

39:50

broader multi-stakeholder decision-making

39:52

where impacted groups' representatives

39:55

can express their views .

39:57

That's awesome . I really look forward to

39:59

reading that paper when it comes out

40:01

, In addition to the paper we're discussing

40:03

today , which I will include in

40:05

our show notes . I will also update

40:08

the show notes to include a link to your

40:10

future paper once you publish it

40:12

. The last but

40:14

not least area where there's

40:16

a shortcoming would be individual autonomy

40:18

and market imbalances . Talk

40:21

to us a little bit about what potential solutions

40:24

to this shortcoming would be .

40:25

Yeah , of course we cannot discuss

40:27

, as I said before , privacy and

40:29

anti-technologists in general . We should always

40:32

look at how single

40:34

privacy and anti-technologists practices

40:37

are affecting

40:39

some of the fairness components in

40:41

practice . But what I

40:43

might say is that market

40:46

imbalance should be regulated

40:48

not just for privacy

40:51

in the narrow sense , but

40:53

we should consider a lot of

40:55

different obligations that

40:57

can reduce market

40:59

dominance . I will make a simple example

41:01

In the European Union , two years ago

41:04

, the Digital Markets Act was

41:06

approved . The Digital Markets Act

41:08

is an important power rebalancing

41:11

tool , imposing a lot of duties

41:13

in terms of competition

41:16

law and fair access

41:18

to data and also consent

41:21

to data processing . So , just referring

41:23

also to individual autonomy that you mentioned

41:25

and the DMA , the Digital Markets

41:27

Act is an important tool

41:30

that complements privacy

41:32

and data protection . Just to say

41:34

, privacy and anti-technologies are a great

41:36

tool that should be complemented

41:39

by specific rules

41:42

in terms of market control . This

41:44

is clear , for example , in

41:46

reducing abusive practices

41:49

. That can happen when

41:51

big techs might

41:54

manipulate individuals or might

41:56

exploit dependencies , because this is

41:58

another problem . I didn't mention that term so

42:00

far , but dependency is the

42:02

problem that we really want

42:05

to address . We depend

42:07

on social media , we depend on big

42:09

techs , we depend on social giants

42:11

, and this dependency is

42:13

the real power imbalance problem

42:16

. So the states should take

42:18

a position against these dependencies

42:21

, either imposing rules

42:24

and fundamental rights enforcement

42:26

duties on big tax

42:28

or prohibiting some

42:31

abusive practices .

42:33

It's a lot to think about . I'm not sure there's the

42:35

political will to make it happen , but we'll see if

42:37

we can get a federal law

42:39

that embodies all of that . What

42:41

were some of your team's conclusions at

42:44

the end of writing this , and where might there

42:46

be some areas where you might want to do

42:48

some more research , or more research is needed ?

42:50

Sure , I think a lot of research

42:53

is still needed . Just to

42:55

make some examples , we

42:57

couldn't go deeper on

42:59

each single privacy-enhancing technology

43:02

in practice , and also we

43:05

should look at how , for example , generative

43:07

AI is altering the

43:09

discussion . So our paper didn't

43:11

consider generative AI challenges

43:13

, but of course this is perhaps

43:15

chapter two of our

43:17

activity . How can

43:20

privacy-enhancing technologies help

43:23

or not help for hallucinations

43:26

and misalignment of generative

43:28

AI systems , where

43:31

fairness is a big problem , because

43:33

we know that hallucination and

43:35

misalignment can produce

43:37

discrimination on generative AI

43:39

. For example , chatbot or image search

43:42

engines can produce

43:44

stereotypes , can induce

43:47

harms . So of course

43:49

, these are some of the areas that we

43:51

need to investigate

43:53

in the future and it's just part of

43:55

the problem .

43:56

I think that really sums up a lot of what's needed

43:58

. In fact , I'll be on the lookout for some working

44:01

groups or standards or

44:03

just more research coming out on the topic

44:05

. I think one of the things I've been thinking

44:07

about either doing myself or kind of surprised

44:09

I haven't seen much out on the market around them but

44:11

is a listing of all of the privacy

44:14

enhancing technologies based on different use cases

44:16

, but also based on what

44:18

is the privacy guarantee

44:21

that the organization wants to ensure

44:23

by using that PET

44:25

and then working backwards to

44:27

see or , I'm sorry , by using

44:29

a PET and then working backwards to see which

44:31

PET or set of PETs

44:33

would get that job done . But

44:36

this conversation has made me really think

44:38

about . We need to think broader than

44:40

just can we do the thing ? Can

44:43

we achieve this end goal and instead

44:45

broaden it to also include are we

44:47

being fair to the individual and

44:49

to , like society generally , the

44:52

group of individuals ? So really

44:54

a lot to think about . Thank you so much

44:56

for your time today . Are there any words of

44:58

wisdom that you'd like to leave the audience with before

45:00

we close today ?

45:01

I think we all

45:03

, as a scientific

45:06

and technological and policy

45:09

community , should consider

45:12

privacy announcing technologies bigger

45:14

and shift from PETs

45:17

to FETs . So

45:19

fairness announcing technologies , which

45:21

is not so difficult to reach , we

45:23

just need to think a bit more critical

45:26

and a bit broader . And what

45:28

are the real goals ? The real goals

45:30

are protecting

45:32

the most impactful groups

45:35

, the most marginalized and

45:37

impacted groups in the digital environments

45:40

.

45:40

What a great idea and really

45:42

elevating it beyond just privacy

45:45

to meet fairness . So you'll meet a lot of goals

45:47

there , right Including just , especially

45:50

if you apply to AI . Well , thank you so

45:52

much , jean-claudio . Thank you for

45:54

joining us today on the Shifting Privacy Left

45:56

podcast . Until next Tuesday

45:58

, everyone , when we'll be back with engaging content

46:01

and another great guest . Thank

46:03

you so much , bye-bye . Thanks

46:07

for joining us this week on Shifting Privacy

46:10

Left . Make

46:16

sure to visit our website , shiftingprivacyleftcom , where you can subscribe to updates so you'll

46:18

never miss a show While you're at it . If you found this episode

46:20

valuable , go ahead and share it with a friend

46:23

, and if you're an engineer who

46:25

cares passionately about privacy , check

46:27

out Privato , the developer-friendly

46:29

privacy platform and sponsor of this show

46:31

. To learn more , go to privatoai

46:34

. Be sure to tune in next Tuesday

46:36

for a new episode . Bye for now .

Rate

Get this podcast via API

From The Podcast

The Shifting Privacy Left Podcast

Shifting Privacy Left features lively discussions on the need for organizations to embed privacy by design into the UX/UI, architecture, engineering / DevOps and the overall product development processes BEFORE code or products are ever shipped. Each Tuesday, we publish a new episode that features interviews with privacy engineers, technologists, researchers, ethicists, innovators, market makers, and industry thought leaders. We dive deeply into this subject and unpack the exciting elements of emerging technologies and tech stacks that are driving privacy innovation; strategies and tactics that win trust; privacy pitfalls to avoid; privacy tech issues ripped from the headlines; and other juicy topics of interest.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Download Audio Filehttps://www.buzzsprout.com/2059470/15291009-s3e14-why-we-need-fairness-enhancing-technologies-rather-than-pets-with-gianclaudio-malgieri-brussels-privacy-hub.mp3

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More