Pydantic Performance Tips by Talk Python To Me | Podchaser

Episode from the podcastTalk Python To Me

Pydantic Performance Tips

Released Friday, 14th June 2024

Good episode? Give it some love!

Pydantic Performance Tips

Pydantic Performance Tips

Friday, 14th June 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

You're using Pydantic and it seems pretty straightforward,

0:02

right? But could you adopt some simple

0:04

changes to your code that would make it a lot

0:06

faster and more efficient? Chances are you'll

0:08

find a couple of the tips from Sydney Runkel

0:11

that will do just that. Join us

0:13

to talk about Pydantic performance tips here

0:15

on TalkPython episode 466 recorded June 13th,

0:17

2024. You're

0:25

listening to Michael Kennedy on TalkPython to

0:27

me. Life from Portland, Oregon,

0:29

and this segment was made with Python.

0:34

Welcome to TalkPython to me, a

0:37

weekly podcast on Python. This is

0:39

your host, Michael Kennedy. Follow me

0:41

on Massadon where I'm at M

0:43

Kennedy and follow the podcast using

0:45

at TalkPython, both on bostodon.org. Keep

0:47

up with the show and listen to over

0:50

seven years of past episodes at TalkPython.fm. We've

0:53

started streaming most of our episodes live

0:55

on YouTube. Subscribe to our

0:57

YouTube channel over at TalkPython.fm slash YouTube

0:59

to get notified about upcoming shows and

1:02

be part of that episode. This

1:04

episode is brought to you by Sentry.

1:06

Don't let those errors go unnoticed. Use

1:09

Sentry like we do here at TalkPython.

1:11

Sign up at TalkPython.fm slash Sentry. And

1:13

it's brought to you by Code Comments,

1:16

an original podcast from Red Hat. This

1:18

podcast covers stories from technologists who've been

1:20

through tough tech transitions and

1:23

share how their teams survived the

1:25

journey. Episodes are available

1:27

everywhere you listen to your podcast

1:29

and at TalkPython.fm slash code dash

1:31

comments. Hey folks, I

1:33

got something pretty excellent for you.

1:35

PyCharm Professional for six months for

1:37

free. Over at TalkPython, we

1:40

partnered with the JetBrains team to get

1:42

all of our registered users free access

1:44

to PyCharm Pro for six months. All

1:47

you have to do is take one of

1:49

our courses. That's it. However, do note that

1:51

this is not valid for renewals over JetBrains.

1:53

Only new users there. And

1:56

if you're not currently a registered user

1:58

at TalkPython, well, no problem. This

2:00

offer comes with all of our courses. So

2:03

even if you just sign up for one

2:05

of our free courses at talkbython.fm, click on

2:07

courses in the menu, you're in. So

2:10

how do you redeem it? Once you have

2:12

an account over at TalkBython, then it's super

2:14

easy. Just visit your account page on TalkBython

2:16

training and in the details tab, you'll have

2:18

a code and a link to redeem your

2:20

six months of PyCharm Pro. So

2:22

why not take a course, even a

2:24

free one, and get six months free of PyCharm.

2:28

Sydney, welcome back to TalkBython to me. It's

2:30

awesome to have you here. Thank you. Super

2:32

excited to be here. And yeah, I'm excited

2:34

for our chat. I am too. We're going

2:37

to talk about Pydantic, one of my very

2:39

favorite libraries that just makes working with Python

2:41

data, data exchange so, so

2:43

easy, which is awesome. And it's really

2:45

cool that you're on the Pydantic team

2:48

these days for them. I

2:50

guess, let's jump back just a little

2:52

bit. A few weeks ago, I got

2:54

to meet up a little bit in

2:56

Pittsburgh at PyCon. How was PyCon for

2:58

you? It was great. So it was

3:00

my first PyCon experience ever. It was a

3:03

very, very large conference. So it was

3:05

a cool first introductory conference experience. I

3:08

had just graduated not even a week before. So

3:10

it was a fun way to roll

3:12

into full-time work and get exposed really

3:15

to the Python community. And it was

3:17

great to just have a mix of getting to

3:19

give a talk, getting to attend lots of awesome

3:21

presentations, and then most of all, just meeting a

3:24

bunch of really awesome people in the community. I

3:27

always love how many people you

3:29

get to meet from so many

3:32

different places and perspectives. And

3:34

it just reminds you the

3:36

world is really big, but also really small. You

3:38

get to meet your friends and new people

3:40

from all over the place. Definitely. I was

3:43

impressed by the number of international attendees. I

3:45

didn't really expect that. It was great. Yeah,

3:47

same here. All right, well, maybe

3:50

a quick introduction for yourself for those

3:52

who didn't hear your previous episode. And

3:54

then we'll talk a bit about this

3:57

pedantic library. Yeah, sure. Sounds great. My

4:00

name is Sydney. I just graduated from the

4:02

University of Wisconsin. Last time

4:04

I chatted with you, I was still pursuing my degree

4:06

in computer science and

4:08

working part-time as an intern at

4:10

the company Pydantic, which kind of

4:12

founded around the same ideas that

4:15

inspired the open source tool. And now we're building

4:18

commercial tools. And now I've rolled over into full-time

4:20

work with them, primarily on the

4:22

open source side. So yeah, very

4:24

excited to kind of be contributing

4:26

to the open source community, but

4:28

also getting to help with our

4:30

commercial tools and development there. Yeah, yeah, awesome.

4:32

We'll talk a bit about that later. Super

4:35

cool to be able to work on open source as

4:37

a job, as a proper job, right?

4:40

Yeah, it's awesome. It's really unique.

4:43

I've kind of encouraged lots of people to

4:45

contribute to open source as kind of a

4:47

jump start into their software development

4:50

careers, especially like young folks who are looking

4:52

to get started with things and maybe don't

4:54

have an internship or that sort of thing

4:56

set up yet. I think it's a really

4:58

awesome pipeline for getting exposed to good code

5:00

and collaborating with others and that sort of

5:02

thing. But it's definitely special to get to

5:04

do and get paid as well. Indeed.

5:06

So it's a little bit unbelievable to

5:09

me. But I'm sure that it is

5:11

true that there are folks out there

5:13

listening to the podcast that are like,

5:15

Pydantic, maybe you've heard of that. What

5:17

is this Pydantic thing? Yeah,

5:20

great question. What is Pydantic? So

5:22

Pydantic is the leading data validation

5:24

library for Python. And so

5:26

Pydantic uses type hints, which are optional

5:29

in Python, but kind of generally more

5:31

and more encouraged to enforce

5:34

constraints on data and kind of validate

5:36

data structures, et cetera. So we're

5:39

kind of looking at a very simple

5:41

example together right now where we're importing

5:43

things like date time and

5:45

tuple types from typing. And then kind

5:47

of the core of Pydantic is you

5:50

define these classes

5:52

that inherit from this class called

5:54

base model that's in Pydantic. And

5:57

that inheritance is what ends up helping you.

6:00

use methods to validate data,

6:03

fill JSON schema, things like that. And so

6:05

in our case, we have this delivery class

6:07

that has a timestamp, which

6:09

is of type date time, and

6:11

then a dimensions tuple, which has

6:13

two int parts. And so then

6:16

when you pass data into this

6:18

delivery class to create an instance, Pydantic

6:21

handles validating that data to make

6:24

sure that it conforms to those constraints we've

6:26

specified. And so it's really

6:28

a kind of intermediate tool that you can use for

6:30

de-serialization or loading data and then

6:33

serialization dumping data. Yeah,

6:35

it's a thing of beauty. I

6:37

really love the way that it works. If

6:39

you've got JSON data, nested JSON data, right?

6:42

You go to Pydantic.dev slash open source,

6:44

there's an example of here that we're

6:46

talking about. And it's got

6:48

a tuple, but the tuple contains integers, two of

6:50

them. And so if there was a tuple of

6:52

three things, it'll give you an error. If it's

6:55

a tuple of a date time and an int,

6:57

it'll give you an error. It

6:59

reaches all the way inside. And

7:01

things, I guess, it compares against. It's a little

7:03

bit like data classes. Have you done much

7:05

with data classes and compared them? Yeah,

7:08

that's a great question. So we actually

7:10

offer support for Pydantic data classes.

7:12

So I think data classes kind of

7:14

took the first step of

7:17

really supporting using type hints for

7:19

model fields and things like that. And

7:22

then Pydantic sort of takes an extra

7:24

jump in terms of validation and schema

7:26

support. And so I think one very

7:29

common use case is if you're defining

7:31

API request and response models, you can

7:33

imagine the JSON schema capabilities come in

7:35

handy there and just ensuring

7:37

the integrity of your API and the data

7:40

you're dealing with. Very helpful

7:42

on the validation front. Yeah, yeah, very

7:44

cool. Well, I

7:47

guess one more thing for people who are

7:49

not super familiar with it. Pydantic

7:51

is, I think it's used every now and

7:54

then. Let's check it out on GitHub here.

7:56

I'm starting to think of some of

7:58

the main places people have heard of.

8:01

of it obviously fast API I think

8:03

is the thing that really launched its

8:05

popularity in the early days if I

8:07

had to guess. But if we go

8:09

over to GitHub, GitHub says that for

8:12

the open source things that pedantic is

8:14

a foundational dependency for 412,644 different projects.

8:20

Yeah, unbelievable. Yeah,

8:22

it's very exciting. We just got our may

8:25

download numbers and heard that

8:27

we have over 200 million downloads

8:29

in May. So that's both version

8:32

one and version two, but definitely exciting to

8:34

see how kind of critical of a tool

8:36

has become for so many different

8:38

use cases in Python, which is awesome.

8:40

Yeah, absolutely. It's really, really critical. And

8:43

I think we should probably talk a

8:45

little bit about pedantic v1 v2 as

8:47

a way to get into the architecture

8:49

conversation, right? That was a big thing

8:51

I talked to Samuel Calvin, maybe maybe

8:53

a year ago or so, I would

8:56

imagine think around around PyCon. I think

8:58

we did actually a Python last year

9:00

as well. Yeah, for sure. So a

9:03

lot of the benefit of using

9:05

pedantic is we promise some

9:08

great performance. And a

9:10

lot of those performance gains came during our jump from

9:12

v1 to v2. So v1 was written solely

9:16

in Python, we had some like

9:18

compiled options, but really, it was

9:21

mostly Pythonic data validation,

9:23

or I say Pythonic, it's always Pythonic,

9:25

but data validation done solely in Python.

9:28

And the big difference with v2 is

9:30

that we rewrote kind of the core

9:32

of our code in Rust. And so

9:35

Rust is much faster. And so depending

9:37

on what kind of code you're can

9:40

be, you know, anywhere from two to 20

9:42

times faster in certain cases. So right

9:45

now we still have

9:47

this Python wrapper around everything in

9:49

v2. But then, and

9:51

that's kind of used to define schemas

9:53

for models and that sort of thing.

9:56

And then the actual validation and serialization

9:58

logic occurs in

10:00

Pythonic Core in Rust. Right.

10:02

So I think the team did a really

10:04

good job to make this major

10:07

change, this major rewrite and split

10:09

the whole monolithic thing into a

10:11

Pythonic Core, Pythonic itself,

10:13

which is Python-based in a way

10:16

that didn't break too many projects,

10:18

right? Yeah, that was the goal. You

10:20

know, every now and then there are breaking changes

10:23

that I think are generally

10:25

a good thing for the library moving forward,

10:27

right? Like hopefully whenever we make a

10:29

breaking change, it's because it's leading to

10:31

a significant improvement. But we definitely do

10:33

our best to avoid breaking changes

10:35

and certainly someday we'll launch

10:37

a V3 and hopefully that'll be

10:39

an even more seamless transition for

10:41

V2 users to V3 users. Yeah,

10:45

I would imagine that the switch

10:47

to Rust probably, that big rewrite,

10:49

it probably caused a lot of

10:51

thoughts of reconsidering, how are we

10:53

doing this? Or now that

10:55

it's over in Rust, maybe it doesn't make sense this way or

10:57

whatever. Yeah, and I think just kind of,

11:00

you know, we got a lot of feedback and usage

11:02

of Pythonic V1, so try to

11:04

do our best to incorporate all that feedback into

11:06

a better V2 version in terms of both APIs

11:08

and performance and that sort of thing. Sure,

11:10

sure. John out in the audience asks, how

11:13

did the team approach thread safety

11:15

with this? So Rust can be

11:18

multiple threads, easy. Python, not

11:20

so much really, although maybe soon with

11:22

free-threaded Python. Yeah, that's a good

11:25

question. So our kind of

11:27

Rust guru on the team is David Hewitt,

11:29

and he's very in the know about

11:32

all of the multi-threading and things happening on the

11:34

Rust side of things. I myself have

11:36

some more to learn about that certainly. But

11:38

I think in general kind of our approach is that Rust

11:41

is quite type safe, both performant

11:43

and type safe, which is great and memory

11:45

safe as well. And

11:48

I think most of our, I'll talk a little

11:50

bit later about some parallelization

11:53

and factorization that we're

11:55

looking at for performance improvements. But in terms

11:57

of safety, I think if you have any questions, feel free

11:59

to ask. free to open an issue on the

12:01

Pydantic core repo and we get a conversation

12:03

going with David Hewitt. I would

12:05

imagine it's not that you guys haven't had

12:08

to do too much with it. Just that

12:10

Python currently, but soon, but currently doesn't really

12:13

let you do much true

12:15

multi-threading because of the guilt.

12:18

But the whole, I think, you know, Python 3.13 is going

12:20

to be crazy with free threaded Python and

12:24

it's going to be interesting to see how that evolves.

12:28

I know we definitely do some jumping through hoops and

12:31

just having to be really conscious of stuff

12:33

with the guilt in Pydantic core and

12:36

Py03 and Py03 is kind of the

12:38

library that bridges Python

12:40

and Rust and so it's heavily used in Pydantic core

12:42

as you can imagine. So I'm excited

12:44

to see what changes might look like there. Yeah,

12:46

same. All right, well, let's jump into the performance

12:48

because you're here to tell us all about identical

12:50

performance tips and you got a whole bunch of

12:53

these. Did you give this talk at PyCon? I

12:55

did partially. It's a little bit different, but some

12:57

of the tips are the same. I

12:59

don't think the videos are out yet, are they? As

13:01

the time of recording on June 13th. Yeah,

13:04

no, I actually checked a couple of minutes ago. I

13:06

was like, I said one thing during my talk that I

13:08

want to double check, but the videos are not out yet. So

13:11

no, I'm really excited. There's going to be a bunch.

13:13

There was actually a bunch of good talks, including yours

13:15

and some others. I want to watch, but they're not

13:17

out yet. All right. Let's

13:20

jump into Pydantic performance. Where should we where should

13:22

we start? I can start on

13:24

the slideshow if we want. Yeah, let's do that.

13:26

Awesome. So yeah, I think the

13:28

categories of performance tips that we're going to talk

13:30

about here have some fast

13:32

one-liner type performance tips that you

13:35

can implement in your own code.

13:38

And then the meat of the how

13:40

do I improve performance in my

13:43

application that uses Pydantic, we're going to talk a bit about

13:46

discriminated unions, also called

13:48

tagged unions. And then finally

13:50

talk about on our end

13:52

of the development, how are we continuously

13:54

improving performance, you know, Pydantic internals,

13:57

wise, etc. Sure. of

14:00

the equivalent of unit test

14:02

for performance? Yeah, we do. We

14:05

use a library called Cod Speed that I'm

14:07

excited to touch on a bit more later.

14:09

Yeah, all right, let's talk about that later.

14:11

Perfect. Yeah, sure thing. So

14:14

I have this slide up right now just kind

14:16

of talking about why people use Pydantic. We've already

14:18

covered some of these, but just kind of as

14:20

a general recap, it's powered by Type

14:23

Hints and one of our biggest promises is speed.

14:26

We also have these other great

14:28

features like JSON schema compatibility and

14:30

documentation. Comes in particularly handy when

14:32

we talk about APIs, support

14:35

for custom validation and serialization logic.

14:37

And then as we saw with

14:39

the GitHub repository observations, a very

14:42

robust ecosystem of libraries and

14:44

other tools that use and depend on

14:46

Pydantic that leads to this

14:48

kind of extensive and large community, which is really

14:50

great. But this all

14:52

kind of lies on the foundation of like,

14:54

Pydantic is easy to use and it's very

14:56

fast. Yeah,

14:59

well, the speed is really interesting

15:01

in the multiplier that you all have

15:03

for basically a huge swath of the

15:06

Python ecosystem, right? We just saw the

15:08

412,000 things that depend on Pydantic. Well,

15:12

a lot of those, their performance

15:15

depends on Pydantic's performance as well.

15:17

Right? Yeah. Certainly. Yeah,

15:19

it's nice to have such a large ecosystem of

15:21

folks to also contribute to the library

15:24

as well, right? Like, because other people

15:26

are dependent on our performance, the

15:28

community definitely becomes invested in it as well, which is

15:30

great. This

15:33

portion of TalkPython is brought to

15:35

you by Open Telemetry Support at

15:37

Sentry. In the

15:39

previous two episodes, you heard how we

15:41

use Sentry's error monitoring at TalkPython and

15:44

how distributed tracing connects errors,

15:46

performance and slowdowns and more

15:48

across services and tiers. But

15:50

you may be thinking, our company uses Open

15:53

Telemetry. So it doesn't make sense for

15:55

us to switch to Sentry. After

15:57

all, Open Telemetry is a standard already

16:00

adopted it, right? Did you

16:02

know with just a couple of lines

16:04

of code, you can connect OpenTelemetry's monitoring

16:07

and reporting to Sentry's backend? OpenTelemetry

16:09

does not come with a backend to store

16:12

your data, analytics on top of that data,

16:14

a UI or error monitoring. And

16:16

that's exactly what you get when

16:19

you integrate Sentry with your OpenTelemetry

16:21

setup. Don't fly blind,

16:23

fix and monitor code faster with

16:25

Sentry. Integrate your OpenTelemetry systems

16:27

with Sentry and see what you've been

16:29

missing. Create your Sentry account

16:31

at talkpython.fm slash

16:34

sentry-telemetry. And when

16:36

you sign up, use the code talkpython,

16:38

all caps, no spaces. It's good for

16:40

two free months of Sentry's business plan,

16:42

which will give you 20 times as

16:44

many monthly events as well as other

16:46

features. My thanks to Sentry for

16:48

supporting TalkPython to me. But

16:51

yeah, so kind of as that first category, you

16:53

can chat about some basic performance tips. And I'll

16:55

do my best here to kind of describe this

16:58

generally for listeners who maybe aren't able to see

17:00

the screen. So when you are validating- Can

17:02

we share your slideshow later with the audience?

17:04

Can we put it in the show notes?

17:06

Yeah, yeah, absolutely. Okay, so people wanna go

17:08

back and check it out. But yeah, we'll

17:10

describe it for everyone. Go ahead. Yeah,

17:13

so when you're validating data in

17:15

Pydantic, you can either validate

17:17

Python objects or like dictionary

17:20

type data, or you

17:22

can validate JSON formatted data. And

17:24

so one of these kind of like

17:26

one-liner tips that we have is to

17:28

use our built-in model

17:31

validate JSON method instead of

17:33

calling this our model

17:35

validate method and then separately loading the

17:37

JSON data with the standard lib JSON

17:39

package. And the reason that

17:42

we recommend that is one of the

17:44

like crux of the general performance patterns

17:46

that we try to follow is not

17:48

materializing things in Python when we don't have

17:50

to. So we've already mentioned that our core is written

17:53

in Rust, which is much faster than Python. And

17:55

so with our model validate JSON built-in method,

17:59

whenever you pass in that string, we send it right to Rust.

18:01

Whereas if you do the JSON loading by

18:03

yourself, you're gonna materialize Python object and then

18:05

have to send it over. I

18:08

would. Right, and so you're gonna be using

18:10

the built-in JSON load s, which will then,

18:12

or load or whatever, and then

18:14

it'll pull that in, turn it into a Python

18:17

dictionary, then you take it and try to convert

18:19

that back to a Rust data

18:21

structure, and then validate it in Rust, and

18:23

that's where all the validation loads anyway. So

18:26

just get out of the way, right? Exactly,

18:28

yep. It's like skip the Python step if

18:30

you can, right? And I will note there is

18:32

one exception here, which is I mentioned we support

18:35

custom validation. If you're using

18:37

what we call like before and wrap

18:39

validators that do something in Python and

18:41

then call our internal

18:44

validation logic, and then maybe even do something

18:46

after, it's okay. You can use model

18:49

validate and the built-in JSON.load s because

18:51

you're already kind of guaranteed to be

18:53

materializing Python objects in that case. But

18:55

for the vast majority of cases, it's

18:57

great to just go with the built-in

18:59

model validate JSON. Yeah, that's really good

19:01

advice. And they seem kind of equivalent,

19:03

but once you know the internals, right,

19:05

then it's, well, maybe it's not exactly.

19:07

Yeah, and I think implementing some of these tips

19:10

is helpful in that if you understand some of the

19:12

kind of like pydantic architectural context, it

19:15

can also just help you think more about

19:17

like, how can I write my pydantic code

19:19

better? Absolutely. So the next tip I have

19:21

here, very easy one-liner

19:23

fix, which is when you're

19:25

using type adapter, to

19:29

basically validate one type. So

19:32

we have base models, which we've chatted about before, which

19:34

is like, if you have a model with lots of

19:36

fields, that's kind of the structure you use to define

19:39

it. Well, type adapter is great if you're like, I

19:41

just want to validate that this data is a list

19:43

of integers, for example, as we're seeing on the screen.

19:46

Right, because let me give people an idea. Like

19:48

if you accept, if you've got a JSON, well,

19:51

just JSON data from wherever, but a lot of times

19:54

it's coming over an API that's provided you as

19:56

a file and it's not your data you control, right?

19:59

You're trying to validate it. you could get

20:01

a dictionary JSON object that's got

20:03

curly braces with a bunch of stuff, in

20:05

which case that's easy to map to a

20:08

class, but if you just have JSON which

20:10

is bracket, thing, thing, thing, thing, closed bracket,

20:12

well how do you have class that

20:14

represents a list? It gets

20:16

really tricky, right, to be able to understand,

20:19

you can't model that with classes, and so

20:21

you all have this type adapter thing, right?

20:24

That's what the role plays generally, is that right?

20:26

Yeah, and I think it's also

20:28

really helpful in a testing context,

20:30

like when we wanna check that

20:32

our validation behavior is right for

20:34

one type, there's no reason to

20:36

go build an entire model, if

20:38

you're really just validating against one

20:40

type or structure, type adapter is

20:42

great. And so kind of the

20:45

advice here is you only want to initialize

20:48

your type adapter object once, and

20:51

the reason behind that is we build a

20:53

core schema in Python and

20:55

then attach that to a class or

20:57

type adapter, et cetera. And so if

20:59

you can not build that type adapter

21:02

within your loop, but instead do

21:04

it right before, or not build it in

21:06

your function, but instead outside of it, then

21:09

you can avoid building the core schema over

21:11

and over again. Yeah, so basically what

21:13

you're saying is that the type adapter

21:15

that you create might as well be

21:17

a singleton because it's stateless, right? Like

21:19

it doesn't store any data, kind

21:22

of slightly expensive to create relatively.

21:24

And so if you had a function that was called

21:27

over and over again, and that function had a loop,

21:29

and inside the loop you're creating the type adapter, that'd

21:31

be like worst case scenario almost, right? Yeah,

21:33

exactly, and I think this kind of goes

21:35

along with like general best programming tips, right?

21:37

Which is like, if you only need to

21:39

create something once, do that once, and

21:41

then- Exactly, a parallel

21:44

that maybe goes way, way back in

21:46

time could be like a compiled regular

21:48

expression. You wouldn't

21:51

do that over and over in a loop, you

21:53

would just create a regular, the compiled regular expression,

21:55

and then use it throughout your program, right? Because

21:57

it's kind of expensive to do that, but it's

21:59

faster. ones that's created. Yeah, exactly. And funny that

22:01

you mentioned that I actually fixed a

22:04

bug last week where we were compiling

22:06

regular expressions twice when folks

22:09

like specified that as a constraint on a

22:11

field. So definitely just something to keep

22:13

in mind and easy to fix or

22:15

implement with type adapters here. Yeah, awesome.

22:17

Okay, I like this one. That's a

22:19

good one. Yeah. So this next tip

22:22

also kind of goes along with like general best

22:24

practices, but the more specific you

22:26

can be with your type hints, the better.

22:28

And so specifically, if you know

22:30

that you have a list of integers,

22:33

it's better and more efficient to specify a

22:35

type in as a list of integers instead

22:37

of a sequence of integers, for

22:40

example. Or if you know you have

22:42

a dictionary that maps strings to integers,

22:44

specify that type in as a dictionary,

22:47

not a mapping. Interesting. Yeah, so you

22:49

could import sequence from the type in

22:51

module, just the generic way. But I

22:53

guess you probably have specific code that

22:56

runs that can validate lists more efficiently

22:58

than a general iterable type of thing,

23:00

right? Yeah, exactly. So in the case

23:02

of like a sequence versus a list, it's

23:05

the like square and rectangle thing, right? Like a list

23:08

is a sequence, but there are lots of other types

23:10

of sequences. And so you can imagine for

23:13

a sequence, we like have to check lots of other

23:15

things. Whereas if you know with certainty, this is going

23:17

to be a list or it should be a list,

23:20

then you can have things be more efficient

23:22

with specificity there. Does it make

23:24

any difference at all? Whether you

23:26

use the more modern type specifications,

23:29

like traditionally people would say from

23:31

typing import capital L list, but

23:33

now you can just say lowercase

23:35

L list with the built-in and

23:37

no import statement. Are

23:39

those equivalent or is there some minor difference

23:41

there, do you know? Yeah, that's a

23:43

good question. I wouldn't be surprised if there

23:46

was a minor difference that was

23:48

more a consequence of like Python

23:50

version, right? Because there's like, I

23:52

mean, I suppose you could import the old capital L

23:54

list in a newer Python version, but I think

23:57

the difference is like more related to the specificity

23:59

of a type. Rather than have

24:01

like versioning. Yeah yeah. If if

24:03

the use of that capital L

24:05

lists made you write and import

24:07

statements I mean of would cause

24:09

the the program to start ever

24:11

so slightly slower to there's another

24:13

import, it's kind of Rowers A

24:15

already knows it's already imported. What

24:17

blisters? You wouldn't believe how many

24:19

times I get messages on you

24:21

tube videos I've done or even

24:23

from courses St Michael's I don't

24:25

know what you're doing but your

24:27

code is just wrong. I wrote

24:29

lower case. L less bracket something into

24:31

said list is not a sub index of

24:33

all or something like that and you look,

24:35

you've just done it wrong. You can any

24:37

defects is like or. Offer

24:40

your hum five hundred seven or something

24:42

super. All the for these new features

24:45

are added but I either says somewhere

24:47

in the community we haven't communicated as

24:49

well. I don't know for sure. I

24:51

was writing some code earlier today and

24:54

a meeting and I use that like

24:56

from typing import union and and union

24:58

x and y type. That's. My

25:00

talk of like city use of type like

25:02

what are you doing you a fair use

25:04

of is exactly a but skews the thing

25:07

that was in a taste and three tenable.

25:09

Even if if people are in three nine

25:11

that code doesn't run or if they're not

25:13

familiar with the changes it's so it is.

25:15

There's all these trade. I almost feel like

25:17

it would be amazing to go back for

25:20

any time there's a security release it really

25:22

says say another three seven or some fun

25:24

and change the error message. To say this

25:26

feature only works in the future version of

25:28

Python rather than some arbitrary error of your

25:30

junior. I know that would be great. Yeah

25:32

definitely. Yes, Some of those areas can be

25:35

pretty cryptic with the syntax stuff and they

25:37

can or it's so of be specific list

25:39

to board not sequence if you know it's

25:41

a less or tube or wherever. He

25:43

I am and and time my last minor. Tipped

25:45

which great that you brought up

25:48

import statements on. And tennis adding general

25:50

time to a program is I don't have

25:52

a slight for this on, but if we

25:54

go back to that type adapter slide are

25:56

we talked about the fact that initialize in

25:59

this type adapter. builds a core

26:02

schema and attaches it to that class. And

26:04

that's kind of done at build

26:06

time, at import time. So that's

26:08

already done. And

26:11

if you really don't want to

26:14

have that import or build time take a

26:16

long time, you can use the defer build

26:19

flag. And so what that does is it defers

26:22

the core schema build until the first validation

26:24

call. You can also set that on model

26:26

config and things like that. But basically,

26:28

the idea here is striving to

26:30

be lazier. Like if we

26:32

don't need to build this core schema right

26:35

at import time because we want our program

26:37

to start up quickly, that's great. We might have

26:39

a little bit of a delay on the first

26:41

validation, but maybe startup time is more important. So

26:44

that's a little bit more of a

26:46

preferential validation, sorry, preferential performance tip, but

26:49

available for folks who need it. Yeah, let me give

26:51

you an example, like a people an example where I

26:53

think this might be useful. In

26:55

the talk Python training, the courses site, I

26:58

think we've got 20,000 lines of Python code,

27:00

which is probably more at this point. I

27:02

checked a long time ago, but a lot.

27:04

And it's a package. And so when you

27:06

import it, it goes and imports all the

27:08

stuff to run the whole web app, but

27:10

also little utilities like, oh, I just want

27:13

to get a quick report. I want to

27:15

just access this model and then use it

27:17

on something real quick. It imports

27:19

all that stuff so that app startup would

27:21

be potentially slowed down by this. Where if

27:24

you know, like only sometimes is that type

27:26

adapter used, you don't want to necessarily have

27:28

it completely created until that function gets called.

27:30

So then the first function call might be

27:32

a little slow, but there'd be plenty of

27:34

times where maybe it never gets called, right?

27:36

Yep, exactly. Awesome, okay.

27:38

All right, so kind of a

27:40

more complex performance optimization is

27:43

using tagged unions. They're still pretty

27:45

simple. It's just like a little bit more than

27:47

a one line change. So

27:49

kind of talking about tagged unions, we

27:52

can go through a basic example, why we're using tagged unions

27:54

in the first place, and then some more

27:56

advanced examples. Okay. This

28:00

talk by Thunami is brought to you by

28:02

Code Comments, an original podcast from Redhat. You

28:05

know when you're working on a project and

28:07

you leave behind a small comment in the

28:09

code? Maybe you're hoping to help others learn

28:11

what isn't clear at first. Sometimes

28:14

that code comment tells a story of a

28:16

challenging journey to the current state of the

28:18

project. Code Comments, the

28:20

podcast, features technologists who've been

28:22

through tough tech transitions and

28:25

they share how their teams survived that

28:27

journey. The host, Jamie Parker, is a

28:30

Redhatter and an experienced engineer. In

28:32

each episode, Jamie recounts the stories

28:34

of technologists from across the industry

28:37

who've been on a journey implementing

28:39

new technologies. I recently listened to

28:41

an episode about DevOps from the

28:43

folks at Worldwide Technology. The

28:45

hardest challenge turned out to be getting buy-in

28:48

on the new tech stack rather than using

28:50

that tech stack directly. It's

28:52

a message that we can all relate to and I'm

28:54

sure you can take some hard-won lessons back to your

28:56

own team. Give Code Comments a

28:59

listen. Search for Code Comments

29:01

in your podcast player or just use

29:03

our link, talkbython.fm slash code

29:05

dash comments. The link is

29:07

in your podcast player's show notes. Thank

29:10

you to Code Comments and Redhat for supporting

29:12

Talk Bythonymy. Let's

29:14

start with what are tag unions because I honestly have no

29:16

idea. I know what unions are but tagging them, I

29:18

don't know. Yeah, sure thing. So

29:21

tag unions are a special type of union.

29:24

We also call them discriminated unions. They

29:27

help you specify a member

29:29

of a model that you can use for

29:32

discrimination in your validation. What that

29:34

means is if you have two models that

29:36

are pretty similar and your field

29:39

can be either one

29:41

of those types of models, model X or model Y,

29:44

but you know that there's one tag

29:46

or discriminator field that differs, you

29:49

can specifically validate against that field

29:51

and skip some of the other

29:53

validation. So like I'll

29:55

move on to an example here in a

29:57

second, but basically it helps you validate. more

30:00

efficiently because you get to skip validation of

30:02

some fields. So it's really helpful if you have models that

30:04

have like 100 fields, but one of

30:06

them is really indicative of what type it might be.

30:09

I see. So instead of trying to figure out like,

30:11

is it all of this stuff once you know it

30:14

has this aspect or that aspect, then you can

30:16

sort of branch it on a path and just

30:18

treat it as one of the elements of the

30:20

union. Is that right? Yes, exactly. So

30:23

one other note about discriminated

30:25

unions is you specify this discriminator, and

30:27

it can either be a string like

30:29

literal type or a callable type. And

30:31

we'll look at some examples of those. So here's

30:33

kind of a more concrete example so we can

30:35

really better understand this. So

30:38

let's say we have a, this is the

30:40

classic example, right? A cat model and a dog

30:43

model. And they

30:45

both have- Happy-go-dog people. You're going to start a

30:47

debate here. Exactly, exactly. They both

30:49

have this pet type field. And

30:52

for the cat model, it's a literal

30:54

that is just the string cat. And then for

30:56

the dog model, it's the literal that's the string

30:58

dog. So it's just kind of a flag on

31:00

a model to indicate what type it is. And

31:04

you can imagine, in this basic case, we only

31:06

have a couple of fields attached to each model,

31:08

but maybe this is like data

31:10

in a like that database. And

31:12

so you can imagine like there's going to be tons

31:15

of fields attached to this, right? So it'd be

31:17

pretty helpful to just be able to look at

31:19

it and say, oh, the pet type is dog. Let's

31:21

make sure this data is valid for a dog

31:23

type. And I'll also note we have a lizard

31:25

in here. So

31:27

what this looks like in

31:29

terms of validation with Pydantic then is

31:32

that when we specify this pet field,

31:34

we just add one extra setting,

31:37

which says that the discriminator is that pet

31:39

type field. And so then when we pass

31:41

in data that corresponds to a dog model,

31:45

Pydantic is smart enough to say, oh, this is a discriminated

31:47

union field. Let me go look for the

31:49

pet type field on

31:51

the model and just see what that is.

31:54

And then use that to inform my decision

31:56

for what type I should validate against. OK,

31:58

that's awesome. So if we

32:01

don't set the discriminator keyword

32:03

value in the field

32:05

for the union, it'll still work, right? It

32:08

just has to be more exhaustive and slow. Yeah,

32:11

exactly. So it'll still validate

32:13

and it'll say, hey, let's take this input data

32:15

and try to validate it against the cat model.

32:18

And then Pydantical will come back and say, oh, that's not

32:20

a valid cat. Like let's try the next one. Whereas

32:23

with this discriminated pattern, we can skip

32:25

right to the dog, which

32:27

you can imagine helps us skip some of the

32:29

on this stuff. Yeah, absolutely. Okay, that's really cool.

32:31

I had no idea about this. Yeah, yeah. It's

32:34

a cool, I'd say like moderate level feature.

32:36

Like I think if you're just starting to

32:38

use Pydantec, you probably haven't touched discriminated unions

32:41

much, but we hope that it's simple enough to

32:43

implement that most folks can use it if they're

32:45

using unions. Yeah, that's cool. I don't

32:47

use unions very often, which is probably why other

32:50

than, you know, something pipe none, which is, you

32:52

know, like optional, but yeah, if

32:54

I did, I'll definitely remember this.

32:57

Yeah. Alrighty. So

32:59

as I've mentioned, this helps for more efficient

33:01

validation. And then where this really comes

33:04

and has a lot of values when you are dealing with

33:07

lots of nested models or models that have tons

33:09

of fields. So let's say you have a union

33:11

with like 10 members and each member

33:13

of the union has 100 fields.

33:15

If you can just do validation against 100 fields instead

33:18

of 1000, that would be great in

33:20

terms of a performance gain. And

33:23

then once again with nested models, you know, if you

33:25

can skip lots of those union member

33:27

validations, also going to boost your performance. Yeah,

33:29

for sure. You know, an example where this

33:32

seems very likely would be using

33:34

it with beanie or some other document

33:37

database where the modeling structure is

33:39

very hierarchical. You end up with

33:41

a lot of nested sub-Pydantec models

33:44

in there. Yeah, very

33:46

much so. Cool. So

33:48

as a little bit of an added benefit, we

33:50

can talk about kind of this improved error handling,

33:53

which is a great way to kind of visualize

33:55

why the discriminated union pattern is more

33:57

efficient. So right now we're looking at

33:59

an example. of validation against a

34:01

model that doesn't use a discriminated

34:04

union and the errors are not very

34:06

nice to look at. You basically

34:08

see the errors

34:10

for every single permutation of the different values

34:12

and we're using nested models so it's very

34:15

hard to interpret. So we don't have to

34:17

look at this for too long, it's not

34:20

very nice. But if we look at... But

34:22

basically the error message says, look there's something wrong

34:24

with the union. If it was a string, it

34:27

is missing these things. If it was this kind

34:29

of thing, it misses those things. If it was

34:31

a dog, it misses this. If it's a cat,

34:34

it misses that. It doesn't

34:36

specifically tell you. It's

34:38

a dog so it's missing the color

34:41

size or whatever, right? Right, exactly.

34:44

But then, and I'll go back and kind of explain the

34:47

discriminated model for this case in a second, but

34:49

if you look at this is the model with

34:51

the discriminated union instead, we have

34:54

one very nice error that says,

34:56

okay, you're trying to validate this

34:58

x field and it's the wrong

35:01

type, right? So

35:04

yeah, the first example that we were looking at

35:06

was using string type discriminators. So we just had

35:09

this pet type thing that said, oh, this is

35:11

a cat or this is a dog, that sort

35:13

of thing. We also offer

35:15

some more customization

35:17

in terms of we also allow

35:20

callable discriminators. So in

35:22

this case, this field

35:24

can be either a string or

35:26

this instance of

35:28

discriminated model. So it's

35:30

kind of a recursive pattern, right? And that's

35:32

where you can imagine the nested structures

35:35

becoming very complex very easily.

35:37

And we use this kind

35:39

of callable to differentiate

35:41

between which model we should

35:43

validate against and then we tag each of the

35:46

cases. So a little bit more of a

35:48

complex application here, but once again, when

35:50

you kind of see the benefit in

35:52

terms of errors and interpreting things and

35:54

performance, I think it's generally

35:56

a worthwhile investment. That's cool. So

35:59

if you wanted to... something like a

36:01

composite key equivalent of a

36:03

discriminator, right? Like if

36:05

it has this field and its nested

36:07

model is of this type, it's one

36:09

thing versus another. Like a

36:11

free user versus a pain user. You might have

36:13

to look and see their total lifetime value plus

36:16

that they're a registered user.

36:18

I don't know, something like, you could write

36:20

code that would pull that information out and

36:22

then discriminate which thing to validate against, right?

36:24

Yeah, exactly. Yeah, it definitely comes

36:27

in handy when you have like, and you're like,

36:29

okay, well, I still want the performance benefits of

36:31

a discriminated union, but I kind of have

36:33

three fields on each model that are indicative

36:35

of which one I should validate against, right?

36:37

Yeah. And it's like, well, you know, taking

36:39

the time to look at those three fields

36:41

over the hundred is definitely worth it. Just

36:44

a little bit of complexity for the

36:46

developer. Mm-hmm, yeah, cool. One other note

36:49

here is that discriminated union. Can we go

36:51

back really quick? Yeah, yeah. Other than that

36:53

previous one? So I got a quick question.

36:55

So for this, you write a function. It's

36:57

given the value that comes in, which could

37:00

be a string, it could be a dictionary,

37:02

et cetera. Could you do

37:04

a little bit further performance improvements and add

37:06

like a func tools LRU cache to cache

37:09

the output? So every time it sees the

37:11

same thing, if there's a repeated data through

37:13

your validation, it goes, I already know what

37:15

it is. What do you think? Yeah, yeah.

37:17

I do think that would be possible. That's definitely

37:20

an optimization we should try out and put

37:22

in our docs for like the advanced, advanced

37:24

performance tips. Yeah, because if you've got a

37:26

thousand strings and

37:28

then that word like it's maybe

37:31

male, female, male, female, male, female, like that

37:34

kind of where the data is repeated a

37:36

bunch, then it

37:38

could just go, yep, we already know that

37:40

answer. Yeah. Potentially, I don't

37:42

know. Yeah, no, definitely. And I will

37:44

say, I don't know if it takes

37:46

effect. I don't think it takes effect

37:48

with discriminated unions because this logic is

37:51

kind of in Python, but I

37:53

will say we recently added a like

37:55

string caching setting because we have

37:57

kind of our own JSON parsing

37:59

logic. that we use in Pydantic Core. And

38:02

so we added a string caching setting so that

38:04

you don't have to rebuild the exact same strings

38:06

every time. So that's a

38:08

nice performance. Yeah, nice. Caching's awesome, until

38:10

it's not. Yeah, exactly.

38:14

So one quick note here

38:16

is just that discriminated unions are still

38:18

JSON schema compatible, which is awesome for

38:20

the case where you're, once again, defining

38:22

API requests and responses. You wanna still

38:24

have valid JSON schema coming out of

38:26

your models. Yeah, very cool. And then

38:28

it might show up in things like open

38:30

API documentation and stuff like

38:32

that, right? Yep, exactly. So

38:35

I'll kind of skip over this. We already touched on the

38:38

callable discriminators. And then I'll

38:40

leave these slides up

38:42

here as a reference. Again, I don't think this is

38:44

worth touching in too much detail, but just

38:47

kind of another comment about if you've got

38:50

nested models, that still

38:52

works well with discriminated unions. So we're still on

38:54

the pet example, but let's say this

38:56

time you have a white cat and

38:58

a black cat model, and then

39:00

you also have your existing

39:02

dog model. You can still create a

39:04

union of, your

39:07

cat union is a union of black cat

39:09

and white cat, and then you can union that

39:11

with the dogs and it still works. And

39:13

once again, you can kind of imagine the

39:16

exponential blow up that would occur if you

39:18

didn't use some sort of discriminator here in

39:20

terms of errors. Yeah, very

39:22

interesting. Okay, cool. Yeah, so

39:24

that's kind of all in terms

39:26

of my recommendations for discriminated

39:29

union application. I would encourage folks who

39:31

are interested in this to check out our

39:33

documentation. It's pretty thorough in that regard. And I

39:35

think we also have those links attached to the

39:38

podcast. Yeah, definitely. And then performance

39:40

improvements in the pipeline. Is this something that

39:42

we can control from the outside? Is this

39:44

something that you all are just adding for

39:46

us in the next version? Yeah, good question. This

39:49

is hopefully, maybe not all in the next version,

39:51

but just kind of things we're keeping our eyes

39:53

on in terms of requested performance

39:55

improvements and ideas that we have. I'll

39:58

go a little bit out of order here. We've been talking a

40:00

bunch about core schema and kind

40:03

of maybe deferring the build of that or

40:06

just trying to optimize that. And that actually happens

40:08

in Python. So one of the

40:10

biggest things that we're trying to do is

40:12

effectively speed up the core schema building process

40:15

so that import times are faster and

40:17

just, you know, Pythonic is more performant in

40:20

general. Well, so one

40:23

thing that I'd like to ask about

40:26

kind of back on the Python site a little bit, suppose

40:29

I've got some really large document,

40:31

right? Really nested document. If you

40:33

have converted some terrible XML thing

40:35

into JSON or I

40:37

don't know, something. And there's a little bit

40:40

of structured schema that I care about. And

40:42

then there's a whole bunch of other stuff

40:44

that I could potentially create nested models to

40:46

go to, but I don't really care about

40:48

validating them. It's just whatever it is, it

40:50

is. What if you

40:52

just said that was a dictionary? Would that

40:54

short circuit a whole bunch of validation and

40:57

stuff that would make it faster potentially? Yeah.

40:59

Could it turn off the validation for a

41:01

subset of the model if it's really big

41:03

and deep and you don't really care for

41:06

that part? Yeah, good question. So we offer

41:08

an annotation called skip validation that

41:10

you can apply to certain types.

41:13

So that's kind of one approach. I think

41:15

in the future, it could be nice to offer kind of

41:17

a config setting so that you can more easily

41:19

like list features that you wanna

41:22

skip validation for instead of like applying those on

41:24

a field by field basis. And

41:26

then the other thing is if you only define

41:28

your model in terms of the fields that you

41:30

really care about from that, very

41:32

gigantic amount of data, we

41:35

will just ignore the extra data that you pass in and

41:38

pull out the relevant information. Right,

41:40

okay, yeah, good. Back

41:42

to the pipeline. Yeah, back to the pipeline. So

41:45

another improvement, we talked a little

41:48

bit about potential like

41:50

parallelization of things or vectorization. One

41:52

thing that I'm excited to learn more about

41:54

in the future and that we've started working

41:56

on is this thing called SIMD in Jitter

41:58

and that's our JSON. iterable parser library

42:01

that I was talking about. And

42:03

so SIMD stands for Single Instruction Multiple

42:05

Data. Basically means that you can

42:07

do operations faster and

42:10

that's with this kind of vectorization approach. I

42:13

certainly don't claim to be an expert in SIMD, but I

42:15

know that it's improving our validation

42:18

speeds in the Department

42:21

of JSON parsing. So that's something that

42:23

we're hoping to support for a broader

42:26

set of architectures going forward. Yeah, that's

42:28

really cool. Almost like what

42:30

pandas does for Python is set a loop

42:32

in over and validation and doing something to

42:35

each piece, you just go this whole column,

42:37

multiply it by two. Yep, yep, exactly. I'm

42:39

sure it's not implemented the same, but like

42:42

conceptually the same. Yep, yep, very much so.

42:45

And then the other two things in the

42:47

pipeline that I'm gonna mention are kind of

42:49

related once again to the avoiding materializing things

42:51

in Python if we can. And

42:54

we're even kind of extending that to

42:56

avoiding materializing things in Rust if we

42:58

don't have to. So the first thing is

43:00

when we're parsing JSON in Rust, can we

43:02

just do the validation as we kind of

43:04

chomp through the JSON instead of like materializing

43:06

the JSON as a Rust object

43:08

and then doing all the validation? So like can

43:11

we just do it in one pass? Okay,

43:13

is that almost like generators and

43:15

iterables rather than loading all in a

43:17

memory at once and then processing it

43:19

one at a time? Yeah, exactly.

43:22

And it's kind of like, do

43:25

you build the tree and then walk it three

43:27

times or do you just

43:29

do your operations every time you add something to the tree?

43:31

Yeah. And then the last

43:33

performance improvement in the pipeline that I'll mention is

43:35

this thing called fast model. Has

43:37

not been released yet, hasn't really

43:39

even been significantly developed, but this is

43:42

cool in that it's really approaching that

43:44

kind of laziness concept again. So

43:47

attributes would remain in Rust after

43:49

validation until they're requested. So

43:51

this is kind of along the lines of the

43:53

defer build logic that we were talking about in

43:55

terms of like, we're not gonna send you the

43:57

data or perform the necessary operations until. They're

44:00

requested right okay. Yeah, if you don't ever

44:02

access the field then why process all that

44:05

stuff right and convert it into Python objects

44:07

Yeah, exactly. Um, but yeah, we're kind of

44:09

just excited in general to Be

44:12

looking at lots of performance improvements on our

44:14

end even after the big v2 speed-up Still

44:16

have lots of other things to work on

44:18

and improve. Yeah, it sure seems

44:20

like it and If

44:23

this free threaded Python thing takes

44:25

off who knows maybe there's even

44:27

more craziness with parallel Processing of

44:30

different branches of the model at different,

44:32

you know alongside each other. Yeah

44:36

So I think this kind of dovetails nicely

44:38

into like you asked earlier Like is there

44:40

a way that we kind of monitor the

44:42

performance improvements that we're making? And

44:45

we're currently using and getting

44:48

started with two tools that are really helpful

44:51

And I can share some PRs if

44:53

that's helpful and send links after but

44:55

one of them is cod COD

44:57

speed which integrates super nicely

44:59

with CI and

45:02

github and it basically runs

45:04

tests tagged with this like benchmark

45:06

tag And then

45:08

it'll you know run them on main compared to

45:10

on your branch and then you can see like

45:13

oh this made my code you know 30%

45:15

slower like maybe let's not merge that right away

45:17

or Conversely,

45:19

you know, there's a 30% Improvement

45:22

on some of your benchmarks. It's really nice to kind of

45:24

track and see that I see So it looks like it

45:26

sets up. So this is a Cod

45:28

speed dot IO, right? Yeah Then it

45:31

sets up as a github action as

45:33

part of your CI CD and you

45:35

know Probably automatically runs when a PR

45:37

is open and things along those lines,

45:40

right? Yep, exactly All right. I've never

45:42

heard of this. But yeah, if it

45:44

just does the performance testing for yourself

45:46

automatically Why not right? Let it

45:48

let it do that. Yeah And then

45:51

I guess another tool that I'll

45:53

mention while Talking

45:55

about kind of our you know

45:57

continuous optimization is a one word for it

46:00

is this tool kind of similarly

46:02

named called CodeFlash. So

46:05

CodeFlash is a new

46:07

tool that uses LOMs to

46:10

kind of read your code and

46:12

then develop potentially more performant versions,

46:15

kind of analyze those in terms of, you know,

46:17

is it pass, is this new

46:19

code passing existing tests? Is it passing

46:22

additional tests that we write? And

46:24

then another great thing that it

46:26

does is open PRs for you with those improvements and

46:29

then explain the improvements. So I

46:31

think it's a really pioneering tool in the

46:33

space and we're excited to kind of

46:36

experiment with it more on our PRs

46:38

and in our repository. Okay,

46:41

I love it. Just tell me why is this, why

46:44

did this slow down? Well, here's why. Yeah,

46:46

exactly. And they offer

46:48

both like local runs of the

46:51

tool and also built-in CI support.

46:54

So those are just kind of two tools that we use and

46:57

are increasingly using to help us kind

47:00

of check our performance as we continue to develop

47:03

and really inspire us to, you know, get

47:05

those green check marks with the

47:07

like performance improved on lots of PRs.

47:09

Yeah, the more you can

47:11

have it where if it passes the automated

47:13

build, it's just ready to go and you

47:15

don't have to worry a little bit and

47:18

keep testing things and then have uncertainty, you

47:20

know that. It's nice, right? Gives you a

47:22

lot of, lets you rest and

47:24

sleep at night. Yeah, most certainly.

47:26

I mean, I said it before,

47:28

but the number of people who

47:31

are impacted by pedantic, I

47:33

don't know what that number is, but it has to be tremendous because if

47:35

there's 400,000 projects that use it, like

47:38

think of the users of those projects, right? Like

47:40

that multiple has got to be big for, you

47:42

know, I'm sure there's some really popular ones. For

47:44

example, FastAPR, right? Yeah, yeah.

47:47

And it's just nice to know that

47:49

there are other companies

47:51

and tools out there that can help us

47:53

to, you know, really boost the performance benefits

47:55

for all those users, which is great. All

47:58

right, yeah, that is really cool. I think, let's

48:00

talk about one more performance benefit

48:03

for people and not so much in

48:05

how fast your code runs, but in

48:07

how fast you go from raw data

48:10

to Pydantic models. So one

48:13

thing, you probably have seen, we might have

48:15

even spoken about this before, are you familiar

48:17

with JSON to Pydantic? The website? Yeah, it's

48:20

a really cool tool. Yeah, it's such a

48:22

cool tool. And if you've got some really

48:24

complicated data, like let's see, I'll pull up

48:26

some weather data that's in JSON format or

48:28

something, right? If you just take this and

48:31

you throw it in here, just don't even have

48:33

to pretty print it. It'll just go, okay, well,

48:35

it looks like what we've got is this really

48:38

complicated nested model here. And

48:40

it took, we did this while I was talking, it took 10 seconds

48:43

for me clicking the API to get

48:45

a response to having like a pretty

48:47

decent representation here. Yeah,

48:50

it's great in terms of like developer

48:52

agility, especially, right? It's like, oh, I've

48:54

heard of this tool called Pydantic. I've seen it

48:56

in places like, I don't really know if I

48:58

wanna manually go build all these models

49:00

for my super complicated JSON data. It's like,

49:02

boom, three seconds, done for you, basically. Exactly,

49:05

like, is it really worth it? Because

49:08

I don't wanna have to figure this thing out and figure

49:10

out all the types and like, no, just paste it in

49:12

there and see what you get. You're gonna be, it won't

49:15

be perfect, right? Some things, if they're null

49:17

in your data, but they could be something that

49:19

would make them an optional element, like they could

49:21

be an integer or they could be null. It

49:23

won't know that it's gonna be an integer, right?

49:26

Right. So you kind of gotta patch it

49:28

up a tiny bit, but in

49:30

general, I think this is really good. And

49:32

then also, just drop in with your favorite

49:34

LLM. I've been using LLM

49:36

Studio, which is awesome. Nice,

49:39

I heard you talked about that on one of the

49:41

most recent podcasts, right? Yeah, yeah,

49:43

it's super cool. You can just download llama3

49:45

and run it locally with, I

49:48

think my computer can only handle seven billion

49:50

parameter models, but you get pretty good answers.

49:52

And if you give it a piece

49:55

of JSON data and you say, convert that

49:57

to Pydantic, you'll get really good results.

50:00

You have a little more control over than what

50:02

you just get with this tool. But I

50:04

think those two things, while not

50:06

about runtime performance, you know, going from

50:08

I have data till I'm working with

50:11

Pydantic, that's pretty awesome. Yeah, definitely.

50:13

And if any, you know, passionate

50:16

open source contributors are listening and want to

50:18

create like a CLI tool for doing this

50:20

locally, I'm sure that would be very much

50:22

appreciated. I think this is based

50:24

on something that I don't use,

50:26

but I think it's based on the

50:28

data model code generator, I

50:30

think might be a CLI tool or

50:33

a library. Let's see. Yes. Oh, yeah,

50:35

very nice. But here's the problem that

50:37

you go and define like a YAML

50:39

file. Like it's just not as easy

50:41

as like there's a text field I

50:43

paste in my stuff, but it

50:45

does technically, technically work, I

50:48

suppose. Yeah, I know.

50:50

Definitely the LOM approach or just the basic

50:52

website that approaches are very quick, which is

50:54

nice. Yeah. Speaking of LOMs,

50:56

just really quick. Like I feel you

50:58

get some of the Python newsletters and other places

51:00

like here's the cool new packages. A lot of

51:03

them are like nine out of 10 of them

51:05

are about LLMs these days. I was like, that

51:07

feels a little over the top to me, but

51:09

I know there's other things going on in the

51:11

world. But you know, just put

51:13

your thoughts on LLMs and encoding these days. I

51:15

know you write a lot of code and

51:17

think about it a lot and probably use them somewhere in

51:19

there. Yeah, no, for sure. I'm

51:22

pretty optimistic and excited about it. I

51:25

think there's a lot of good that can

51:27

be done and a lot of productivity boosting

51:29

to be had from integrating

51:31

with these tools, both in your local

51:33

development environment and also just in general.

51:36

I think sometimes it's also great

51:38

in the performance department, right? Like

51:40

we can see with CodeFlash using

51:42

LLMs to help you write for

51:44

performance code can also be really

51:46

useful. And it's been exciting to see

51:49

some libraries really leverage pedantic as

51:51

well in that space in terms

51:53

of validating LLM outputs or even

51:55

using LLM calls in

51:57

pedantic validators to validate. You

52:00

know data along constraints that are more

52:02

like language model friendly So

52:05

yeah, I'm optimistic about it. I still have a lot to learn

52:07

but It's cool to see the

52:09

variety of applications and kind of where you can

52:11

plug in pedantic in that process for fun

52:13

Yeah, I totally agree I right now the

52:16

context window like how much you can give

52:18

it as Information then to start asking questions

52:20

is still a little bit small like you

52:22

can't give it some huge program I say,

52:24

you know find me the bugs where this

52:26

function is called or you know And it's

52:28

like it doesn't quite understand enough all at

52:31

once but that thing

52:33

keeps growing. So eventually someday we'll

52:35

all see Yeah, all right. Well,

52:37

let's talk just for a minute

52:39

Maybe real quick about what

52:41

you all are doing that pedantic the company

52:44

rather than pedantic the open source library Like

52:46

what do you all got going on there?

52:48

Yeah, sure. So Pedantic

52:52

has the company has released our

52:54

first commercial tool. It's called log fire and

52:56

it's in open beta So

52:58

it's an observability platform and

53:01

we would really encourage anyone interested to try

53:03

it out It's super easy to get started

53:05

with you know, just the basic like hip

53:07

install of the SDK

53:10

and then you start using it in your code

53:12

base and then We

53:14

have the kind of log fire dashboard where you're

53:16

gonna see the observability and

53:18

results And so we

53:20

kind of adopt this like needle in the haystack

53:22

philosophy Where we want this to

53:24

be a very easy to use observability platform

53:28

That offers very like Python centric

53:30

insights And it's this

53:32

I have opinionated wrapper around open

53:34

telemetry if folks are familiar

53:36

with that But in kind

53:39

of the context of performance One of the great

53:41

things about this tool is that it

53:43

offers this like nested logging and profiling

53:45

structure for code So

53:48

it can be really helpful in Kind

53:50

of looking at your code and being like we don't

53:52

know where this, you know Performance slowdown is occurring but

53:55

if we integrate with log fire, we can see

53:57

that like very easily in the dashboard Yeah

54:00

Yeah, you also have

54:02

some interesting approaches like specifically

54:04

targeting popular frameworks like Instrument

54:06

Fast API or something like

54:08

that, right? Yeah, definitely. Trying

54:10

to build integrations that work

54:13

very well with Fast API, other tools like

54:15

that. And even also

54:17

offering custom features

54:19

in the dashboard. If

54:22

you're using an observability tool, you're probably advanced enough

54:24

to want to add some extra things to

54:26

your dashboard. And we're working on supporting that

54:28

with Fast UI, which I know you've chatted

54:31

with Samuel about as well. Yeah, absolutely. I

54:34

got a chance to talk to Samuel about LockFire

54:36

and some of the behind the scenes of the structure.

54:38

It was really interesting. But also speaking of Fast UI,

54:40

I did speak to him. When

54:43

was that? Back in February. So

54:46

this is a really popular project.

54:48

And even on the, I

54:50

was like, quite

54:52

a few people decided that they were interested in even

54:54

watching the video on that one. Yeah,

54:58

anything with Fast UI? Sorry, did

55:00

you say anything with Fast UI? Yeah, yeah.

55:02

Are you doing anything on the Fast UI side? Or

55:04

are you on the Pydantic side

55:07

of things? Yeah, good question. I've been working

55:09

mostly on Pydantic, just larger user base,

55:11

more feature requests. But I'm done a

55:14

little bit on the Fast UI side

55:16

and excited to kind of brush up

55:18

on my TypeScript and build that

55:20

out as a more robust and supported tool. I

55:23

think, especially as we grow as a company

55:25

and have more open source support in general,

55:27

that'll be a priority for us, which is

55:29

exciting. Yeah. It's

55:32

an interesting project, basically a cool

55:34

way to do JavaScript front end and react

55:36

and then plug those back into Python API

55:39

as like fast API and those types of

55:42

things, right? Yeah, and kind

55:44

of a similarity with Fast UI and LogFire,

55:46

the new tools that there's pretty seamless integration

55:48

with Pydantic, which is definitely going to be

55:50

one of the kind of core tenants of

55:53

any products or open source things that we're

55:55

producing in the future. Yeah, I can imagine. That's

55:57

something you want to pay special attention to. how

56:00

well do these things fit together as a

56:02

whole rather than just here's something interesting, here's

56:04

something interesting. Yeah. Yeah. Awesome. All right. Well,

56:07

I think that pretty much wraps

56:09

it up for the time that we have

56:11

to talk today. Let's, let's close

56:13

it out. Close it out for us with

56:15

maybe final call to action for people who

56:17

are already using Pydantic and they want it

56:20

to go faster, or maybe they could adopt

56:22

some of these tips. What do you tell

56:24

them? Yeah, I would say, you

56:27

know, inform yourself just a little bit about

56:29

kind of the Pydantic architecture, just

56:31

in terms of like, what is core schema

56:33

and why are we using Rust for validation

56:35

and serialization? And then that can kind of

56:37

take you to the next steps of, when

56:40

do I want to build my core schemas based

56:42

on kind of the nature of my application? Is

56:44

it okay if imports take a little bit longer

56:46

or do I want to delay that? And then

56:48

take a look at discriminated unions. And then

56:50

maybe if you're really interested in improving

56:53

performance across your application that supports

56:55

Pydantic and other things, trying

56:58

out logfire and just seeing what sort of benefits you

57:00

can get there. Yeah. See where you're spending your time

57:02

is one of the very, you

57:04

know, not just focused on Pydantic, but

57:06

in general, our intuition is often pretty

57:08

bad for where's your

57:11

code slow and where is it not slow.

57:13

You're like, that looks really complicated. That must

57:15

be slow. Like, nope, it's that one call

57:17

to like some sub module that you didn't

57:19

realize was terrible. Yeah. Yeah. And I guess

57:21

that kind of circles back to the like,

57:23

LM tools and, you

57:25

know, integrated performance analysis with

57:27

CodSpeed and CodeFlash and even just other

57:29

LM tools, which is like, use the

57:31

tools you have at hand. And yeah, sometimes they're

57:34

better at performance improvements

57:36

than you might be, or it can at least give

57:38

you good tips that give you, you know, launching point,

57:40

which is great. Yeah, for sure. Or even good old

57:42

C profile built right in, right? If you really, if

57:45

you want to do it that way. Awesome. Yeah. Well,

57:47

Sydney, thank you for being back on the

57:50

show and sharing all these tips

57:52

and congratulations on all the work you and

57:54

the team are doing. You know, what a

57:56

success Pydantic is. Yeah. Thank you so much

57:58

for having me. It was. Wonderful to get

58:00

to have this discussion with you and excited that I

58:02

got to meet you in person at PyCon recently. Yeah,

58:04

that was really great, really great. Until

58:07

next PyCon, see you later. This

58:10

has been another episode of Talk Python to Me.

58:13

Thank you to our sponsors. Be sure to check out

58:15

what they're offering. It really helps support the show. Take

58:18

some stress out of your life. Get

58:20

notified immediately about errors and performance issues

58:22

in your web or mobile applications with

58:24

Sentry. Just visit talkpython.fm

58:27

slash Sentry and get

58:29

started for free. And be sure to

58:31

use the promo code talkpython, all one word.

58:34

Code comments and original podcasts from Red

58:36

Hat. This podcast covers

58:38

stories from technologists who've been through

58:40

tough tech transitions and

58:42

share how their teams survived the

58:45

journey. Episodes are available everywhere

58:47

you listen to your podcasts and at

58:49

talkpython.fm slash code dash comments. Want to

58:51

level up your Python? We have one

58:53

of the largest catalogs of Python video

58:56

courses over at Talk Python. Our content

58:58

ranges from true beginners to deeply advanced

59:00

topics like memory and async. And best

59:02

of all, there's not a subscription in

59:04

sight. Check it out for yourself at

59:07

training.talkpython.fm. Be sure to

59:09

subscribe to the show, open your favorite podcast app

59:11

and search for Python. We should be right at

59:13

the top. You can also find

59:15

the iTunes feed at slash iTunes, the

59:18

Google Play feed at slash Play and

59:20

the direct RSS feed at slash RSS

59:22

on talkpython.fm. We're live

59:24

streaming most of our recordings these days. If

59:26

you want to be part of the show

59:29

and have your comments featured on the air,

59:31

be sure to subscribe to our YouTube channel

59:33

at talkpython.fm slash YouTube. This

59:35

is your host, Michael Kennedy. Thanks so much for

59:37

listening. I really appreciate it. Now get out there

59:39

and write some Python code. Thank

59:42

you. you

Rate

Get this podcast via API

From The Podcast

Talk Python To Me

Talk Python to Me is a weekly podcast hosted by developer and entrepreneur Michael Kennedy. We dive deep into the popular packages and software developers, data scientists, and incredible hobbyists doing amazing things with Python. If you're new to Python, you'll quickly learn the ins and outs of the community by hearing from the leaders. And if you've been Pythoning for years, you'll learn about your favorite packages and the hot new ones coming out of open source.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More