Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
You're using Pydantic and it seems pretty straightforward,
0:02
right? But could you adopt some simple
0:04
changes to your code that would make it a lot
0:06
faster and more efficient? Chances are you'll
0:08
find a couple of the tips from Sydney Runkel
0:11
that will do just that. Join us
0:13
to talk about Pydantic performance tips here
0:15
on TalkPython episode 466 recorded June 13th,
0:17
2024. You're
0:25
listening to Michael Kennedy on TalkPython to
0:27
me. Life from Portland, Oregon,
0:29
and this segment was made with Python.
0:34
Welcome to TalkPython to me, a
0:37
weekly podcast on Python. This is
0:39
your host, Michael Kennedy. Follow me
0:41
on Massadon where I'm at M
0:43
Kennedy and follow the podcast using
0:45
at TalkPython, both on bostodon.org. Keep
0:47
up with the show and listen to over
0:50
seven years of past episodes at TalkPython.fm. We've
0:53
started streaming most of our episodes live
0:55
on YouTube. Subscribe to our
0:57
YouTube channel over at TalkPython.fm slash YouTube
0:59
to get notified about upcoming shows and
1:02
be part of that episode. This
1:04
episode is brought to you by Sentry.
1:06
Don't let those errors go unnoticed. Use
1:09
Sentry like we do here at TalkPython.
1:11
Sign up at TalkPython.fm slash Sentry. And
1:13
it's brought to you by Code Comments,
1:16
an original podcast from Red Hat. This
1:18
podcast covers stories from technologists who've been
1:20
through tough tech transitions and
1:23
share how their teams survived the
1:25
journey. Episodes are available
1:27
everywhere you listen to your podcast
1:29
and at TalkPython.fm slash code dash
1:31
comments. Hey folks, I
1:33
got something pretty excellent for you.
1:35
PyCharm Professional for six months for
1:37
free. Over at TalkPython, we
1:40
partnered with the JetBrains team to get
1:42
all of our registered users free access
1:44
to PyCharm Pro for six months. All
1:47
you have to do is take one of
1:49
our courses. That's it. However, do note that
1:51
this is not valid for renewals over JetBrains.
1:53
Only new users there. And
1:56
if you're not currently a registered user
1:58
at TalkPython, well, no problem. This
2:00
offer comes with all of our courses. So
2:03
even if you just sign up for one
2:05
of our free courses at talkbython.fm, click on
2:07
courses in the menu, you're in. So
2:10
how do you redeem it? Once you have
2:12
an account over at TalkBython, then it's super
2:14
easy. Just visit your account page on TalkBython
2:16
training and in the details tab, you'll have
2:18
a code and a link to redeem your
2:20
six months of PyCharm Pro. So
2:22
why not take a course, even a
2:24
free one, and get six months free of PyCharm.
2:28
Sydney, welcome back to TalkBython to me. It's
2:30
awesome to have you here. Thank you. Super
2:32
excited to be here. And yeah, I'm excited
2:34
for our chat. I am too. We're going
2:37
to talk about Pydantic, one of my very
2:39
favorite libraries that just makes working with Python
2:41
data, data exchange so, so
2:43
easy, which is awesome. And it's really
2:45
cool that you're on the Pydantic team
2:48
these days for them. I
2:50
guess, let's jump back just a little
2:52
bit. A few weeks ago, I got
2:54
to meet up a little bit in
2:56
Pittsburgh at PyCon. How was PyCon for
2:58
you? It was great. So it was
3:00
my first PyCon experience ever. It was a
3:03
very, very large conference. So it was
3:05
a cool first introductory conference experience. I
3:08
had just graduated not even a week before. So
3:10
it was a fun way to roll
3:12
into full-time work and get exposed really
3:15
to the Python community. And it was
3:17
great to just have a mix of getting to
3:19
give a talk, getting to attend lots of awesome
3:21
presentations, and then most of all, just meeting a
3:24
bunch of really awesome people in the community. I
3:27
always love how many people you
3:29
get to meet from so many
3:32
different places and perspectives. And
3:34
it just reminds you the
3:36
world is really big, but also really small. You
3:38
get to meet your friends and new people
3:40
from all over the place. Definitely. I was
3:43
impressed by the number of international attendees. I
3:45
didn't really expect that. It was great. Yeah,
3:47
same here. All right, well, maybe
3:50
a quick introduction for yourself for those
3:52
who didn't hear your previous episode. And
3:54
then we'll talk a bit about this
3:57
pedantic library. Yeah, sure. Sounds great. My
4:00
name is Sydney. I just graduated from the
4:02
University of Wisconsin. Last time
4:04
I chatted with you, I was still pursuing my degree
4:06
in computer science and
4:08
working part-time as an intern at
4:10
the company Pydantic, which kind of
4:12
founded around the same ideas that
4:15
inspired the open source tool. And now we're building
4:18
commercial tools. And now I've rolled over into full-time
4:20
work with them, primarily on the
4:22
open source side. So yeah, very
4:24
excited to kind of be contributing
4:26
to the open source community, but
4:28
also getting to help with our
4:30
commercial tools and development there. Yeah, yeah, awesome.
4:32
We'll talk a bit about that later. Super
4:35
cool to be able to work on open source as
4:37
a job, as a proper job, right?
4:40
Yeah, it's awesome. It's really unique.
4:43
I've kind of encouraged lots of people to
4:45
contribute to open source as kind of a
4:47
jump start into their software development
4:50
careers, especially like young folks who are looking
4:52
to get started with things and maybe don't
4:54
have an internship or that sort of thing
4:56
set up yet. I think it's a really
4:58
awesome pipeline for getting exposed to good code
5:00
and collaborating with others and that sort of
5:02
thing. But it's definitely special to get to
5:04
do and get paid as well. Indeed.
5:06
So it's a little bit unbelievable to
5:09
me. But I'm sure that it is
5:11
true that there are folks out there
5:13
listening to the podcast that are like,
5:15
Pydantic, maybe you've heard of that. What
5:17
is this Pydantic thing? Yeah,
5:20
great question. What is Pydantic? So
5:22
Pydantic is the leading data validation
5:24
library for Python. And so
5:26
Pydantic uses type hints, which are optional
5:29
in Python, but kind of generally more
5:31
and more encouraged to enforce
5:34
constraints on data and kind of validate
5:36
data structures, et cetera. So we're
5:39
kind of looking at a very simple
5:41
example together right now where we're importing
5:43
things like date time and
5:45
tuple types from typing. And then kind
5:47
of the core of Pydantic is you
5:50
define these classes
5:52
that inherit from this class called
5:54
base model that's in Pydantic. And
5:57
that inheritance is what ends up helping you.
6:00
use methods to validate data,
6:03
fill JSON schema, things like that. And so
6:05
in our case, we have this delivery class
6:07
that has a timestamp, which
6:09
is of type date time, and
6:11
then a dimensions tuple, which has
6:13
two int parts. And so then
6:16
when you pass data into this
6:18
delivery class to create an instance, Pydantic
6:21
handles validating that data to make
6:24
sure that it conforms to those constraints we've
6:26
specified. And so it's really
6:28
a kind of intermediate tool that you can use for
6:30
de-serialization or loading data and then
6:33
serialization dumping data. Yeah,
6:35
it's a thing of beauty. I
6:37
really love the way that it works. If
6:39
you've got JSON data, nested JSON data, right?
6:42
You go to Pydantic.dev slash open source,
6:44
there's an example of here that we're
6:46
talking about. And it's got
6:48
a tuple, but the tuple contains integers, two of
6:50
them. And so if there was a tuple of
6:52
three things, it'll give you an error. If it's
6:55
a tuple of a date time and an int,
6:57
it'll give you an error. It
6:59
reaches all the way inside. And
7:01
things, I guess, it compares against. It's a little
7:03
bit like data classes. Have you done much
7:05
with data classes and compared them? Yeah,
7:08
that's a great question. So we actually
7:10
offer support for Pydantic data classes.
7:12
So I think data classes kind of
7:14
took the first step of
7:17
really supporting using type hints for
7:19
model fields and things like that. And
7:22
then Pydantic sort of takes an extra
7:24
jump in terms of validation and schema
7:26
support. And so I think one very
7:29
common use case is if you're defining
7:31
API request and response models, you can
7:33
imagine the JSON schema capabilities come in
7:35
handy there and just ensuring
7:37
the integrity of your API and the data
7:40
you're dealing with. Very helpful
7:42
on the validation front. Yeah, yeah, very
7:44
cool. Well, I
7:47
guess one more thing for people who are
7:49
not super familiar with it. Pydantic
7:51
is, I think it's used every now and
7:54
then. Let's check it out on GitHub here.
7:56
I'm starting to think of some of
7:58
the main places people have heard of.
8:01
of it obviously fast API I think
8:03
is the thing that really launched its
8:05
popularity in the early days if I
8:07
had to guess. But if we go
8:09
over to GitHub, GitHub says that for
8:12
the open source things that pedantic is
8:14
a foundational dependency for 412,644 different projects.
8:20
Yeah, unbelievable. Yeah,
8:22
it's very exciting. We just got our may
8:25
download numbers and heard that
8:27
we have over 200 million downloads
8:29
in May. So that's both version
8:32
one and version two, but definitely exciting to
8:34
see how kind of critical of a tool
8:36
has become for so many different
8:38
use cases in Python, which is awesome.
8:40
Yeah, absolutely. It's really, really critical. And
8:43
I think we should probably talk a
8:45
little bit about pedantic v1 v2 as
8:47
a way to get into the architecture
8:49
conversation, right? That was a big thing
8:51
I talked to Samuel Calvin, maybe maybe
8:53
a year ago or so, I would
8:56
imagine think around around PyCon. I think
8:58
we did actually a Python last year
9:00
as well. Yeah, for sure. So a
9:03
lot of the benefit of using
9:05
pedantic is we promise some
9:08
great performance. And a
9:10
lot of those performance gains came during our jump from
9:12
v1 to v2. So v1 was written solely
9:16
in Python, we had some like
9:18
compiled options, but really, it was
9:21
mostly Pythonic data validation,
9:23
or I say Pythonic, it's always Pythonic,
9:25
but data validation done solely in Python.
9:28
And the big difference with v2 is
9:30
that we rewrote kind of the core
9:32
of our code in Rust. And so
9:35
Rust is much faster. And so depending
9:37
on what kind of code you're can
9:40
be, you know, anywhere from two to 20
9:42
times faster in certain cases. So right
9:45
now we still have
9:47
this Python wrapper around everything in
9:49
v2. But then, and
9:51
that's kind of used to define schemas
9:53
for models and that sort of thing.
9:56
And then the actual validation and serialization
9:58
logic occurs in
10:00
Pythonic Core in Rust. Right.
10:02
So I think the team did a really
10:04
good job to make this major
10:07
change, this major rewrite and split
10:09
the whole monolithic thing into a
10:11
Pythonic Core, Pythonic itself,
10:13
which is Python-based in a way
10:16
that didn't break too many projects,
10:18
right? Yeah, that was the goal. You
10:20
know, every now and then there are breaking changes
10:23
that I think are generally
10:25
a good thing for the library moving forward,
10:27
right? Like hopefully whenever we make a
10:29
breaking change, it's because it's leading to
10:31
a significant improvement. But we definitely do
10:33
our best to avoid breaking changes
10:35
and certainly someday we'll launch
10:37
a V3 and hopefully that'll be
10:39
an even more seamless transition for
10:41
V2 users to V3 users. Yeah,
10:45
I would imagine that the switch
10:47
to Rust probably, that big rewrite,
10:49
it probably caused a lot of
10:51
thoughts of reconsidering, how are we
10:53
doing this? Or now that
10:55
it's over in Rust, maybe it doesn't make sense this way or
10:57
whatever. Yeah, and I think just kind of,
11:00
you know, we got a lot of feedback and usage
11:02
of Pythonic V1, so try to
11:04
do our best to incorporate all that feedback into
11:06
a better V2 version in terms of both APIs
11:08
and performance and that sort of thing. Sure,
11:10
sure. John out in the audience asks, how
11:13
did the team approach thread safety
11:15
with this? So Rust can be
11:18
multiple threads, easy. Python, not
11:20
so much really, although maybe soon with
11:22
free-threaded Python. Yeah, that's a good
11:25
question. So our kind of
11:27
Rust guru on the team is David Hewitt,
11:29
and he's very in the know about
11:32
all of the multi-threading and things happening on the
11:34
Rust side of things. I myself have
11:36
some more to learn about that certainly. But
11:38
I think in general kind of our approach is that Rust
11:41
is quite type safe, both performant
11:43
and type safe, which is great and memory
11:45
safe as well. And
11:48
I think most of our, I'll talk a little
11:50
bit later about some parallelization
11:53
and factorization that we're
11:55
looking at for performance improvements. But in terms
11:57
of safety, I think if you have any questions, feel free
11:59
to ask. free to open an issue on the
12:01
Pydantic core repo and we get a conversation
12:03
going with David Hewitt. I would
12:05
imagine it's not that you guys haven't had
12:08
to do too much with it. Just that
12:10
Python currently, but soon, but currently doesn't really
12:13
let you do much true
12:15
multi-threading because of the guilt.
12:18
But the whole, I think, you know, Python 3.13 is going
12:20
to be crazy with free threaded Python and
12:24
it's going to be interesting to see how that evolves.
12:28
I know we definitely do some jumping through hoops and
12:31
just having to be really conscious of stuff
12:33
with the guilt in Pydantic core and
12:36
Py03 and Py03 is kind of the
12:38
library that bridges Python
12:40
and Rust and so it's heavily used in Pydantic core
12:42
as you can imagine. So I'm excited
12:44
to see what changes might look like there. Yeah,
12:46
same. All right, well, let's jump into the performance
12:48
because you're here to tell us all about identical
12:50
performance tips and you got a whole bunch of
12:53
these. Did you give this talk at PyCon? I
12:55
did partially. It's a little bit different, but some
12:57
of the tips are the same. I
12:59
don't think the videos are out yet, are they? As
13:01
the time of recording on June 13th. Yeah,
13:04
no, I actually checked a couple of minutes ago. I
13:06
was like, I said one thing during my talk that I
13:08
want to double check, but the videos are not out yet. So
13:11
no, I'm really excited. There's going to be a bunch.
13:13
There was actually a bunch of good talks, including yours
13:15
and some others. I want to watch, but they're not
13:17
out yet. All right. Let's
13:20
jump into Pydantic performance. Where should we where should
13:22
we start? I can start on
13:24
the slideshow if we want. Yeah, let's do that.
13:26
Awesome. So yeah, I think the
13:28
categories of performance tips that we're going to talk
13:30
about here have some fast
13:32
one-liner type performance tips that you
13:35
can implement in your own code.
13:38
And then the meat of the how
13:40
do I improve performance in my
13:43
application that uses Pydantic, we're going to talk a bit about
13:46
discriminated unions, also called
13:48
tagged unions. And then finally
13:50
talk about on our end
13:52
of the development, how are we continuously
13:54
improving performance, you know, Pydantic internals,
13:57
wise, etc. Sure. of
14:00
the equivalent of unit test
14:02
for performance? Yeah, we do. We
14:05
use a library called Cod Speed that I'm
14:07
excited to touch on a bit more later.
14:09
Yeah, all right, let's talk about that later.
14:11
Perfect. Yeah, sure thing. So
14:14
I have this slide up right now just kind
14:16
of talking about why people use Pydantic. We've already
14:18
covered some of these, but just kind of as
14:20
a general recap, it's powered by Type
14:23
Hints and one of our biggest promises is speed.
14:26
We also have these other great
14:28
features like JSON schema compatibility and
14:30
documentation. Comes in particularly handy when
14:32
we talk about APIs, support
14:35
for custom validation and serialization logic.
14:37
And then as we saw with
14:39
the GitHub repository observations, a very
14:42
robust ecosystem of libraries and
14:44
other tools that use and depend on
14:46
Pydantic that leads to this
14:48
kind of extensive and large community, which is really
14:50
great. But this all
14:52
kind of lies on the foundation of like,
14:54
Pydantic is easy to use and it's very
14:56
fast. Yeah,
14:59
well, the speed is really interesting
15:01
in the multiplier that you all have
15:03
for basically a huge swath of the
15:06
Python ecosystem, right? We just saw the
15:08
412,000 things that depend on Pydantic. Well,
15:12
a lot of those, their performance
15:15
depends on Pydantic's performance as well.
15:17
Right? Yeah. Certainly. Yeah,
15:19
it's nice to have such a large ecosystem of
15:21
folks to also contribute to the library
15:24
as well, right? Like, because other people
15:26
are dependent on our performance, the
15:28
community definitely becomes invested in it as well, which is
15:30
great. This
15:33
portion of TalkPython is brought to
15:35
you by Open Telemetry Support at
15:37
Sentry. In the
15:39
previous two episodes, you heard how we
15:41
use Sentry's error monitoring at TalkPython and
15:44
how distributed tracing connects errors,
15:46
performance and slowdowns and more
15:48
across services and tiers. But
15:50
you may be thinking, our company uses Open
15:53
Telemetry. So it doesn't make sense for
15:55
us to switch to Sentry. After
15:57
all, Open Telemetry is a standard already
16:00
adopted it, right? Did you
16:02
know with just a couple of lines
16:04
of code, you can connect OpenTelemetry's monitoring
16:07
and reporting to Sentry's backend? OpenTelemetry
16:09
does not come with a backend to store
16:12
your data, analytics on top of that data,
16:14
a UI or error monitoring. And
16:16
that's exactly what you get when
16:19
you integrate Sentry with your OpenTelemetry
16:21
setup. Don't fly blind,
16:23
fix and monitor code faster with
16:25
Sentry. Integrate your OpenTelemetry systems
16:27
with Sentry and see what you've been
16:29
missing. Create your Sentry account
16:31
at talkpython.fm slash
16:34
sentry-telemetry. And when
16:36
you sign up, use the code talkpython,
16:38
all caps, no spaces. It's good for
16:40
two free months of Sentry's business plan,
16:42
which will give you 20 times as
16:44
many monthly events as well as other
16:46
features. My thanks to Sentry for
16:48
supporting TalkPython to me. But
16:51
yeah, so kind of as that first category, you
16:53
can chat about some basic performance tips. And I'll
16:55
do my best here to kind of describe this
16:58
generally for listeners who maybe aren't able to see
17:00
the screen. So when you are validating- Can
17:02
we share your slideshow later with the audience?
17:04
Can we put it in the show notes?
17:06
Yeah, yeah, absolutely. Okay, so people wanna go
17:08
back and check it out. But yeah, we'll
17:10
describe it for everyone. Go ahead. Yeah,
17:13
so when you're validating data in
17:15
Pydantic, you can either validate
17:17
Python objects or like dictionary
17:20
type data, or you
17:22
can validate JSON formatted data. And
17:24
so one of these kind of like
17:26
one-liner tips that we have is to
17:28
use our built-in model
17:31
validate JSON method instead of
17:33
calling this our model
17:35
validate method and then separately loading the
17:37
JSON data with the standard lib JSON
17:39
package. And the reason that
17:42
we recommend that is one of the
17:44
like crux of the general performance patterns
17:46
that we try to follow is not
17:48
materializing things in Python when we don't have
17:50
to. So we've already mentioned that our core is written
17:53
in Rust, which is much faster than Python. And
17:55
so with our model validate JSON built-in method,
17:59
whenever you pass in that string, we send it right to Rust.
18:01
Whereas if you do the JSON loading by
18:03
yourself, you're gonna materialize Python object and then
18:05
have to send it over. I
18:08
would. Right, and so you're gonna be using
18:10
the built-in JSON load s, which will then,
18:12
or load or whatever, and then
18:14
it'll pull that in, turn it into a Python
18:17
dictionary, then you take it and try to convert
18:19
that back to a Rust data
18:21
structure, and then validate it in Rust, and
18:23
that's where all the validation loads anyway. So
18:26
just get out of the way, right? Exactly,
18:28
yep. It's like skip the Python step if
18:30
you can, right? And I will note there is
18:32
one exception here, which is I mentioned we support
18:35
custom validation. If you're using
18:37
what we call like before and wrap
18:39
validators that do something in Python and
18:41
then call our internal
18:44
validation logic, and then maybe even do something
18:46
after, it's okay. You can use model
18:49
validate and the built-in JSON.load s because
18:51
you're already kind of guaranteed to be
18:53
materializing Python objects in that case. But
18:55
for the vast majority of cases, it's
18:57
great to just go with the built-in
18:59
model validate JSON. Yeah, that's really good
19:01
advice. And they seem kind of equivalent,
19:03
but once you know the internals, right,
19:05
then it's, well, maybe it's not exactly.
19:07
Yeah, and I think implementing some of these tips
19:10
is helpful in that if you understand some of the
19:12
kind of like pydantic architectural context, it
19:15
can also just help you think more about
19:17
like, how can I write my pydantic code
19:19
better? Absolutely. So the next tip I have
19:21
here, very easy one-liner
19:23
fix, which is when you're
19:25
using type adapter, to
19:29
basically validate one type. So
19:32
we have base models, which we've chatted about before, which
19:34
is like, if you have a model with lots of
19:36
fields, that's kind of the structure you use to define
19:39
it. Well, type adapter is great if you're like, I
19:41
just want to validate that this data is a list
19:43
of integers, for example, as we're seeing on the screen.
19:46
Right, because let me give people an idea. Like
19:48
if you accept, if you've got a JSON, well,
19:51
just JSON data from wherever, but a lot of times
19:54
it's coming over an API that's provided you as
19:56
a file and it's not your data you control, right?
19:59
You're trying to validate it. you could get
20:01
a dictionary JSON object that's got
20:03
curly braces with a bunch of stuff, in
20:05
which case that's easy to map to a
20:08
class, but if you just have JSON which
20:10
is bracket, thing, thing, thing, thing, closed bracket,
20:12
well how do you have class that
20:14
represents a list? It gets
20:16
really tricky, right, to be able to understand,
20:19
you can't model that with classes, and so
20:21
you all have this type adapter thing, right?
20:24
That's what the role plays generally, is that right?
20:26
Yeah, and I think it's also
20:28
really helpful in a testing context,
20:30
like when we wanna check that
20:32
our validation behavior is right for
20:34
one type, there's no reason to
20:36
go build an entire model, if
20:38
you're really just validating against one
20:40
type or structure, type adapter is
20:42
great. And so kind of the
20:45
advice here is you only want to initialize
20:48
your type adapter object once, and
20:51
the reason behind that is we build a
20:53
core schema in Python and
20:55
then attach that to a class or
20:57
type adapter, et cetera. And so if
20:59
you can not build that type adapter
21:02
within your loop, but instead do
21:04
it right before, or not build it in
21:06
your function, but instead outside of it, then
21:09
you can avoid building the core schema over
21:11
and over again. Yeah, so basically what
21:13
you're saying is that the type adapter
21:15
that you create might as well be
21:17
a singleton because it's stateless, right? Like
21:19
it doesn't store any data, kind
21:22
of slightly expensive to create relatively.
21:24
And so if you had a function that was called
21:27
over and over again, and that function had a loop,
21:29
and inside the loop you're creating the type adapter, that'd
21:31
be like worst case scenario almost, right? Yeah,
21:33
exactly, and I think this kind of goes
21:35
along with like general best programming tips, right?
21:37
Which is like, if you only need to
21:39
create something once, do that once, and
21:41
then- Exactly, a parallel
21:44
that maybe goes way, way back in
21:46
time could be like a compiled regular
21:48
expression. You wouldn't
21:51
do that over and over in a loop, you
21:53
would just create a regular, the compiled regular expression,
21:55
and then use it throughout your program, right? Because
21:57
it's kind of expensive to do that, but it's
21:59
faster. ones that's created. Yeah, exactly. And funny that
22:01
you mentioned that I actually fixed a
22:04
bug last week where we were compiling
22:06
regular expressions twice when folks
22:09
like specified that as a constraint on a
22:11
field. So definitely just something to keep
22:13
in mind and easy to fix or
22:15
implement with type adapters here. Yeah, awesome.
22:17
Okay, I like this one. That's a
22:19
good one. Yeah. So this next tip
22:22
also kind of goes along with like general best
22:24
practices, but the more specific you
22:26
can be with your type hints, the better.
22:28
And so specifically, if you know
22:30
that you have a list of integers,
22:33
it's better and more efficient to specify a
22:35
type in as a list of integers instead
22:37
of a sequence of integers, for
22:40
example. Or if you know you have
22:42
a dictionary that maps strings to integers,
22:44
specify that type in as a dictionary,
22:47
not a mapping. Interesting. Yeah, so you
22:49
could import sequence from the type in
22:51
module, just the generic way. But I
22:53
guess you probably have specific code that
22:56
runs that can validate lists more efficiently
22:58
than a general iterable type of thing,
23:00
right? Yeah, exactly. So in the case
23:02
of like a sequence versus a list, it's
23:05
the like square and rectangle thing, right? Like a list
23:08
is a sequence, but there are lots of other types
23:10
of sequences. And so you can imagine for
23:13
a sequence, we like have to check lots of other
23:15
things. Whereas if you know with certainty, this is going
23:17
to be a list or it should be a list,
23:20
then you can have things be more efficient
23:22
with specificity there. Does it make
23:24
any difference at all? Whether you
23:26
use the more modern type specifications,
23:29
like traditionally people would say from
23:31
typing import capital L list, but
23:33
now you can just say lowercase
23:35
L list with the built-in and
23:37
no import statement. Are
23:39
those equivalent or is there some minor difference
23:41
there, do you know? Yeah, that's a
23:43
good question. I wouldn't be surprised if there
23:46
was a minor difference that was
23:48
more a consequence of like Python
23:50
version, right? Because there's like, I
23:52
mean, I suppose you could import the old capital L
23:54
list in a newer Python version, but I think
23:57
the difference is like more related to the specificity
23:59
of a type. Rather than have
24:01
like versioning. Yeah yeah. If if
24:03
the use of that capital L
24:05
lists made you write and import
24:07
statements I mean of would cause
24:09
the the program to start ever
24:11
so slightly slower to there's another
24:13
import, it's kind of Rowers A
24:15
already knows it's already imported. What
24:17
blisters? You wouldn't believe how many
24:19
times I get messages on you
24:21
tube videos I've done or even
24:23
from courses St Michael's I don't
24:25
know what you're doing but your
24:27
code is just wrong. I wrote
24:29
lower case. L less bracket something into
24:31
said list is not a sub index of
24:33
all or something like that and you look,
24:35
you've just done it wrong. You can any
24:37
defects is like or. Offer
24:40
your hum five hundred seven or something
24:42
super. All the for these new features
24:45
are added but I either says somewhere
24:47
in the community we haven't communicated as
24:49
well. I don't know for sure. I
24:51
was writing some code earlier today and
24:54
a meeting and I use that like
24:56
from typing import union and and union
24:58
x and y type. That's. My
25:00
talk of like city use of type like
25:02
what are you doing you a fair use
25:04
of is exactly a but skews the thing
25:07
that was in a taste and three tenable.
25:09
Even if if people are in three nine
25:11
that code doesn't run or if they're not
25:13
familiar with the changes it's so it is.
25:15
There's all these trade. I almost feel like
25:17
it would be amazing to go back for
25:20
any time there's a security release it really
25:22
says say another three seven or some fun
25:24
and change the error message. To say this
25:26
feature only works in the future version of
25:28
Python rather than some arbitrary error of your
25:30
junior. I know that would be great. Yeah
25:32
definitely. Yes, Some of those areas can be
25:35
pretty cryptic with the syntax stuff and they
25:37
can or it's so of be specific list
25:39
to board not sequence if you know it's
25:41
a less or tube or wherever. He
25:43
I am and and time my last minor. Tipped
25:45
which great that you brought up
25:48
import statements on. And tennis adding general
25:50
time to a program is I don't have
25:52
a slight for this on, but if we
25:54
go back to that type adapter slide are
25:56
we talked about the fact that initialize in
25:59
this type adapter. builds a core
26:02
schema and attaches it to that class. And
26:04
that's kind of done at build
26:06
time, at import time. So that's
26:08
already done. And
26:11
if you really don't want to
26:14
have that import or build time take a
26:16
long time, you can use the defer build
26:19
flag. And so what that does is it defers
26:22
the core schema build until the first validation
26:24
call. You can also set that on model
26:26
config and things like that. But basically,
26:28
the idea here is striving to
26:30
be lazier. Like if we
26:32
don't need to build this core schema right
26:35
at import time because we want our program
26:37
to start up quickly, that's great. We might have
26:39
a little bit of a delay on the first
26:41
validation, but maybe startup time is more important. So
26:44
that's a little bit more of a
26:46
preferential validation, sorry, preferential performance tip, but
26:49
available for folks who need it. Yeah, let me give
26:51
you an example, like a people an example where I
26:53
think this might be useful. In
26:55
the talk Python training, the courses site, I
26:58
think we've got 20,000 lines of Python code,
27:00
which is probably more at this point. I
27:02
checked a long time ago, but a lot.
27:04
And it's a package. And so when you
27:06
import it, it goes and imports all the
27:08
stuff to run the whole web app, but
27:10
also little utilities like, oh, I just want
27:13
to get a quick report. I want to
27:15
just access this model and then use it
27:17
on something real quick. It imports
27:19
all that stuff so that app startup would
27:21
be potentially slowed down by this. Where if
27:24
you know, like only sometimes is that type
27:26
adapter used, you don't want to necessarily have
27:28
it completely created until that function gets called.
27:30
So then the first function call might be
27:32
a little slow, but there'd be plenty of
27:34
times where maybe it never gets called, right?
27:36
Yep, exactly. Awesome, okay.
27:38
All right, so kind of a
27:40
more complex performance optimization is
27:43
using tagged unions. They're still pretty
27:45
simple. It's just like a little bit more than
27:47
a one line change. So
27:49
kind of talking about tagged unions, we
27:52
can go through a basic example, why we're using tagged unions
27:54
in the first place, and then some more
27:56
advanced examples. Okay. This
28:00
talk by Thunami is brought to you by
28:02
Code Comments, an original podcast from Redhat. You
28:05
know when you're working on a project and
28:07
you leave behind a small comment in the
28:09
code? Maybe you're hoping to help others learn
28:11
what isn't clear at first. Sometimes
28:14
that code comment tells a story of a
28:16
challenging journey to the current state of the
28:18
project. Code Comments, the
28:20
podcast, features technologists who've been
28:22
through tough tech transitions and
28:25
they share how their teams survived that
28:27
journey. The host, Jamie Parker, is a
28:30
Redhatter and an experienced engineer. In
28:32
each episode, Jamie recounts the stories
28:34
of technologists from across the industry
28:37
who've been on a journey implementing
28:39
new technologies. I recently listened to
28:41
an episode about DevOps from the
28:43
folks at Worldwide Technology. The
28:45
hardest challenge turned out to be getting buy-in
28:48
on the new tech stack rather than using
28:50
that tech stack directly. It's
28:52
a message that we can all relate to and I'm
28:54
sure you can take some hard-won lessons back to your
28:56
own team. Give Code Comments a
28:59
listen. Search for Code Comments
29:01
in your podcast player or just use
29:03
our link, talkbython.fm slash code
29:05
dash comments. The link is
29:07
in your podcast player's show notes. Thank
29:10
you to Code Comments and Redhat for supporting
29:12
Talk Bythonymy. Let's
29:14
start with what are tag unions because I honestly have no
29:16
idea. I know what unions are but tagging them, I
29:18
don't know. Yeah, sure thing. So
29:21
tag unions are a special type of union.
29:24
We also call them discriminated unions. They
29:27
help you specify a member
29:29
of a model that you can use for
29:32
discrimination in your validation. What that
29:34
means is if you have two models that
29:36
are pretty similar and your field
29:39
can be either one
29:41
of those types of models, model X or model Y,
29:44
but you know that there's one tag
29:46
or discriminator field that differs, you
29:49
can specifically validate against that field
29:51
and skip some of the other
29:53
validation. So like I'll
29:55
move on to an example here in a
29:57
second, but basically it helps you validate. more
30:00
efficiently because you get to skip validation of
30:02
some fields. So it's really helpful if you have models that
30:04
have like 100 fields, but one of
30:06
them is really indicative of what type it might be.
30:09
I see. So instead of trying to figure out like,
30:11
is it all of this stuff once you know it
30:14
has this aspect or that aspect, then you can
30:16
sort of branch it on a path and just
30:18
treat it as one of the elements of the
30:20
union. Is that right? Yes, exactly. So
30:23
one other note about discriminated
30:25
unions is you specify this discriminator, and
30:27
it can either be a string like
30:29
literal type or a callable type. And
30:31
we'll look at some examples of those. So here's
30:33
kind of a more concrete example so we can
30:35
really better understand this. So
30:38
let's say we have a, this is the
30:40
classic example, right? A cat model and a dog
30:43
model. And they
30:45
both have- Happy-go-dog people. You're going to start a
30:47
debate here. Exactly, exactly. They both
30:49
have this pet type field. And
30:52
for the cat model, it's a literal
30:54
that is just the string cat. And then for
30:56
the dog model, it's the literal that's the string
30:58
dog. So it's just kind of a flag on
31:00
a model to indicate what type it is. And
31:04
you can imagine, in this basic case, we only
31:06
have a couple of fields attached to each model,
31:08
but maybe this is like data
31:10
in a like that database. And
31:12
so you can imagine like there's going to be tons
31:15
of fields attached to this, right? So it'd be
31:17
pretty helpful to just be able to look at
31:19
it and say, oh, the pet type is dog. Let's
31:21
make sure this data is valid for a dog
31:23
type. And I'll also note we have a lizard
31:25
in here. So
31:27
what this looks like in
31:29
terms of validation with Pydantic then is
31:32
that when we specify this pet field,
31:34
we just add one extra setting,
31:37
which says that the discriminator is that pet
31:39
type field. And so then when we pass
31:41
in data that corresponds to a dog model,
31:45
Pydantic is smart enough to say, oh, this is a discriminated
31:47
union field. Let me go look for the
31:49
pet type field on
31:51
the model and just see what that is.
31:54
And then use that to inform my decision
31:56
for what type I should validate against. OK,
31:58
that's awesome. So if we
32:01
don't set the discriminator keyword
32:03
value in the field
32:05
for the union, it'll still work, right? It
32:08
just has to be more exhaustive and slow. Yeah,
32:11
exactly. So it'll still validate
32:13
and it'll say, hey, let's take this input data
32:15
and try to validate it against the cat model.
32:18
And then Pydantical will come back and say, oh, that's not
32:20
a valid cat. Like let's try the next one. Whereas
32:23
with this discriminated pattern, we can skip
32:25
right to the dog, which
32:27
you can imagine helps us skip some of the
32:29
on this stuff. Yeah, absolutely. Okay, that's really cool.
32:31
I had no idea about this. Yeah, yeah. It's
32:34
a cool, I'd say like moderate level feature.
32:36
Like I think if you're just starting to
32:38
use Pydantec, you probably haven't touched discriminated unions
32:41
much, but we hope that it's simple enough to
32:43
implement that most folks can use it if they're
32:45
using unions. Yeah, that's cool. I don't
32:47
use unions very often, which is probably why other
32:50
than, you know, something pipe none, which is, you
32:52
know, like optional, but yeah, if
32:54
I did, I'll definitely remember this.
32:57
Yeah. Alrighty. So
32:59
as I've mentioned, this helps for more efficient
33:01
validation. And then where this really comes
33:04
and has a lot of values when you are dealing with
33:07
lots of nested models or models that have tons
33:09
of fields. So let's say you have a union
33:11
with like 10 members and each member
33:13
of the union has 100 fields.
33:15
If you can just do validation against 100 fields instead
33:18
of 1000, that would be great in
33:20
terms of a performance gain. And
33:23
then once again with nested models, you know, if you
33:25
can skip lots of those union member
33:27
validations, also going to boost your performance. Yeah,
33:29
for sure. You know, an example where this
33:32
seems very likely would be using
33:34
it with beanie or some other document
33:37
database where the modeling structure is
33:39
very hierarchical. You end up with
33:41
a lot of nested sub-Pydantec models
33:44
in there. Yeah, very
33:46
much so. Cool. So
33:48
as a little bit of an added benefit, we
33:50
can talk about kind of this improved error handling,
33:53
which is a great way to kind of visualize
33:55
why the discriminated union pattern is more
33:57
efficient. So right now we're looking at
33:59
an example. of validation against a
34:01
model that doesn't use a discriminated
34:04
union and the errors are not very
34:06
nice to look at. You basically
34:08
see the errors
34:10
for every single permutation of the different values
34:12
and we're using nested models so it's very
34:15
hard to interpret. So we don't have to
34:17
look at this for too long, it's not
34:20
very nice. But if we look at... But
34:22
basically the error message says, look there's something wrong
34:24
with the union. If it was a string, it
34:27
is missing these things. If it was this kind
34:29
of thing, it misses those things. If it was
34:31
a dog, it misses this. If it's a cat,
34:34
it misses that. It doesn't
34:36
specifically tell you. It's
34:38
a dog so it's missing the color
34:41
size or whatever, right? Right, exactly.
34:44
But then, and I'll go back and kind of explain the
34:47
discriminated model for this case in a second, but
34:49
if you look at this is the model with
34:51
the discriminated union instead, we have
34:54
one very nice error that says,
34:56
okay, you're trying to validate this
34:58
x field and it's the wrong
35:01
type, right? So
35:04
yeah, the first example that we were looking at
35:06
was using string type discriminators. So we just had
35:09
this pet type thing that said, oh, this is
35:11
a cat or this is a dog, that sort
35:13
of thing. We also offer
35:15
some more customization
35:17
in terms of we also allow
35:20
callable discriminators. So in
35:22
this case, this field
35:24
can be either a string or
35:26
this instance of
35:28
discriminated model. So it's
35:30
kind of a recursive pattern, right? And that's
35:32
where you can imagine the nested structures
35:35
becoming very complex very easily.
35:37
And we use this kind
35:39
of callable to differentiate
35:41
between which model we should
35:43
validate against and then we tag each of the
35:46
cases. So a little bit more of a
35:48
complex application here, but once again, when
35:50
you kind of see the benefit in
35:52
terms of errors and interpreting things and
35:54
performance, I think it's generally
35:56
a worthwhile investment. That's cool. So
35:59
if you wanted to... something like a
36:01
composite key equivalent of a
36:03
discriminator, right? Like if
36:05
it has this field and its nested
36:07
model is of this type, it's one
36:09
thing versus another. Like a
36:11
free user versus a pain user. You might have
36:13
to look and see their total lifetime value plus
36:16
that they're a registered user.
36:18
I don't know, something like, you could write
36:20
code that would pull that information out and
36:22
then discriminate which thing to validate against, right?
36:24
Yeah, exactly. Yeah, it definitely comes
36:27
in handy when you have like, and you're like,
36:29
okay, well, I still want the performance benefits of
36:31
a discriminated union, but I kind of have
36:33
three fields on each model that are indicative
36:35
of which one I should validate against, right?
36:37
Yeah. And it's like, well, you know, taking
36:39
the time to look at those three fields
36:41
over the hundred is definitely worth it. Just
36:44
a little bit of complexity for the
36:46
developer. Mm-hmm, yeah, cool. One other note
36:49
here is that discriminated union. Can we go
36:51
back really quick? Yeah, yeah. Other than that
36:53
previous one? So I got a quick question.
36:55
So for this, you write a function. It's
36:57
given the value that comes in, which could
37:00
be a string, it could be a dictionary,
37:02
et cetera. Could you do
37:04
a little bit further performance improvements and add
37:06
like a func tools LRU cache to cache
37:09
the output? So every time it sees the
37:11
same thing, if there's a repeated data through
37:13
your validation, it goes, I already know what
37:15
it is. What do you think? Yeah, yeah.
37:17
I do think that would be possible. That's definitely
37:20
an optimization we should try out and put
37:22
in our docs for like the advanced, advanced
37:24
performance tips. Yeah, because if you've got a
37:26
thousand strings and
37:28
then that word like it's maybe
37:31
male, female, male, female, male, female, like that
37:34
kind of where the data is repeated a
37:36
bunch, then it
37:38
could just go, yep, we already know that
37:40
answer. Yeah. Potentially, I don't
37:42
know. Yeah, no, definitely. And I will
37:44
say, I don't know if it takes
37:46
effect. I don't think it takes effect
37:48
with discriminated unions because this logic is
37:51
kind of in Python, but I
37:53
will say we recently added a like
37:55
string caching setting because we have
37:57
kind of our own JSON parsing
37:59
logic. that we use in Pydantic Core. And
38:02
so we added a string caching setting so that
38:04
you don't have to rebuild the exact same strings
38:06
every time. So that's a
38:08
nice performance. Yeah, nice. Caching's awesome, until
38:10
it's not. Yeah, exactly.
38:14
So one quick note here
38:16
is just that discriminated unions are still
38:18
JSON schema compatible, which is awesome for
38:20
the case where you're, once again, defining
38:22
API requests and responses. You wanna still
38:24
have valid JSON schema coming out of
38:26
your models. Yeah, very cool. And then
38:28
it might show up in things like open
38:30
API documentation and stuff like
38:32
that, right? Yep, exactly. So
38:35
I'll kind of skip over this. We already touched on the
38:38
callable discriminators. And then I'll
38:40
leave these slides up
38:42
here as a reference. Again, I don't think this is
38:44
worth touching in too much detail, but just
38:47
kind of another comment about if you've got
38:50
nested models, that still
38:52
works well with discriminated unions. So we're still on
38:54
the pet example, but let's say this
38:56
time you have a white cat and
38:58
a black cat model, and then
39:00
you also have your existing
39:02
dog model. You can still create a
39:04
union of, your
39:07
cat union is a union of black cat
39:09
and white cat, and then you can union that
39:11
with the dogs and it still works. And
39:13
once again, you can kind of imagine the
39:16
exponential blow up that would occur if you
39:18
didn't use some sort of discriminator here in
39:20
terms of errors. Yeah, very
39:22
interesting. Okay, cool. Yeah, so
39:24
that's kind of all in terms
39:26
of my recommendations for discriminated
39:29
union application. I would encourage folks who
39:31
are interested in this to check out our
39:33
documentation. It's pretty thorough in that regard. And I
39:35
think we also have those links attached to the
39:38
podcast. Yeah, definitely. And then performance
39:40
improvements in the pipeline. Is this something that
39:42
we can control from the outside? Is this
39:44
something that you all are just adding for
39:46
us in the next version? Yeah, good question. This
39:49
is hopefully, maybe not all in the next version,
39:51
but just kind of things we're keeping our eyes
39:53
on in terms of requested performance
39:55
improvements and ideas that we have. I'll
39:58
go a little bit out of order here. We've been talking a
40:00
bunch about core schema and kind
40:03
of maybe deferring the build of that or
40:06
just trying to optimize that. And that actually happens
40:08
in Python. So one of the
40:10
biggest things that we're trying to do is
40:12
effectively speed up the core schema building process
40:15
so that import times are faster and
40:17
just, you know, Pythonic is more performant in
40:20
general. Well, so one
40:23
thing that I'd like to ask about
40:26
kind of back on the Python site a little bit, suppose
40:29
I've got some really large document,
40:31
right? Really nested document. If you
40:33
have converted some terrible XML thing
40:35
into JSON or I
40:37
don't know, something. And there's a little bit
40:40
of structured schema that I care about. And
40:42
then there's a whole bunch of other stuff
40:44
that I could potentially create nested models to
40:46
go to, but I don't really care about
40:48
validating them. It's just whatever it is, it
40:50
is. What if you
40:52
just said that was a dictionary? Would that
40:54
short circuit a whole bunch of validation and
40:57
stuff that would make it faster potentially? Yeah.
40:59
Could it turn off the validation for a
41:01
subset of the model if it's really big
41:03
and deep and you don't really care for
41:06
that part? Yeah, good question. So we offer
41:08
an annotation called skip validation that
41:10
you can apply to certain types.
41:13
So that's kind of one approach. I think
41:15
in the future, it could be nice to offer kind of
41:17
a config setting so that you can more easily
41:19
like list features that you wanna
41:22
skip validation for instead of like applying those on
41:24
a field by field basis. And
41:26
then the other thing is if you only define
41:28
your model in terms of the fields that you
41:30
really care about from that, very
41:32
gigantic amount of data, we
41:35
will just ignore the extra data that you pass in and
41:38
pull out the relevant information. Right,
41:40
okay, yeah, good. Back
41:42
to the pipeline. Yeah, back to the pipeline. So
41:45
another improvement, we talked a little
41:48
bit about potential like
41:50
parallelization of things or vectorization. One
41:52
thing that I'm excited to learn more about
41:54
in the future and that we've started working
41:56
on is this thing called SIMD in Jitter
41:58
and that's our JSON. iterable parser library
42:01
that I was talking about. And
42:03
so SIMD stands for Single Instruction Multiple
42:05
Data. Basically means that you can
42:07
do operations faster and
42:10
that's with this kind of vectorization approach. I
42:13
certainly don't claim to be an expert in SIMD, but I
42:15
know that it's improving our validation
42:18
speeds in the Department
42:21
of JSON parsing. So that's something that
42:23
we're hoping to support for a broader
42:26
set of architectures going forward. Yeah, that's
42:28
really cool. Almost like what
42:30
pandas does for Python is set a loop
42:32
in over and validation and doing something to
42:35
each piece, you just go this whole column,
42:37
multiply it by two. Yep, yep, exactly. I'm
42:39
sure it's not implemented the same, but like
42:42
conceptually the same. Yep, yep, very much so.
42:45
And then the other two things in the
42:47
pipeline that I'm gonna mention are kind of
42:49
related once again to the avoiding materializing things
42:51
in Python if we can. And
42:54
we're even kind of extending that to
42:56
avoiding materializing things in Rust if we
42:58
don't have to. So the first thing is
43:00
when we're parsing JSON in Rust, can we
43:02
just do the validation as we kind of
43:04
chomp through the JSON instead of like materializing
43:06
the JSON as a Rust object
43:08
and then doing all the validation? So like can
43:11
we just do it in one pass? Okay,
43:13
is that almost like generators and
43:15
iterables rather than loading all in a
43:17
memory at once and then processing it
43:19
one at a time? Yeah, exactly.
43:22
And it's kind of like, do
43:25
you build the tree and then walk it three
43:27
times or do you just
43:29
do your operations every time you add something to the tree?
43:31
Yeah. And then the last
43:33
performance improvement in the pipeline that I'll mention is
43:35
this thing called fast model. Has
43:37
not been released yet, hasn't really
43:39
even been significantly developed, but this is
43:42
cool in that it's really approaching that
43:44
kind of laziness concept again. So
43:47
attributes would remain in Rust after
43:49
validation until they're requested. So
43:51
this is kind of along the lines of the
43:53
defer build logic that we were talking about in
43:55
terms of like, we're not gonna send you the
43:57
data or perform the necessary operations until. They're
44:00
requested right okay. Yeah, if you don't ever
44:02
access the field then why process all that
44:05
stuff right and convert it into Python objects
44:07
Yeah, exactly. Um, but yeah, we're kind of
44:09
just excited in general to Be
44:12
looking at lots of performance improvements on our
44:14
end even after the big v2 speed-up Still
44:16
have lots of other things to work on
44:18
and improve. Yeah, it sure seems
44:20
like it and If
44:23
this free threaded Python thing takes
44:25
off who knows maybe there's even
44:27
more craziness with parallel Processing of
44:30
different branches of the model at different,
44:32
you know alongside each other. Yeah
44:36
So I think this kind of dovetails nicely
44:38
into like you asked earlier Like is there
44:40
a way that we kind of monitor the
44:42
performance improvements that we're making? And
44:45
we're currently using and getting
44:48
started with two tools that are really helpful
44:51
And I can share some PRs if
44:53
that's helpful and send links after but
44:55
one of them is cod COD
44:57
speed which integrates super nicely
44:59
with CI and
45:02
github and it basically runs
45:04
tests tagged with this like benchmark
45:06
tag And then
45:08
it'll you know run them on main compared to
45:10
on your branch and then you can see like
45:13
oh this made my code you know 30%
45:15
slower like maybe let's not merge that right away
45:17
or Conversely,
45:19
you know, there's a 30% Improvement
45:22
on some of your benchmarks. It's really nice to kind of
45:24
track and see that I see So it looks like it
45:26
sets up. So this is a Cod
45:28
speed dot IO, right? Yeah Then it
45:31
sets up as a github action as
45:33
part of your CI CD and you
45:35
know Probably automatically runs when a PR
45:37
is open and things along those lines,
45:40
right? Yep, exactly All right. I've never
45:42
heard of this. But yeah, if it
45:44
just does the performance testing for yourself
45:46
automatically Why not right? Let it
45:48
let it do that. Yeah And then
45:51
I guess another tool that I'll
45:53
mention while Talking
45:55
about kind of our you know
45:57
continuous optimization is a one word for it
46:00
is this tool kind of similarly
46:02
named called CodeFlash. So
46:05
CodeFlash is a new
46:07
tool that uses LOMs to
46:10
kind of read your code and
46:12
then develop potentially more performant versions,
46:15
kind of analyze those in terms of, you know,
46:17
is it pass, is this new
46:19
code passing existing tests? Is it passing
46:22
additional tests that we write? And
46:24
then another great thing that it
46:26
does is open PRs for you with those improvements and
46:29
then explain the improvements. So I
46:31
think it's a really pioneering tool in the
46:33
space and we're excited to kind of
46:36
experiment with it more on our PRs
46:38
and in our repository. Okay,
46:41
I love it. Just tell me why is this, why
46:44
did this slow down? Well, here's why. Yeah,
46:46
exactly. And they offer
46:48
both like local runs of the
46:51
tool and also built-in CI support.
46:54
So those are just kind of two tools that we use and
46:57
are increasingly using to help us kind
47:00
of check our performance as we continue to develop
47:03
and really inspire us to, you know, get
47:05
those green check marks with the
47:07
like performance improved on lots of PRs.
47:09
Yeah, the more you can
47:11
have it where if it passes the automated
47:13
build, it's just ready to go and you
47:15
don't have to worry a little bit and
47:18
keep testing things and then have uncertainty, you
47:20
know that. It's nice, right? Gives you a
47:22
lot of, lets you rest and
47:24
sleep at night. Yeah, most certainly.
47:26
I mean, I said it before,
47:28
but the number of people who
47:31
are impacted by pedantic, I
47:33
don't know what that number is, but it has to be tremendous because if
47:35
there's 400,000 projects that use it, like
47:38
think of the users of those projects, right? Like
47:40
that multiple has got to be big for, you
47:42
know, I'm sure there's some really popular ones. For
47:44
example, FastAPR, right? Yeah, yeah.
47:47
And it's just nice to know that
47:49
there are other companies
47:51
and tools out there that can help us
47:53
to, you know, really boost the performance benefits
47:55
for all those users, which is great. All
47:58
right, yeah, that is really cool. I think, let's
48:00
talk about one more performance benefit
48:03
for people and not so much in
48:05
how fast your code runs, but in
48:07
how fast you go from raw data
48:10
to Pydantic models. So one
48:13
thing, you probably have seen, we might have
48:15
even spoken about this before, are you familiar
48:17
with JSON to Pydantic? The website? Yeah, it's
48:20
a really cool tool. Yeah, it's such a
48:22
cool tool. And if you've got some really
48:24
complicated data, like let's see, I'll pull up
48:26
some weather data that's in JSON format or
48:28
something, right? If you just take this and
48:31
you throw it in here, just don't even have
48:33
to pretty print it. It'll just go, okay, well,
48:35
it looks like what we've got is this really
48:38
complicated nested model here. And
48:40
it took, we did this while I was talking, it took 10 seconds
48:43
for me clicking the API to get
48:45
a response to having like a pretty
48:47
decent representation here. Yeah,
48:50
it's great in terms of like developer
48:52
agility, especially, right? It's like, oh, I've
48:54
heard of this tool called Pydantic. I've seen it
48:56
in places like, I don't really know if I
48:58
wanna manually go build all these models
49:00
for my super complicated JSON data. It's like,
49:02
boom, three seconds, done for you, basically. Exactly,
49:05
like, is it really worth it? Because
49:08
I don't wanna have to figure this thing out and figure
49:10
out all the types and like, no, just paste it in
49:12
there and see what you get. You're gonna be, it won't
49:15
be perfect, right? Some things, if they're null
49:17
in your data, but they could be something that
49:19
would make them an optional element, like they could
49:21
be an integer or they could be null. It
49:23
won't know that it's gonna be an integer, right?
49:26
Right. So you kind of gotta patch it
49:28
up a tiny bit, but in
49:30
general, I think this is really good. And
49:32
then also, just drop in with your favorite
49:34
LLM. I've been using LLM
49:36
Studio, which is awesome. Nice,
49:39
I heard you talked about that on one of the
49:41
most recent podcasts, right? Yeah, yeah,
49:43
it's super cool. You can just download llama3
49:45
and run it locally with, I
49:48
think my computer can only handle seven billion
49:50
parameter models, but you get pretty good answers.
49:52
And if you give it a piece
49:55
of JSON data and you say, convert that
49:57
to Pydantic, you'll get really good results.
50:00
You have a little more control over than what
50:02
you just get with this tool. But I
50:04
think those two things, while not
50:06
about runtime performance, you know, going from
50:08
I have data till I'm working with
50:11
Pydantic, that's pretty awesome. Yeah, definitely.
50:13
And if any, you know, passionate
50:16
open source contributors are listening and want to
50:18
create like a CLI tool for doing this
50:20
locally, I'm sure that would be very much
50:22
appreciated. I think this is based
50:24
on something that I don't use,
50:26
but I think it's based on the
50:28
data model code generator, I
50:30
think might be a CLI tool or
50:33
a library. Let's see. Yes. Oh, yeah,
50:35
very nice. But here's the problem that
50:37
you go and define like a YAML
50:39
file. Like it's just not as easy
50:41
as like there's a text field I
50:43
paste in my stuff, but it
50:45
does technically, technically work, I
50:48
suppose. Yeah, I know.
50:50
Definitely the LOM approach or just the basic
50:52
website that approaches are very quick, which is
50:54
nice. Yeah. Speaking of LOMs,
50:56
just really quick. Like I feel you
50:58
get some of the Python newsletters and other places
51:00
like here's the cool new packages. A lot of
51:03
them are like nine out of 10 of them
51:05
are about LLMs these days. I was like, that
51:07
feels a little over the top to me, but
51:09
I know there's other things going on in the
51:11
world. But you know, just put
51:13
your thoughts on LLMs and encoding these days. I
51:15
know you write a lot of code and
51:17
think about it a lot and probably use them somewhere in
51:19
there. Yeah, no, for sure. I'm
51:22
pretty optimistic and excited about it. I
51:25
think there's a lot of good that can
51:27
be done and a lot of productivity boosting
51:29
to be had from integrating
51:31
with these tools, both in your local
51:33
development environment and also just in general.
51:36
I think sometimes it's also great
51:38
in the performance department, right? Like
51:40
we can see with CodeFlash using
51:42
LLMs to help you write for
51:44
performance code can also be really
51:46
useful. And it's been exciting to see
51:49
some libraries really leverage pedantic as
51:51
well in that space in terms
51:53
of validating LLM outputs or even
51:55
using LLM calls in
51:57
pedantic validators to validate. You
52:00
know data along constraints that are more
52:02
like language model friendly So
52:05
yeah, I'm optimistic about it. I still have a lot to learn
52:07
but It's cool to see the
52:09
variety of applications and kind of where you can
52:11
plug in pedantic in that process for fun
52:13
Yeah, I totally agree I right now the
52:16
context window like how much you can give
52:18
it as Information then to start asking questions
52:20
is still a little bit small like you
52:22
can't give it some huge program I say,
52:24
you know find me the bugs where this
52:26
function is called or you know And it's
52:28
like it doesn't quite understand enough all at
52:31
once but that thing
52:33
keeps growing. So eventually someday we'll
52:35
all see Yeah, all right. Well,
52:37
let's talk just for a minute
52:39
Maybe real quick about what
52:41
you all are doing that pedantic the company
52:44
rather than pedantic the open source library Like
52:46
what do you all got going on there?
52:48
Yeah, sure. So Pedantic
52:52
has the company has released our
52:54
first commercial tool. It's called log fire and
52:56
it's in open beta So
52:58
it's an observability platform and
53:01
we would really encourage anyone interested to try
53:03
it out It's super easy to get started
53:05
with you know, just the basic like hip
53:07
install of the SDK
53:10
and then you start using it in your code
53:12
base and then We
53:14
have the kind of log fire dashboard where you're
53:16
gonna see the observability and
53:18
results And so we
53:20
kind of adopt this like needle in the haystack
53:22
philosophy Where we want this to
53:24
be a very easy to use observability platform
53:28
That offers very like Python centric
53:30
insights And it's this
53:32
I have opinionated wrapper around open
53:34
telemetry if folks are familiar
53:36
with that But in kind
53:39
of the context of performance One of the great
53:41
things about this tool is that it
53:43
offers this like nested logging and profiling
53:45
structure for code So
53:48
it can be really helpful in Kind
53:50
of looking at your code and being like we don't
53:52
know where this, you know Performance slowdown is occurring but
53:55
if we integrate with log fire, we can see
53:57
that like very easily in the dashboard Yeah
54:00
Yeah, you also have
54:02
some interesting approaches like specifically
54:04
targeting popular frameworks like Instrument
54:06
Fast API or something like
54:08
that, right? Yeah, definitely. Trying
54:10
to build integrations that work
54:13
very well with Fast API, other tools like
54:15
that. And even also
54:17
offering custom features
54:19
in the dashboard. If
54:22
you're using an observability tool, you're probably advanced enough
54:24
to want to add some extra things to
54:26
your dashboard. And we're working on supporting that
54:28
with Fast UI, which I know you've chatted
54:31
with Samuel about as well. Yeah, absolutely. I
54:34
got a chance to talk to Samuel about LockFire
54:36
and some of the behind the scenes of the structure.
54:38
It was really interesting. But also speaking of Fast UI,
54:40
I did speak to him. When
54:43
was that? Back in February. So
54:46
this is a really popular project.
54:48
And even on the, I
54:50
was like, quite
54:52
a few people decided that they were interested in even
54:54
watching the video on that one. Yeah,
54:58
anything with Fast UI? Sorry, did
55:00
you say anything with Fast UI? Yeah, yeah.
55:02
Are you doing anything on the Fast UI side? Or
55:04
are you on the Pydantic side
55:07
of things? Yeah, good question. I've been working
55:09
mostly on Pydantic, just larger user base,
55:11
more feature requests. But I'm done a
55:14
little bit on the Fast UI side
55:16
and excited to kind of brush up
55:18
on my TypeScript and build that
55:20
out as a more robust and supported tool. I
55:23
think, especially as we grow as a company
55:25
and have more open source support in general,
55:27
that'll be a priority for us, which is
55:29
exciting. Yeah. It's
55:32
an interesting project, basically a cool
55:34
way to do JavaScript front end and react
55:36
and then plug those back into Python API
55:39
as like fast API and those types of
55:42
things, right? Yeah, and kind
55:44
of a similarity with Fast UI and LogFire,
55:46
the new tools that there's pretty seamless integration
55:48
with Pydantic, which is definitely going to be
55:50
one of the kind of core tenants of
55:53
any products or open source things that we're
55:55
producing in the future. Yeah, I can imagine. That's
55:57
something you want to pay special attention to. how
56:00
well do these things fit together as a
56:02
whole rather than just here's something interesting, here's
56:04
something interesting. Yeah. Yeah. Awesome. All right. Well,
56:07
I think that pretty much wraps
56:09
it up for the time that we have
56:11
to talk today. Let's, let's close
56:13
it out. Close it out for us with
56:15
maybe final call to action for people who
56:17
are already using Pydantic and they want it
56:20
to go faster, or maybe they could adopt
56:22
some of these tips. What do you tell
56:24
them? Yeah, I would say, you
56:27
know, inform yourself just a little bit about
56:29
kind of the Pydantic architecture, just
56:31
in terms of like, what is core schema
56:33
and why are we using Rust for validation
56:35
and serialization? And then that can kind of
56:37
take you to the next steps of, when
56:40
do I want to build my core schemas based
56:42
on kind of the nature of my application? Is
56:44
it okay if imports take a little bit longer
56:46
or do I want to delay that? And then
56:48
take a look at discriminated unions. And then
56:50
maybe if you're really interested in improving
56:53
performance across your application that supports
56:55
Pydantic and other things, trying
56:58
out logfire and just seeing what sort of benefits you
57:00
can get there. Yeah. See where you're spending your time
57:02
is one of the very, you
57:04
know, not just focused on Pydantic, but
57:06
in general, our intuition is often pretty
57:08
bad for where's your
57:11
code slow and where is it not slow.
57:13
You're like, that looks really complicated. That must
57:15
be slow. Like, nope, it's that one call
57:17
to like some sub module that you didn't
57:19
realize was terrible. Yeah. Yeah. And I guess
57:21
that kind of circles back to the like,
57:23
LM tools and, you
57:25
know, integrated performance analysis with
57:27
CodSpeed and CodeFlash and even just other
57:29
LM tools, which is like, use the
57:31
tools you have at hand. And yeah, sometimes they're
57:34
better at performance improvements
57:36
than you might be, or it can at least give
57:38
you good tips that give you, you know, launching point,
57:40
which is great. Yeah, for sure. Or even good old
57:42
C profile built right in, right? If you really, if
57:45
you want to do it that way. Awesome. Yeah. Well,
57:47
Sydney, thank you for being back on the
57:50
show and sharing all these tips
57:52
and congratulations on all the work you and
57:54
the team are doing. You know, what a
57:56
success Pydantic is. Yeah. Thank you so much
57:58
for having me. It was. Wonderful to get
58:00
to have this discussion with you and excited that I
58:02
got to meet you in person at PyCon recently. Yeah,
58:04
that was really great, really great. Until
58:07
next PyCon, see you later. This
58:10
has been another episode of Talk Python to Me.
58:13
Thank you to our sponsors. Be sure to check out
58:15
what they're offering. It really helps support the show. Take
58:18
some stress out of your life. Get
58:20
notified immediately about errors and performance issues
58:22
in your web or mobile applications with
58:24
Sentry. Just visit talkpython.fm
58:27
slash Sentry and get
58:29
started for free. And be sure to
58:31
use the promo code talkpython, all one word.
58:34
Code comments and original podcasts from Red
58:36
Hat. This podcast covers
58:38
stories from technologists who've been through
58:40
tough tech transitions and
58:42
share how their teams survived the
58:45
journey. Episodes are available everywhere
58:47
you listen to your podcasts and at
58:49
talkpython.fm slash code dash comments. Want to
58:51
level up your Python? We have one
58:53
of the largest catalogs of Python video
58:56
courses over at Talk Python. Our content
58:58
ranges from true beginners to deeply advanced
59:00
topics like memory and async. And best
59:02
of all, there's not a subscription in
59:04
sight. Check it out for yourself at
59:07
training.talkpython.fm. Be sure to
59:09
subscribe to the show, open your favorite podcast app
59:11
and search for Python. We should be right at
59:13
the top. You can also find
59:15
the iTunes feed at slash iTunes, the
59:18
Google Play feed at slash Play and
59:20
the direct RSS feed at slash RSS
59:22
on talkpython.fm. We're live
59:24
streaming most of our recordings these days. If
59:26
you want to be part of the show
59:29
and have your comments featured on the air,
59:31
be sure to subscribe to our YouTube channel
59:33
at talkpython.fm slash YouTube. This
59:35
is your host, Michael Kennedy. Thanks so much for
59:37
listening. I really appreciate it. Now get out there
59:39
and write some Python code. Thank
59:42
you. you
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More