2.5 Admins 194: Thundering Mastodon by 2.5 Admins | Podchaser

Episode from the podcast2.5 Admins

2.5 Admins 194: Thundering Mastodon

Released Thursday, 9th May 2024

Good episode? Give it some love!

2.5 Admins 194: Thundering Mastodon

2.5 Admins 194: Thundering Mastodon

Thursday, 9th May 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Two and a half admins, episode

0:02

194. I'm Joe. I'm

0:05

Jim. And I'm Alan. And here we are again.

0:07

And before we get started, just a

0:09

quick plug. You were on late night

0:11

Linux with us recently, Jim. Seems unlikely. That's

0:14

what you said on late night Linux. You just try to

0:16

freak me out now. That's the joke. One

0:18

of them is already available and the

0:20

next episode next week will also have you

0:23

on it. So, yeah, link in the show notes. Let's

0:26

do some news then. There's been a

0:28

lot of buzz about Mastodon recently and

0:30

Ddosing people's websites. Because if

0:33

you post a link to Mastodon, then

0:35

it generates a preview. And if

0:37

you've got a lot of followers and those

0:40

people are on enough different instances,

0:42

then each instance will generate that

0:44

preview. And a lot of

0:46

people have started to complain that their sites

0:48

are just getting effectively Ddosed. Yeah. Because of

0:50

the way the preview caching works, when

0:52

you send out the message to

0:54

all your followers, all

0:57

the instances of people that follow you will see that

0:59

and try to cache it. And it means that when

1:01

you first send it out, you can end up effectively

1:03

creating a barrage of traffic against your own

1:05

website. This is more an issue of

1:08

clustering a bunch of requests, you know, all

1:10

into the same few seconds

1:12

than it is like an issue

1:14

of actual overall traffic. Because

1:16

we're talking about generating previews

1:18

here. It's not even as heavy as an

1:20

actual site visit. And in theory, you

1:23

would like for everybody who is

1:25

shared or is looking at the

1:27

link that's on Mastodon to

1:29

be able to actually view your website. And

1:32

if your website were capable of handling that

1:34

level of traffic, then the level of traffic

1:36

caused from generating these previews would just be

1:38

nothing. It would just be a wash. And

1:41

in fact, we can see some of this with, you know,

1:43

25 Admins in late night Linux. We

1:45

post our episodes on Mastodon. And

1:48

we haven't experienced a huge issue

1:50

with this. Now we did see

1:53

2.5admins.com locking up a few

1:55

times, right about the time that

1:57

Joe initially made that move. And I increased my two.

2:00

When it's on it to help

2:02

accommodate that that might be related.

2:04

But again, we're not actually sing problems.

2:06

With that said, there are folks who

2:08

get a whole lot more traffic and

2:11

you know, good cheer on a whole

2:13

lot more instances pretty quickly than we

2:15

do and they could see larger issues.

2:17

And with this is really come from

2:19

is the fact that when it's all

2:21

about the multiple instances, this doesn't happen

2:23

on like a twitter or blue sky

2:25

or threads because it's It's basically one

2:28

single monolithic saying the somebody shares it

2:30

is the site. Gets. The preview

2:32

and then everybody sees that task. Copy

2:34

the preview that lives. you know, on

2:36

Twitter or on Blue Sky or on

2:39

Threads or whatever or Mastodon when you

2:41

share something. Everybody who follows you from

2:43

a different instance once they need to

2:45

look at that, their own instance goes

2:47

in his your site and grabs it's

2:49

own tests. So rather than the original

2:52

instance, boarding along the link previewed any

2:54

other instances that need it, every instance

2:56

that needs it needs go to your

2:58

site separately and grab it and there's

3:00

nothing in the. Activity Pub

3:02

Protocol. To. Deal with this

3:04

yet so it's something that's gonna have

3:06

to be implemented and that's not easy.

3:09

I'm. Not sure is something that has to

3:11

be implemented because their security considerations here

3:13

and. Quotes. Fixing

3:15

this unquote would in part undo

3:17

some the that you get out

3:19

of Federation to begin with. Part

3:21

of the whole point of having

3:23

he amassed on be a federated

3:25

platform is this idea that you

3:27

know the individual pieces can actually

3:29

operate individually and you're also minimizing

3:31

some of the info set concerns

3:33

you know from having all your

3:35

eggs in one basket. So it

3:37

would be relatively trivial if we

3:40

ignore your necessary patch schedules and

3:42

a protocol inconsistency in whatever it

3:44

would be fairly. simple to say okay

3:46

well if i'm an instance and i

3:48

see that one of my users needs

3:50

a link preview because of something that

3:52

they're looking at coming from your instance

3:54

i can just requests a cast copy

3:56

that from your instance only thing is

3:58

if you do that now you're setting

4:00

up a chain in which you can

4:02

attack one instance by generating bogus previews

4:05

on another instance that you control and

4:07

expect that to actually travel from instance

4:09

to instance. So that's a

4:11

bit of a security concern, and I'm

4:13

not sure we actually want to fix

4:15

that. Well, also, it somewhat breaks part

4:17

of the point of the Federation in

4:19

that the idea is to distribute the

4:21

cost of the bandwidth and the processing

4:23

to create the preview. The

4:26

instance I'm on, and I have a huge

4:28

number of followers, and it's now responsible

4:30

for not just generating the preview for all

4:33

of its users, but now every other instance

4:35

is going to expect it to pay for

4:37

the bandwidth of providing at least one copy

4:39

of that preview, then that's

4:42

putting a lot more load on that one

4:44

instance. And the entire idea of Federation is

4:46

that I'm paying for my

4:48

users, not for every user in the

4:51

entire network. A lot of this comes

4:53

down to what your site is doing

4:55

in terms of generating the preview that

4:57

platforms like Twitter or Mastodon or whatever

4:59

use. And if your site

5:02

is generating really fat previews

5:04

that require a lot of bandwidth for your

5:06

site to deliver, maybe you should

5:09

look at trimming that on your own end.

5:11

Well, it depends. I think most of the

5:13

previews are actually software at the site is

5:15

going to pretend to be a browser and

5:17

load the page, and actually most of the

5:19

time is using the browser engine to render

5:21

a picture of the page. So

5:23

it's usually that

5:25

software rather than your website that's making the

5:28

preview. Yeah, but it's still hitting

5:30

your database if it's a PHP based

5:32

site, for example. Well, importantly, unlike all

5:34

like web scrapers, it probably is actually

5:36

loading all of the JavaScript and all

5:39

of the images because it's using an

5:41

actual browser engine to make a screenshot

5:43

of what the website looks like and then

5:45

scaling it down. And so as

5:47

far as the website is concerned, it's actually

5:49

no less expensive than a real visit

5:51

from a real user because it's going to run all

5:53

the JavaScript to make sure that the website actually looks

5:56

like it's going to look when you go there. how

6:00

this protocol works if every

6:02

instance is generating this as soon as I post

6:04

it, not the first time a user actually tries

6:06

to view it, because again, the idea of doing

6:08

this ahead of time is that the user sees

6:10

the preview right away, not, oh, I

6:13

have to go to the website and generate the preview as the

6:15

first time someone tries to load it. It's

6:17

effectively like however number of instances my followers

6:19

are spread across, all those people are going

6:22

to go to the website in a very,

6:24

very short time span, which is probably more

6:26

traffic than just the total you

6:28

might have gotten in a one

6:31

minute time span of all your followers going to the website

6:33

at once. Joe raised an excellent

6:35

point though. Joe mentioned that a

6:37

lot of the load issues that people are seeing are probably

6:39

not just bandwidth. That is a very

6:41

fair point. Now, the reason I was focused on bandwidth

6:44

is because this shouldn't

6:46

be generating database load on your

6:48

site because your site should already

6:50

be doing object and image caching.

6:53

And after the first load, it should be

6:55

using really nothing but bandwidth to feed the

6:57

other hoard of instances that want the exact

6:59

same thing. So if you haven't already got

7:02

that set up, yeah, this will take you

7:04

out quick because it'll ham it your

7:06

database to death when you suddenly get,

7:08

you know, potentially several hundred users in

7:10

the same second that are hitting full

7:13

fat page loads off of everything.

7:17

So if you don't already have proper

7:19

caching set up on your site, that's

7:21

the absolute first thing that you've got

7:23

to get taken care of. And honestly,

7:25

you really already needed to do that

7:27

properly. This just kind of draws

7:29

an underline under it. This might also

7:32

point out a slightly subtle part of the

7:34

idea of caching. So there's

7:36

two things that, for example, varnish does. The first

7:38

is the problem of a thundering herd. So

7:41

normally the way caching works is the first person comes

7:43

to your website, you pay the full freight for that,

7:45

and then we save what that website was. We save

7:47

the objects at the end so we can reuse them

7:49

next time. But if while that first

7:51

person were building up all the stuff to do that,

7:53

a second person comes along, most

7:55

websites by default will do all that work

7:58

again because there's no cache copy. And

8:00

then a third person and will do the work again. And and

8:03

you have. Eight. Workers.

8:05

Building up this full version of the web

8:07

sites and only once the first person actually

8:09

finishes do we have a cast copies to

8:12

serve to the next person. And

8:14

so mitigating that thundering herd often lines what

8:16

I ended as can be compared to do

8:18

It and Varnish will Do is only the

8:20

first person is going to actually make it

8:22

through to the back and and once we

8:24

have the cash copy we will then serve.

8:27

The. Page to everybody else and

8:29

Senate has a second mechanism

8:31

called the Grace Were basically.

8:34

If we have a timeout, say we're ah, we

8:36

only cash to pay for ten minutes so that

8:38

is somebody that has more. If people's comments or

8:40

whatever, the page will stay fresh with the great

8:42

when the first person answering the task has expired.

8:45

It's going to trigger loading the new version

8:47

in the background, that acid going to serve

8:49

that person and rebels that comes along until

8:51

the new taskers in is done the still

8:54

versions and ten minutes ago that way the

8:56

web sites it's working in which giving out

8:58

the flight discovers that the paid until we

9:00

have that movers in the page to surf.

9:03

And a poor man's version of this is

9:05

simply tuning your Apache or into Next properly

9:07

so that it doesn't attempt to serve more

9:09

concurrent page loads and your system can actually

9:12

managed so with the you got a cheap

9:14

five dollar Vm, it went over Digital Ocean

9:16

or Vulture wherever. If you've got your Apache

9:18

and your Ph P tuned down to only

9:21

serve five or ten concurrent visitors and it

9:23

can actually do that and you've tested that

9:25

in that works, then you really don't need

9:27

to worry too much about that because your

9:30

worst case scenario is you get city incoming

9:32

requests simultaneously or. five hundred if you prefer

9:34

and the first five or ten go directly

9:36

the back in but you can handle that

9:38

because you've got it tuned to where you

9:40

can't have to me concurrent a quest hitting

9:43

your back into your back and not to

9:45

be able to fulfill it so you for

9:47

for those five or those ten pain the

9:49

full freight sure but you know that that

9:51

only occupies half a second or so and

9:53

then you know the rest of the five

9:56

hundred of the thundering herd they do all

9:58

get fulfilled from cash we have a problems

10:00

if you didn't bother tuning your PHP stack

10:02

and your web server and instead you get

10:04

those 500 incoming requests and it tries to

10:06

fulfill all 500 straight from the metal

10:09

at the same time. That happens, you

10:11

got a problem. The thing about that

10:13

is, if that happens for any other

10:15

reason, 500 people all want to view

10:17

500 different pages at the same time, you

10:20

still have a problem because rather than putting them

10:22

in a queue and servicing them as quickly as

10:24

it can, your server tries to serve all of

10:27

them at once, falls flat on its face and

10:29

you get somebody calling you and saying,

10:31

hey, website's down. Right, five befores or whatever.

10:34

Yeah, to that exact point,

10:36

the only downside to relying

10:38

just on a low number of workers is

10:40

if it is people trying to load 10

10:42

different pages if you have a more popular

10:44

website, but if your website has that diverse

10:47

of visitors, you're probably not running off the

10:49

$5 VM at that point. You don't need

10:51

to go and set up a lot of

10:53

fancy varnish and write a whole VCL

10:55

config when if it's

10:57

a WordPress and you're probably only going to

10:59

have one popular blog post at a time,

11:01

then there's no reason that your $5 VM

11:04

won't be able to handle that. If

11:06

it's WordPress, there are caching plugins that will do

11:08

almost all of this work for you. Install

11:11

w3 total cache and install

11:13

the dependencies that it's looking for. I

11:15

would recommend memcache D and

11:17

PHP APC and make sure that it's

11:19

using those for all of the caching

11:21

mechanisms and like that's it, you're done.

11:23

Congratulations, you have a high performance stack.

11:25

Yeah, I've done that for news

11:27

websites for actual broadcast television stations

11:30

in other countries that had serious

11:32

amounts of traffic. And

11:34

it wasn't a $5 VM, but

11:36

it was only a couple of pretty modest web

11:38

servers. And they were able to handle a huge

11:41

amounts of traffic by just using that w3 total

11:43

cache. It's worth mentioning that the

11:45

Macedon developers are working on a proper fix

11:47

for this, but that has been delayed. And

11:50

in the meantime, they're sort of quick fix is to

11:53

make each instance wait a random time

11:55

between zero and 60 seconds before generating

11:57

the preview, which helps spread the The

12:00

little them yas marina load over even sixty

12:02

seconds to make a huge difference to that.

12:04

My server been able to handle. It. Surely.

12:07

Using a Cdn said mitigate this, shouldn't

12:09

it? Absolutely sure did. Now with that

12:11

said, we have seen a fair number

12:13

of the complaints come in include. You

12:15

know, hey, we we use Cloud Flair

12:17

and we still have this problem or

12:19

still getting the A West End. I

12:21

think it's important that we point out

12:23

that if you're using Cloud Player or

12:26

some other content distribution network and this

12:28

little bit of of traffic spike coming

12:30

from Mastodon is still taking your site

12:32

down. You have not deployed your Cd

12:34

and correctly and you need to fix

12:36

that and that's. Something. Like Debris Three Total

12:38

Cost has the stuff to do. That's because a

12:40

lot of those sedans rely on your origin web

12:42

server including the Right hatters to say city and

12:45

this is how long you can cast this for

12:47

And these are things need to do to tell

12:49

the difference if you're logged in, user and and

12:51

all lived in users so you can tell that

12:54

it's okay to test this even though it has

12:56

a cookie and things like that. When.

12:58

I did consulting for this kind of

13:00

stuff in the past. I did it

13:02

for a very, very large newspaper here

13:05

in Canada, and they were paying for

13:07

the most expensive Cdn if they're the

13:09

original Ot Cdm and. They.

13:11

Go to the Fiber! Just the origin

13:13

servers for the Cdn was enough traffic

13:16

that really puts the new new story

13:18

it's favorite. It hits from like two

13:20

hundred and seventy of these passing nodes

13:22

spread around the world for the Cdn

13:24

only. and that was enough to take

13:27

out their web stack because it was

13:29

a Microsoft. Based. West and so

13:31

we put a bunch of previous De Vargas

13:33

mention that stuff in front of. It's an.

13:36

Invalid. At all discussing to to make the

13:38

city and work. And yeah we literally sold

13:40

them a service to make their Cdn stop

13:42

taking out. Their website is a Cdn for

13:44

the Cdn, there was a Cdm for the

13:46

season isn't Almost or is the casting layer

13:48

in front of the website to protect the

13:50

website. Some the Cdm. Because. Your

13:52

web server with so limited being I spaced

13:55

and and all that back in with espionage

13:57

just. Didn't. Have any of the tuning that

13:59

we've been talking about. We put and

14:01

you next incarnation front of it so that

14:03

we could inject all the right configuration to.

14:06

Protect. said the Microsoft servers from

14:08

back of my servers. I've. Had

14:10

he is very for the exact same

14:12

thing in order to protect horrible little

14:15

vulnerable Ay as a as P servers

14:17

quite a few times. Microsoft.

14:20

Plans to lock down windows Dns like

14:22

never before. Well, I mean, that's a

14:24

pretty easy conclusion to draw given that

14:27

they've never locked down enough. Sort all.

14:29

Essentially, what we're looking at here is

14:31

a combination of encrypted Dns an i'm

14:34

an odd sort of then allow block

14:36

list seater to tell a Dns server

14:38

yeah, I am unwilling to resolve these

14:41

domains and I am won't resolve those

14:43

other domains along with a a protocol

14:45

bump to allow. I believe this is

14:48

an active directory. It it. Might be

14:50

further down the sack. You can tell

14:52

client Windows machines the only Dns you're

14:54

allowed to use is my Dns Yeah,

14:56

I think part of this is a

14:58

response to things like browsers deciding they

15:00

wanna do their own Dns over his

15:02

Bs and things like that yet, which

15:04

makes it very difficult as an admin

15:06

to block websites, right? Right to control

15:08

what's happening on your never Jeff. And

15:11

three other news your trusty A which

15:13

as encrypted and use Christie Africa Semitism

15:15

for the connections between me and user

15:17

and a Dns server so that you

15:19

know you talking to did he had

15:21

a server you're expecting So this ideally

15:23

would also stop machines from being tricked

15:26

into talking to you know in a

15:28

teddy teddy teddy that as in Google

15:30

and things like that so that when

15:32

you consider as for seats in a

15:34

corporate environment where they're at Jefferies pushing

15:36

a vigorous into the crime scenes they

15:38

know that to find machines are going

15:40

to get. Tricked into talking to something else.

15:43

to be clear that does also

15:45

implies that your dns server knows

15:47

exactly which authenticated user requested to

15:49

resolve mckee day eggs.com the i

15:51

guess it is neutral or semitism

15:53

to his yeah this feels like

15:55

very good for you as an

15:57

administrator and as an organization know

16:00

So good for the people who are

16:02

your actual users or employees You

16:05

know little column a little column B It's

16:07

bad for users or employees in terms of

16:09

privacy for the exact reason I just mentioned

16:12

But it's potentially good for them and that

16:14

it makes it easier for admins to keep

16:16

them from clicking the shiny link and doing

16:18

Really stupid crap. Yeah, it's one

16:21

of the issues right now is that it used

16:23

to be you know We expected to have things

16:25

like not granted They usually didn't work anywhere near

16:27

as well as they were you know build to

16:29

work But we used to have

16:31

like central firewalls, you know, that would do deep

16:33

packet inspection and say no, this is bad traffic

16:35

I'm not gonna let this get to my user

16:37

who I'm trying to protect Well,

16:39

you can't do that anymore when everything's

16:42

HTTPS and you know DNS over HTTPS

16:44

and yada yada yada It's it's

16:46

end-to-end encrypted. Well, if you're in the middle You

16:49

can't intervene for good any more than you can

16:51

intervene for evil It sounds like this

16:53

is gonna end up on a lot of compliance

16:55

forms possibly like I've already seen that where yeah

16:58

If you want to visit certain websites

17:01

now that have part of the office 365 SSO It's

17:05

like you can only do that if you're running a

17:07

machine that's running a configuration that's

17:09

compliant with the policy So

17:11

you can't just access it from any laptop. It has to

17:13

be a laptop that has all of

17:15

the corporate policies applied Like for example, maybe

17:17

using the same trust DNS Which

17:20

is basically a way to integrate Windows

17:22

is DNS engine into its filtering platform

17:24

Which is the core component of Windows

17:26

firewall and have that all happen on

17:29

the client but there's some extra pull

17:31

quotes here from Jake Williams

17:33

who's the VP of research and development at a Consultancy

17:36

on these and saying is basically providing

17:38

a way to have kind of an

17:40

input and output to the firewall Hooked

17:42

into this so that the

17:44

firewall has input on where you're going

17:46

and what's going on the windows firewall

17:49

to be clear Yes, sorry to be

17:51

clear the windows firewall you can trigger

17:53

firewall actions So data going into the

17:55

firewall will decide what the firewall does

17:57

but also trigger external actions for the

17:59

firewall output So instead of

18:01

having to reinvent this, you can have

18:03

anti-virus and your web filtering proxy and

18:05

all that stuff hooked in together and

18:08

knowing all about it. To be fair,

18:10

that also sounds like if your machine

18:12

already got owned, your attacker might put

18:14

in rules that say, hey, don't

18:17

bother firewall checking anything coming from badguy.com.

18:19

Right. Or you can only do DNS

18:21

lookups from badguy's DNS server, so you're

18:23

never going to get to the real

18:25

Google lever again. To be fair, malware

18:27

authors could already do that one way

18:29

or the other. They just break your

18:31

stack and make it non-configurable and, well,

18:34

usually break it as well. But

18:37

those attempts to lock you into a DNS server, I've

18:39

seen quite a lot of that for more than a

18:41

decade. OK,

18:43

this episode is sponsored by Tailscale.

18:46

Go to tailscale.com.25a.

18:50

Tailscale is an intuitive, programmable way to manage

18:52

a private network. It's

18:54

zero-trust network access that every organization

18:56

can use, and with Tailscale's

18:59

ACL policies, you can securely control

19:01

access to devices and services with

19:03

next-gen network access control. Loads

19:06

of the late-night Linux family hosts

19:08

use Tailscale for all sorts, including

19:10

controlling 3D printers, remoting into their

19:13

relative systems for support, controlling

19:15

home assistant, and sending ZFS

19:17

snapshots to off-site backup locations. I

19:20

got it set up in minutes, and you can too. So

19:23

support the show and check out Tailscale for

19:25

yourself. Go to tailscale.com.25a

19:27

and try out Tailscale for

19:29

free for up to 100

19:31

devices and 3 users with

19:33

no credit card required. That's

19:36

tailscale.com.25a. Let's

19:40

do some free consulting then. But first, just a quick

19:42

thank you to everyone who supports us with PayPal and

19:44

Patreon. We really do appreciate that. If

19:47

you want to join those people, you

19:49

can go to 2.5admins.com.slash support. And

19:51

remember that for various amounts on Patreon, you

19:53

can get an advert-free RSS feed of either

19:55

just this show or all the shows in

19:57

the late-night Linux family. want

20:00

to send any questions for Gemma Island

20:02

or your feedback, you can email shurru

20:04

at 2.5admins.com. Another perk of being

20:06

a patron is you get to skip the queue, which is what

20:08

William has done. He writes, is

20:11

there any decent consumer grade backup

20:13

media for right ones archival purposes?

20:16

My use case is I want to store some

20:18

personal photos as is in some kind of off-site

20:20

storage unit or something and really

20:22

only read from these backups in the event

20:24

of a disaster. The short answer is no,

20:26

there really just isn't. The somewhat

20:28

longer answer is you can't use cheap DVD-ROMs

20:30

not only because they're far too tiny and you'll

20:33

spend your entire life trying to back anything

20:35

up, but also just because

20:37

consumer DVD-ROMs don't actually live forever. They

20:39

degrade over time and you don't know

20:41

how long they're going to take to

20:43

degrade and once they have degraded, if

20:46

you weren't paying attention when it happened, your data is just

20:48

gone and you will never ever get it back. So

20:51

we can talk about larger optical drives.

20:53

Well, that's Blu-ray. Again, you can't

20:55

use just a regular El Chibo

20:58

Blu-ray disk, so you're looking for

21:00

something called archival grade. An archival

21:02

grade disk should in theory last

21:04

at least 10 years and some

21:06

claim up to a century, but again,

21:09

long-term testing of archival optical disks

21:11

has shown that sometimes they last

21:13

longer than the regular ones and

21:15

sometimes they don't. Even

21:18

worse when you talk about the archival grade Blu-rays, I

21:20

looked up prices just before we recorded and you're

21:23

gonna spend about $200 a terabyte,

21:25

so I'm just gonna go ahead and say, no, you're

21:27

not doing that either. So

21:30

essentially what you're looking at here is going

21:32

to be long-term cold storage of magnetic media.

21:35

You can use something like LTO

21:37

tapes, but I would honestly recommend

21:39

just consumer hard drives, man. Pick

21:41

2.5 or 3.5, whichever

21:43

form factor you prefer. If

21:45

you don't want to touch these things for long periods

21:47

of time, you're just gonna put them in cold storage in

21:49

a vault somewhere, you know, make sure it's climate controlled and

21:51

you don't have any magnetic field issues, you

21:53

should be fine. I would probably recommend

21:56

do two backups onto two different drives

21:58

and store them. You still should

22:00

really check them every few years, but it's going

22:02

to be a whole lot quicker and

22:04

easier to pull them and check them because you can

22:06

literally just import the pool and scrub and make sure

22:09

that you didn't have any errors pop up. And if

22:11

you didn't, you're fine. Yeah. And

22:13

sort of that, like especially looking at comparatively

22:15

the price of the archival

22:17

grade Blu-rays for that much

22:19

money over even a 10-year

22:21

term, having a machine with

22:23

that hertrize spinning and scrubbing it

22:26

every month, it's going to cost you less per

22:28

terabyte, having a whole computer and

22:30

electricity. Easily. It's not even going to

22:33

be close. Hard drives down

22:35

to like almost $10 a terabyte now versus $200

22:37

for the archival media. And

22:42

like Jim said, that archival media is like they

22:44

say maybe it'll last that long, but

22:46

Blu-rays haven't been around long enough to be 100% sure. And

22:50

the sample ones that the lab used lasted that

22:52

long. How long do the ones you bought at

22:54

a store that's out on a shelf and we're

22:56

in a truck for who knows how long and

22:58

all the other facts that might mean they

23:00

just don't work anymore. And if

23:02

you had two hard drives, you could maybe

23:04

pull one out one week, scrub it, make

23:06

sure it's all good, take it

23:09

back, take the other one, scrub that. So you've

23:11

always got one in the backup

23:13

location. Yeah, absolutely. And it doesn't have to be

23:15

weekly like Jim was saying. Like if you're checking

23:17

them twice a year or something, that's

23:19

probably fine too. Yeah, but don't take them both

23:21

out at the same time because then you don't

23:23

have anything in that emergency backup location. Yeah. You

23:26

don't want the house fire to hit you

23:28

while you're checking your backups. Yeah.

23:31

And more importantly, this should not be the

23:33

only backup obviously like as an extra like

23:36

third or fourth backup. Yeah, fine. But

23:38

this should definitely not be your primary backup.

23:40

Agreed. If for no other reason

23:42

than that your primary backup should be a lot easier

23:44

to get at than this, honestly. Yep. Tony,

23:47

who's a patron, also skipped the queue. He writes,

23:49

I have a Microsoft SQL Server that I need

23:52

to try and get more performance out of. I

23:55

do not see any memory or CPU bottlenecks.

23:57

So I'm planning on putting in two Dell

23:59

SATA SSDs. and mirroring

24:01

them in ZFS on my Proxmox.

24:04

Would it be better to use a Z-VOL versus

24:06

a Q-COW2? Also, by

24:08

using Q-COW2, would I prematurely wear

24:10

my SSDs? Okay, so to

24:12

start out, you're absolutely on the right track for

24:14

improving your performance on any kind of a database

24:16

engine. Yes, you do want

24:18

to put that on ZFS mirrors, and

24:22

the Dell SATA SSD should do fine.

24:24

It's not necessarily going to be the

24:26

highest performance solid-state you could get, but

24:29

again, it should be fine. Especially if these things look

24:31

like improvements to what you have now, you should be

24:33

very happy with it. The other thing

24:35

I'll mention is usually I

24:37

prefer either RAW files or Q-COW2 to

24:39

Z-VOLs, but you are a Proxmox user.

24:41

So Proxmox's user interface is

24:43

going to fight you pretty hard on trying

24:45

to create VMs using Q-COW2. I would

24:48

say just go ahead and do Z-VOL because that's

24:50

what Proxmox really wants to do in your Proxmox

24:52

user. Now, the one thing

24:54

I would advise you, Proxmox is going to

24:56

by default make the VOL block size either

24:59

8K, which is far too small, or 16K

25:01

if it's the newest version of Proxmox, which

25:03

is still probably smaller than you want.

25:06

SQL Server has a variable page size,

25:08

but it operates typically in extents rather

25:10

than pages, and a default

25:12

extents size on SQL Server is 64K. So

25:16

I would probably recommend VOL block size equals

25:18

64K for that virtual machine. The

25:20

combination of that and two SSDs in mirror

25:22

is going to make you very happy. Yeah,

25:24

a couple of caveats. Depends on

25:27

the version of Microsoft SQL Server, but

25:29

in the past it had problems

25:32

if you tried to expose a

25:34

physical sector size that it didn't

25:36

understand. Really old ones didn't

25:38

even support 4K, but I'm sure you're probably newer than

25:40

that now. But depending

25:42

on how the hypervisor works, it might actually

25:44

expose the VOL block size as the sector

25:46

size to the VM, in which

25:49

case Windows will see that that virtual hard drive

25:51

is having a sector size of 64K, and

25:54

the Microsoft SQL Server might say,

25:56

no, that drive's not compatible. So if

25:58

you do see that, then you might have to... configure

26:00

Proxmox to have the

26:03

Volblock size match whatever the SQL server is

26:05

looking for. But if it

26:07

doesn't yell at you, just do a Jim

26:09

set and it'll be fine. You absolutely do

26:11

not want to try to match Volblock size

26:13

to anything that an ancient version of SQL

26:15

Server thinks of as the physical sector size

26:17

of the drive because that's going

26:19

to be way way way too small. In the

26:21

unlikely event that that happens, I

26:23

would recommend instead like that is the point where

26:25

it becomes worth fighting Proxmox's UI and just creating

26:27

a raw file in a data set and basing

26:30

your VM on that. It is possible to

26:32

do in Proxmox, it's just a pain in

26:34

the butt. Yeah and I'm sure

26:36

that there's also probably a setting somewhere in Proxmox

26:39

to just tell it what sector size to tell

26:41

the VM the disk is no matter what it

26:43

is. So I know BeHive on BSD can do

26:45

that and that's what we've done

26:47

in the past when trying to support existing

26:50

SQL Server installs that were being

26:52

migrated from VMware or whatever and we had to

26:54

make it feel the same so that it would

26:56

just keep working. What about prematurely

26:58

wearing SSDs then? That's not an issue.

27:00

Using QCal2 storage or raw file storage

27:02

is not going to produce any

27:04

amplification. It's gonna be no different than

27:07

doing Z-Goals. Yeah if you did a

27:09

QCal2 and left the default record size

27:11

of 128k it might be a little

27:13

bit but also unless

27:15

you're buying stupendously low-end SSDs you're not going

27:18

to be able to wear the SSD out

27:20

especially a SATA-interfaced one where you're not going

27:22

to be able to write gigabytes per second

27:24

to it 24-7 for five years. I'm gonna

27:28

push back on that one. It is entirely possible to

27:30

wear through the whatever random crap

27:32

Dell sends you for SSDs in five years

27:34

depending on your workload. I have absolutely had

27:36

small business clients do it. If

27:39

this is fast and about in a home lab no

27:41

it's very unlikely but if this is Microsoft SQL in

27:44

like a real production environment with say 10

27:46

engineers hitting it all day long yeah

27:48

you may very well be able to burn through

27:51

that in five years. I think a point I

27:53

failed to realize that yes when you said a

27:55

Dell SSD it's going to be whatever Dell gave

27:57

you not the high-end when you picked that had

27:59

a high Delix

28:01

has charged you quite a bit more for the drives

28:03

that actually have a reasonable number of drive rights per

28:06

day. Yeah, and when you just say DelSATA, we don't

28:08

actually know if we're talking about, you know, if it's

28:10

going to be an off-label, you

28:12

know, rebranded consumer Samsung, which is usually

28:14

what you get when you buy, quote,

28:16

Del branded SSDs, you know, like for

28:18

a laptop or, you know, any consumer

28:20

device, or whether it's going to be

28:22

white labeled enterprise SSDs, which you can

28:25

also get from Del, which they will

28:27

charge you way too much for. Another

28:29

option, if you don't have to stick with Del branded

28:32

gear, and you just want a

28:34

really good, not that expensive, solid state drive

28:36

that will offer you a ton of performance

28:38

and write endurance and hardware QoS that keeps

28:40

that database operating, you know, with

28:42

consistent low latency that you're looking for. I

28:45

really like Kingston's DC600M line. They

28:48

offer about double the endurance per terabyte

28:50

of drive size that high-end

28:52

consumer drives, you know, like a

28:54

Samsung Pro offer. And

28:57

that hardware QoS is no joke, man. They

28:59

don't look quite as fast with like a

29:01

single threaded FIO run going across

29:03

them. But the difference

29:05

in latency between your worst results

29:07

and your best results is just

29:09

almost non-existent almost drives. Whereas with

29:11

consumer SSDs, it can

29:13

be quite large. Most of

29:15

the size would you be looking at for this job, Jorgin? That's

29:18

the other thing I was just about to mention to Jim's point.

29:21

Don't get the smallest SSDs that will fit your

29:24

database. The bigger the SSD is, the more endurance

29:26

is going to have, especially if you're not using

29:28

it all. And they tend to

29:30

also be better performance just because you're spreading that

29:32

same amount of work out over more flash cells.

29:35

Your minimum purchase should be a one terabyte

29:37

SSD these days. There's just no reason to

29:39

buy smaller than that. And

29:41

if one terabyte isn't enough to get you

29:44

to, let's say, 50% drive space free when

29:46

you first dump your workload on it, then

29:48

you should be looking at larger drives than

29:50

that. But don't buy anything smaller than

29:52

one terabyte. There's just no reason to. Right. Well,

29:55

we'd better get out of here then. Please show

29:57

at 2.5admins.com if you want to send in your questions. questions

30:00

or feedback. You can find

30:02

me at jarwrest.com/mrdon. You

30:05

can find me at mercenaryassistadmin.com and

30:07

I'm at Alan Jude. We'll see you next week.

Rate

Get this podcast via API

From The Podcast

2.5 Admins

2.5 Admins is a podcast featuring two sysadmins called Allan Jude and Jim Salter, and a producer/editor who can just about configure a Samba share called Joe Ressington. Every two weeks we get together, talk about recent tech news, and answer some of your admin-related questions.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More