AWS Made Easy

Ask Us Anything: Episode 10

Episode 10
July 12, 2022
1 h 03 min

This episode is #10, and to celebrate Rahul and Stephen invited Badri Varadarajan, guest from Episodes #1 and #2.

Latest podcast & videos

September 27, 2022November 3, 2022

1 h 07 min In this episode, Rahul and Stephen continue the theme of Behind the Scenes by showing some of the automation which makes AWS Made Easy possible.

September 20, 2022September 28, 2022

1 h 07 min In this episode, Rahul and Stephen recap the "Behind the Scenes" episode 1, and then discuss a few new AWS announcements, and plan for Behind the ...

September 13, 2022September 20, 2022

1 h 10 min In this episode, Rahul and Stephen begin part 1 of a 3-part series in showing #AWS-powered automation, developed with DevSpaces and DevFlows, to show how they ...

August 30, 2022September 19, 2022

1 h 03 min In this “What’s New Review” post, Rahul and Stephen go over a variety of announcements from AWS. Most of the articles rated very well, with the ...

August 17, 2022September 19, 2022

1 h 11 min In this episode, Rahul and Stephen film from Anaheim, where they were attending an AWS Partner Summit. They filmed from a makeshift studio in a hotel ...

View all »

Summary

This episode is a “What’s New Review”, where they reviewed the following articles.

Announcing Heterogeneous Clusters for Amazon SageMaker model training

In this segment, we discuss the announcement by AWS about heterogeneous clusters for SageMaker model training.

Amazon SageMaker Autopilot experiments are now up to 2x faster

SageMaker Autopilot experiments now run 2x faster due to an innovation called zero shot hyperparameter initialization.

Badri mentions a paper discussing this new method. This paper is located here:
https://arxiv.org/pdf/2203.03466.pdf

The paper seems to suggest larger performance gains are possible in theory. However, this is often the case that results found in a theoretical setting are optimistic compared to what is achievable in production settings.

Announcing General Availability of EC2 M1 Mac instances for macOS

EC2 Instances running the Apple M1 processor, an extremely fast ARM-based CPU/GPU, are now generally available. The major implication of this is that developers targeting the Apple M1 can now run their build/test/deploy steps in an EC2-based CI/CD setup. This will be especially useful in multi-architecture setups, where most likely developers are already using a CI/CD setup. Developers will no longer need to do M1 testing on local hardware.

AWS Security Hub launches 36 new security best practice controls

AWS Security Hub has released 36 new controls for its Foundational Security Best Practice standard. AWS Security Hub checks the assets and configuration in your AWS account. In this segment, we discuss security hub, and come up with talk about ways to that the changes suggested by security hub can be automated.

AWS Identity and Access Management introduces IAM Roles Anywhere for workloads outside of AWS

This announcement is an interesting one. With IAM Roles Anywhere, one can enable external workloads to access AWS resources via temporary X.509 certificates, rather than long term AWS credentials. It will be interesting to see the use cases that evolve using IAM Roles Anywhere.

Transcript

[music]
Stephen

All right. Hello, everyone, and welcome to “AWS Made Easy; Ask Us Anything”, episode number 10. We’re so glad that you could be here with us. And welcome Rahul and Badri. And also thank you to our ASL interpreters, Marisa and Karen for making this stream accessible. How’s everyone doing? Wow, we’ve made it 10 episodes. How are you, Rahul? How are you, Badri?
Rahul

I know.
Badri

Was I… How did you guys have done 10 episodes already? I thought I came to the first [inaudible 00:00:48].
Rahul

Yeah. Feels like quite an achievement and I think we should celebrate. Yeah, getting to 10 episodes is pretty awesome. When we started this series, you know, we were like, let’s just do it for fun. Let’s see, you know, how it goes. And we are already 10 episodes in, which is pretty neat.
Badri

Yeah, no I’ve been following on LinkedIn. I just didn’t realize it’s been 10 episodes.
Stephen

Oh, it’s been really fun. And it’s fun doing this with you. And so, thank you. You were our original guests in episodes one and two. So really glad to have you back for our order of [crosstalk 00:01:32]. We have to have you again for episode 100.
Badri

There you go.
Rahul

Awesome. So, how’s the week been for you guys or the weekend?
Stephen

Let’s see. What did I do? It’s very relaxing. Well, my brother and my brother’s girlfriend came into town from Turkey and they brought this enormous plate of baklava with them, which is my total weak point. So I’m going to have to get to the gym after this to compensate for the half a tray of baklava I ate. So that was nice. Haven’t seen him in a while.
Rahul

Awesome. Badri, what about you?
Badri

We were going go up to go up [inaudible 00:02:20], but there are fires. So, they’ve closed [inaudible 00:02:28] and everything. So quiet weekend.
Stephen

And Rahul, how about you?
Rahul

I had my taste of baklava. So I just returned back on Sunday from a five-week trip working remotely. So we spent three weeks in Seattle, a week in Hawaii, and then last week I was in Dubai. So yeah, I got my taste of baklava there as well, apart from all the other Middle Eastern sweets. And then yeah, just got back after five weeks. It’s awesome to be able to travel like that, you know, be completely remote and travel and work from anywhere and then come back, but it also then feels even better to be back. And I apologize in advance for a little bit of a humming that you might hear from some of the servo motors that are running on my printer. I have a 3D print going and it’s been hours, I think almost seven hours since this printer has been running. And it says another 20 minutes to go. So I feel scared at this point to turn it off or pause it.
Stephen

Well, you have to power through it. What are you printing?
Rahul

I am actually printing a case, a wall mounting case for an iPad for my home assistant dashboard. It’s been long time coming. And while I was away, I had a lot of my automation run. Amazingly, I have a lot of plants up in my terrace, and I built this, you know, drip irrigation system, which is all custom-controlled via my app that I wrote. And all my plants survived six weeks or five and a half weeks that we were away. So I count that as an accomplishment here. Just my final test run was about two or three days before we left. And then to have this thing run seamlessly without either the pipes bursting or, you know, something going wrong with the solenoids or the relays was quite an achievement given that everything was built from scratch. So, yeah.
Badri

You know, what I think you should do is every time you do one more automation, you should put it on the cloud fix blog [inaudible 00:04:44].
Rahul

Yeah. Have a separate tag in there for all the home fixers and cloud fixers. Awesome.
Stephen

That’s really neat. Oh, nice work. That’s a real stress test because those were some hot weeks that you were… Yeah, those plants survived some hot weeks, so that’s…
Rahul

Oh yeah. Chennai went up to like 44 degrees centigrade or something like that in the middle.
Stephen

It’s 44 for our U.S. viewers of what 115, 110?. It’s a lot.
Rahul

Yeah, 150? Yeah, it’s 115 plus. And it gets dry here. So you know, it was…It’s a miracle that all of it survived. I was monitoring it from the cameras while I was there. And I had to augment a little bit of the automation with extra doses of water being given on some days, but yeah, otherwise… And I was just very pleasantly surprised that it all just worked.
Stephen

Congratulations. That’s a really neat achievement
Rahul

Yeah, we have about 15 lemons that have kind of budded, so that was a nice surprise coming back in…
Stephen

Oh, cool. All right. Well, for this episode, we’re going to do a What’s New Review. So as we know, AWS, they’re publishing announcements all the time. We have a company chatbot that gives us a ping whenever there’s a new announcement and sometimes goes off 5, 6, 7 times a day or more. And so we thought we’d pick a selection of those over the last couple of days and just go over them one at a time, see our thoughts, and what it means for how it changes things, or maybe doesn’t change things, what we could do better differently. So let’s just jump into it. We’ll just start the segment with announcing heterogeneous clusters for Amazon SageMaker model training. Let’s play our intro.

And speaking of SageMaker, in addition to making these segments being fun, Amazon has another thing called segment detection. So when I break up this video into highlights, those little transitions help the segment detection using SageMaker or some machine learning model in the background. Okay, [crosstalk 00:07:10].
Rahul

We should have to do a completely separate segment by the way, on how we produce all of these videos and create the segments and then put them up on LinkedIn and our entire workflow and pipeline that’s also on AWS to do all of this stuff. I think that [crosstalk 00:07:24] fun conversation.
Stephen

Yeah, it would do a behind-the-scenes episode. That would be great. All right, we’ll keep that up. This sounds really fun.
Rahul

Yeah. So this particular one, where you have heterogeneous clusters for Amazon SageMaker model training, definitely interesting. I think it just boils down to the fact that training these models starts getting so expensive, that everyone is looking for new ways to save money. And the fact that you can actually have different kinds of, you know, instances in a pool depending on, you know, what your savings plan look like, your CRIs or your RIs look like, you might be able to leverage some of those instances in some of your training models. So instead of just being stuck to one instance type, you could leverage any spare capacity you have lying elsewhere to run all of these models and thus save some money.

For me though, you know, honestly, I just wish they had spot instances, but they probably don’t have spot instances for two reasons. One is either they don’t have capacity. If they don’t have spare capacity, which I mean, given the amount of training and inferencing and stuff that happens right now in AWS and the fact that pretty much for everyone, the very first workloads that they move to AWS is machine learning related, I can imagine a situation where they don’t have any spare capacity left to allow them in spot. It would be a good, you know, experiment to run, to see if you can acquire any of the P2…the P instances, or the P family instances on the spot.

But that could be reason number one. Reason number two is it’s just not been implemented as a feature yet. And I don’t quite know what the complications might be because spots have been part of the compute platform for a very, very long time now. So it should…I had assumed be pretty standard as part of the underlying infrastructure to bring into any other service that leverages a bunch of compute. But Badri, thoughts?
Badri

Yeah, I agree. This part thing is, was a question that struck me. This is interesting, right? Is how database is playing, like three, four different personas in the ML world. This is for advanced users who also have large workloads, right? I mean, by the time you get to a point where you are controlling your own training pipeline and you have enough data to clearly separate out the data prep and training tasks, you’re a fairly advanced and heavy-duty user of SageMaker. I think we’ll talk about some other things which are for more hands-off users, right? So AWS, I think, is doing what AWS does, which is they give you everything. They give you this whole menu and you [inaudible 00:10:25] what you want to use. But I’ve felt this pain point, and it’s interesting that they did it this way. I think what’s interesting… I don’t know if they let you write different code for the two different types of instances. I read something in there that said it’s the same code that runs and you just sort of…
Rahul

Yeah, I think it’s the same code. It’s almost like an autoscaling. It’s basically like your spot fleet or your, you know, autoscaling fleet that you can have where it’s just a bunch of different resources. Some of those training models on smaller machines will run longer. Some will run faster.
Stephen

And I suppose the code can intro inspect itself and say, “Do I have a GPU?” And follow a certain path depending on if it did or not.
Badri

Right. That’s what the documentation seems to imply. I can see it being so useful. I mean, basically, it’s a huge problem. Data preparation is a huge problem for lots of pipelines.
Rahul

Correct.
Badri

So, yeah, makes sense to separate that. And to your point, Rahul, I think that might also let you do spot. I don’t know that the clusters thing, heterogeneous clusters places with spot, but there is, some portions of the SageMaker workflow that do run on spot. So maybe that also lets you use spot for the computer-intensive portions, right. Because those are standard instances, I think.
Rahul

Yeah. I think I was reviewing some of the pricing for these clusters, and it was either on-demand or you could do reserved instances, but there was no mechanism of doing spot in there. There wasn’t spot pricing available, but something I need to go look into it, but I didn’t see spot being available for these clusters.
Badri

Yeah, yeah [crosstalk 00:12:22].
Rahul

I’ll take a look at it for sure.
Badri

… Spot, but somewhere below there is managed spot training. I think it’s just one of those things that they were not the least problem.
Stephen

It’s pretty neat. I know it’s kind of a… It’s pretty neat that we’re even talking about it. I remember as a teenager, I built a Beowulf cluster, and it was so comp… You had to match the network cards, you had to match the library, you had to match every last little detail or, you know, it would just go haywire. It’s the idea that, oh yeah, let’s just add a few nodes randomly. Maybe they’ll drop out randomly. So, it’s neat that we could even think along these lines.
Rahul

Yeah. I mean, there used to be a system…there used to be a piece of software called… I think it was called Globus if I’m not mistaken. And this is back in 2003, we had actually an eight-machine cluster that we had set up to do this, you know, distributed processing. It was an optimization problem that we were working on. And yeah, we had to manage every aspect of that distributed compute that was happening then. And now that you can kind of just build these resilient systems that just assume that machines will go down like that’s the base assumption that the machine will go down. You don’t have to start worrying about, you know, a scenario where, you know, you have to take care of an event, like a machine going on. So [crosstalk 00:13:44] everything is different since then.
Stephen

Yeah. It’s pretty neat. It’s really exciting the tools we have access to. All right. Do we have anything else to say about heterogeneous clusters? We want spot.
Rahul

We definitely want spot. I think having the spot fleet, if you could run spot fleet in this SageMaker model training, I think that would be a game-changer. If your costs were that low, whether just a fraction, my… And I’m gonna run an experiment and let’s talk about it next week. I want to run an experiment to see what spot capacity AWS has.
Stephen

All right.
Rahul

The last time I ran an experiment to test AWS’s spot capacity, I got a call from them because at that time I was running X132 Excels to understand what spot capacity they had on that. I was trying to launch 100 of them. And I think I got a panic call from someone in AWS fleet planning or capacity planning. So…
Stephen

All right. Our audience, we owe you the results of an experiment. We’ll get back to you next week.
Rahul

Absolutely.
Stephen

All right. Let’s take a… Let’s switch gears to the next one. Here it goes. All right. We are talking about Amazon SageMaker Autopilot experiments now up to 2X faster. The SageMaker Autopilot, the idea it’s running a whole different…a bunch of model. You give it some raw data, it’s going to run different feature engineering. It’s going to pick a couple of different models and give you a leaderboard of which models are doing the best for that particular data, right? So sometimes random forests do well. Sometimes neural networks, it’s depending on the structure of your data, different models can exploit that in different ways. So now 2X faster, that’s a pretty good leap. What’s driving this leap?
Rahul

Badri, you wanna start with this one?
Badri

Yeah. Yeah. So I wish they had this a few years back. So back in 2017, we were doing this exact same thing. Basically, we needed to train a computer vision model and everybody does this by the way. So what you do is you don’t have data sets to train your entire big model. So you take whatever somebody else is trained with. I mean, a YOLO network or a resident network, take that and then just do have basically train it to a different set and relearn, make it classify your classes instead of whatever that came with. The zero-shot hyper-parameter initialization is the big one. I think that’s how they get most of it. There’s also something about the instances that are used under the hood. I don’t know that that makes it necessarily faster.

There’s this really neat paper I read a few weeks back about how much they got, I mean, a particular team got by using zero-shot HPI. Yeah, I think that’s the one. I checked it earlier. So, those were like eye-popping numbers, right? 10X even close to, like, 100X improvement in training speed. This I think is what SageMaker is doing to address the middle persona here, right? There’s on the, what we talked about earlier is, like, this power user, they understand everything they need to understand.

Then there is this other user who is like, “Okay, I don’t know anything about this. I’ll just use recognition, like the complete managed service.” And this is the middle user, they want to do some training, but maybe not a lot of training. I expect most people fall into this category because their out of the box models don’t quite do what they want it to do. So they want to do some training, but not a lot. So this is neat. I assume that’s what AWS did. They went and did some reasonable defaults and got 2X improvement.
Rahul

Yeah. I’m actually very surprised by the gap in the savings between what the paper was claiming and what AWS’s numbers are. So there probably is something fundamentally different because when they came to… The paper, I think says something like the eventual cost was just about 7%. There’s still a lot left on the table, and I wonder why. Either the approach is different and there’s gonna be more fine-tuning coming in, or it’s also possible that the… Yeah, I guess since you’re operating in kind of three different… You’re trying to basically figure out what your best answers could be in terms of latency of your inferences, the model that gives you the faster latencies.

The size of the model, of course, because it depends on where you want to install the model. And then the third part of it is the cost of actually training and the cost of inference. And it basically becomes a trade-off at the end of the day once you have a selection of these models. Are they doing something different to be able to get better at those three dimensions and getting models and answers across those three dimensions, which is why it’s taking them to only the 50% mark instead of the 90%?
Badri

Yeah, that’s interesting. Actually reading this, it looks like the two data points they gave us that were for different data set sizes, as you [crosstalk 00:19:26] listing dimension would be the model size, not the data set size.
Rahul

Correct. Model size would be interesting to figure out, you know, what… Because that is one of the trade-off parameters that you have to come up with. So, it’ll be interesting to see. The dataset sizes are different. And the larger your dataset size, your improvements are smaller, which to me is a little counterintuitive.
Stephen

And I wonder if it’s getting better at pruning the space of models at which it’s doing, right? Because it has a selection of models that’s going to run over selection.
Rahul

No, this is actually just hyper-parameter tuning, right? It’s not model selection. So this is primarily just hyperparameter tuning.
Stephen

Yeah. So… got it, okay.
Badri

Fewer hyperparameters, I assume actually that might be what it is that if you look at all those, like, 10X, 100X improvements, those are the… I mean, those gains are high when you have more parameters in your model. And I think this is looking at a completely different dimension which is why we are seeing these other results. It would’ve been interesting if they had said what gains they got for a model with 100 parameters versus 1,000 parameters. I assume there the intuition you mentioned, Rahul, holds that the speed up would be much better for the 1,000 parameter model than the 100-parameter model.
Rahul

Yeah. Okay. Let’s do a takeaway. We will reach out to the AWS team and get an answer on the comparison between that paper and to really understand what they did.
Stephen

Yeah.
Rahul

I think this is a very, very new announcement. We haven’t gotten down to circling back with AWS on this, but yeah, we’ll definitely want to bring this back.
Stephen

Yeah. Okay. Yeah. And it is always interesting that I guess that difference, you know, having say academic background, right. What’s theoretically possible, and then what is actually possible in real-world scenarios. I remember I read this great operations management paper, and it was talking about distribution systems and it was all about supply chains. And it came up with this great algorithm. At the very end, it had this little footnote said, oh, for N equals more than two, this is computationally intractable. Oh, okay, great. I guess unless you have exactly one warehouse and one retailer, this is a great answer, but otherwise, I guess the author got their paper published. So it’s interesting to see what because obviously Amazon has to actually make this work in practice. So it’ll be interesting to see what they ran into, which, like you said, stops from getting to that upper bound of performance.
Badri

Right. The other interesting, I’m curious to hear you guys’ thoughts on this is, they’ve offered these three persona, right? I wonder which one is most used. I never can get a sense of this. There is the expert user, there is this which is for the intermediate, and then there is the third, which just says, okay, I trust candle or I trust recognition or whatever.
Rahul

So I think for the vast majority… So I put it in two categories… Or actually, yeah, the three categories are one is you have the super-advanced users who are building their own models from scratch and doing every aspect of it, controlling every parameter of it. The second one is ones who are using autopilot to build models. And the third one is your vertically AI services like recognition, comprehend, forecast, personalize, all of those. Is that your question? Is that your third category?
Badri

Yeah, that’s my…
Rahul

Okay. So, I actually think the vast majority of machine learning customers, especially the ones… I mean, I’ve spoken to a bunch of them and I get the impression that they rely a lot on hand-building their machine learning models. They use the compute, they use SageMaker notebooks, they use all of that stuff, but instead of letting something like autopilot run and finances, the people who are actually doing this work know what they want to do and therefore just run the models themselves more often than not. So I’d say that has the highest use. People who are building end-user applications and want something that’s a cursory feature or I’d say a peripheral feature of their application tend to use their vertical or a lot more of it.

So for example, if, you know, providing not safe for work is a peripheral feature of your product, not something that’s core and central to it. Like it’s feature number, you know, 75 in the list of features, then they tend to kind of take this as an approach to kind of, you know, using recognition or something like that to do it. I have not known of very many products that have started using these services as their first choice for their core and fundamental feature. Like, I don’t know if anyone… If you look at a service like transcription, if you look at what Otter is doing, you look at you know, a bunch of other providers who are doing transcription, you find that none of them is just packaging up comprehend… Oh, sorry. None of them is packaging transcribe and comprehend to do all those basic things. They are actually building custom models, right?

And that is a little surprising. I haven’t come across any product, which just built a very thin layer over these AWS services. Though I think they should. I think it makes perfect sense to do that. So yeah, I don’t think that many, that very many people are actually using those services. And I think a bunch of newbies who are getting into the ML world, still trying to discover it, still trying to figure out what it can do for them, are starting to use Autopilot. So, people who have all the data but haven’t really, like, they don’t know how to get started with ML in general, they’re, like, okay, here’s an algorithm. Let me see what you can… You know, here’s what I care about. Let me see what it can produce.
Stephen

Having been a data science consultant, I mean, oftentimes as a new data scientist, you’re so excited about accuracy, but oftentimes the business requirement just, you need to be right more than half the time. And so there’s a diminishing return on squeezing out that last little bit of accuracy. And often just picking the best model out of autopilot and running with it is probably good enough.
Rahul

Yeah. And I think the reason why I like Autopilot is because it actually gives you…it makes you take those trade-offs, right? It says your three dimensions are accuracy, latency of your inferences, and the size of the model. Right? It literally gives you those three parameters and it gives you the best models in those three categories, and you go pick the one that meets your business requirement. And I think using automaker forces you to make that trade-off across those dimensions. I don’t think most data scientists even think about it as they are… You know, unless it is governed, unless some constraints are governed by hardware, like model sizes, I don’t think most data scientists even think about where the trade-off line is gonna be made between accuracy and model size and latency. Yeah. They’re trying to get the best results and they’ll go endlessly in a number of cases to the point of diminishing returns. Yeah.
Stephen

Again, the real world versus I guess theoretical trade-off. Are any final thoughts on SageMaker Autopilot experiments?
Badri

Just one. I wish if they had this in 2017, my life would’ve been easier. I literally did this, basically, we used to get these things assembled at fries, these custom machines to do our inferences with like graphics cards in them, and talking off differentiated heavy lifting. It was heavy lifting, those boxes were heavy to [inaudible 00:28:26] And our big insight was we could actually use the gaming stations to do more or less the same kind of thing. So if somebody had given me this, that’s like one summer of my life not living [inaudible 00:28:39] in the office.
Stephen

Oh, that’s…
Rahul

We’ve all lived through scenarios like that.
Stephen

All right. That’s perfect. All right. Let’s switch over to EC2 on the M1 Mac.
[music]
Stephen

All right. So this is an exciting one. I think this will… A lot of people, Badri, will agree with you about… These people will be saying what you just said. The availability of M1 Mac on EC2 is going to make a lot of people’s life easier. Oh, I’ve seen some pretty beautiful racks of Mac Minis, but then again, it’s much better that Amazon do it.
Rahul

Correct. Yeah. So I think this is for anyone who is developing applications that also need, you know, clients or, you know, need something specific built for the Mac ecosystem. You wanna put something on the app store, you know, either for your mobile apps or your iPad apps or your desktop apps or even for that matter, if you are doing Safari plugins and so on, you kind of require Xcode to run, to build them, to compile them. And this, I think they’re literally going right after the likes of MacStadium who have provided this service kind of for years historically. And anyone, you know, if you wanted the latest version of Mac with a particular OS on it and wanted to test out all your, you know, builds and setups, you would basically do this.

You would, you know, provision an instance, run all your Xcode stuff on it, build it, deploy it from there. We’ve had Macs in the AWS ecosystem as one of the instance sites for a while. Now that the new ARM-based ones have come, they’ve actually, you know, created a big problem because now you had these ARM-based processes. You want to make sure that all of your stuff just worked as is on this completely different architecture. There was support for, you know, a setup, X86 apps could still work, but they wanted you to recompile all the stuff. They made it really easy for you to recompile stuff for the next code and generate the ARM binaries. But you needed to do it on an M1 machine. And now that this is available, I think dev teams, in general, can do this.

The thing that I usually dislike about, or I had historically disliked about the Apple ecosystem, was it created an even bigger onus on, it has to build on your local machine for you to create a binary and deploy to production, right? And for the first time… I mean, MacStadium did some of it, but it was a pretty manual process at that. For the first time, you can actually build an entire automated CI/CD pipeline that processes all of your builds in the cloud on standardized, clean, or pristine environments and can, you know, process and create your builds and deploy stuff into your app store. So that I think is pretty neat. And these days, everyone is building, you know, iPhone apps and iPad apps for their products. So yeah, there’s a huge demand for it.
Stephen

And then if you’re an Apple developer right now, we’re at this transition point where there’s a lot of Intel, Apple, Silicon in the wild, and people who are Apple developers who may have not switched over yet. And so this will be a great way to, like you said, they don’t have to have a separate manual CI/CD pipeline which involves compiling locally. They can integrate this into their process exactly the same as their X86 Linux-based CI/CD for if they’re building Android or building for Windows or Linux or whatever, this could be [crosstalk 00:32:50].
Rahul

And the thing is that there’s always going be demand for something like this, right? I mean the M2s are already out. The M1 Macs and Ultras, though they’re the same architecture, the SoC is kind of different. The Ultra is literally two of these fused together. And so I don’t know, I haven’t come across any scenarios yet, and it’d be interesting to see if people need to do specific testing for those specific environments as well.

So yeah, it is gonna be pretty interesting to see how people are using these. I think it’s really required for headless CI/CD setups. Like you do not want to build locally. Like, I struggle to set up, anytime I’m doing a new project on Xcode, there’s just so much other junk on my machine that it messes something or the other up. And clean environments is I think the bane of most developers around the world.
Stephen

Yeah. Yeah, no, I completely agree with you. Actually, this brings up an interesting question, right? So M1 Macs are ARM-based… Well, we actually have a user who asked a question about developing on ARM. Should we pop that up on the screen?
Rahul

Yeah, let’s do that.
Stephen

All right. So David says, “What is a good way to develop on ARM aside from the M1 Macs?” Well, actually that works out really well. I don’t know… For those of you who don’t follow me on Twitter, which you should, @SteveJB, I just tweeted out something fun. A demo of our product called DevSpaces, which it does give you that ephemeral environment. And the really good thing about Dev Spaces, it runs on not just X86, but it’ll run on the AWS Graviton, which like the M1 is an ARM-based processor. I wanted to share with you this DevSpaces video I made, and David, I hope this answers your question.
Announcer

Start the timer, and click the DevSpace’s button. Wait a few seconds. Docker is pulling NEO4J image. Copy the URI. Run the Neo-V-1 Python script. Try a cypher query. Done. Check out, try-graviton.devspaces.com today. See @SteveJB for more DevSpaces tips.
Stephen

So as a review, what just happened, and that was real-time. This is a little cool app call, I think, Termdown, which we’ll do an ASCII-based counter. And so what happened in those 60 seconds, and I put that there to show it was real-time, is that it’s cloned my Git repository. RedGitpod.YAML. And within that Gitpod.YAML, it knew I wanted to use Neo4J because I had it set up to pull the latest Neo4J. And Neo4J is multi-architecture. So, the Docker image is multi-architecture, they’ll run on X86 or ARM. So it pulled that down and then DevSpaces, it can map a port to a URL.

So Neo4J starts with that web console you just saw right there, it starts that on port 7474, which you can configure to be exposed publicly. This exposed the Neo4J web console. We then ran that Python script, which loaded up some sample data, and we can visualize it right there. And that was all literally in about, there we go, you see the counter at one minute, one second. And I think it took me three seconds to actually push the DevSpaces button. So that might be 58 seconds of clock time. But in any case, that’s how you develop on ARM if you wanna use the Graviton. So…
Rahul

So, before we jump into the other serious stuff, I have to make a comment about the audio. What audio? And that a really funky audio.
Stephen

Oh, maybe I’ve been watching too many YouTube shorts, but I used Amazon poly. You know, you see TikTok videos, they have that synthesized voice that’s quite popular. And so I tried to replicate that with Amazon poly. So I think that was the Kimberly voice of Amazon poly.
Rahul

You guys should check it all out…
Stephen

I didn’t wanna distract the viewers with my own voice.
Rahul

But yeah, going back to this particular one, yeah, DevSpaces is pretty neat. It gives you all these amazing clean environments to work with. Like, I’ve stopped doing local development, to be honest, on my machine. And I switched over to the iPad because this shows up in a browser and I can work out of pretty much anywhere. And with the last five weeks of travel, it was awesome. But coming back to this, so the ARM flavor that AWS runs is Graviton, right? It’s you’ve got the graviton 2s. Now, the Graviton 3 is available and GA for the C or the compute family, the C7Gs are now available in GA.

So if you wanted to develop something for the C7G, even the Mac for that matter is even though it’s ARM architecture, there are different flavors of the ARM architecture. You don’t exactly know if everything will continue to work exactly the same way once you get it into Graviton. So in an ideal scenario, you would want to develop on a Graviton instance. And what DevSpaces does for you is it gives you those clean environments with Graviton machines. So…
Stephen

Right there, there’s a list of all my environments of different repositories at different states, different branches. I can click on open, and there we go, brand new clean dev environment. And not just like an empty one but literally running Neo4J or any other database you want right there, along the way. So pretty…really enjoying DevSpaces.
Rahul

Yeah. My favorite experience out of all of this is actually doing PR reviews. So when I get a PR review, I get… Because usually, you know, developers are sending me a bunch of them. I literally get a click on a link from the DevSpaces plugin that we have, literally allows me to go from the PR request, click a button. It brings up an entire environment with the PR showing up with all the Devs. I get to review it. And then I run the entire tests right there without it, you know, completely hemorrhaging my machine, or, you know, putting it into a stall state. I get to review the PR and I approve it right there, and then I kill the entire environment. So I never bring anything locally onto my machine. I get to test it and run it.

In fact, earlier, you know, I used to be so frustrated with trying to build it and get it all working on my local machine. It would just take so long, I’d just do a cursory read-only of the code. And I think in my… I actually always felt like I was doing a shoddy job of reviewing those PRs. And now that I can not only review it, review the text and the content, but I can also run a test or make sure all the tests are running, do all of those, you know, or even just run the code, run the app, make sure it’s all working as I expect. And once I’m happy with that, I just kill the entire environment. And so I can run through PRs in a matter of like 10, 15 minutes now as against…
Stephen

So you don’t have that hesitation of, “Oh, I was working on something else and I don’t wanna reset my database.” There’s none of that hesitation there. It’s just a browser tab.
Rahul

Correct. I mean, the amount of Git stashes I used to do before, you had to be working on a particular branch, someone’s PR came in, stash it all, then come back here and pop it out. And then it…my local Gits used to be an absolute mess, not to mention the environments. Like, God help you if someone brought in a dependency, it would just completely derail my workflow, it would derail my entire work and my day for that matter.
Stephen

Oh, and especially if it’s like you said, a project that’s busy and you’ve got multiple PRs from the same project, but on different branches, right?
Rahul

Correct.
Stephen

I remember one particular project where there was one effort to switch from MySQL to Postgres, but there was still active development on the MySQL branch. And so you had to keep switching back and forth, and this makes it a lot easier.
Rahul

Absolutely.
Stephen

All right. Well, let’s take a quick….we’ll do a quick break and we’ll switch gears. And when we come back, we will talk about Security Hub. All right. See you in a bit.
Announcer

Is your AWS bill going up? CloudFix makes AWS cost savings easy and helps you with your cloud hygiene. Think of CloudFix as the Norton utilities for AWS. AWS recommends hundreds of new fixes each year to help you run more efficiently and save money, but it’s difficult to track all these recommendations and tedious to implement them. Stay on top of AWS recommended savings opportunities and continuously save 25% off your AWS bill. Visit cloudfix.com for a free savings assessment.
Stephen

Okay. AWS Security Hub, 36 new security best practice controls. So, Security Hub, what is security hub?
Rahul

Okay. So, Badri, you wanna take a first pass at this announcement?
Badri

I’ll just say this, I’m like, I don’t know how interface expects anybody to keep track of all of this. For some reason, I think we know the reasons they do want to build these primitives, but they expect folks to be able to keep track of all of this and implement them, they won’t turn on the button. I think that’s really what you guys are doing with automating this is invaluable because at this point it stops just short of being useful. Right? You see all of this? Okay. What am I supposed to do with this? I log in today. Okay, great. I have, I can recognize all these services and I think I’m using them, but do I have a problem or not? What…
Rahul

Yeah. I think they do a pretty good job of, you know, looking at the patterns and giving you all the best practices. Like if you go to the foundational security best practices standards… Why don’t we pull that up, Stephen? I think they go to the level of literally giving you, you know, step-by-step guidance. If you just go down to that, just go down one level. Go to the… Yeah, that one. The best practices controls. So if you’ll just scroll down a bit, they will give you step by step…
Stephen

Let’s see EC2. That’s gonna be busy.
Rahul

Okay, great. So they tell you what the problem is, they give you all the recommendations, they tell you what you should do or shouldn’t do…and if you scrolled on just a little bit, they’ll give you the remediation method. Literally, point by point, they will say, do this, do this, do this, do this. And…
Stephen

So then are you expected to go and refresh this page at once a day and read through this and see what to do?
Rahul

I really wish that there was one simple automation or a script that they just turned on for you. Like, this is a security best practice. I wonder if they ever have a conversation where they go to a customer and say, “Yeah, we know you probably don’t want the security best practice, but we’ve spent all this hard work doing.” Like who here is… It is like when Bezos talks about his in variance, like, customers always want more, you know, faster delivery and lower cost. And he’s like, I have never met a customer that I go to and says… Who comes in and says, “I want my stuff to be more expensive.” Right? This is one of those things. Like, who goes to AWS and says, “No, I want, you know, this security to be loose.” Or, you know, “I want all these security vulnerabilities to be there in my system.” Like why aren’t these just turned on by default?

I thought one of the things that AWS did a few years ago, I think actually quite a few years ago now, they made IAM policies or IAM, yeah, IAM policies you know, forbid everything by default. And then you had to kind of build up from there because can’t imagine anyone wanting to start with, “Hey, let me open up everything to the world. And then let me start figuring out what I need to, you know, shut off.” So this is similar, like, why isn’t that a standard across the board? That’s something that I don’t understand. And why isn’t this just getting implemented as is?
Stephen

Well, just looking at this one, these EC2 instances should not have public IP. I mean, I’ve violated this one a thousand times. And I thought, okay, at the time, I just wanna spin something up, test something out. I don’t care if it’s public, I’m going to SSH into it, play with it for two hours then and get of it. So I can see the idea of if it’s completely locked down by default, then there’s a huge learning curve, I suppose, of learning all these best practices before you can even get to your instance. But I can see the other hand where you could say, turn on auto-remediation mode, and then let me figure out how to work in this secure by default system.
Rahul

No, but things like this, like there are already remediation. So you have SSM for example, which will allow you to log into your instance, even if it is inside a VPC, it’ll allow you to play around with it and do whatever you need to do without violating any of those controls. So the need for something like this to exist, I don’t really get, because anything that’s inside of VPC should be controlled by the policies and the rules of the VPC. And if you need to, as an individual, access a particular instance, for whatever reason, the last thing that you should do is attach a public IP address to it. It just uses SSM. So you should use SSM. You know, just log in using SSM and…
Stephen

That’s a good point.
Rahul

…do whatever you need to do. So I went through… I spent some time going through a bunch of these and I was like, these should be turned on by default. Like there’s literally no excuse to keep these on.
Stephen

So this was… If it said to automatically enable new controls, these controls are enabled for you by default, but this doesn’t remediate by default.
Rahul

Correct. So these controls are basically just, hey, just red flagging this thing for you, you know, which is interesting and neat, but someone going in and fixing this, like looking…an average account these days has 500 resources, right? And across these services, you’re gonna have about 500 resources. Who’s gonna go down, and there are what, seven or eight steps per remediation, who’s gotta do 500 into eight remediations, you know, one by one by one? It’s, no one’s gonna care about it.

That’s where AWS falls short of, you know, helping customers actually fix these problems. They’re highlighting them and saying, we’ve done our job, now it’s your problem. And I think that’s where… You know, that’s kind of the whole reason why CloudFix came to be, and we just gotta take all of these and put them in CloudFix. It’s, we hope customers will benefit from having all of these standard hygiene rules built and auto remediated for them.
Stephen

Okay. Oh, okay. So, that makes it… Okay, so this security best practice will get you so far, but then being able to… With CloudFix say, okay, now actually do it.
Rahul

Yeah.
Stephen

And say, go in there and make a change, fix it.
Rahul

Yeah. Security app just says, “Hey, here’s a dashboard that tells you all the things that is wrong with your account from a security standpoint. And then you basically have the link that’ll take you to this article. Then you go to the article and then you go one by one, follow the steps. It’s impractical, in my view.
Stephen

So my first job as a teenager was network security for an email hosting startup. I remember at one point, I could configure firewalls using IP chains that was Linux Kernel 2.2. And it’s really hard. It’s only, you know, exploded by several orders of magnitude since then in terms of security. So, it is a constant battle. So, I’m glad that we have this, and I’m glad that we’ve got CloudFix to take us that next step of do this, keep an eye on this because it’s something you always have to watch. You never say I’m done with security.
Rahul

Yeah. In fact, that brings up an interesting idea. I think we are gonna start an open… I’m thinking of, we should probably start an open-source project and invite other contributors. We will go ahead and create a whole series of change manager templates. Let’s do that. Let’s create change manager templates for all of these and get AWS to certify them.
Stephen

All right. That sounds really good.
Rahul

Right. I mean, these are AWS recommendations, so it’s like all of the other ones, let’s get AWS to certify these. And I’m pretty confident the config manager, the change manager team has gotta be pretty excited about it.
Stephen

Oh, absolutely. I mean, if you could say, okay here… Because they’ve already signed off on the action, so it’s just, here’s a way to implement it. All right. That sounds really cool. All right. Well, audience stay tuned and we will keep you posted on that. It will be definitely contents for a future episode. All right. Well, I think that’s a great transition. Should we talk about IAM?
Rahul

Yeah, let’s talk about IAM.
Stephen

All right, let me roll the transition. All right. AWS Identity and Access Management or IAM introduces IAM Roles Anywhere for workloads outside of AWS. So this is an interesting idea. So IAM manages things usually within AWS. So it’s saying, can my database talk to my EC2 instance, that kind of thing, but now we’re talking about IAM outside of AWS. So, what’s the purpose of this? Where would you use this?
Rahul

Huh, where do I start? So, IAM for me is I think one of the most painful of the AWS services because it adds too much friction in the way it is designed and it operates today to getting stuff done in general. I struggle [crosstalk 00:53:51] this one for two reasons… For this announcement for two reasons, number one is this doesn’t take away the problem of… For me, IAM authorization, of course, is one aspect of it. But…sorry, authentication is one aspect of it, authorization is the other piece. And this doesn’t solve the problem of centralizing authorization in IAM. When you have applications, this basically just gives you the certificate and that digital certificate will be shared with the application. Your access controller authorization is still gonna be managed within the applications as I understand it. I don’t know how you would manage that within IAM itself. I…
Stephen

Well, just to summarize, it could be a computer external to AWS that wants to access EC2 or S3 and they could get a temporary credential, right?
Rahul

Yeah. Let’s say something like Salesforce. I mean, you are using Salesforce that is outside AWS. And lets say Salesforce allow you to use digital certificate, the X 509 certificate. Then you could actually use your users certificate in IAM or get temporary credentials, which is basically this X 509 certificate and use that to sign into Salesforce. So that can be done. But then the access control, what data will you have access to, that is managed and maintained within that system and not within IAM roles and policies. So the more I think about it, and Badri and I were just having a little bit of a chat before this, if you were to take IAM and extend it into authorization as well, what would you do?

That’s a really hard problem because it’s the applications that define your roles. It’s the applications that define resources. And if you were to do something like RBAC, you need to be able to somehow pass those roles and resources back to IAM and have IAM recognize them so that you can model them there and have some way of IAM then pass that back to the application or have the application call into IAM to actually do authorization resolution or RBAC resolution. So I don’t know, it seems to get very complicated and messy, and I don’t know what prompted AWS or what use cases prompted AWS to create this. Badri, any thoughts?
Badri

Yeah. I’m with you. I mean, the one thing it does is I guess it replaces that secrets manager function call, right? I mean, we do have applications where all these services have credentials all over the place. And using IAM kind of solved that, but you could equally have solved that by just putting those credentials in secrets managers. But anything beyond that, so authentication, like you said. But authorization, the only thing… I mean, we were spitballing this earlier, right? You could try and create two different IM policies for different levels of access, but there is nothing about IAM that will check it. I mean, it’s, unless you created two different endpoints and had different policies for the two endpoints, nothing really prevents one of your services from assuming excessive permissions.
Rahul

Yeah. I think the one thing that certainly is interesting is the fact that this has a valid use case for temporary credentials. Like if you make a request, you get your temporary credentials, and then you use it over there. I guess the secrets manager stuff would be a little bit of a workaround in that you have some mechanism that constantly resets the password at whatever regular intervals or whatever. And this kind of flips it and says, you get a temporary credential, it has an expiration in the certificate. So as soon as, you know, that particular time elapse, you have authorization for five minutes to do whatever you need to do. And I guess that use case makes sense the more I think about it. But that difference is very, very subtle. For a lot of external systems, like how many systems would you need temporary access for a certain period of time?
Badri

Yeah, right. And maybe you are worried that use case that way, sort of…
Rahul

And for a lot of the third party applications also, the more of the two-factor routes that starts taking, you know, playing a factor in your authentication part of it, like what would an app… If I were a third party app, not in the AWS ecosystem, there’s a big mandate to make sure that you have two-factorial authentication working over here. How does that work with IAM?
Badri

My reading of this was, this was entirely for, like, back-end type of access. This is for services, not humans. So, I mean, if we were not doing this, then we’d do… How is this problem solved today? It’s solved by either, like, a private key? Or if you are going to use the public endpoint, you would hopefully, use secrets manager though I’ve seen more than enough cases of people just putting username or password and plain text. So at least it works.
Rahul

I mean, no one can help you if you’re doing this in plain text. No amount of other, you know, checks and balance are gonna help you. But assuming that you were using something like secrets manager, right. I mean, this adds very limited value on top of it.
Badri

Right. [Inaudible 01:00:04].
Stephen

Well, I think [crosstalk 01:00:07]. Yeah, I think we’re gonna have to see how this, the usage of this evolves over time. So I think this is one of those where they throw it out there, like you said, it doesn’t necessarily make any other service. This isn’t an obvious replacement for anything, but we’re gonna have to see how does this evolve over time and what usages evolve? Well, it looks like we’ve run out of time, which always happens because we have great subject matter to talk about and there’s just a limited amount. It’s unfortunate that we can’t just do this… Well maybe we could, we could do this two hours a day and we probably still couldn’t keep up.
Rahul

No, two hours a day is not even remotely close to being able to keep up with all of these conversations, but this is awesome. So we have three takeaways that we are gonna follow up on. The first one is we are gonna run an experiment on trying to figure out spot instance availability for the P-class instances and see if they actually have any capacity and where that might be. That would make a good conversation. The second one…remind me what the second one was. The third one, of course, was we were going to start doing some SSM templates, so SSM documents that try and take a lot of these security hub policies and hopefully kick start an open source project for them.
Stephen

And I think we also wanted to do a behind-the-scenes look at how we actually put this show together. We’re gonna use… What was it? The segment detector to make this into highlights. So we’ll say number two is a behind-the-scenes of AWS Made Easy. All right. Well, thank you again, audience. Thank you, Raul. And thank you. Badri. It’s great to have you back. And it’s great to celebrate 10 episodes. And we’ll see you next week. All right. Well, thanks, everyone. And we’ll see you next time.
Rahul

Thanks, everyone.
[music]
Announcer

Has your AWS public cloud bill growing? While most solutions focus on visibility, CloudFix saves you 10% to 20% on your AWS bill by finding and implementing AWS-recommended fixes that are 100% safe. There’s zero downtime and zero degradation in performance. We’ve helped organizations save millions of dollars across tens of thousands of AWS instances. Interested in seeing how much money you can save? Visit cloudfix.com to schedule a free assessment and see your yearly…