AWS Made Easy

Ask Us Anything: Episode 5 – C7g, Step Functions, and more

Analytics, Application integration, Compute, Migration

Episode 5
June 1, 2022
1 h 14 min

Today’s show followed the same “What’s New Review” format as Episode 3. We go over several important announcements, especially regarding the Graviton3 release to General Availability.

Latest podcast & videos

September 27, 2022November 3, 2022

1 h 07 min In this episode, Rahul and Stephen continue the theme of Behind the Scenes by showing some of the automation which makes AWS Made Easy possible.

September 20, 2022September 28, 2022

1 h 07 min In this episode, Rahul and Stephen recap the "Behind the Scenes" episode 1, and then discuss a few new AWS announcements, and plan for Behind the ...

September 13, 2022September 20, 2022

1 h 10 min In this episode, Rahul and Stephen begin part 1 of a 3-part series in showing #AWS-powered automation, developed with DevSpaces and DevFlows, to show how they ...

August 30, 2022September 19, 2022

1 h 03 min In this “What’s New Review” post, Rahul and Stephen go over a variety of announcements from AWS. Most of the articles rated very well, with the ...

August 17, 2022September 19, 2022

1 h 11 min In this episode, Rahul and Stephen film from Anaheim, where they were attending an AWS Partner Summit. They filmed from a makeshift studio in a hotel ...

View all »

Summary

New Instance Types – C7g, M6a, C6a, oh my!

We spent the majority of the show discussing the new C7g instance, which was released to General Availability last week. This is based on the new Graviton 3 processor, which we discuss at length.

We (AWS Made Easy) did a study on the Graviton3 performance, compared with the Graviton2 and a comparable C6i instance, which runs Intel Xeon Scalable CPUs. The lessons learned from the study are that, compared to the Graviton2, the Graviton3 offers a 20-35% performance increase for only a 6% cost increase. See the study for more details and benchmarks.

The study is available here: https://awsmadeeasy.com/blog/aws-graviton3-adoption-in-esw-capital/

In the discussion, we referenced a talk about the adoption of the Graviton processor. This is the AWS re:Invent 2021 Keynote with Peter DeSantis, 39:31. The keynote, or at least this segment, is well worth a watch.

We spent a bit of time talking about how ARM-based processors, such as the Graviton3 and Apple’s famed M1-series of CPUs, have far lower Thermal Design Power (TDP). In addition to taking far less power, ARM-based CPUs are where the innovation is occurring. For more details, see Apple Announces The Apple Silicon M1: Ditching x86 – What to Expect, Based on A14, from AnandTech. There is a famous visual in that article, shown below.

We see Apple’s ARM-based CPUs growing in performance, whereas Intel’s CPUs are stagnating. The Graviton3 follows a similar trajectory.

We then shift gears into talking about the AMD EPYC-Milan processor, which is powering the M6a and C6a processors. We couldn’t do a better job at breaking down the specs than the famous Linus Sebastian, so we played a clip instead. They key takeaway is shown in the following screengrab:

From the video The Fastest CPU on the Planet – Linus Tech Tips

But remember, when considering operating this in a data center, have a look at that giant heatsink. This is because these CPUs, like their Intel counterparts, are power hungry.

From the video The Fastest CPU on the Planet – Linus Tech Tips

The concluding advice from Rahul: Use the Graviton3 unless you have a compelling reason not to. Their price/performance is superior to Intel and EPYC in most general server workloads.

Announcing new workflow observability features for AWS Step Functions

We then discuss the “New workflow observability features for AWS Step Functions” announcement. There is a detailed accompanying blogpost. Note that step functions are essentially ways of orchestrating collections of lambda functions, which can become very similar to a collection of microservices. These are notoriously difficult debug and reason about.

I showed a fun video with an example of this (language warning).

New for AWS DataSync – Move Data Between AWS and Other Public Locations

Next, we discussed the announcement entitled New for AWS DataSync – Move Data Between AWS and Other Public Locations. This announcement makes it easier to synchronize data sources, such as Google Cloud Storage or Microsoft Azure FIles, with AWS data sources such as S3 or EFS.

Our conclusion: On the one hand, this could be a useful tool, on the other hand it:

Encourages a multi-cloud approach, which is difficult and usually unnecessary.
Can get extremely costly, as every cloud vendor charges high egress fees.

Once Frenemies, AWS and ElasticSearch are now Besties

For our final segment, we briefly review an article by VentureBeat entitled “Once Frenemies, AWS and ElasticSearch are now Besties.” This article describes the relationship between Elastic Co. and AWS. Last year, AWS forked ElasticSearch into OpenSearch, due to a change in license initiated by Elastic. It seemed at the time to be a sour relationship, and there was plenty of gossip about it.

However, things seemed to have turned, and the Elastic / AWS Partnership seems strong. How did things evolve this way? Is this a good outcome for all?

Our conclusion: The power of the AWS Marketplace is strong, and it was smart of Elastic to make things work. Marketplace allows you to use enterprise software such as ElasticSearch without a long and convoluted procurement process, and instead paying by usage. Elastic has a strong product offering on top of ElasticSearch, namely the Elastic Logstash Kibana (ELK) stack. The visualizations produced by Kibana are, at present, superior to those produced by Amazon.

Transcript

[music]
[00:10:17]
Stephen

All right. Hello, everyone, and welcome to episode 5 of AWS Made Easy, Ask us Anything Livestream. This is episode number 5. Thank you so much for joining us today. And before we start, I’d like to thank our ASL interpreters, Jacob, and Lindsey. And thank you so much. It’s really nice to have you with us. And thank you for making this Livestream more accessible. The idea of the show, it’s an Ask Us Anything, it’s all in the name, Rahul and I we’re gonna have a chat about a few new things in AWS over the last couple of weeks. And please post a comment in the LinkedIn or the Twitch stream, and we’ll try and get to it. All right. Rahul, how are you doing? How’s your weekend?
Rahul

Good, I think a lot of excitement around the new instance type. That makes it a little holiday that we were planning over the weekend. I was juggling enjoying the beach, for us out here in India, on the east coast, and trying to do some quick experiments to see the GA part of the C7g and I think we’re going to be talking a lot more about that. But overall, I enjoyed some of the air conditioning, because it is insanely hot.
Stephen

Okay. Fantastic. Well, we had the wettest May in record in Seattle, the wettest May in recorded history. And we’re just closing that out this month, we had a chance to go over to Discovery Park, I posted a picture on Twitter, we went to the local Veterans Cemetery in Discovery Park, and I spent a few minutes there. And let’s see ordered a bunk bed off of Amazon that came in about 300 metal pieces, it looked like a giant Erector Set, that’s supposed to, at some point, hold the kids up. But luckily, my dad loves those kinds of things. He loves those “follow the instructions” things he’s volunteered to put it together later. So that freed me from that.
Rahul

That sounds interesting. We are going to be in Seattle, as you know, next week with family. And I really hope the weather can change because keeping kids indoors for long hours is going to be incredibly hard. So looking forward to Seattle weather changing and being here next week.
Stephen

You know, it’s looking pretty good this weekend. But we’ve got a selection of board games just in case.
Rahul

Fingers crossed. Awesome
Stephen

All right. So should we get into the main announcements?
Rahul

Absolutely. Let’s dive right in.
Stephen

Well, the main one is the C7g. And that’s just been released to general availability. I’ll put the link in the chat right now. Put that here. And so the C7g. That is, you see, we’re very excited about that one I’ve been holding on to this for a while I’ll repost the STL on Twitter. So you guys can put your own. These are Prusa, gray, and orange, but it happens to match the AWS colors. So this is the latest in the graviton series of instances. Let’s put up the article on the screen. See, here we go. Here it is. That’s the link I just posted. The C7g. The C is for compute-intensive. Seven is seventh generation. And g is Graviton. So the ARM-based processor. So what’s special about this one? What’s special about the gravitons? It’s different from our regular instances that you normally reach for.
Rahul

Yeah. So these are basically the custom-built in-house ARM-based processors that Amazon has designed. And they believe that the graviton series of processes is going to take up very soon over 25% of all of their data centers. So Amazon is making a big bet on this new architecture and these new types of processors, they are truly providing or proving to be incredibly powerful and energy-efficient at the same time, and are really the Next Generation of compute that you’re likely to see. Between all the big players, right now, there’s such a push towards getting these ARM-based processors out, because just as sure improvements you’re able to see in these compute processors. And invariably, everyone is getting onto system on chip kind of design, which, ARM, you know, allows you to build. It’s remarkable what you can do with these chips. And I really see these as a future for computing over the next few years.
Stephen

Well, let’s have a look at it. It’s a fascinating thing just to look at. So we can see this whole, this chip-based design. So rather than one, everything on one dive, we’ve got these little components that you can see surrounding the cores. It’s in terms of spec, we’ve got 64 cores, it’s based on ARM 8.4, or what they call neo-version one, they’re in the special instruction set is pretty powerful. And like you said, they’ve got three of these on a board. And then those three are orchestrated by another ARM chip called the what is… The Nitro TPM. That’s what’s controlling all the virtualization. So when you start up a smaller instance, it’ll slice off a piece of that CPU and hand it to you. It’s really, really pretty impressive. So what makes it better? Why would someone use this over… Well, I guess over the Graviton2 they’re claiming 25% higher performance, 50% higher memory access DDR5, but then grant ARM versus x86, what’s the main selling point there?
Rahul

Yeah. So if you really look at it from a server compute standpoint, and this story has changed so dramatically, you never associated ARM-based processors for you know, seller compute. I mean, till very recently, the only ARM processors you think of was in your cell phone, or mobile phone, or when you bought a Raspberry Pi, I have an entire cluster of Raspberry Pi’s, you know, the averages for experiments and my home automation. But that was the extent to which you would think about using ARM just a few years ago, but the game has to change with these new architectures. Now, when you compare these ARM design processors with x86, here are basically what the trade-offs are for a server use case, right? Because you have a data center, every data center is looking to pack as much compute as possible onto every unit of a server, right?

Some of the racks I mentioned are measured in use, you have one you to use, and so on, you want to pack in as much density as you possibly can into a single rack. And that’s basically the math that it boils down to. When you look at the numerous N1 or v1 architecture, it’s truly a game-changer, because A, it allows for three sockets as against two. Now what I mean by socket is the number of places or how many CPUs you can actually put onto a single motherboard. And the fact that this architecture allows for three sockets, that means you can put in three CPUs on the motherboard.
Stephen

So there’ll be three of these on one board?
Rahul

Right. So you have these on one single board. And the fact that each one of those CPUs has 64 cores, that’s a pretty large number. But interestingly, the architecture itself allows for up to 96 cores, if I’m not mistaken. So you could go all the way up to 96 cores. And you could push the clock speed for each one of these cores, up to 3.1 gigahertz, right? From everything that I’ve read so far, it looks like Amazon is not really pushing the envelope on these processors from all of our bench front data appear to be still operating at about 2.5 gigahertz. So they really have so much more room for performance improvements.

I can’t wait to see what they’ll do with the next generation of gravitons. Because they can go from 64 cores to 92 or 96 cores, I think, on a CPU basis, you already have three sockets per motherboard. And on top of that, they can go all the way from 2.5 gigahertz per core, all the way up to 3.1. So this is so much headroom for performance improvement. They literally just started cranking the wheels on this. In contrast, when you look at the likes of AMD, and Intel, the approach that they’re taking for higher performance is to increase the clock speeds, okay? Most of these processors are doing clock speeds of 5 gigahertz and more. That’s what they’re shooting for.

In fact, just last week, I was really excited because AMD had a whole bunch of new announcements, where they were pushing their clock speeds per core up to 5.5 gigahertz on a sustained basis. And that is pretty impressive in itself. Again, 64 cores per CPU with the next-gen processors. But the other trade-off that servers need to make is the TDP. TDP is basically how much heat is produced, or what is the thermal designed, you know, provision…that’s basically how much heat it gets produced by your processors. And when you look at the AMD processors, they are pushing nearly 300 Watts, which basically means you need very, very custom-made cooling mechanisms for these CPUs that AMD makes.

My guess is that, and again, these numbers haven’t been published for graviton, but just looking at the general trend of ARM-based processors, I’m guessing that the graviton processors produce less than half the TDP, they are less than half the TDP of these AMD processors that are currently ruling the roost. I wanna mention Dell because AMD is kind of, you know, really kicking Intel’s butt at this point of time. But that’s basically the difference between those two approaches. When it comes to servers, you want the most efficient cooling for all the CPUs, so that you can push them as hard as possible. And you want to pack in as many cores as you possibly can on the single rack, or the single motherboard. And ARM does so much better than any of the x86 architectures, that it just makes sense to pack in as many of those as possible in your data center.
Stephen

Yeah. That makes a lot of sense. It’s amazing thinking about the different approaches to getting more performance. On the one hand, you can just throw more power on it, and it runs hotter, and it goes faster. And that’s one way. With another way is to change the design of what you’re doing or change, in this case, the architecture. And the ARM architecture has proven to be much more efficient in terms of power. I posted that TDP is thermal design power. As another example, there’s a famous one from an Antec, this is Intel versus Apple, and Apple’s chip is also, the Apple M1 is also an ARM-based chip. But what they’re showing is, like you said the stagnation along the Intel line. And there’s innovation on the arm side of things. It’s really impressive. I know you and I both run on our personal computers, Apple M1 processors, it was incredible. They don’t get hot. Thinking about my old older, you know Core i7 ThinkPads. And you’re working hard your hands are getting sweaty from these things. And I’d imagine it’s the same in a data center or Intel versus ARM-based things just by design, they don’t take as much power. They’re a much simpler instruction set.
Rahul

Great. And I think Apple kind of blew everyone away with the performance improvements that they got out of these custom chips that they built.

And like I said, the TDPs on those are so low that I’ve actually never felt my Mac Mini that runs on M1 has literally been running tens of browser windows 24/7, I have never seen it get even warm. And my old MacBook Pro. And in fact, even a Mac Pro if you remember the dustbin design or the trashcan design.

That one which basically was running 16 core, sorry, that was running a 32 core processor. That’s the one with 64 gigs of RAM. That one gets so insanely hot in comparison. And this tiny little M1 does so much better on performance and thermal efficiency. It’s just absolutely mind-boggling. In fact, most of these cheese graters, Mac Pros, even those have been obsoleted. Like I wonder even though Apple continues to sell them. I don’t know why someone would go buy one of these. The new MacBook Pro 16 inches or the other trollers[SP] are actually, they would completely destroy these cheese grater Mac Pros.

That’s the fundamental difference between the x86 architecture and the new ARM-based ones that these guys are building.
Stephen

This is the old PowerPC cheese grater, I do think this was the nicest case I’ve ever seen in all the machined aluminum, when you open it up, it’s all perfect. But in terms of what really matters in terms of architecture, you said the trash can, that’s an anomaly in the design, let’s call it that.
Rahul

True. But if you actually open it up, and I don’t know if you’ve had a chance to ever open up one of these Macs, the SOC that has everything built-in, is so tiny compared to the old motherboards with the CPUs and all the other peripherals added to it. It is remarkable how efficient they made it all. Yes, you had the constraint that you can upgrade it and do all the other stuff. But if you were purely looking at performance and efficiency, it is truly remarkable what they’ve done with these new ARM architectures.
Stephen

Yeah. It’s really impressive. One second, we’re gonna switch interpreters here. It’s, again, thank you both for being here. It’s incredible to have one stream coming in, one stream going out. So yeah, we’ve given a little break and switch. So thank you both for being here. We’re bringing in Lindsey. Yeah, it’s incredible with the Graviton3, and yeah, again, going on this heat-based thing, I can never think of having a Core i9 in my pocket. So then, because zooming out, how do you choose an instance type? It used to be so simple, it was big, medium, and small. And then now there’s, I was looking at U.S. West, too. And there’s 503. What do you think about when you’re choosing an instance type?
Rahul

So, basically, there are four parameters that determine what kind of instance size you pick. So you have your CPU, you have your memory, you have your network iO, and your disk type, right? Those are your four parameters. And you have to determine your instance type based on the workload that you’re running. Now, let’s say you had some kind of compute-intensive workload where you just needed a ton of CPU, you would pick the C types that you have on AWS, the C types are your compute-intensive machines, where the ratio of compute to memory is significantly higher, then you have other workloads where, let’s say you are trying to run a database, right? When you’re trying to run a database, what you want is good iOS that you can load up all your database, you know, tables very quickly into memory.

So you want to read very quickly. And you want as much memory as possible to keep everything in memory. Because once you can do…if you can do queries in memory, that is the most efficient way to run a database. So it’s not really that compute-intensive, but it all hinges on being able to load up as much of your tables in memory as possible. And that’s where some of the R types and the N types of instances come in. The R types optimize on both iO and memory, rather than on CPU or network. You have other instance types, which optimize on your EBS volume attached to it.

So there are instance types like the Is, which you would use for file servers and things like just do tons and tons of writes, you know, onto disk. Like if you had something like an Elastic Search Cluster, they’re just pulling in a whole lot of data from different streams and just dumping it all into disk indexing it right away, you would possibly use something like the Is, as your instance types for those, those are EBS optimized. So it all depends on what you need out of your workload. There are some scenarios where network iO makes a very big difference. And in fact, the larger your instance, the more network iO you get up to 12 or 14, I think, yeah up to 12 Gbps connectivity on the network. But if you have smaller instance types, you probably don’t get that much.

Then there is another category, which is the burstable instances which are the Ts. So you’ll see the T3 instance types, yeah T4 and T5. T4g is available now as a Graviton1s, but you have the T3s and T4s. These basically are instances where you know they aren’t going to use these processors on a sustained basis. But there might be times when you need to burst your CPU load for a short period of time, and that’s when you would use these T types. So you have a wide variety, you really need to understand what the parameters of your workload are going to be. And based on that you pick the instance type.
Stephen

I guess making the analogy to something we both love to Lego, we think back in the earlier days, there was just a couple of bricks, right? There was the, you know, the rectangles, the squares, the flat ones. But now we’ve got a lot more, a lot more shapes to play with. So we can build things that are much more customed to exactly the workload. And whether that’s a database, or whether that’s a streaming platform, or quick data ingest, or spot bursting, we can pick exactly what we want.
Rahul

Yeah, and I think back to AWS’s core philosophy, you need to pick the right resource for the right kind of workload, whether it be databases, pick the right database for the kind of workload you have, pick the right kind of compute, or instance type for the, you know, compute requirements or memory requirements or iO requirements, they are providing a massive catalog to users to go pick whatever they want. Now, one thing I would say if you’re starting afresh, pick the latest generation, because the latest generation is always going to be cheaper and faster. The reason why you have this big explosion of the old, you know, you have 500-odd instance types, I would cut that down by you know, about at least 80% and only focus on the new instance type because they are faster and better. The old ones exist because AWS changed their strategy for how they provided cost benefits back to the customer.

So, in the early days, you know, when we first started, there was only one instance type or, you know, there were one of each type, there was one standard. And what they would do is any price reductions that they got, they will just straight away pass it on to the consumer. That means every year, even if you did nothing, you just stay on the same instance type, you will suddenly see your bill go up 10% because AWS will literally pass on those cost benefits to customers. But two or three years in, they kind of stopped doing that, they started creating new instance families and saying, “you want to stay on the old standards, they are just gonna get more expensive. You want the next generation, the faster, better ones, you have to make the effort of moving on to these new instance types.”

And that did, you know, require work on part of the customers to upgrade and migrate because in some cases, they were disruptive in certain senses. But largely not too hard. In the early days, you also had different types of hypervisors you had to care about, a lot of that problem has kind of disappeared. But back in the day, you had to decide, you know whether you had paravirtual instances or hypervisors. And then you had to figure out whether they were compatible and do a lot of restrictions on each. So you have to build your EMIs to be compatible with each. They were very complicated to kind of move. But these days, they’re all pretty much on the standard platform, I would say pick the latest family and latest generation of instances. So right now you’re going with the T4, C7, M6s, pick the latest generation and move to those always. They’re always cheaper and better.
Stephen

All right, that’s a really good piece of advice. So pick the latest generations probably look for ones with the G on the end. And yeah, and then the other ones are there. So in a sense we may be because they’re in the middle of the strategy change, probably in the future, we may see maybe even less instance types as this transition happens.
Rahul

Yeah, but actually, you’d be surprised how many customers are still running the very, very old M1 instances, [crosstalk 00:33:42.890] there are still customers running it and you know, just continues to support them.
Stephen

Because they don’t force things until it’s really, really unnecessary.
Rahul

Yeah. I mean, it’s both the good and the bad. Again, back to AWS Made Easy. Our recommendation is to get on the latest and greatest because it’s just cheaper and it is faster. Like there is…it’s a no-brainer trade-off on that front.
Stephen

Well, that’s actually a perfect segue. So we did a study on this cost performance, this cost-performance of Graviton3 versus Graviton2. So the idea of cost performance we’re thinking about not just is it cheaper or is it faster, but some combination of the two where sometimes we’re willing to pay the same amount and have it go faster. And sometimes we’re happy to keep the performance steady and pay less. So when we’re talking about that cost-performance ratio, it could be either one of those cases are somewhere in the middle.

Looking at this study that we did, and this is on our AWS Made Easy page we posted about it yesterday. So they did compare similar workloads on three different instance types. So this is the yellow here is the C6g. So that’s the older graviton, the Graviton2. The blue is the Graviton3 and then the red is a similar instance but with Intel Xeon. And the vertical axis here is seconds to complete the same amount of tasks. So I guess what a really cool finding is, if you just simply switch from Graviton2 to Graviton3 with the finding here 25% to 30% better, with no transition effort, no porting at the same architecture, it’s literally just flipping that switch, it was really neat. Pretty neat…
Rahul

I will just add a caveat to that. So if you are typically running an application that is based on a virtual machine, so languages that run on Java, Ruby on Rails kind of architectures, or Python, or, you know, Node.js, one that run on some kind of a virtual machine or interpretation, you should generally not have a problem switching from x86 to ARM. The only places where we could run into some complications are scenarios where you have some libraries that have a native core, you know, there are some Java libraries that are written in JNI. Basically, C++ implementations under a Java wrapper, scenarios like that will require you to recompile the native core part of it in, you know, so that it’s compatible with ARM processors. For the most part, you usually don’t encounter those kinds of scenarios anymore. So you should be able to relatively easily take your x86-based application and ported over to ARM pretty easily.
Stephen

Yeah. In fact, there was one of the talks at Re:Invent, I forgot the name, I’ll post it in the show notes. One of the talks in Re:Invent, they did a study on the Graviton transitioning to ARM and they found it took many dev teams about a week on average, like you said, occasionally, there’s these underlying dependencies, where usually it’s a matter of tracking down the ARM-based dependency, maybe switching one of your compiler flags. So yeah, on something like Python or Ruby, where it’s interpreted not a big deal, but at something lower level, you have to dig deep. But usually, these things have been solved and documented. I mean, the savings are so good that people, the whole community has an incentive to chase this.
Rahul

Absolutely. I mean, the price-performance improvement is, you know, to the order of 40% or greater, depending on somewhere between 20% and 40%. But the other key thing, I think, for our audience to understand is that when we deal with AWS, we’re dealing in terms of vCPUs, right? And that’s the fundamental unit that you purchase. And one would think that these are apples-to-apples comparisons, but they really are not. The way vCPU is computed on ARM versus the way it’s computed on the x86 machines is fundamentally different. When you look at x86 architecture, each core is a hypothetic core, that means you can run two threads on each core.

So when you buy a vCPU, or when you resell a vCPU, on x86 architecture, what you’re effectively getting is half a core, or you’re getting one thread, if you get a two vCPU instance, you effectively get one core at that point. On ARM, given that ARM is single-threaded on a per-core basis, what you get is an entire core. So even if you had the state of the art, you know, processor, an x86 that’s running at near 5 gigahertz in effect, what you’re getting is only a single thread or two threads that are running on that core on x86. So it is whatever you get on that one thread. You also have the overheads of managing the threads in the pipelines there. Though Intel and AMD have gotten pretty efficient with managing those pipelines. On the ARM side of things, you will literally get the dedicated core that processes your single thread and that is your vCPU. So there is a fundamental difference in the units even though we’re talking about the CPUs, it is not an apples-to-apples comparison when you actually look at what you’re getting out of the processor.
Stephen

That makes a lot of sense, right? There’s these as the processes are getting more and more and more specialized. It’s hard to imagine that this one number can distill everything that you’d need to know. So that’s why we did this study. Again, the link will be in the show notes and it’s on our AWS Made Easy LinkedIn page. The idea is, here we go, I guess the main takeaway, it said, “Graviton3 improved performance of single-threaded applications over Graviton2, but shines best in these parallel applications, right? Because in parallel, you get to take full advantage of all of the cores.

And then also, I guess, here’s a, I guess, a neat kind of summary statistic, from two to three instantaneous speed improvement while costing only about…to speed improvement is on the order of 20% to 35%, whereas the cost improvement is about 6% increase. So that’s pretty exciting. And this, other people have done some testing and ran into similar orders of magnitude speed apps. And I guess one other thing I want to say about the Graviton is that they focused on real workloads. So these aren’t, what’s the right word, artificial specs based on contrived scenarios that you wouldn’t ever actually encounter. That these are real, you know, these are databases and analytics and ETL pipelines, you know, the bread and butter stuff that we’re all doing every day. So it’s pretty exciting.
Rahul

Yeah. And from our experience, working with the Graviton team at AWS, over the last few months, it’s remarkable, every little optimization that they’ve done and they’ve worked on is based on a real-world workload like you said, it is not hypothetical. They’re not randomly trying to optimize everything there. They’re, they’re literally looking at all the standard instructions that are basically going into the processor and they’re optimizing those pipelines. Every time they encounter a new scenario where they find something being slow they look at what kind of instructions are going in there optimizing those and making them available to customers. So they are, like you said, working on real-world workloads, and not hypothetical scenarios or made-up benchmarks that you invariably see in a lot of scenarios.
Stephen

Perfect. So do we have anything else to say about ARM before we talk about the other big CPU in the room, but Epyc?
Rahul

No, get around as quickly as possible. This is the bet you want to make. You know, like, I said before, the runway for performance improvements on ARM are significantly greater than on the x86 architecture. At this point, AMD and to some extent, Intel, they are literally fighting physics, to the point where they are trying to figure out how not to melt the CPU, as they are kind of increasing the number of cores on every little CPU. And the TDP is kind of reaching insane level at those 5.5 Gigahertz clock speeds. You know, they’re hard to make trade-offs like decide what are going to be efficiency cores and decide what are going to be performance cores. Because if they made everything a performance core, they would literally melt the CPU. So they’re fighting physics at that particular place.

And then you look at the trade-offs that, you know, the likes of Amazon have to make with Graviton on the ARM side, there’s just so much headway. You can go up in 96 cores that the architecture supports, you can go from 2.5 Gigahertz and core up to 3.1 and get a massive boost just out of that, you already have three sockets per motherboard. So they have so much more room for performance improvements over the next few years compared to x86 that, that really is the bet you want to make. So please go ahead and try it out. See if fabrications work there, it’s more performance and saves you tons of money.
Stephen

Awesome. Thanks. Thanks for sharing. That’s a really good way to get insight on the Graviton. I’m going to switch gears for one second. Welcome back, Jacob. All right.
Rahul

By the way, these guys have an incredibly hard job interpreting all the technical jargon we throw at them. And I can’t even imagine, you know what they go through as they are doing this interpretation. So, thank you, Jacob and Lindsey.
Stephen

Thank you very much. All right. So I want to share a little bit of history. So, I was an AMD early adopter, and this build, let me see here. So AMD and Intel, they’ve been almost cat and mouse for a while. So let me show you this. This is a very flattering set of photos here. This is me doing a build as a very good and very popular teenager. And this computer right here on the left was the AMD K6-2 300. And I was replacing it with a Pentium III that I was very excited about. And there was a ton… Yep, see, I was so excited. I documented it for posterity, there’s my Leatherman tool in the background. Who can’t love this guy? Me with my Voodoo 550.
Rahul

Those are awesome pictures.
Stephen

Thank you. Let’s see, what else do I have. Yeah, there we go. This is a background with Node.js.
Rahul

And looks like the image when he just finished and got it to boot.
Stephen

Oh, it was probably 2:00 a.m in the morning. Yeah, all the boxes are open, shrink wrap everywhere, there’s something about those good old days.

So at that time, this is again about 2000. So AMD had some exciting offerings, but then Intel hit back hard and I guess started this era of… They had a really, really successful run. I mean, like these guys, remember these guys with an ad campaign like that, I mean, how could it not be successful. So but now AMD has hit back hard over the last couple of years. And they’ve introduced the Epyc Milan Series.

So let’s have a look at the announcement. I’ll put this back on the screen. Let’s see… M6A. So the announcement for Amazon is this M6A and C6A which are now generally available in many regions. They’re powered by this third-generation AMD Epyc, all-core frequency of 3.6 Gigahertz. A big increase over the M5. But again, this is a different ballgame than we’re talking about when we’re talking about the Gravitons. The Gravitons are cheaper to run, they’re not hot. So we just have a quick look at what the Milan looks like, just for a visual impact.
Rahul

Absolutely.
Stephen

All right. So this is Linus from Linus Tech Tips and he’s gonna show us and I’ll post the link to his video but let’s have a look.
Linus

This is the top of the Line Ls and Gs. The Epyc 7763 that I’m holding in my hand. Yeah. It may only run at a mere 2.45 gigahertz base, 3.5 gigahertz boost, but don’t let that fool you. Let’s say no Civic, okay? This is the Hummer with the common swap. All right. These things are huge. I can’t get over it. Like, why did they have to make them so big? It’s not a measuring contest, just never gets old, you know. I think the only way to get a bigger one is to buy one on lttstore.com. Oh, yeah, by the way, we’re gonna have Epyc thread Ripper-shaped ones coming soon, probably in a few months. Now. Of course, Yoda would happily remind all of us that size is not important, performance is what matters.

And when it comes to performance, this is pretty much the finest display of kicking someone while they’re down that I think I have ever seen. And I don’t even feel bad. Here’s the thing. Intel’s latest and greatest 10-nanometer Ice Lake Xeons they’ve got some platform advantages, particularly when they’re paired with Intel’s Optane Memory. But for purely CPU-bound applications, I mean, they’re not even competitive with AMD’s last-generation Epyc Rome chips price for price, let alone the new ones. Because here’s the thing, AMD comes in, with more cores, more gigahertz, more cash, and literally double the PCI Express Lanes per socket. Yet AMD still prices their top dog cheaper than Intel’s top dog. And you might think well at $8,000 a chip, 200 bucks. That’s a rounding error. Right? But if you’re buying 1000 of them at a time, like, Intel thinks you are, sorry, Intel, I’m good enough for you, I guess, that works out to a quarter-million dollars.
Stephen

All right. So that was good to see that. I want to pause it there. Look back here. Look at that size of the fan that he’s putting. So while he was talking, he was installing these massive supports for this enormous chunk of aluminum heatsink. And so this is what you were saying, Rahul, this TDP. This is a different philosophy of CPUs. And if you want really blazing fast performance, this is a good option. But it’s different approach that’s really really hot and more power-hungry than the Graviton.
Rahul

Remember, I was talking about the use or the rack space use. This one looks like a 4-U build. Like I can’t imagine how that fan worked. You know fit in less than 4-U. So, I mean, that’s the amount of space that just cooling would use up. If you were to kind of build a data center based on these kinds of processors, this is the heat, you have to figure out a way to cool and efficiently.
Stephen

So, under what circumstances would you choose this compared to a graviton? And when would you choose the Intel?
Rahul

So, I think the more I think about it, I would not go for the Intel. So it’s just a difference between choosing AMD. So yeah, choosing an AMD Epyc processor versus graviton. And at the end of the day, given the recent cloud workload, all you should care about is your price performance for your particular workload. And for some unique scenarios, AMD probably would be better. But I think largely when you look at a lot of general-purpose workloads, ARM is actually kind of winning that race at this point from a price-performance standpoint. And given that AWS’s overheads are going to be far less than custom building their own chips, and running them, and offering it to customers, my bet is on Graviton. Like, even if there were a few workloads that are optimized right now for AMD, if I look at the next two or three years, I expect that pretty much every workload would be running on graviton.
Stephen

That makes a lot of sense. I asked a few other colleagues about the same question of when would you choose? When would you make this choice? And that’s basically the consensus. And then again, sorry, we don’t mean to rag on Intel, it’s just the state of the world at the moment. And you know, things go back and forth, you never know, hopefully, that they’ll innovate. But I guess if you have a high-performance computing workload that is very, very tailored to a particular instruction set with hand-coded assembly functions that are targeting, you know, AVX-512, or some other particular instruction set, then you’d want to go there because there’s a huge investment in developer time. But when like what you’re saying for general purpose, it’s most likely the graviton threes are going to be your best bet.
Rahul

Absolutely.
Stephen

All right. So should we wrap up the…anything else on the instance types? I think we’ve had a really good discussion about this.
Rahul

Yeah. I think we spent way longer than we had anticipated on this particular topic. But again, it’s just such an interesting announcement, an interesting area, where today, customers actually have to make the bet. You have to decide whether you’re going to be on x86 or the ARM type of processors and knowing all this stuff is incredibly valuable.
Stephen

Yeah, absolutely. It’s really fun to think about and see. I guess one great thing is you see that there’s healthy competition that’s driving this forward. I’m glad that there isn’t just one architecture to choose from.
Rahul

Absolutely.
Stephen

All right. Let’s switch gears to another announcement about workflow observability for AWS step functions. All right. Let’s do that. Oh, let me put this on the screen, workflow. There we go. Here we go. And I think we’ll switch interpreters. We’re ready. Okay. So this is the new…the workflow observability features of AWS step functions. So AWS step functions. Let’s…
Rahul

Stephen, you want to just quickly pull up the announcement.
Stephen

Yes. Oh, I’m sorry. There we go. Sorry about that. Here it is.
Rahul

Great.
Stephen

That is what this is getting at. So step functions, zooming out for a moment. Step functions are ways of tying together AWS services. I would bring Jacob back in. Step functions are ways of tying together different AWS services in a flow programming kind of paradigm, where you can say run this lambda function. And then when that happens, take a certain result, put it to s3, and it really ties these things together really nicely. Now, one can think of it almost as a collection of microservices, but all the microservices are AWS-based microservices. And the tricky bit about distributed systems, in general, is that they’re notoriously hard to reason about or debug. Now I’m going to play a 22nd clip that I always talk about when I talk about microservices that I think anyone who’s ever developed in a microservice architecture may relate to.
Man1

It’s so hard to display the birthday date on the Settings page. Why can’t we get this done this quarter?
Man2

Look, I’m sorry, we’ve been over this. It’s the design of our back end. First thing is what we call the BINGO service. See, BINGO knows everyone’s name-o. So we get the user’s ID out of there. And from BINGO, we can call Papaya NMBS to get that user ID and turn it into a user session token, we can validate those with LMNOP. And then once we have that, we can finally pull the user’s info down from Raccoon.
Man1

Yeah, but couldn’t have the raccoon team basically adjust…
Man2

No, Raccoon isn’t guaranteed to have that info. Before we do this, we have to go to Wingman, and do a query to see if the user is willing to take it to the next level, or if they’re just playing the field. Now Wingman is cool but he doesn’t store any user info himself, he has to reach out to other user info provider services like RGS, BRB-DLL, Ringo 2, PLS.
Stephen

I think that illustrates the point that when you have these large, distributed systems, you really have to keep that flowchart in your mind all at once. And so acknowledging this issue, it looks at what AWS has done, it’s saying, “you can now navigate the details of the workflow execution.” And they actually posted this corresponding blog post to go with it. And so now when you have the step function, here, so it’s again this directed acyclic flowchart, you can actually trace through the debugging and see, okay, this yellow represents where something went wrong, you can see what happened and look at the different state transitions, where the error occurred, the state of everything going in and out of each process. So it’s a really useful and clearly illustrated necessary thing in order to debug and understand your data flow through these complex systems. So what’s been your experience with step functions? And where do you see this announcement fitting into the greater scheme of things?
Rahul

Yeah. So you should think of step functions pretty much as a workflow, right? You have states, you have a state transition system, you have events that allow you to move from one state to the other. And you have a bunch of processors for each state. So typically, these processors are lambda functions, or some kind of, you know, processor that really might just compute. But in the simplest form, lambda functions is basically what is used to tie all these things together.

Now, historically, think of it as if a million transactions went through this entire workflow. debugging an issue is invariably very, very hard, because the way you’re number one, identifying an issue because these are all aggregates, you need to aggregate up stuff around state. So when you say that, in a particular state, things are failing repeatedly, you need to be able to jump on it. That kind of visibility was fairly workable. The other thing was, you’re dealing with millions, if not billions, of events that go through these workflows and being able to trace them down at the level of events to understand what event got triggered, or did not get triggered. How did the true state transition actually happen? What are the state changes that happened?

Getting into all of those details with every single trace, was incredibly hard to do. And therefore, any workflow that was beyond maybe five or six boxes, was, you know, a few thousand instances running on a per-day basis, would be incredibly hard to debug. And I think this particular announcement has made it way simpler. Now we are to see how it performs this experiment with a few use cases where we have millions of transactions going through this, but it definitely makes it way easier to dive in and figure out where things are wrong.

The one thing that I kinda miss and right now has to be done separately is aggregate up, you know, stats across different transactions that go to this. So for example, you want to understand how many instances failed at step three of your state, right? So that when it hits a certain threshold, you know to jump on it. And that’s on the observability side of things. That’s a little harder to do right now out of the box. You can take all the data events data, put it into an s3 bucket or into a Kinesis stream run Athena queries, create a quick side dashboard and alarms and all that stuff. It gets a little harder, or you can create CloudWatch metrics to do the same thing. It is just a little harder to do, but it’s definitely a step in the right direction towards observability. And being able to get right into the nitty-gritty details of what’s happening with each execution.
Stephen

Yeah. That makes a lot of sense. And I liked that idea of looking forward, you’d almost want a, a heat map, I’m trying to think of a way you’d visualize this of combinations of states that cause things to fail. It is amazing as we get into these more complex systems, you do really need to have that underlying…what’s the right word? It’s like this intuition of what’s happening and what can cause certain…can certain types of errors. And then you can reason about them. But it’s kind of fun also just to see the evolution of tools get bigger and bigger and bigger to tackle these problems. But I think, Rahul, you and I both have plenty of experience with debugging via printf and grep. And it’s exciting that we have more and more and more powerful tools to tackle this, as you know, as the problem gets bigger, luckily, our tools are evolving with it.
Rahul

Absolutely.
Stephen

Should we have anything else to say about this one, or we should move on to Data Sync?
Rahul

No. I think we can move on to the next announcement.
Stephen

Perfect. All right. And we’ll do a quick swap of the interpreters. And let’s talk about AWS Data Sync. And cue that one-up, Data Sync. Here we go.
Rahul

Yeah. So any of these Data syncs is actually a really neat service. If you haven’t tried it up already. It is a great way to move your data into AWS from wherever your data is. And you can set up a whole lot of rules and schedules and stuff like that. There’s tons of monitoring available to understand what’s being Synced and what’s out of Sync. And it’s a great way to keep different data stores on the same page. This announcement I find interesting and confusing at the same time. So this announcement seems largely driven by a lot of customers who are trying to push the multi-cloud story where they want the data not just in AWS but also replicated for disaster recovery purposes, on Azure as well as GCP.

Now there are…I’m still trying to digest that use case. Because when I really think about it, the first thought that goes to my head is the egress charges nightmare, because AWSes egress charges are so atrociously high, that if I think of taking lots of chunks of data that are very important to me and replicating it across different cloud providers, I am definitely going to see a massive bill. And I don’t know how this kind of a setup really helps, you know, or disaster recovery, because you could do the same thing across three regions, and within AWS itself, or across different availability zones as well.
Stephen

That’s where they mentioned that egress part at the end. The other cloud providers have similar charges. Oh, just a little note, you may be subject to data transfer out fees, as Badri mentioned.
Rahul

That’s a lot of concession they’re thrown your way.
Stephen

This might hurt a lot.
Rahul

This will hurt a lot coming from experience, you know, I mean a simple thing like writing into s3 without using a VPC endpoint, right? I mean, that traffic goes over the internet. So if you have your computer within a VPC and you didn’t use the VPC endpoint to write even to s3, you suddenly see a massive bill, because stuff is going over the internet into s3. So it’s considered as an egress and back into s3, right? Which is why the first recommendation we tell everybody is for any, you know, a store that has a VPC endpoint, switch over to the VPC endpoint, you will save tons of money just out of those egress charges. This is just I don’t know this, this feels like there’s gonna be a massive bill that’s just waiting to come your way. And you’d be better off just using multiple ACs or multiple, you know, regions at most to replicate your data, rather than doing it across cloud providers. Yeah, this feels like a disaster waiting to happen to be honest. So, I have very mixed feelings about it. I love Data Sync as a service but just a word of caution to anyone trying to use this for multi-cloud. Be prepared for a massive bill depending on how much data you’re trying to move over.
Stephen

Yeah, absolutely. And that’s really, and plus you’re paying it twice, right? Because someone’s egress is another person’s ingress.
Rahul

Ingress is usually free across most cloud providers, for example, AWS will not charge you anything for an ingress. They’d have to…they want you to put all your data in that. Egress is basically what gets charged double of what it should have been because they cover ingress cost by overcharging on the egress.
Stephen

That’s why everyone wants to take it in. And then okay, it says people have thought this through. I guess I can see one use case where say, if you have development teams that aren’t well synchronized and well-coordinated and different efforts, but say some artifact gets produced in Google Cloud, or say, maybe analytics that you then want to push over to s3, I can see that propping up. But overall, what you’d want to have it all under one roof and not try and do this multi-cloud approach. Because like you said, the bills will hurt. Unless you can think of a really good reason why you need to do this. But just make sure you evaluate it.
Rahul

Yeah. I have a good reason I would love to learn. Like I haven’t been able to wrap my head around a reason, which requires you to be multi-cloud in that way, versus multi-region within a single cloud provider. But I would love to learn. It’s something that I’ve been struggling to figure out a good use case for.
Stephen

Perfect. All right. Well, we have one quick one we’ll do is we’ve only got about three minutes left. This one wasn’t an AWS official article, when it came out, I thought it spurs on a little bit of a good discussion. This is talking about the relationship between Elastic and Elastic.co, and AWS and AWS had a hosted Elasticsearch. So a bit of context in this about a year ago, AWS forked Elasticsearch into OpenSearch, because Elasticsearch changed the terms of their license to not facilitate hosted services because that competed with their own business model.

And so what this article is getting at, and we’ll post a few links on the show notes. If you looked at the state of the world a year ago between these two companies, it looked a little bit strained. And I really like Elasticsearch, I think it’s a really neat product. And it was a bit sad to see this. But then looking at the state of the world now, there’s been a lot of great integrations between Elastic and Amazon. In fact, I’ll pull up one. There we go… So this is a recent post by Elastic, April 2022 working with AWS to accelerate results that matter. And then they’re working together to make sure it’s a really good experience that all the Elastic products run on AWS. So what’s the moral of this whole story? What do we learn of…? I mean, I guess I would think that cooperation tends to win in these cases.
Rahul

I have a slightly different take on this one. I think this is a testament to the AWS Marketplace, where if you are a software vendor, you know that all of our customers are in the AWS Marketplace. AWS Marketplace has two fundamental advantages. Number one is that it allows you to discover customers very quickly. And if you sign up with the partner program, AWS encourages or will help you find customers for your products within their customer base. So that’s one advantage. But the second advantage also is for a lot of companies that are only now starting to move into AWS. And this is going to be you know, 90% of workloads that we’re talking about here. Over the next few years, there’s still in a very, very huge chunk of workloads that are yet to move in.

A lot of these organizations are still struggling with the old procurement model, where you have to go through a separate procurement engagement to buy software, right? But the Marketplace makes it really easy where with one click, you can go deploy a new product within your AWS account, and it shows up as a single line item in your AWS billing. So your organization has already signed up for AWS, makes it really easy to take your product and deploy it within an end…you know, within a customer base that’s already on AWS, and for them, it allows them to kind of override traditional procurement, which can in some cases be insanely painful to get through. But because it is all beautifully integrated within the AWS ecosystem, it makes it really easy for, you know, random teams to literally go click a button and get it deployed and start using. So that’s the advantage.
Rahul

So, I think, in this particular case, Elasticsearch has realized that it does need the AWS Marketplace because that’s where all of their potential customers are. And they have the advantage also the fact that while they open-sourced large chunks of Elasticsearch, which AWS basically took and turn into their, you know, Elasticsearch, or open-source service. Elasticsearch as a company still has the entire ELK stack that includes Kibana and LogStash and ELK as a whole is basically what a lot of customers want to leverage. The Kibana visualization beats anything that AWS has, and customers want. So it makes sense for Elasticsearch to leverage the marketplace to sell a whole stack as a whole with their terms and their pricing. Even though it’s on the AWS Marketplace, and have it be available to AWS customers. So rather than fight AWS on just the Elasticsearch part, they are now dealing with the entire stack, and are offering it in the marketplace. And that makes life a lot easier for both parties, they will fight a lot less. And I think customers end up benefiting the most out of this.
Stephen

Okay. I like that take a lot. So there’s still room for innovation, both our innovation on top of AWS’s offering. And that’s what Elastic has been able to differentiate themselves and also maintain a prominence in the marketplace that doesn’t get confused with AWS’s offering of the very same product. And yet they can still work together and it’s in everyone’s best interest to be on top of AWS to at least to circumvent that procurement process. The software, I was in that industry a while ago having to deal with tender offers and the finance department and all that stuff. It’s much easier to say, “Oh, let me try this out for a few dollars an hour and see what happens.” That’s a much better way for everyone.
Rahul

Exactly.
Stephen

All right. Well, I think we’re two minutes over. It was a really fun discussion, again, prepared a lot more content than we got through. But I really, really enjoyed all the different topics, and thank you again to Lindsey and Jacob for making this accessible to everyone. And thanks for all who really enjoyed it. And is there any closing… Anything else before we sign off for the week?
Rahul

No. I think again, I go back to the first topic, please go ahead and look at the new C7gs, they are awesome. And go take a look at our benchmark that we published on awsmadeeasy.com We’d love your comments on that. And you know, we’d love to get your questions. We take a look, we get a lot of offline questions that we try to bake into this conversation. But feel free to post live as well during these sessions and we’d love to answer them live. Yeah, and we look forward to seeing you next week.
Stephen

And then next week, we are on Thursday, right?
Rahul

That’s right. So the team is actually in Seattle next week, we are going to be working very closely with the AWS teams themselves. We have a lot of discussions and meetings across different product groups. And therefore, instead of doing this on Tuesday next week, we will be doing this on Thursday.
Stephen

All right, perfect. Well, looking forward to that one and hope all of you out there have a nice evening, afternoon, morning, wherever you are. And we’ll see you next Thursday. Bye, everyone.
Rahul

Thanks, everyone.
[01:14:11.506]
[music]