AWS Insiders Podcast: Episode 6 – AWS Cost Optimization 201

Episode 6
June 1, 2022
36 min

With David Hessing, EVP of Cost Optimization at Trilogy

On this episode, David Hessing, EVP of Cost Optimization at Trilogy, and Rahul Subramaniam, AWS superfan and CTO of ESW Capital of CloudFix, dive deep into AWS Cost Optimization best practices for how to apply cost optimization principles when designing, configuring, and maintaining workloads in AWS Cloud environments.

Listen now on

Speakers

Rahul Subramaniam

AWS superfan and CTO of ESW Capital

Rahul is currently the CEO of CloudFix and DevGraph and serves as the Head of Innovation at ESW Capital. Over the course of his career, Rahul has acquired and transformed 140+ software products in the last 13 years.

David Hessing

EVP of Cost Optimization at Trilogy

David Hessing is the EVP of Cost Optimization at ESW Capital, and the Product CTO at Trilogy. He was previously Chief Technology Officer at Zirra and lead all technology and algorithm development there.

Transcript

Dionn Schaffner

Hey listeners. Welcome to the podcast. We are here today on AWS Insiders with Rahul Subramaniam, and then we’re
going to have a special guest, David Hessing. David, how are you doing today?
David Hessing

I’m doing really well. Good to be here.
Dionn Schaffner

Rahul, how’s it going today?
Rahul Subramaniam

Awesome, Dionn. Having David on board is really, really awesome. I’m sure we’re going to have some really
interesting things to talk about. Can’t wait to get started right away.
Dionn Schaffner

Awesome. Actually, there was something that was on my mind. In the U.S, we finished celebrating Mother’s Day,
and I’m a mother of two. But also, I’m in the interesting spot where my mother has passed away. And so, I’m in
this really interesting moment. With all the marketing that’s coming out, the personalization, and the ability
for companies to really churn, collect, analyze, and create insights from data, I got this really interesting
email from a business, based on your past history, the purchases you made. “But if you don’t want to receive
information about Mother’s Day because of where you are in your stage of life, let us know.” And I thought that
was a really interesting use of data, insights, and personalization… The ability for our customers and
businesses to collect all this data, to really target and personalize the customer experience. And I know there
are a lot of systems and services at AWS that support those kinds of things. So I don’t know… What new things
are you seeing that are coming out, the AWS innovations that can help us move in those interesting directions?
Rahul Subramaniam

There are so many higher-order services at AWS these days that provide incredible amounts of power and leverage
for new-age companies to come and start challenging some of the old guards. I mean, when you look at services
like AWS Forecast, AWS Personalize, specifically to the use case that you were talking about, or Pinpoint, which
allows you to run campaigns like this in a seamless manner, in a matter of hours, if not a few minutes, it’s
quite incredible how AWS services are enabling businesses to get started without having to build up tons and
tons of infrastructure.
Dionn Schaffner

Yeah. And with all those services, right… So now, we’re talking about adding lots of elements to the IT
infrastructure, and we know that that can cause bloat, right? People are launching instances right and left.
They run for a minute, then they’re gone. And so, then that gives us this opportunity for cost optimization.
Some of the things we talked about with Badri were some of the smaller fixes. But I know we’re going to bring in
David, who’s going to tell us about some of the more challenging and the more complex opportunities for cost
optimization. David, take a quick moment and tell our listeners a little bit about yourself and what you’re up
to these days.
David Hessing

Sure. So I am an EVP of cost optimization here at ESW Capital. And for about the last year, I have been focusing
on how we can reduce our costs, primarily AWS costs, across our software portfolio in order to run that software
more cheaply and save money.
Dionn Schaffner

And tell us a little bit about the spectrum of opportunities for cost optimization. We know there are some
elements that a product like CloudFix can go in and fix seamlessly without having to break open the vault, in
terms of security and making changes. And then there are some more difficult opportunities. Can you just take us
through that spectrum and what that looks like?
David Hessing

Sure. So our model, when we go and find savings, starts from CloudFix. CloudFix is the product that we sell but
also use internally, to go and get between five and 10% savings by doing safe fixes that, as you said, are
secure, don’t cause downtime, and are really a no-brainer. Right? They’re simple things. You covered them last
week. AWS recommended things… Moving to the latest generation of something. Things that you want to just do to
save money. But beyond that, right, that only gets you, like you said, to five or 10%.
David Hessing

So our next level is something that we internally call… These don’t have very exciting names… Just
resource-level optimization specs. So we’ll go and write this spec and look at a set of AWS accounts, up to,
usually, about $10 million of spend, and just find where there are opportunities for savings by either
deleting… That’s obviously the easiest and simplest type of savings. Or by tuning resources at the resource
level. And we can get into some details and some examples about how that looks. Those specs actually typically
get us up into the 40 to 50% range of savings from the initial five to 10% from CloudFix.
David Hessing

So the final tier is called code level cost reduction specs, where we’ll actually dive into the guts of a
product and find the opportunities for savings within a product. And that is sort of the hardest type of savings
to get. It obviously, by the name, will require coding and changes to the product. But when we do go and do that
and make the investment in doing that, we’ll get the savings up all the way into the 80 to 90% savings range off
of the initial cost of running that product.
Dionn Schaffner

And we’re hearing from customers, especially when they’re innovating at such speeds, there’s a reluctance to
take a pause and take a moment to do cost optimization, right? Because there are fears about security, there are
fears about change. How do you approach selling the opportunity for cost optimization within an organization?
Because we’re talking about mitigating risks. As you said, there’s downtime with some of these bigger problems.
How do you sell that within the organization? Is it a straight ROI dollars? Like, “Look, here’s what it’s going
to save you.” Or how does that work?
David Hessing

So there are two levels, and you said the first one. The first one really is the ROI dollars, right? So when we
do these specs, that’s right in the spec, it says, “This is how much money we’re going to save, and this is the
effort it’s going to take to save it.” And that’s obviously the easiest sell, right? If the effort is less than
the savings… And we typically use a one-year ROI horizon. If the savings exceed the effort, then you’re going
to do it, because you’re going to recoup that investment within one year. And then, from that point on, that
reduction in your bill is perpetual and you basically just have that savings for as long as you’re running that
software.
David Hessing

The second level is a little harder, right? The second level is more of a total cost of ownership, the total
cost of running the software. So we run a lot of legacy software here, at ESW Capital, because of our business
model of requiring older companies. Not only, but that is a key piece of it. And when you’re running older
software that’s running on servers or running on-prem, or in a data center, or whatever, you have a lot of costs
that are associated just with running that sofware]. Obviously, the main one is headcount. People administering
those servers, making sure they’re up to date with the latest security fixes and patches. And then you have,
also, costs, because that might tend to be brittle and you might have outages. So your software goes down, and
your customers are not very happy, and there are obviously costs associated with that.
David Hessing

So the second model, which we’re working on actually getting to be a very firm model, but even before you have a
firm model that spits out numbers, the second approach to getting these sold internally is to point to the total
ownership cost and how that is going to go down.
Dionn Schaffner

Rahul, you had mentioned, in another conversation, a recent acquisition, and you put your hands to your temple
and you were scratching your face, talking about, “Oh, this code is written in lisp.” Which, by the way, is one
of the languages of my youth. Thank you very much. And you were discussing, sort of, the challenges of going
back in and reworking some of that information, some of that code, in terms of optimizing it into the new
infrastructure. Tell us some good stories about that.
Rahul Subramaniam

Yeah. I mean, at times, some of these codebases can get really hairy. There are codebases that we acquired early
on. I remember, there’s one codebase in particular that had 17 different languages that constituted the entire
product. And this was a product that was about 25 years old. And over a period of time, as new developers came
into work on the product, they just picked a completely different technology stack. And it was an absolute mess.
And I think I personally spent well over a year just deep-diving into different aspects of that particular
codebase, trying to figure out how to make it all work together. And those, of course, are one extreme end of
the spectrum. But it is not uncommon to find codebases in all kinds of different languages. Any code base that
is over 10 years old will have at least three if not more languages constituting the entire code base. You have
a set of languages that you would use for the front end and the UI. You would have a bunch of technologies that
will determine your backend.
Rahul Subramaniam

And as stuff has moved, everything from object-oriented languages to procedural, to functional programming, too,
then, asynchronous event-oriented programming… As these new paradigms have evolved over the last two or three
decades, you suddenly start seeing your codebases evolve the same way. There’ll be some monoliths and cores that
will be in the old technologies and the old paradigms, and newer ones will be in newer stuff. So as you move
stuff over to AWS and find ways to rationalize the costs around them, you do have to be cognizant about where
the scenes are in your application, which you can then use to figure out how you can carve out specific pieces,
and then start leveraging services to replace them.
Dionn Schaffner

So we’re talking about a lot of change, right? And as we think about the cost optimization… And one of the
biggest fears our customers have is change, right? We’re going to change a technology stack, we’re moving, we’re
consolidating. Which has ramifications, like you said, from everything, from personnel, and resources, all the
way through the knowledge base. Tell us a little bit… Maybe, David, you could take us through a little bit
about what change management is, and how that is going to be helpful for some of these larger cost optimization
initiatives.
David Hessing

Okay. Sure. The change manager is a service embedded within the AWS systems manager. So it’s a little hard to
find. And what it does is it is a way for you to safely make control changes to your AWS environment. It has a
notion of a change template, which sets up the parameters of what is allowed to be changed within an AWS
account. And you set up, and basically install, so to speak, those templates into your database account. And
then, once that template is there, you then can create change requests against that template. What it allows you
to do is see every single change, and make sure that it’s done, like I said, in a controlled fashion, and a
recorded fashion. And it also is a very, very important way of managing security risk. So CloudFix uses this
mechanism to make the changes that it recommends and finds to get those cost savings. It does that through
Change Manager, in order that there’s no way, there’s no risk, there’s no worry, concern. There’s no technical
way for CloudFix to actually do anything directly within a customer’s environment.
Rahul Subramaniam

I’ll just add one other thing, which is that customers are petrified of changes being made to their accounts.
For two reasons. Reason number one is, in a lot of cases, like I discussed last time, customers actually don’t
even know what is actually specified in the resources that they have launched. And that creates a ton of
anxiety. When someone comes in and talks about cleaning up an account of resources that are currently deployed,
most customers, in my view, who’ve had resources running for a while, probably don’t really have a sense of
what’s actually running in those instances. And that causes a ton of fear. Shutting off instances, resizing
them, or even just rebooting them at times causes a ton of anxiety.
Rahul Subramaniam

The second aspect of it is that, invariably, to make any change, you need to grant a third-party tool or third
parties, as people, incredible amounts of access to make those changes. And when you don’t have that level of
trust with either the tool or the people, you are reluctant to hand over pretty much the keys to the kingdom.
And that causes a ton of anxiety, as well. So on those two dimensions, there’s just so much anxiety that most
customers have that they don’t really want to make changes to their system, just because they don’t understand
it, or they have to grant incredible amounts of permission.
Rahul Subramaniam

So to me, actually, Change Manager is one of those amazing diamonds amongst the AWS services that are just
completely lost in that coal mine of really badly named services. Change Manager is actually called AWS Systems
Manager Change Manager. It’s like, you wouldn’t even think about looking for a name like that across the AWS
catalog. And as David said, it’s very deep inside the systems manager suite of little service lets, if I can
call it that.
Rahul Subramaniam

But the reality is that Change Manager has changed the game for how changes can be implemented within your
account. The fact that you have a fully auditable traceable mechanism of changes being made in the system, and
it’s an AWS-recommended way of doing it, is quite remarkable. The fact that very few people know about it is
really sad, but that’s what we want to share with everyone. There is this amazing service that everyone should
really be used to implement changes.
Rahul Subramaniam

You create these SSM documents, which are actually nothing. They are basically scripts that are called Systems
Manager documents, that are run by Systems Manager agents, or SSM agents. And they will execute these in a very
systematic, automated manner like we discussed last time. And the amazing part over here is that you are put in
control, not the third party tool, not a third party person, but you, as the customer, the owner of the account,
get control to actually look at every SSM document, approve it, and then decide what requests get executed. So
you get a level of control that, in the past, has been virtually impossible. And Change Manager literally has
changed the game of how you make changes to your AWS accounts.
Dionn Schaffner

And you talk about needing this level of trust in order to embrace this technology and this process. When we
think about that, coming from the third-party vendor who we’re going to hire, or the tool, we want to know that
they’ve fought through, or they’ve been through the fires of change hell and trying to get to it. Tell us why we
should trust that they’ve encountered all of those elements, and some of the pitfalls they’ve discovered in
their process of creating this service.
David Hessing

Yeah. So when I started this process, it was really sort of a challenge that was thrown at me, was, “Hey, we, at
ESW Capital, have tens of thousands of accounts, tens of millions of dollars of AWS spend. And we know there’s a
ton of waste in there. Well, go find it.” I got a team to do that, to help me, but it was very open-ended to
start out with, which meant that in the beginning, for a few weeks, I was kind of floundering around, trying to
figure out where to find the savings. So after a couple of weeks of trying to figure out everything going on
across all of our accounts, and all that, we started to see certain patterns.
David Hessing

One, a big pattern we saw is when you are buying the wrong type of thing in AWS… And I’ll explain what I mean
by that. Because AWS has so many services, everything becomes quite granular, right? If you need storage, you
pay for S3 or you pay for EBS. If you need compute, you pay for… Well, there’s a whole range of options, but
the most common option is, that you pay for instances in EC2. What happens sometimes with some of our older
software is you end up paying for computing when, really, what you needed was storage. And that’s a huge savings
opportunity because computing is typically a few orders of magnitude more expensive than storage. So that’s one
big area where you found that. Because of the way our software was running on certain servers, we basically were
incorrectly paying for the wrong type of resource.
David Hessing

And one thing that AWS lets you do is split that out, right? So if you can actually break out your storage and
put it in the right medium… It depends on your software, but typically, that’s going to be S3 or EFS. Then,
you can dramatically lower your costs, both on that storage directly, but more importantly, the gains are from
when you can then reduce your compute. So, that was one big pattern that emerged pretty quickly.
David Hessing

Another one that came out is database consolidation, right? So when we buy all these products, they all have
their own database. Some of them have many, many databases. We made an acquisition where the product had over
2000 databases. And one key insight is that, if you can take all those databases, and they’re all running on
some sort of container where it might be within RDS, it might be self-hosted, but you can take all of those and
consolidate them onto a huge server, you actually save money. Because when each one is running on its own in its
own dedicated compute and storage, typically, you need to leave something around 70% headroom to account for
periodic spikes on each of those individual databases.
David Hessing

But when you consolidate them onto a huge server… Big server, many cores, tons of memory… You only really
need to leave about 20% headroom. Because each individual spike that comes on each of those little databases,
first of all, on that huge server doesn’t need that 70%. It only needs a huge server relative that 20% is
enough. And also, those spikes typically don’t happen at the same time. This is something… The insight comes
from the Erlang distribution. And because those spikes don’t happen at the same time, you need less space, and
you can run all of those databases for dramatically less. We’ve seen savings of around 65 to 70% from doing that
sort of consolidation.
Rahul Subramaniam

Yeah, that’s actually quite incredible. What we found was that… It’s remarkable that Enterprise software
products, on average, every time we’ve looked at a data center or brought stuff into AWS, historically, the
average utilization of resources in Enterprise software has been less than 2%, which is shocking. They plan so
much for the odd spike that they might see that the waste that you see over there is quite remarkable. And like
David was talking about, the airline model. This is basically the same mechanism that, let’s say, your internet
ISP uses to figure out or call centers used to figure out, what capacity they have and how many subscribers they
can support. So for example, if there are a thousand people who have one Gbps connection for your internet, you
don’t really need a one terabyte pipe to support that, because every ISP knows that not everybody is going to be
using the internet at one Gbps all at the same time.
Rahul Subramaniam

So there is a mathematical model, called the underlying model, that allows you to figure out how many customers
you can have for a certain pipe. And the same model really can be applied to resource planning, when you take a
large number of very small spike-y workloads, and then consolidate them on some very, very large instances. At
ESW, we’ve actually got some Aurora databases. Well, Aurora is literally a ground-up built database that’s built
for the Cloud. From scratch. And we have packed in over 2000 schemas on a single cluster. And the beauty of that
is, we would’ve had 2000 different servers, with all kinds of crazy sporadic workloads, with very little
utilization overall, or on average. And we’ve been able to put them all and consolidate them on these super
large clusters. It’s basically served us incredibly well, in terms of costs and performance at the same time.
Because each one of these little workloads, when they achieve a spike, they get to use the full power of that
super large machine that they would otherwise have not been able to use.
Rahul Subramaniam

Okay. One of the experiments that we tried early on, that… We tried to take X1e.32xl machines. Now, these are
incredibly large AWS machines. They have 128 cores, and they have four terabytes of ram. That’s how large these
are. And we took a whole lot of Docker containers that we started packing into these super large machines. Now,
of course, we ran into one very technical problem on Docker itself, where the Kubernetes, in fact, will not
allow you to have more than 110 containers per node. But barring that, it actually brought about so much cost
savings for us, as we started packing in these workloads on single, super large machines.
Rahul Subramaniam

We started seeing this pattern across the board. When you have a certain kind of resources, and you have tens,
if not thousands of these, if you can figure out a way to make a super large machine, super large instance, and
start consolidating them, it brings huge cost savings. And if that service is managed by AWS, it makes even more
sense, because they take care of all the risk. They take care of all the management headaches. And you can
literally push each one of these to their very brink, and make sure that you are leveraging the resources to
their maximum.
David Hessing

Well, I will just add on to that, and say, this is this DB consolidation is an interesting example. It’s right
on the border between the resource level of cost savings and a product level of change, right? Because
typically, a DB is going to be pretty well-defined from your product, but there is going to be code you’re going
to need to change. So it’s sort of the lightest level of where you’re going to have to go into your product,
probably change your DB configuration, depending on your product.. Hopefully, it won’t be too hard if your
product’s written if your code’s well-organized. But it’s kind of right at the level… Of that level of
division between those two cost areas. One last area, just, I think, worth mentioning, is, by the example of the
harder type of savings, is when you go all-in on an AWS service, even more right?
David Hessing

So one area that we have seen is where older products will have their own custom analytics reporting areas, will
have an area where they crunch the numbers. They may have written it entirely themselves, or maybe they are
leveraging an older type of analytic technology. And with some AWS services, like Glue, and Athena, backed by
S3, re-doing those analytics and that reporting on those technologies leads to huge, huge cost savings. Order of
magnitude cost savings.
David Hessing

I mean, running Athena and S3 is pennies, typically. For storing the data in S3, and then you’re charged per
query in Athena, and you can even get the UI with QuickSight. So that’s a very common stack we go to. But this
kind of savings requires a lot of coding, right? So we did do this on one product, and we had a team of coders
working on it for about a quarter, and they went through and just rewrote every single report that was
previously running on Hadoop, with a custom UI, and converted it over. And we saved a whole bunch of money
there.
Dionn Schaffner

And so, some of these larger cost optimization projects, like you just mentioned, like… We’re consolidating
databases, we’re rewriting how our product is accessing this information. What types of organizations, or maybe
it’s the size of the organizations… Does it make sense to start looking into these types of initiatives?
What’s a profile of an organization that should put this type of cost optimization at the top of their strategic
list.
David Hessing

Everybody should be exercising good cost hygiene, and be looking to save money. I recognize that not everybody
might want to make the investments to do the harder types of cost savings. Certainly, the easy stuff is
definitely worth doing. And the resource-type of optimizations are worth doing. In stuff that’s not used at all,
or barely used. And these are just easy. You should definitely go after these, and find these. I mean, whether
it’s your developers that [inaudible 00:25:48] up some test machines, and then forgot about them or data that’s
just accruing because of the way your SAS products was written, and you never bothered to worry about deleting
data. I mean, there’s just tons of easy stuff to go and get.
David Hessing

Actually, one thing we’re doing internally, which is on the roadmap for CloudFix’s little ways out, is cost
anomaly detection. So we’re building on an AWS service that is backed by machine learning. And AWS has a very
nice service that… You can turn on your [inaudible 00:26:19] accounts, and it will alert you if it detects a
cost anomaly. The problem is, you got to do something. Right? The problem is, you got to check it. So we built
on top of that service. And now, we are running this internally for a while, now. Where, if AWS fires off an
anomaly, if it’s above a certain threshold, we’re using something along with the order of $50 a day, but it’s
completely configurable, we will create a ticket, and somebody has to go investigate that and figure out what’s
going on.
David Hessing

I mean, it’s hard to count these savings because they’re preventative, right? They never actually were on your
bill. So it’s a few assumptions about how long that cost anomaly would have run and some other types of
decisions about how you want to count this money. But we have saved, already, millions just by doing this cost
anomaly detection. So back to your point, some of this stuff is so easy to go and get, and some of its just good
practice, that really, everybody should be doing it.
Dionn Schaffner

So how are you organizing your team? So, say I’m listening in on this conversation, and I’m like, “You know
what? We are not doing this. We need a team like David’s team, that’s going to focus on this, find those
anomalies.” And to your point, actually do something about it. What would be a best practice for resources, the
things that you said, attack the preventative opportunities? What does that team look like?
David Hessing

That’s a great question. So I can tell you what it looks like for us. It’s not a big team. It’s a team of about
five. This is all after CloudFix. This is assuming CloudFix already ran, and got you your five and 10%. So after
that, what we do is, we will go and look, account-wise, and look for certain types of resources across different
accounts. And we’ve got that down to a bit of a science. The most common types of resources that we see cost
spikes for, or just waste. Not just only spikes, just waste to go and get, and clean up.
David Hessing

We have sort of a standard work unit for going and checking, say, Elasticsearch/OpenSearch, right? We see where
people will go and configure a cluster. When it’s for dev, they’ll have a master node in there and you just
don’t need it, and you can just get rid of it. Bring the cluster down to one, or especially for dev or QA, the
same… It works for you just fine, for what you’re using it for. And you just cut costs by 30% or more. Right?
So, that’s just one example.
David Hessing

So we have a set of these work units that are oriented around different common resources, that have either waste
or poor usage patterns, and the team will go and run those. And they’ll go and look for those sorts of
resources, just on an ongoing basis. And then, the other thing we do is, we go and we find things at the product
level that become the code level cost reduction opportunities. But that typically requires almost an anatomy of
a product, right? So we’ll go look at a product, and we’ll say, “Hey, this product is spending $3 million
annually,” right? So we will go and do the anatomy of that product and say… Just to understand where that
spend is, how it’s broken down, and where those opportunities are for cutting that within the product.
Dionn Schaffner

What are your success metrics for this team? How are they evaluated?
David Hessing

In cost, it’s easy. Right? Because cost, it’s dollars. Right? It’s the easiest metric… We touched on the
one-year ROI that we shoot for. That’s not just on the team’s activities itself. That includes the engineering
time. So success right now is, basically, that we identify cost savings where… The cost of identifying the
cost savings, and the cost of getting and executing those cost savings is less than the cost savings. That’s our
success metric.
Dionn Schaffner

So ideally, this group is going to work themselves out of a job, if they do it right?
David Hessing

Absolutely. We are starting to see a plateau, right? The low-hanging fruit, or the easy stuff that we went and
got over the last year. But I will say, this has gone a lot longer than I expected. When I started this over a
year ago, I thought, “Hey, maybe we’ll go cut some waste for a quarter,” right? Or maybe two quarters. And we’re
still going strong. We have a backlog of work that extends into Q3. ESW has a very large portfolio of products,
but a lot of those products are maybe a bit smaller. They’re each individual product… Annual spend is less. So
in that long tail of products, it may not be cost-effective to get the cost savings, right? You might not hit
that on your ROI.
David Hessing

So we’re doing two things. We’re looking for ways to be more efficient ourselves, right? We’re looking for ways
to identify cost savings that cost less, streamline our own work, and make it more efficient. The other thing is
that we’re turning this into a product because we think that this is something that is very valuable for
external companies as well. So if we succeed in that, then we won’t be out of a job.
Rahul Subramaniam

I think David comes in with a relatively shorter-term stint in the cost optimization world at ESW. From my 14,
15-year experience working with AWS on this front, this is never-ending. Cost savings are just… As long as
you’re using a database, you’re always going to have scope for optimizing costs. It’s primarily because there
are three dimensions in which things are changing.
Rahul Subramaniam

The first one is that AWS is constantly creating new services. And if you are wise in leveraging these services,
you’re going to be using all of the new stuff that AWS is coming up with. And with every new service, there is
almost a life cycle, where… You start using the service, you start seeing the benefits of it. It seems very
cheap, initially. And then, suddenly, as you spend more of it, your cost suddenly spikes, and then you start
looking into why the cost is spiked. Are you following all the best practices? You start optimizing the costs
over a period of time. It’s a natural life cycle of every new service that AWS has ever created. Right? So
that’s dimension number one.
Rahul Subramaniam

Dimension number two is your dev and staging environments… Your developers are constantly adding new
resources. None of those resources are static. They’re constantly launching new services. They’re constantly
doing something new, which is adding to new resources that you have to constantly look at. You have to be on top
of things. You have to make sure you have all the policies in place so that you can control costs when they’re
launching these new resources, and so on.
Rahul Subramaniam

And then the third dimension is that almost all the cost optimization stuff is trying to keep pace with
everything that AWS is coming up with. You almost feel like you’re at least six months or a year behind all the
new stuff that AWS is working on. Your cost optimization team is catching up with stuff. There’s new stuff
that’s being discovered. There’s new stuff being implemented. And there’s going to be new runbooks you have to
create. So that your teams and your engineering org can start adopting them, and they know how to deal with it,
know how to safely execute those changes. So, given all these dimensions or changes, I think there’s always
going to be room for cost optimization. As long as you’re spending money on AWS, you’re always going to need a
mechanism to save money, because there is always going to be waste.
David Hessing

Right? And plus, as ESW, we’re always making new acquisitions as well.
Rahul Subramaniam

That’s very true.
Dionn Schaffner

And that seems in line, also, with the process of going through the internal pains and gains of figuring it out,
and then productizing that and turning it into something that our customers can use, because obviously, ESW is
not the only one running into these same problems. Right?
Rahul Subramaniam

Exactly.
Dionn Schaffner

Awesome. All right. To wrap up, David, tell us the top two tips our listeners should know about cost
optimization.
David Hessing

The easiest cost optimization is to find stuff you’re not using. We just finished a spec… Again, this is an
internal spec. And I think, something like 60% of the savings it identified was just stuff that people left
running, because the advantage of the Cloud is it’s… Click a button and you turn something on. There’s no
friction, which is great. Number one. Tip number two, I think, is what Rahul said. Which is just, always be
looking for what AWS is rolling out, right? Because there are also easy savings to get there. Those two tips,
that’s just easy stuff. Get you a ton of savings to start with.
Dionn Schaffner

And Rahul, what do you think the top two ways to reduce the fear of the risk in an organization would be,
forgoing and doing some of these more challenging cost optimization initiatives?
Rahul Subramaniam

Yeah. So, to that, my tip number one would be… Start leveraging Change Manager, which is AWS’s best practice
for how you should start implementing the changes. And then, I’ll address one other aspect. David already
covered all the simple stuff, the easy ways to save money. But for folks who are attempting some of the hardest
stuff, which is the code level changes, or product level changes, architecture. There’s a simple framework that
you can use to mitigate some of the risks. Number one is, that the way you should look at your product is if you
can find seams within your product that are easily replaceable by standard out-of-the-box AWS services, then go
ahead and perform that one change.
Rahul Subramaniam

So for example, if you have your product doing full tech search, and that’s very nicely operated as a service,
take that and replace it with something like Elasticsearch, OpenSearch, or Kendra, that serves your purpose. But
that’s only if you have a very clear seam. Or if analytics is a completely separate subsystem, you can replace
that out, and change it with S3 Athena, QuickSight, or Redshift. Athena, QuickSight… Whatever your latency
requirements are on your queries, pick one. But as long as you can do that, that’s a great way to save money.
Rahul Subramaniam

If you cannot find seams within your product, I would say you might be better off rewriting your core value
proposition completely from scratch with the AWS native services, because trying to get 100% feature parity for
an old monolithic code base that has a lot of craft that has developed on it over the years is incredibly hard.
You’re better off taking a step back, identifying the core value proposition of that particular monolith, or the
module, and then saying, “What would it mean to implement that value proposition with native AWS API?” We
usually find, that if you really did that exercise, you wouldn’t need more than 10,000 lines of code to
implement most of these kinds of solutions. So I would say, don’t try to completely rewrite a monolith, because
you would end up wasting a lot of time and effort chasing up savings that are probably not there.
Dionn Schaffner

The build-versus-buy challenge.
Rahul Subramaniam

Exactly.
Dionn Schaffner

Awesome. Well, guys, this has been great. Thank you, David, for coming in and giving us a little more insight
into some of the other areas of cost optimization. We’ve talked high level, we’ve talked down to database
consolidation level. I think we hit all the angles, so I appreciate you coming out, David. Thanks for being
here.
David Hessing

Absolutely. It’s been a pleasure.
Dionn Schaffner

Thanks, everyone, for listening today. If you enjoyed our podcast, please be sure to rate, review and subscribe.
See you next time on AWS Insiders.

AWS Insiders Podcast: Episode 6 – AWS Cost Optimization 201

Speakers

Rahul Subramaniam

David Hessing

Transcript

Leave a Reply Cancel reply

About AWS Made Easy

Resources

Get the latest updates on AWS