AWS Made Easy
Search
Close this search box.
AWS Insiders Podcast: Episode 6 - AWS cost optimization 101 with David Hessing

AWS Insiders Podcast: Episode 6 – AWS Cost Optimization 201

With David Hessing, EVP of Cost Optimization at Trilogy

On this episode, David Hessing, EVP of Cost Optimization at Trilogy, and Rahul Subramaniam, AWS superfan and CTO of ESW Capital of CloudFix, dive deep into AWS Cost Optimization best practices for how to apply cost optimization principles when designing, configuring, and maintaining workloads in AWS Cloud environments.

Listen now on

Speakers

Portrait of Rahul Subramaniam

Rahul Subramaniam

AWS superfan and CTO of ESW Capital

Rahul is currently the CEO of CloudFix and DevGraph and serves as the Head of Innovation at ESW Capital. Over the course of his career, Rahul has acquired and transformed 140+ software products in the last 13 years.
Portrait of David Hessing

David Hessing

EVP of Cost Optimization at Trilogy

David Hessing is the EVP of Cost Optimization at ESW Capital, and the Product CTO at Trilogy. He was previously Chief Technology Officer at Zirra and lead all technology and algorithm development there.

Transcript

  1. Dionn Schaffner

    Hey listeners. Welcome to the podcast. We are here today on AWS Insiders with Rahul Subramaniam, and then we’re
    going to have a special guest, David Hessing. David, how are you doing today?

  2. David Hessing

    I’m doing really well. Good to be here.

  3. Dionn Schaffner

    Rahul, how’s it going today?

  4. Rahul Subramaniam

    Awesome, Dionn. Having David on board is really, really awesome. I’m sure we’re going to have some really
    interesting things to talk about. Can’t wait to get started right away.

  5. Dionn Schaffner

    Awesome. Actually, there was something that was on my mind. In the U.S, we finished celebrating Mother’s Day,
    and I’m a mother of two. But also, I’m in the interesting spot where my mother has passed away. And so, I’m in
    this really interesting moment. With all the marketing that’s coming out, the personalization, and the ability
    for companies to really churn, collect, analyze, and create insights from data, I got this really interesting
    email from a business, based on your past history, the purchases you made. “But if you don’t want to receive
    information about Mother’s Day because of where you are in your stage of life, let us know.” And I thought that
    was a really interesting use of data, insights, and personalization… The ability for our customers and
    businesses to collect all this data, to really target and personalize the customer experience. And I know there
    are a lot of systems and services at AWS that support those kinds of things. So I don’t know… What new things
    are you seeing that are coming out, the AWS innovations that can help us move in those interesting directions?

  6. Rahul Subramaniam

    There are so many higher-order services at AWS these days that provide incredible amounts of power and leverage
    for new-age companies to come and start challenging some of the old guards. I mean, when you look at services
    like AWS Forecast, AWS Personalize, specifically to the use case that you were talking about, or Pinpoint, which
    allows you to run campaigns like this in a seamless manner, in a matter of hours, if not a few minutes, it’s
    quite incredible how AWS services are enabling businesses to get started without having to build up tons and
    tons of infrastructure.

  7. Dionn Schaffner

    Yeah. And with all those services, right… So now, we’re talking about adding lots of elements to the IT
    infrastructure, and we know that that can cause bloat, right? People are launching instances right and left.
    They run for a minute, then they’re gone. And so, then that gives us this opportunity for cost optimization.
    Some of the things we talked about with Badri were some of the smaller fixes. But I know we’re going to bring in
    David, who’s going to tell us about some of the more challenging and the more complex opportunities for cost
    optimization. David, take a quick moment and tell our listeners a little bit about yourself and what you’re up
    to these days.

  8. David Hessing

    Sure. So I am an EVP of cost optimization here at ESW Capital. And for about the last year, I have been focusing
    on how we can reduce our costs, primarily AWS costs, across our software portfolio in order to run that software
    more cheaply and save money.

  9. Dionn Schaffner

    And tell us a little bit about the spectrum of opportunities for cost optimization. We know there are some
    elements that a product like CloudFix can go in and fix seamlessly without having to break open the vault, in
    terms of security and making changes. And then there are some more difficult opportunities. Can you just take us
    through that spectrum and what that looks like?

  10. David Hessing

    Sure. So our model, when we go and find savings, starts from CloudFix. CloudFix is the product that we sell but
    also use internally, to go and get between five and 10% savings by doing safe fixes that, as you said, are
    secure, don’t cause downtime, and are really a no-brainer. Right? They’re simple things. You covered them last
    week. AWS recommended things… Moving to the latest generation of something. Things that you want to just do to
    save money. But beyond that, right, that only gets you, like you said, to five or 10%.

  11. David Hessing

    So our next level is something that we internally call… These don’t have very exciting names… Just
    resource-level optimization specs. So we’ll go and write this spec and look at a set of AWS accounts, up to,
    usually, about $10 million of spend, and just find where there are opportunities for savings by either
    deleting… That’s obviously the easiest and simplest type of savings. Or by tuning resources at the resource
    level. And we can get into some details and some examples about how that looks. Those specs actually typically
    get us up into the 40 to 50% range of savings from the initial five to 10% from CloudFix.

  12. David Hessing

    So the final tier is called code level cost reduction specs, where we’ll actually dive into the guts of a
    product and find the opportunities for savings within a product. And that is sort of the hardest type of savings
    to get. It obviously, by the name, will require coding and changes to the product. But when we do go and do that
    and make the investment in doing that, we’ll get the savings up all the way into the 80 to 90% savings range off
    of the initial cost of running that product.

  13. Dionn Schaffner

    And we’re hearing from customers, especially when they’re innovating at such speeds, there’s a reluctance to
    take a pause and take a moment to do cost optimization, right? Because there are fears about security, there are
    fears about change. How do you approach selling the opportunity for cost optimization within an organization?
    Because we’re talking about mitigating risks. As you said, there’s downtime with some of these bigger problems.
    How do you sell that within the organization? Is it a straight ROI dollars? Like, “Look, here’s what it’s going
    to save you.” Or how does that work?

  14. David Hessing

    So there are two levels, and you said the first one. The first one really is the ROI dollars, right? So when we
    do these specs, that’s right in the spec, it says, “This is how much money we’re going to save, and this is the
    effort it’s going to take to save it.” And that’s obviously the easiest sell, right? If the effort is less than
    the savings… And we typically use a one-year ROI horizon. If the savings exceed the effort, then you’re going
    to do it, because you’re going to recoup that investment within one year. And then, from that point on, that
    reduction in your bill is perpetual and you basically just have that savings for as long as you’re running that
    software.

  15. David Hessing

    The second level is a little harder, right? The second level is more of a total cost of ownership, the total
    cost of running the software. So we run a lot of legacy software here, at ESW Capital, because of our business
    model of requiring older companies. Not only, but that is a key piece of it. And when you’re running older
    software that’s running on servers or running on-prem, or in a data center, or whatever, you have a lot of costs
    that are associated just with running that sofware]. Obviously, the main one is headcount. People administering
    those servers, making sure they’re up to date with the latest security fixes and patches. And then you have,
    also, costs, because that might tend to be brittle and you might have outages. So your software goes down, and
    your customers are not very happy, and there are obviously costs associated with that.

  16. David Hessing

    So the second model, which we’re working on actually getting to be a very firm model, but even before you have a
    firm model that spits out numbers, the second approach to getting these sold internally is to point to the total
    ownership cost and how that is going to go down.

  17. Dionn Schaffner

    Rahul, you had mentioned, in another conversation, a recent acquisition, and you put your hands to your temple
    and you were scratching your face, talking about, “Oh, this code is written in lisp.” Which, by the way, is one
    of the languages of my youth. Thank you very much. And you were discussing, sort of, the challenges of going
    back in and reworking some of that information, some of that code, in terms of optimizing it into the new
    infrastructure. Tell us some good stories about that.

  18. Rahul Subramaniam

    Yeah. I mean, at times, some of these codebases can get really hairy. There are codebases that we acquired early
    on. I remember, there’s one codebase in particular that had 17 different languages that constituted the entire
    product. And this was a product that was about 25 years old. And over a period of time, as new developers came
    into work on the product, they just picked a completely different technology stack. And it was an absolute mess.
    And I think I personally spent well over a year just deep-diving into different aspects of that particular
    codebase, trying to figure out how to make it all work together. And those, of course, are one extreme end of
    the spectrum. But it is not uncommon to find codebases in all kinds of different languages. Any code base that
    is over 10 years old will have at least three if not more languages constituting the entire code base. You have
    a set of languages that you would use for the front end and the UI. You would have a bunch of technologies that
    will determine your backend.

  19. Rahul Subramaniam

    And as stuff has moved, everything from object-oriented languages to procedural, to functional programming, too,
    then, asynchronous event-oriented programming… As these new paradigms have evolved over the last two or three
    decades, you suddenly start seeing your codebases evolve the same way. There’ll be some monoliths and cores that
    will be in the old technologies and the old paradigms, and newer ones will be in newer stuff. So as you move
    stuff over to AWS and find ways to rationalize the costs around them, you do have to be cognizant about where
    the scenes are in your application, which you can then use to figure out how you can carve out specific pieces,
    and then start leveraging services to replace them.

  20. Dionn Schaffner

    So we’re talking about a lot of change, right? And as we think about the cost optimization… And one of the
    biggest fears our customers have is change, right? We’re going to change a technology stack, we’re moving, we’re
    consolidating. Which has ramifications, like you said, from everything, from personnel, and resources, all the
    way through the knowledge base. Tell us a little bit… Maybe, David, you could take us through a little bit
    about what change management is, and how that is going to be helpful for some of these larger cost optimization
    initiatives.

  21. David Hessing

    Okay. Sure. The change manager is a service embedded within the AWS systems manager. So it’s a little hard to
    find. And what it does is it is a way for you to safely make control changes to your AWS environment. It has a
    notion of a change template, which sets up the parameters of what is allowed to be changed within an AWS
    account. And you set up, and basically install, so to speak, those templates into your database account. And
    then, once that template is there, you then can create change requests against that template. What it allows you
    to do is see every single change, and make sure that it’s done, like I said, in a controlled fashion, and a
    recorded fashion. And it also is a very, very important way of managing security risk. So CloudFix uses this
    mechanism to make the changes that it recommends and finds to get those cost savings. It does that through
    Change Manager, in order that there’s no way, there’s no risk, there’s no worry, concern. There’s no technical
    way for CloudFix to actually do anything directly within a customer’s environment.

  22. Rahul Subramaniam

    I’ll just add one other thing, which is that customers are petrified of changes being made to their accounts.
    For two reasons. Reason number one is, in a lot of cases, like I discussed last time, customers actually don’t
    even know what is actually specified in the resources that they have launched. And that creates a ton of
    anxiety. When someone comes in and talks about cleaning up an account of resources that are currently deployed,
    most customers, in my view, who’ve had resources running for a while, probably don’t really have a sense of
    what’s actually running in those instances. And that causes a ton of fear. Shutting off instances, resizing
    them, or even just rebooting them at times causes a ton of anxiety.

  23. Rahul Subramaniam

    The second aspect of it is that, invariably, to make any change, you need to grant a third-party tool or third
    parties, as people, incredible amounts of access to make those changes. And when you don’t have that level of
    trust with either the tool or the people, you are reluctant to hand over pretty much the keys to the kingdom.
    And that causes a ton of anxiety, as well. So on those two dimensions, there’s just so much anxiety that most
    customers have that they don’t really want to make changes to their system, just because they don’t understand
    it, or they have to grant incredible amounts of permission.

  24. Rahul Subramaniam

    So to me, actually, Change Manager is one of those amazing diamonds amongst the AWS services that are just
    completely lost in that coal mine of really badly named services. Change Manager is actually called AWS Systems
    Manager Change Manager. It’s like, you wouldn’t even think about looking for a name like that across the AWS
    catalog. And as David said, it’s very deep inside the systems manager suite of little service lets, if I can
    call it that.

  25. Rahul Subramaniam

    But the reality is that Change Manager has changed the game for how changes can be implemented within your
    account. The fact that you have a fully auditable traceable mechanism of changes being made in the system, and
    it’s an AWS-recommended way of doing it, is quite remarkable. The fact that very few people know about it is
    really sad, but that’s what we want to share with everyone. There is this amazing service that everyone should
    really be used to implement changes.

  26. Rahul Subramaniam

    You create these SSM documents, which are actually nothing. They are basically scripts that are called Systems
    Manager documents, that are run by Systems Manager agents, or SSM agents. And they will execute these in a very
    systematic, automated manner like we discussed last time. And the amazing part over here is that you are put in
    control, not the third party tool, not a third party person, but you, as the customer, the owner of the account,
    get control to actually look at every SSM document, approve it, and then decide what requests get executed. So
    you get a level of control that, in the past, has been virtually impossible. And Change Manager literally has
    changed the game of how you make changes to your AWS accounts.

  27. Dionn Schaffner

    And you talk about needing this level of trust in order to embrace this technology and this process. When we
    think about that, coming from the third-party vendor who we’re going to hire, or the tool, we want to know that
    they’ve fought through, or they’ve been through the fires of change hell and trying to get to it. Tell us why we
    should trust that they’ve encountered all of those elements, and some of the pitfalls they’ve discovered in
    their process of creating this service.

  28. David Hessing

    Yeah. So when I started this process, it was really sort of a challenge that was thrown at me, was, “Hey, we, at
    ESW Capital, have tens of thousands of accounts, tens of millions of dollars of AWS spend. And we know there’s a
    ton of waste in there. Well, go find it.” I got a team to do that, to help me, but it was very open-ended to
    start out with, which meant that in the beginning, for a few weeks, I was kind of floundering around, trying to
    figure out where to find the savings. So after a couple of weeks of trying to figure out everything going on
    across all of our accounts, and all that, we started to see certain patterns.

  29. David Hessing

    One, a big pattern we saw is when you are buying the wrong type of thing in AWS… And I’ll explain what I mean
    by that. Because AWS has so many services, everything becomes quite granular, right? If you need storage, you
    pay for S3 or you pay for EBS. If you need compute, you pay for… Well, there’s a whole range of options, but
    the most common option is, that you pay for instances in EC2. What happens sometimes with some of our older
    software is you end up paying for computing when, really, what you needed was storage. And that’s a huge savings
    opportunity because computing is typically a few orders of magnitude more expensive than storage. So that’s one
    big area where you found that. Because of the way our software was running on certain servers, we basically were
    incorrectly paying for the wrong type of resource.

  30. David Hessing

    And one thing that AWS lets you do is split that out, right? So if you can actually break out your storage and
    put it in the right medium… It depends on your software, but typically, that’s going to be S3 or EFS. Then,
    you can dramatically lower your costs, both on that storage directly, but more importantly, the gains are from
    when you can then reduce your compute. So, that was one big pattern that emerged pretty quickly.

  31. David Hessing

    Another one that came out is database consolidation, right? So when we buy all these products, they all have
    their own database. Some of them have many, many databases. We made an acquisition where the product had over
    2000 databases. And one key insight is that, if you can take all those databases, and they’re all running on
    some sort of container where it might be within RDS, it might be self-hosted, but you can take all of those and
    consolidate them onto a huge server, you actually save money. Because when each one is running on its own in its
    own dedicated compute and storage, typically, you need to leave something around 70% headroom to account for
    periodic spikes on each of those individual databases.

  32. David Hessing

    But when you consolidate them onto a huge server… Big server, many cores, tons of memory… You only really
    need to leave about 20% headroom. Because each individual spike that comes on each of those little databases,
    first of all, on that huge server doesn’t need that 70%. It only needs a huge server relative that 20% is
    enough. And also, those spikes typically don’t happen at the same time. This is something… The insight comes
    from the Erlang distribution. And because those spikes don’t happen at the same time, you need less space, and
    you can run all of those databases for dramatically less. We’ve seen savings of around 65 to 70% from doing that
    sort of consolidation.

  33. Rahul Subramaniam

    Yeah, that’s actually quite incredible. What we found was that… It’s remarkable that Enterprise software
    products, on average, every time we’ve looked at a data center or brought stuff into AWS, historically, the
    average utilization of resources in Enterprise software has been less than 2%, which is shocking. They plan so
    much for the odd spike that they might see that the waste that you see over there is quite remarkable. And like
    David was talking about, the airline model. This is basically the same mechanism that, let’s say, your internet
    ISP uses to figure out or call centers used to figure out, what capacity they have and how many subscribers they
    can support. So for example, if there are a thousand people who have one Gbps connection for your internet, you
    don’t really need a one terabyte pipe to support that, because every ISP knows that not everybody is going to be
    using the internet at one Gbps all at the same time.

  34. Rahul Subramaniam

    So there is a mathematical model, called the underlying model, that allows you to figure out how many customers
    you can have for a certain pipe. And the same model really can be applied to resource planning, when you take a
    large number of very small spike-y workloads, and then consolidate them on some very, very large instances. At
    ESW, we’ve actually got some Aurora databases. Well, Aurora is literally a ground-up built database that’s built
    for the Cloud. From scratch. And we have packed in over 2000 schemas on a single cluster. And the beauty of that
    is, we would’ve had 2000 different servers, with all kinds of crazy sporadic workloads, with very little
    utilization overall, or on average. And we’ve been able to put them all and consolidate them on these super
    large clusters. It’s basically served us incredibly well, in terms of costs and performance at the same time.
    Because each one of these little workloads, when they achieve a spike, they get to use the full power of that
    super large machine that they would otherwise have not been able to use.

  35. Rahul Subramaniam

    Okay. One of the experiments that we tried early on, that… We tried to take X1e.32xl machines. Now, these are
    incredibly large AWS machines. They have 128 cores, and they have four terabytes of ram. That’s how large these
    are. And we took a whole lot of Docker containers that we started packing into these super large machines. Now,
    of course, we ran into one very technical problem on Docker itself, where the Kubernetes, in fact, will not
    allow you to have more than 110 containers per node. But barring that, it actually brought about so much cost
    savings for us, as we started packing in these workloads on single, super large machines.

  36. Rahul Subramaniam

    We started seeing this pattern across the board. When you have a certain kind of resources, and you have tens,
    if not thousands of these, if you can figure out a way to make a super large machine, super large instance, and
    start consolidating them, it brings huge cost savings. And if that service is managed by AWS, it makes even more
    sense, because they take care of all the risk. They take care of all the management headaches. And you can
    literally push each one of these to their very brink, and make sure that you are leveraging the resources to
    their maximum.

  37. David Hessing

    Well, I will just add on to that, and say, this is this DB consolidation is an interesting example. It’s right
    on the border between the resource level of cost savings and a product level of change, right? Because
    typically, a DB is going to be pretty well-defined from your product, but there is going to be code you’re going
    to need to change. So it’s sort of the lightest level of where you’re going to have to go into your product,
    probably change your DB configuration, depending on your product.. Hopefully, it won’t be too hard if your
    product’s written if your code’s well-organized. But it’s kind of right at the level… Of that level of
    division between those two cost areas. One last area, just, I think, worth mentioning, is, by the example of the
    harder type of savings, is when you go all-in on an AWS service, even more right?

  38. David Hessing

    So one area that we have seen is where older products will have their own custom analytics reporting areas, will
    have an area where they crunch the numbers. They may have written it entirely themselves, or maybe they are
    leveraging an older type of analytic technology. And with some AWS services, like Glue, and Athena, backed by
    S3, re-doing those analytics and that reporting on those technologies leads to huge, huge cost savings. Order of
    magnitude cost savings.

  39. David Hessing

    I mean, running Athena and S3 is pennies, typically. For storing the data in S3, and then you’re charged per
    query in Athena, and you can even get the UI with QuickSight. So that’s a very common stack we go to. But this
    kind of savings requires a lot of coding, right? So we did do this on one product, and we had a team of coders
    working on it for about a quarter, and they went through and just rewrote every single report that was
    previously running on Hadoop, with a custom UI, and converted it over. And we saved a whole bunch of money
    there.

  40. Dionn Schaffner

    And so, some of these larger cost optimization projects, like you just mentioned, like… We’re consolidating
    databases, we’re rewriting how our product is accessing this information. What types of organizations, or maybe
    it’s the size of the organizations… Does it make sense to start looking into these types of initiatives?
    What’s a profile of an organization that should put this type of cost optimization at the top of their strategic
    list.

  41. David Hessing

    Everybody should be exercising good cost hygiene, and be looking to save money. I recognize that not everybody
    might want to make the investments to do the harder types of cost savings. Certainly, the easy stuff is
    definitely worth doing. And the resource-type of optimizations are worth doing. In stuff that’s not used at all,
    or barely used. And these are just easy. You should definitely go after these, and find these. I mean, whether
    it’s your developers that [inaudible 00:25:48] up some test machines, and then forgot about them or data that’s
    just accruing because of the way your SAS products was written, and you never bothered to worry about deleting
    data. I mean, there’s just tons of easy stuff to go and get.

  42. David Hessing

    Actually, one thing we’re doing internally, which is on the roadmap for CloudFix’s little ways out, is cost
    anomaly detection. So we’re building on an AWS service that is backed by machine learning. And AWS has a very
    nice service that… You can turn on your [inaudible 00:26:19] accounts, and it will alert you if it detects a
    cost anomaly. The problem is, you got to do something. Right? The problem is, you got to check it. So we built
    on top of that service. And now, we are running this internally for a while, now. Where, if AWS fires off an
    anomaly, if it’s above a certain threshold, we’re using something along with the order of $50 a day, but it’s
    completely configurable, we will create a ticket, and somebody has to go investigate that and figure out what’s
    going on.

  43. David Hessing

    I mean, it’s hard to count these savings because they’re preventative, right? They never actually were on your
    bill. So it’s a few assumptions about how long that cost anomaly would have run and some other types of
    decisions about how you want to count this money. But we have saved, already, millions just by doing this cost
    anomaly detection. So back to your point, some of this stuff is so easy to go and get, and some of its just good
    practice, that really, everybody should be doing it.

  44. Dionn Schaffner

    So how are you organizing your team? So, say I’m listening in on this conversation, and I’m like, “You know
    what? We are not doing this. We need a team like David’s team, that’s going to focus on this, find those
    anomalies.” And to your point, actually do something about it. What would be a best practice for resources, the
    things that you said, attack the preventative opportunities? What does that team look like?

  45. David Hessing

    That’s a great question. So I can tell you what it looks like for us. It’s not a big team. It’s a team of about
    five. This is all after CloudFix. This is assuming CloudFix already ran, and got you your five and 10%. So after
    that, what we do is, we will go and look, account-wise, and look for certain types of resources across different
    accounts. And we’ve got that down to a bit of a science. The most common types of resources that we see cost
    spikes for, or just waste. Not just only spikes, just waste to go and get, and clean up.

  46. David Hessing

    We have sort of a standard work unit for going and checking, say, Elasticsearch/OpenSearch, right? We see where
    people will go and configure a cluster. When it’s for dev, they’ll have a master node in there and you just
    don’t need it, and you can just get rid of it. Bring the cluster down to one, or especially for dev or QA, the
    same… It works for you just fine, for what you’re using it for. And you just cut costs by 30% or more. Right?
    So, that’s just one example.

  47. David Hessing

    So we have a set of these work units that are oriented around different common resources, that have either waste
    or poor usage patterns, and the team will go and run those. And they’ll go and look for those sorts of
    resources, just on an ongoing basis. And then, the other thing we do is, we go and we find things at the product
    level that become the code level cost reduction opportunities. But that typically requires almost an anatomy of
    a product, right? So we’ll go look at a product, and we’ll say, “Hey, this product is spending $3 million
    annually,” right? So we will go and do the anatomy of that product and say… Just to understand where that
    spend is, how it’s broken down, and where those opportunities are for cutting that within the product.

  48. Dionn Schaffner

    What are your success metrics for this team? How are they evaluated?

  49. David Hessing

    In cost, it’s easy. Right? Because cost, it’s dollars. Right? It’s the easiest metric… We touched on the
    one-year ROI that we shoot for. That’s not just on the team’s activities itself. That includes the engineering
    time. So success right now is, basically, that we identify cost savings where… The cost of identifying the
    cost savings, and the cost of getting and executing those cost savings is less than the cost savings. That’s our
    success metric.

  50. Dionn Schaffner

    So ideally, this group is going to work themselves out of a job, if they do it right?

  51. David Hessing

    Absolutely. We are starting to see a plateau, right? The low-hanging fruit, or the easy stuff that we went and
    got over the last year. But I will say, this has gone a lot longer than I expected. When I started this over a
    year ago, I thought, “Hey, maybe we’ll go cut some waste for a quarter,” right? Or maybe two quarters. And we’re
    still going strong. We have a backlog of work that extends into Q3. ESW has a very large portfolio of products,
    but a lot of those products are maybe a bit smaller. They’re each individual product… Annual spend is less. So
    in that long tail of products, it may not be cost-effective to get the cost savings, right? You might not hit
    that on your ROI.

  52. David Hessing

    So we’re doing two things. We’re looking for ways to be more efficient ourselves, right? We’re looking for ways
    to identify cost savings that cost less, streamline our own work, and make it more efficient. The other thing is
    that we’re turning this into a product because we think that this is something that is very valuable for
    external companies as well. So if we succeed in that, then we won’t be out of a job.

  53. Rahul Subramaniam

    I think David comes in with a relatively shorter-term stint in the cost optimization world at ESW. From my 14,
    15-year experience working with AWS on this front, this is never-ending. Cost savings are just… As long as
    you’re using a database, you’re always going to have scope for optimizing costs. It’s primarily because there
    are three dimensions in which things are changing.

  54. Rahul Subramaniam

    The first one is that AWS is constantly creating new services. And if you are wise in leveraging these services,
    you’re going to be using all of the new stuff that AWS is coming up with. And with every new service, there is
    almost a life cycle, where… You start using the service, you start seeing the benefits of it. It seems very
    cheap, initially. And then, suddenly, as you spend more of it, your cost suddenly spikes, and then you start
    looking into why the cost is spiked. Are you following all the best practices? You start optimizing the costs
    over a period of time. It’s a natural life cycle of every new service that AWS has ever created. Right? So
    that’s dimension number one.

  55. Rahul Subramaniam

    Dimension number two is your dev and staging environments… Your developers are constantly adding new
    resources. None of those resources are static. They’re constantly launching new services. They’re constantly
    doing something new, which is adding to new resources that you have to constantly look at. You have to be on top
    of things. You have to make sure you have all the policies in place so that you can control costs when they’re
    launching these new resources, and so on.

  56. Rahul Subramaniam

    And then the third dimension is that almost all the cost optimization stuff is trying to keep pace with
    everything that AWS is coming up with. You almost feel like you’re at least six months or a year behind all the
    new stuff that AWS is working on. Your cost optimization team is catching up with stuff. There’s new stuff
    that’s being discovered. There’s new stuff being implemented. And there’s going to be new runbooks you have to
    create. So that your teams and your engineering org can start adopting them, and they know how to deal with it,
    know how to safely execute those changes. So, given all these dimensions or changes, I think there’s always
    going to be room for cost optimization. As long as you’re spending money on AWS, you’re always going to need a
    mechanism to save money, because there is always going to be waste.

  57. David Hessing

    Right? And plus, as ESW, we’re always making new acquisitions as well.

  58. Rahul Subramaniam

    That’s very true.

  59. Dionn Schaffner

    And that seems in line, also, with the process of going through the internal pains and gains of figuring it out,
    and then productizing that and turning it into something that our customers can use, because obviously, ESW is
    not the only one running into these same problems. Right?

  60. Rahul Subramaniam

    Exactly.

  61. Dionn Schaffner

    Awesome. All right. To wrap up, David, tell us the top two tips our listeners should know about cost
    optimization.

  62. David Hessing

    The easiest cost optimization is to find stuff you’re not using. We just finished a spec… Again, this is an
    internal spec. And I think, something like 60% of the savings it identified was just stuff that people left
    running, because the advantage of the Cloud is it’s… Click a button and you turn something on. There’s no
    friction, which is great. Number one. Tip number two, I think, is what Rahul said. Which is just, always be
    looking for what AWS is rolling out, right? Because there are also easy savings to get there. Those two tips,
    that’s just easy stuff. Get you a ton of savings to start with.

  63. Dionn Schaffner

    And Rahul, what do you think the top two ways to reduce the fear of the risk in an organization would be,
    forgoing and doing some of these more challenging cost optimization initiatives?

  64. Rahul Subramaniam

    Yeah. So, to that, my tip number one would be… Start leveraging Change Manager, which is AWS’s best practice
    for how you should start implementing the changes. And then, I’ll address one other aspect. David already
    covered all the simple stuff, the easy ways to save money. But for folks who are attempting some of the hardest
    stuff, which is the code level changes, or product level changes, architecture. There’s a simple framework that
    you can use to mitigate some of the risks. Number one is, that the way you should look at your product is if you
    can find seams within your product that are easily replaceable by standard out-of-the-box AWS services, then go
    ahead and perform that one change.

  65. Rahul Subramaniam

    So for example, if you have your product doing full tech search, and that’s very nicely operated as a service,
    take that and replace it with something like Elasticsearch, OpenSearch, or Kendra, that serves your purpose. But
    that’s only if you have a very clear seam. Or if analytics is a completely separate subsystem, you can replace
    that out, and change it with S3 Athena, QuickSight, or Redshift. Athena, QuickSight… Whatever your latency
    requirements are on your queries, pick one. But as long as you can do that, that’s a great way to save money.

  66. Rahul Subramaniam

    If you cannot find seams within your product, I would say you might be better off rewriting your core value
    proposition completely from scratch with the AWS native services, because trying to get 100% feature parity for
    an old monolithic code base that has a lot of craft that has developed on it over the years is incredibly hard.
    You’re better off taking a step back, identifying the core value proposition of that particular monolith, or the
    module, and then saying, “What would it mean to implement that value proposition with native AWS API?” We
    usually find, that if you really did that exercise, you wouldn’t need more than 10,000 lines of code to
    implement most of these kinds of solutions. So I would say, don’t try to completely rewrite a monolith, because
    you would end up wasting a lot of time and effort chasing up savings that are probably not there.

  67. Dionn Schaffner

    The build-versus-buy challenge.

  68. Rahul Subramaniam

    Exactly.

  69. Dionn Schaffner

    Awesome. Well, guys, this has been great. Thank you, David, for coming in and giving us a little more insight
    into some of the other areas of cost optimization. We’ve talked high level, we’ve talked down to database
    consolidation level. I think we hit all the angles, so I appreciate you coming out, David. Thanks for being
    here.

  70. David Hessing

    Absolutely. It’s been a pleasure.

  71. Dionn Schaffner

    Thanks, everyone, for listening today. If you enjoyed our podcast, please be sure to rate, review and subscribe.
    See you next time on AWS Insiders.

Leave a Reply

Your email address will not be published. Required fields are marked *