AWS Made Easy

AWS Insiders Podcast: Episode 5 – AWS Cost Optimization 101

With Badri Varadarajan, EVP of product at CloudFix

On this episode, Badri Varadarajan, EVP of product at CloudFix, and Rahul Subramaniam CEO (AWS superfan and CTO of ESW Capital) of CloudFix and AWS super fan, dive deep into AWS Cost Optimization best practices for how to apply cost optimization principles when designing, configuring, and maintaining workloads in AWS Cloud environments.

Listen now on

Speakers

Portrait of Rahul Subramaniam

Rahul Subramaniam

AWS superfan and CTO of ESW Capital

Rahul is currently the CEO of CloudFix and DevGraph and serves as the Head of Innovation at ESW Capital. Over the course of his career, Rahul has acquired and transformed 140+ software products in the last 13 years.
Portrait of Badri Varadarajan

Badri Varadarajan

EVP of product at CloudFix

After a few years of a Corporate R&D role in the Silicon Valley, Badri created his own startup using cameras and AI to deliver insights on customer interactions with retail displays. In 2019, he discovered Trilogy and the rest is history. He’s worked on a CRM platform that short-circuits 10 years of salesforce innovations and gets straight to the heart of what’s possible today with the latest AWS technologies.

Transcript

  1. Dionn Schaffner

    Welcome to the podcast. So excited to have Rahul here today, as well as Badri. Badri, how are you doing today?

  2. Badri Varadarajan

    Very well. Sunny in California.

  3. Dionn Schaffner

    Awesome. Fabulous. I just have to ask, what made you decide to focus your entire career on AWS? I mean, how do
    you get involved in this. Rahul, tell us how it started. I mean, you could have been a rocket scientist, you
    could have been in some deep dark lab somewhere. Why AWS?

  4. Rahul Subramaniam

    You’re close. I mean, I did major in physics in my undergrad, so astrophysics would probably have been it, but
    before… Very early on in 2007-2008, I was grappling with a whole lot of infrastructure issues, and that’s when I
    discovered AWS. And the fact that infrastructure was being made available literally with a simple API call, just
    blew my mind. And the more I dived into it, the more I used it, and the more I interacted with AWS. I got into a
    position where I was breaking almost every AWS service as they were releasing it, which had them call me pretty
    much every week about something that I broke, and just interacting with the amazingly smart folks over there
    over the years just got me hooked. I was then part of every new service they created. I was involved in trying
    everything out that they came up with. And they just became an integral part of how we did business. So I think
    our entire business got very deeply integrated with AWS, as well.

  5. Dionn Schaffner

    So you were the troublemaker is what you’re saying. You were the one causing all the trouble on the outside, so
    they’re like, “You know what, we got to get this guy in closer. So, let’s bring him into the fold a little bit.”

  6. Rahul Subramaniam

    I think I’d like to phrase it as me being that advanced tester and we collaboratively made the services good to
    the benefit of both parties, so it was a win-win.

  7. Dionn Schaffner

    That’s great. Badri, how about you? How did you end up investing in AWS and cost optimization as what you are
    living and breathing all day long?

  8. Badri Varadarajan

    I mostly blame Rahul for that. AWS was a bit of an acquired test for me. I spent the initial part of my career
    working on the network edge, building infrastructure and algorithms that got deployed and run where you are, be
    it for connectivity or computer vision analysis and so on. And so, I figured, “If you can’t beat them, join
    them.”

  9. Dionn Schaffner

    Okay. But let’s talk a little bit more about cost optimization. There was a time before cost optimization, there
    was a dark time, we don’t speak of it much, there was lots of work. Rahul, tell us what managing 45,000 plus
    accounts looked like without some way to automate cost and performance. Take us through the dark times.

  10. Rahul Subramaniam

    Cost optimization, for me, with AWS, actually started way before we had 45,000 accounts. It started when we had
    one account, and I’m talking about early 2007. At that time, shockingly, AWS didn’t even have IAM that everyone
    is familiar with, which allows you to manage all your users, access control, and stuff around this. Back then
    you had one account. So, one username, one password, and your entire organization would have to be given that
    username and password to operate on it.

  11. Dionn Schaffner

    Yikes.

  12. Rahul Subramaniam

    Okay? And can you imagine that world back in 2007?

  13. Dionn Schaffner

    Ugh.

  14. Rahul Subramaniam

    And we had over a thousand engineers that we wanted to enable on AWS. So we had this big nightmare of a scenario
    where we had to give away our master username and password, which had all of our credit… Back then it was all
    credit cards only. So, it had our corporate credit card punched in over there, and 1000 people could do whatever
    they wanted on one account. There was no tagging, there was nothing. It was basically… The setup was primed for
    absolute chaos.

  15. Rahul Subramaniam

    And so, in 2007, I wrote our first system, which acted like IAM. And it did two things. One, it allowed users to
    log into a portal where they could request whatever resources they wanted. And it acted as a proxy to our one
    single AWS account. But the second thing that we found very soon was, that people were just launching instances
    willy-nilly and never turning them off. And our credit card ran out of limits so fast that we would have to
    refund the card multiple times in a month, which is just absolutely crazy.

  16. Dionn Schaffner

    Wow.

  17. Rahul Subramaniam

    So, one of the first cost optimization measures that we put in place was, we asked every person to put in their
    eight-hour shift into the portal and we would automatically turn off or hibernate those machines that they had
    launched when it was not their peak working hours.

  18. Dionn Schaffner

    Uh-huh.

  19. Rahul Subramaniam

    If they wanted, they could go back and turn it on if they were logging in at some odd hours. But just that cut
    our AWS costs by 66% because you’re only using it for 8 hours out of the 24 hours, right?

  20. Dionn Schaffner

    66%?

  21. Rahul Subramaniam

    Yeah.

  22. Dionn Schaffner

    Wow.

  23. Rahul Subramaniam

    That was the first cost optimization system I wrote back in 2007 and it’s been a constant journey ever since. If
    you’re spending money on a system, there’s always room to optimize those costs.

  24. Dionn Schaffner

    I like that because as we look at the cost optimization, we talk about that, it’s like, is it just for big
    business? Is it something that even a small startup should be concerned with? But, even if it’s just one
    account, that’s a great piece of knowledge to understand that across the spectrum, there are opportunities for
    everybody to reach some benefits from cost optimization. What do you think is the top thing customers, and
    clients should know about that time in the dark ages? What do you wish everyone knew that you suffered through
    the hard way but you know now?

  25. Rahul Subramaniam

    I think the thing that we learned very quickly was the foundations of why we made the big bet on AWS. And that
    is that they are innovating at a pace that is just so remarkable. Everything that we thought of as a gap was
    very quickly closed in a matter of a year or two. So if you were ever making a long-term bet, you wanted to bet
    on AWS, or you would bet on AWS because they have a track record of constantly working on customer problems and
    turning all these standard utility functions into amazing services that are just commodity services you can use
    with simple API, and whatever gaps are there, you can be assured that if you bring it to their notice, it gets
    solved. So with that, it becomes a no-brainer to make the long-term bet on AWS. And I think people are still
    doubtful today, but having lived through this over the last 14, 15 years, I have the experience to feel very
    confident about that long-term bet.

  26. Dionn Schaffner

    And maybe Badri, you can help us answer this question. Back to cost optimization, some businesses just aren’t
    paying attention to it yet, right? They say, “It’s not a priority. We’ve got other strategic initiatives going
    on.” What do you say back to those folks, and how do you get them to recognize the importance of cost
    optimization?

  27. Badri Varadarajan

    In a way, I’m kind of sympathetic to that idea that cost optimization isn’t a problem till it is, unfortunately.
    I think there is this great article about Dropbox, I think, where as long as the revenue was growing and the
    market was rewarding growth. In valuations, they didn’t worry a jot about cost optimizations, but arguably
    rightly so. But then the problem is once that curve flattens, you don’t want to go into panic and sort of
    getting into whiplash and say, “Cost isn’t the problem,” and suddenly, the next quarter, it is the number one
    thing. “We don’t care. We shut down all our innovation projects.” I think you want to sort of a wholesome way to
    think about cost optimization. If you keep doing cost optimization, as a matter of course, first, it’s good
    hygiene. Second, you build up your cost optimization muscles organizationally. And when it becomes a real
    problem, then you can sort of hit the ground running and take proportional measures as opposed to just going
    from not worrying about it at all to it being the only thing you worry about.

  28. Dionn Schaffner

    I like that. “Building the cost optimization muscles within the organization,” love that. So when the time
    comes, you can flex big, you’re ready. And to follow up with that, one of the other things we hear then though
    is, people are like, “Hey, yeah, we are working on cost optimization. We have an internal team. We’ve got some
    internal tools.” How do you balance that challenge?

  29. Badri Varadarajan

    One way to approach that is to ensure that you’re not just doing cost optimization by listing a bunch of tasks.
    You want to sort of go towards a goal from doing this overtime is just the proof that such a thing is possible,
    right? I mean, you want to do the four-minute mile. You want to understand that costs can be reduced
    organically. And our framework to think of it is, if you’re just starting with your cost optimization journey,
    you can get to 50 to 60% cost reduction. Now and then, folks like Rahul work some magic and get 66% by doing one
    thing, but that’s the exception, not the rule. I mean, you sort of, you will get to 10% by doing something
    simple. And then the next 20% ends up being a little bit more complicated. And then the last 30% involves a
    bunch of sprints, which go deep, but it’s healthy for you to know, starting, that it is possible. It’s not a
    fool’s errand. You will get there if you do it systematically and choose your projects well.

  30. Dionn Schaffner

    Well, how do you know when you’re doing it wrong? How do you know if you’re not choosing your projects well?
    What does that look like?

  31. Badri Varadarajan

    What it looks like is quarter after quarter of potential savings. I mean, it’s very easy for you to either hire
    a vendor or do it yourself and get an impressive-looking report that says your cost can be lowered by 60%. The
    problem is that that’s not realizable. All it does is, every quarter, you feel bad about what you did not
    achieve last year.

  32. Dionn Schaffner

    There’s the money left on the table. Dang.

  33. Badri Varadarajan

    That’s right. I’m now reading this book called Switch, about organizational change. You don’t want to just paint
    a big picture and not take the first step. It’s healthy for you to sort of feed your reptilian brain by booking
    small victories. If you have a grand plan, never do anything, you’re probably doing it wrong. You should be able
    to do organic incremental improvements.

  34. Dionn Schaffner

    Rahul, I feel like you probably have some war stories about this.

  35. Rahul Subramaniam

    Yeah. Early on when AWS had just started, I think it, as Badri said, it was possible to make a few changes that
    would get big returns because there were a lot of gaps in the services, the infrastructure, the API, and stuff
    like that. But over the last 15 years, AWS has built so much maturity around how they build up their services
    that getting those big savings by doing one thing is just incredibly hard. Early in the days of our cost
    optimization efforts, we ran into a bunch of scenarios, for example, migrating databases or moving over
    applications to serverless. So, a large number of applications that we acquired were primarily on-premise
    monolithic applications. And we tried to switch them all over to services like Lambda when Lambda first came
    around.

  36. Rahul Subramaniam

    Lambda is not designed to deal with monolithic applications like the ones that we had. Right? And that meant
    that we were embarking on this major surgery on our applications, trying to replicate the same function in a
    microservices pattern. And suddenly, we had so much chaos that we just didn’t know how to manage it. And we had
    several failed exercises like that where either, over some time, we completed maybe 20 or 30% of the
    functionality as we moved over to microservices-based on Lambda. Or it was just a non-starter because certain
    things were being done in a certain way that customers were comfortable with, and you would have to change the
    entire mechanics of it as it moved to the serverless world. Right? So, we just couldn’t make all those
    dimensions meet. So, we realized the hard way that those big bang approaches were very few and far between as
    the services matured over some time.

  37. Dionn Schaffner

    Well, and so you tried a ton of cost optimization tools before deciding to build your own, right? So what were
    they missing you thought, “I can do this better. Let me just sit down and put some stuff together.” What were
    they missing and why did you think you could do it better?

  38. Rahul Subramaniam

    Yeah. First and foremost, I already have the big burden of being responsible for almost two and a half billion
    lines of code that we own across all of the companies in our portfolio. And I had no interest in building a new
    product or building and owning a new codebase. That was not the intention at all. Our default is to go and look
    for every tool out there in the market that could potentially help us solve this problem. So, we did just that.
    We tried out all the tools, but very soon we realized that there were three fundamental problems. Problem number
    one was that most of these tools ended up being visualization tools we’re closing that gap on some of the
    visualization problems that AWS had. By the way, all of those are gone. Right now, if you look at AWS tools,
    they pretty much give you all the data you want, but most of the cost optimization tools that you find in the
    market today are still glorified visualization tools that take all the data that’s in AWS and present it to you
    in fancy graphs. Okay?

  39. Rahul Subramaniam

    The bottom line is it became our problem to go figure out how to realize those savings because you have to slice
    and dice all that data and figure out what the insight is, and then go figure out how to realize the savings.

  40. Rahul Subramaniam

    The second problem with these tools was that none of them fixed anything. Even if they provided some insights or
    some recommendations, they didn’t fix anything for us. More often than not these recommendations like, “Hey,
    resize your EC2 instances,” or “Why don’t you move to a completely different serverless platform, because the
    per-unit cost there is completely different.” All of those recommendations, while great sounding and the
    recommended savings or potential savings was 50, 60%, just realizing that was incredibly hard because you needed
    to perform major application surgery to achieve even remotely close to those kinds of savings. And we just
    couldn’t get our teams to sign off on or be successful at those major surgeries that they had to perform.

  41. Rahul Subramaniam

    And the third issue with a lot of these tools was that they were just insanely complicated. Just navigating a
    lot of these tools required almost like a Ph.D. in AWS services where you wouldn’t even know… They just kept
    slapping on stuff over the basic visualization that they started with, and you wouldn’t even know where to go
    look for insights even if they provided some insights. And a lot of these tools just got so overly complicated,
    requiring admin permissions to do anything, that it just became a no-go for a large proportion. So the admin
    permissions, and complex UI, became a big issue, as well. And because all of those hurdles were things that made
    a lot of these tools no-gos for us, we had to invest in figuring out a simpler way to realize savings, not just
    talk about potential savings.

  42. Dionn Schaffner

    This is where the rubber hits the road. It’s great in theory, but how do we instantiate and get these results?

  43. Rahul Subramaniam

    By the way, there was a year that we spent trying out all these tools and putting together a SWAT team trying to
    get all the savings. We spent a few million trying to realize these big savings but saved nothing.

  44. Dionn Schaffner

    Well, that’s part of the journey, right? And that led you to CloudFix. Okay, you get one minute to talk about
    CloudFix specifically. Well, you both do. So, Rahul, you go first, and then Badri, you’re going to follow up
    with your comments on CloudFix. Ready? Go

  45. Rahul Subramaniam

    Very simply, what we did was we looked at all the AWS recommendations that they had made around cost. We
    filtered them down to the ones that we believed were completely non-disruptive and that we could execute
    centrally. And that’s literally what we did. And then, of course, Change Manager came at just about the right
    time, where we were able to use Change Manager as a mechanism to deploy those fixes without needing admin
    credentials. So, in effect, we were fixing the problem instead of just talking about potential. And we did not
    require admin credentials. We followed all of AWS’s best practices and recommendations to realize those savings.
    And that was basically what closed all the gaps for us that we had with the other tools.

  46. Dionn Schaffner

    Badri?

  47. Badri Varadarajan

    I have nothing to add now. AWS [inaudible 00:19:33]. CloudFix is supposed to be the “fix it, don’t talk about
    it,” too. So, that’s it. We’re done.

  48. Rahul Subramaniam

    Yeah. I mean, it’s supposed to be the simplest tool out there. With five clicks, you save 10 to 20%. It’s
    supposed to be simple.

  49. Dionn Schaffner

    I like it. And so let’s talk about the 5 to 20%. Let’s talk about the money. You mentioned that you all spent a
    million dollars trying to get to this product. How much are you saving now by having this tool in your arsenal?
    Maybe you can use percentages if you’re not going to talk real money.

  50. Rahul Subramaniam

    Yeah. Our savings are a combination of what you see in CloudFix today and stuff that will be coming up in
    CloudFix because we run all of these finders and fixes on our setup first before running it or deploying it for
    other customers. And we run it for quite a while. We measure ourselves every week, “How much do we save on an
    annualized basis? What amount of spend do we claw back every week?” That’s our metric, and we measure it in
    dollar terms, in concrete dollar terms.

  51. Rahul Subramaniam

    And today, for our spend, which across AWS customers isn’t very large. We still manage to claw back about a
    quarter of a million dollars a week, which is pretty significant, and this is, again, on an annualized basis.
    But I think, on an average, for customers, you could find that whatever be your AWS spend, when we try to buy a
    company, or when we are evaluating acquisitions, it’s a simple assumption that we make, which is, “Long term, we
    can save 50% via this incremental mechanism, but short term, 10 to 20%…” is a given. It is something that we can
    absolutely go on.

  52. Dionn Schaffner

    That’s great. And if you think about it in terms of the cost optimization spectrum, CloudFix is on the easy
    side. Press five steps, and you’re going to get this kind of return. And then who you’ve spoken before of, the
    really harder problems that take up more of an investment of the organization to sort of go and dig in. Maybe
    Badri, can you tell us maybe, how you balance the cost of your internal resources and the risks against your
    expected ROI of getting these cost optimization results in-house? Is there, like, a magic number? Where do you
    find the tipping point for the business to decide, “Hey, yes, we are going to make this investment with our
    time, with our people, with our resources, and to really dig into this particular cost optimization problem”?
    How do we know when we should jump or when we should just wait?

  53. Badri Varadarajan

    Yeah. That’s a good question. I’d say I’d slice it into different tiers. There are about 10 to 20% off savings.
    That’s the realm that CloudFix operates in, where your investment should really only be on the tool, not on the
    people managing the tool. The tool should just basically do it. Then the next 30% you get by investing in people
    as well. Essentially, those are the sorts of savings for which you need engineering teams to get involved. You
    want to ensure that your cost optimization does not affect the functionality of the product. So I’d almost
    operate that as an engineering project itself and look at the ROI in those terms, so you do need to account for
    what you’re spending in terms of manpower there, how many hours are people spending, and what features are they
    not shipping because they’re doing that.

  54. Badri Varadarajan

    So I divide the ROI into those two different tiers. And I think one of the reasons CloudFix exists is this
    realization that you can actually get 10 to 20% without involving people and investing brainpower in it. It’s
    just a tool that just does its thing, and you don’t need to do anything beyond click a few buttons and count the
    cash.

  55. Rahul Subramaniam

    I’ll add one more dimension to this. The non-disruptiveness of what you’re going after is the second dimension
    in this. So I absolutely agree with Badri that the tool versus people is one dimension, but also look at the
    non-disruptiveness of the changes that you’re trying to make. And that also, again, it’s like a quadrant. So
    there are a bunch of non-disruptive changes that do require people’s assets. So for example, there’s a bunch of
    financial engineering or just process-related stuff that can save you a bunch of costs. For example, if you’re
    migrating a bunch of workloads from on-prem to AWS, you should absolutely have someone take lead and sign up for
    the Migration Acceleration Program, where AWS covers a bunch of your costs while you’re migrating all your stuff
    so that you’re not paying double during the migrations. That’s a great way to save a ton of money. Most people
    don’t even know about the Migration Acceleration Program, but it’s the easiest thing to get signed up for. All
    you need to do is tag your resources in a certain way, and you start getting credit from AWS for all of those
    workloads.

  56. Rahul Subramaniam

    Another example is, if you have certain services where your consumption of those services is just insanely
    expensive, you absolutely should have somebody go talk to the product team and see if you can work out a
    discount or a volume discount with that particular product team because AWS does that very often. If they find a
    customer using a particular service far more than anyone else, they will make concessions, and you can always go
    negotiate that.

  57. Rahul Subramaniam

    The third one is, leveraging things like savings plans, CRIs, and the reserved instances, that criteria. You
    need someone that you can dedicate to who can understand all of this. It’s just a bunch of financial engineering
    where it’s a trade-off between commits and discounts that you get. And you can do that and get up to a 50%
    discount on your spending, depending on how much of a trade-off you’re willing to make in terms of commitment
    versus the discount you get.

  58. Rahul Subramaniam

    And lastly, there is the EDP. And again, I don’t recommend anyone sign the EDP. I treat it almost like a
    handcuff around your hands. I mean, it’s not something I recommend, but that is an instrument of last resort, as
    well, to get a certain amount of discount if you’re willing to make commitments or if you’re really, really,
    really sure of what your spend over the next three to five years is going to look like.

  59. Rahul Subramaniam

    Now, those are all the things that you could do by investing in certain people and getting those cost benefits.
    And they measure near zero on the disruption side of the equation.

  60. Dionn Schaffner

    It really sounds like you need allies within your own organization sometimes to sort of come on board to
    participate in this cost-optimization journey. Engineers who are down in it every day, who are feeling the pain,
    how do they sell wanting to go do these cost-optimization projects, the organization, or sideways in the
    organization to get more folks like, “Hey, we’re going to need some talent resources. We’re going to need some
    business folks to really help us understand the business challenges of this,” how do you champion this
    throughout the organization?

  61. Badri Varadarajan

    Actually, that is one of the more important things here. A lot of this, there’s good know-how out there,
    particularly AWS publishes a bunch of these things. One problem is, that they make it complicated, which is why
    those like CloudFix are needed. But the other thing is organizational buy-in. If you look at strategies that we
    have seen that work, they switch between the small, furry mammal strategy and the apex predator strategy for
    survival. You first get some small wins and sort of earning your stripes, and then the organizational people
    take you seriously. It’s like, “Okay, hey, this project can save money. Let’s invest more here.” And then you go
    into those… I mean, you may not save a lot of money that way, but at least you’ve proven that such a thing is
    possible. And then you can switch to the apex predator strategy, which is, “We need engineering teams to care
    about this.”

  62. Badri Varadarajan

    Even things like tagging that Rahul was talking about, you need folks to buy in and go tag those resources. The
    one thing I’d recommend is to first earn your organizational stripes and then fight those battles. That’s one
    thing.

  63. Badri Varadarajan

    The other thing is just the process itself. Change Manager really makes it simple because you don’t have to get
    buy-in which involves meetings, playbooks, and people reading off of the same playbook and following processes,
    and so on. That gets baked into the AWS console itself, which is a big thing. The AWS Change Manager team wrote
    a blog post on how we approach this. I was talking to the product manager the other day, and she told me that
    has been used by other organizations as well. So there are organizational playbooks here, as well, in terms of
    how you can structure cost optimization.

  64. Rahul Subramaniam

    Yeah, I think, again, going back to what Badri was saying earlier, the two dimensions of tool versus people, it
    is in my opinion, a 100x harder once you get into that people investment domain. So, you want to maximize and
    get all your wins from the tools and the automation before you get into asking for an investment of people,
    time, and resources, because expertise is scarce, and the resources are scarce. Whatever limited resources you
    have, you want to invest them in building features and building your applications, because they have the domain
    knowledge about your business. And that’s where you want to invest that skill and expertise.

  65. Rahul Subramaniam

    The second dimension, of course, also plays a role. Most people get scared when you start moving towards any
    sort of disruptive change because they don’t understand it. They fear what the impact is going to be. And
    they’re more likely to figure out ways to reject ideas of change than be participants in it. So, you have to
    earn their trust. You have to earn your stripes as Badri said, but try to get as many wins out of the tools
    first and exhaust all of those options before you start asking for people’s resources, building up knowledge
    amongst the people, and allaying their fears about what these changes might mean for them. And that would be my
    approach.

  66. Dionn Schaffner

    We do hear from our customers who are considering various cost optimization projects, including CloudFix, that
    security is really their struggle to participate, right? They’re like, “Wait, you want me to give you admin
    access to all of this?” How do you combat that particular hurdle?

  67. Badri Varadarajan

    Yeah, absolutely. Right. I mean, security is a big concern and that’s why you want to address it organically as
    part of every project itself. So, from a tool point of view, that was actually one of the key design
    considerations that went behind CloudFix. When you go into these people’s projects, again, there you want to
    ensure permissions, not only in security, not only in terms of who’s allowed to access it, but I think you also
    want to limit what people can do. There are two dimensions to security, like, which tool or which person is
    allowed to do something and be what they’re allowed to do. And there you want to ensure that you put in the
    right service control policies in place, you’re putting other rules in place. AWS will give you these tools, but
    they do believe in just giving you tools and letting you build them yourself.

  68. Rahul Subramaniam

    Yeah, absolutely. I mean, AWS has invested a lot in creating a well-architected framework that you can use to
    define your security parameters very well in the AWS framework so that you can be assured of exactly what a tool
    or people are allowed or not allowed to do. Unfortunately, for most organizations, security comes more as an
    afterthought. Like, they will first build the application, they will first do a bunch of stuff, and then they
    will think about, “Okay, now what should I do about security?” When I say “security,” I also include permissions
    and schemes that may not be secure in terms of somebody attacking your infrastructure or something like that.
    Security, for me, is also where you let somebody launch a bunch of instances without control or where you let
    people launch different kinds of resources that may or may not be auditable by the organization. So, best
    practice is setting up your service catalog, setting up policies, and things like that.

  69. Dionn Schaffner

    Let’s talk about the people. Whose lives are changed on a daily by implementing some of these cost-optimization
    initiatives? Like, how does their life change before a cost optimization to after?

  70. Badri Varadarajan

    I think the folks were happiest to run this in the CFO’s office, right? Because it’s good for them. But I think
    that’s part of the challenge of this, is to ensure that you’re making the CFO’s office happy without affecting
    the CTO’s office or the VP eng.’s job. And ideally, that’s what you want to do. You want to ensure that all your
    projects have well-known blast radii at the beginning that affect as few people in the engineering organization
    as possible, and wherever it does affect them, the effect is contained, that they know exactly what they need to
    do, or some automation, or tool, or UI messaging will tell them exactly what they need to change in their
    behavior. And hopefully it’s not [inaudible 00:33:15].

  71. Badri Varadarajan

    I’ll give you an example. You can put in a policy that says you cannot launch instances of a certain type
    because you want to protect yourself against Bitcoin mining attacks. That’s fine, but you need to have a clear
    way of communicating that and ensuring that that does not affect operations that happen as a matter of course.
    And good practice there would be to before you make a change, just do a dry run and see what all operations it
    could’ve affected in the last three months, and target messaging towards people who would be affected by those.

  72. Dionn Schaffner

    Tell us about a blast radius that has gotten away from you, and sort of been the worst cost-optimization
    project. And what do you think was the key contributor to something like that? Rahul, you probably have some
    good stories.

  73. Rahul Subramaniam

    There were times in the early days when we were just starting with our cost-optimization journey, we had a bunch
    of instances that were doing very little to the point that the resource utilization was near zero. And in one of
    the very early cost-optimization exercises that we did, we ended up shutting off hundreds of such machines,
    which ended up being fairly critical from an operations standpoint. Thankfully, we snapshotted all of these
    instances before we decided to shut them off, as a safety measure. So, though there was disruption, we were able
    to bring all of these instances back, but had we not put that measure in place, we would probably have suffered
    greatly. I think one of the other things that is historic, or it’s been an interesting insight for me at least,
    is, over a period of time, as organizations evolve and change, I find it really shocking how little the
    organization knows about all of their infrastructures.

  74. Rahul Subramaniam

    You have tens of thousands of machines and resources running all over the place. You would think that there’s a
    perfectly auditable manifest of exactly what machine runs what, and you know when you’re going to shut it off,
    or whatever. You’d be surprised as to how many instances are running because nobody knows what’s in there and
    they’re just petrified to shut it off, or EBS volumes that are running or that have been detached. They’re just
    literally sitting there, massive discs of terabytes of storage capacity provisioned on them, but when you ask
    someone to go delete it because nobody has touched it in a year, they’re like, “I don’t know what that has, so
    I’m just petrified to shut it off. And I don’t know anyone else who knows what that instance has.” Right? So
    you’d be surprised as to how prevalent that is across large organizations. Especially over a period of time,
    people really don’t have a sense of the inventory of infrastructure and resources that are running, and that’s a
    big cause of a lot of waste and cost, as well.

  75. Dionn Schaffner

    From the business side, it’s the “Infrastructure is just there. It’s running. We’re not going to mess with it.
    We’re not going to peel back the layers and see what’s under the covers. All we know is it’s working, don’t
    touch it,” right? But at some point, there is the ROI discussion of, “Okay, we need to tighten it up. We need to
    free up some resources and do some other things. Let’s scarily open the hood, see what kinds of things fly out,
    and let’s go and address that.” And oftentimes you don’t see organizations utilizing their IT infrastructure as
    a strategic advantage, right? It’s just supporting the business. And so you have to have these moments of
    everyone around the table, we all come together and look and say, “Hey, this is really important strategically.
    We really need to dig in and make this work for us, not just keep us treading water evenly. How can this help us
    advance where we’re going?”

  76. Dionn Schaffner

    All right. So what are the key factors to making a cost-optimization project be successful? Badri, you talked
    about identifying the blast radius. What other things are critical that not just IT but the whole business needs
    to make sure in place, as we launch a cost-optimization initiative?

  77. Badri Varadarajan

    Who needs to be informed, like, whose lives get impacted, as you said earlier. To Rahul’s point earlier, as
    well, that’s a key fact what Rahul mentioned is, that folks actually don’t know what’s running in their AWS
    infrastructures. And in fact, within the same team, different people don’t know what others are doing. I mean,
    we’ve had both things happen is, a manager signs off on something because they’re not aware of exactly what
    their DevOps engineers are doing. And you think this is something like a simple change, and it turned out to be
    a huge, big problem. To give you a concrete example, because you think you’re just restarting an instance, it’s
    going to come back within three minutes, but it’s running some startup script that nobody knows about.

  78. Badri Varadarajan

    And the DevOps engineer who knew about it is on vacation. And we never even asked him because, “Hey, this is
    just a restart. Why do you need to ask anybody?” On the flip side, projects have gotten blocked for months
    together because the manager thought it was a much bigger deal than it actually was. And when the engineers got
    involved, they were like, “Ah, this is just a tag. I’ll do it. I have automation for this. I already have a way
    to address all these machines.” So, figuring out who the stakeholders are for any given project is super useful.
    And that’s also why you want to cut it up by the project. You don’t want to have this one big goal of saving
    60%. You want to slice it into a particular application or a particular service that you want to optimize.

  79. Rahul Subramaniam

    And I’ll just add one more thing. I think having automation and tooling in place is really key. When you do
    manual one-off stuff, you are more likely to make mistakes and regret them. So, whatever project you take on,
    make sure that it is automation-driven because that’s how you ensure that what you’re doing is repeatable, not
    just for one resource but across your entire setup, and you’re not going to make mistakes, because it’s all
    enshrined in code. So, automation is, I think, key. No matter what kind of project you’re delivering, try not to
    have manual steps in the process.

  80. Dionn Schaffner

    Because people are always causing problems.

  81. Rahul Subramaniam

    I mean, in reality, if you did first principles thinking and said, “What is the root cause of all bugs?”
    Somebody wrote some line of code that caused a bug.

  82. Dionn Schaffner

    Mm-hmm. Let’s talk about the talent that we have at CloudFix. So you all are attracting some amazing talent
    there. How are you doing that? How are you bringing the best and brightest minds to come in and tackle this
    problem?

  83. Rahul Subramaniam

    I think one of the advantages that we have is that we are literally working at the bleeding edge of cloud
    computing, where we are trying to stay ahead of the curve of this firehose of AWS: Services, product updates,
    and announcements. And that is literally the bleeding edge of computing.

  84. Dionn Schaffner

    Well, rumor has it that two out of the three first FinOps certifications are sitting at folks in the ESW capital
    organization right now. So you truly have the best of the best in this area. You get the top two takeaways about
    cost optimization that you want our listeners to walk away with. Badri, you start.

  85. Badri Varadarajan

    Cost optimization is possible, and you can do it incrementally.

  86. Rahul Subramaniam

    Care about the dollars saved and realized. Don’t go after the potential. And the second one would be, to rely on
    tools and automation as much as you can, because the minute you start needing people to do a whole bunch of
    stuff, you just get slowed like crazy.

  87. Dionn Schaffner

    Well, thank you both for this enlightening and arousing conversation. Badri, thank you. Rahul, thank you. It’s
    been a great time talking to y’all.

  88. Badri Varadarajan

    Thanks.

  89. Rahul Subramaniam

    Likewise, an absolute pleasure.

  90. Dionn Schaffner

    Thanks, everyone for listening today. If you enjoyed our podcast, please be sure to rate, review and subscribe.
    See you next time on AWS Insiders.

  91. Dionn Schaffner

    We hope you enjoyed this episode of AWS Insiders. If so, please take a moment to rate and review the show. For
    more information on how to implement a hundred percent safe AWS recommended account fixes that can save you 10
    to 20% off your AWS bill, visit cloudfix.com. Join us again next time for more secrets and strategies from top
    Amazon insiders and experts. Thank you for listening.

Leave a Reply

Your email address will not be published.