On this episode, Badri Varadarajan, EVP of product at CloudFix, and Rahul Subramaniam CEO (AWS superfan and CTO of ESW Capital) of CloudFix and AWS super fan, dive deep into AWS Cost Optimization best practices for how to apply cost optimization principles when designing, configuring, and maintaining workloads in AWS Cloud environments.
AWS superfan and CTO of ESW Capital
EVP of product at CloudFix
Dionn Schaffner
Welcome to the podcast. So excited to have Rahul here today, as well as Badri. Badri, how are you doing today?
Badri Varadarajan
Very well. Sunny in California.
Dionn Schaffner
Awesome. Fabulous. I just have to ask, what made you decide to focus your entire career on AWS? I mean, how do
you get involved in this. Rahul, tell us how it started. I mean, you could have been a rocket scientist, you
could have been in some deep dark lab somewhere. Why AWS?
Rahul Subramaniam
You’re close. I mean, I did major in physics in my undergrad, so astrophysics would probably have been it, but
before… Very early on in 2007-2008, I was grappling with a whole lot of infrastructure issues, and that’s when I
discovered AWS. And the fact that infrastructure was being made available literally with a simple API call, just
blew my mind. And the more I dived into it, the more I used it, and the more I interacted with AWS. I got into a
position where I was breaking almost every AWS service as they were releasing it, which had them call me pretty
much every week about something that I broke, and just interacting with the amazingly smart folks over there
over the years just got me hooked. I was then part of every new service they created. I was involved in trying
everything out that they came up with. And they just became an integral part of how we did business. So I think
our entire business got very deeply integrated with AWS, as well.
Dionn Schaffner
So you were the troublemaker is what you’re saying. You were the one causing all the trouble on the outside, so
they’re like, “You know what, we got to get this guy in closer. So, let’s bring him into the fold a little bit.”
Rahul Subramaniam
I think I’d like to phrase it as me being that advanced tester and we collaboratively made the services good to
the benefit of both parties, so it was a win-win.
Dionn Schaffner
That’s great. Badri, how about you? How did you end up investing in AWS and cost optimization as what you are
living and breathing all day long?
Badri Varadarajan
I mostly blame Rahul for that. AWS was a bit of an acquired test for me. I spent the initial part of my career
working on the network edge, building infrastructure and algorithms that got deployed and run where you are, be
it for connectivity or computer vision analysis and so on. And so, I figured, “If you can’t beat them, join
them.”
Dionn Schaffner
Okay. But let’s talk a little bit more about cost optimization. There was a time before cost optimization, there
was a dark time, we don’t speak of it much, there was lots of work. Rahul, tell us what managing 45,000 plus
accounts looked like without some way to automate cost and performance. Take us through the dark times.
Rahul Subramaniam
Cost optimization, for me, with AWS, actually started way before we had 45,000 accounts. It started when we had
one account, and I’m talking about early 2007. At that time, shockingly, AWS didn’t even have IAM that everyone
is familiar with, which allows you to manage all your users, access control, and stuff around this. Back then
you had one account. So, one username, one password, and your entire organization would have to be given that
username and password to operate on it.
Dionn Schaffner
Yikes.
Rahul Subramaniam
Okay? And can you imagine that world back in 2007?
Dionn Schaffner
Ugh.
Rahul Subramaniam
And we had over a thousand engineers that we wanted to enable on AWS. So we had this big nightmare of a scenario
where we had to give away our master username and password, which had all of our credit… Back then it was all
credit cards only. So, it had our corporate credit card punched in over there, and 1000 people could do whatever
they wanted on one account. There was no tagging, there was nothing. It was basically… The setup was primed for
absolute chaos.
Rahul Subramaniam
And so, in 2007, I wrote our first system, which acted like IAM. And it did two things. One, it allowed users to
log into a portal where they could request whatever resources they wanted. And it acted as a proxy to our one
single AWS account. But the second thing that we found very soon was, that people were just launching instances
willy-nilly and never turning them off. And our credit card ran out of limits so fast that we would have to
refund the card multiple times in a month, which is just absolutely crazy.
Dionn Schaffner
Wow.
Rahul Subramaniam
So, one of the first cost optimization measures that we put in place was, we asked every person to put in their
eight-hour shift into the portal and we would automatically turn off or hibernate those machines that they had
launched when it was not their peak working hours.
Dionn Schaffner
Uh-huh.
Rahul Subramaniam
If they wanted, they could go back and turn it on if they were logging in at some odd hours. But just that cut
our AWS costs by 66% because you’re only using it for 8 hours out of the 24 hours, right?
Dionn Schaffner
66%?
Rahul Subramaniam
Yeah.
Dionn Schaffner
Wow.
Rahul Subramaniam
That was the first cost optimization system I wrote back in 2007 and it’s been a constant journey ever since. If
you’re spending money on a system, there’s always room to optimize those costs.
Dionn Schaffner
I like that because as we look at the cost optimization, we talk about that, it’s like, is it just for big
business? Is it something that even a small startup should be concerned with? But, even if it’s just one
account, that’s a great piece of knowledge to understand that across the spectrum, there are opportunities for
everybody to reach some benefits from cost optimization. What do you think is the top thing customers, and
clients should know about that time in the dark ages? What do you wish everyone knew that you suffered through
the hard way but you know now?
Rahul Subramaniam
I think the thing that we learned very quickly was the foundations of why we made the big bet on AWS. And that
is that they are innovating at a pace that is just so remarkable. Everything that we thought of as a gap was
very quickly closed in a matter of a year or two. So if you were ever making a long-term bet, you wanted to bet
on AWS, or you would bet on AWS because they have a track record of constantly working on customer problems and
turning all these standard utility functions into amazing services that are just commodity services you can use
with simple API, and whatever gaps are there, you can be assured that if you bring it to their notice, it gets
solved. So with that, it becomes a no-brainer to make the long-term bet on AWS. And I think people are still
doubtful today, but having lived through this over the last 14, 15 years, I have the experience to feel very
confident about that long-term bet.
Dionn Schaffner
And maybe Badri, you can help us answer this question. Back to cost optimization, some businesses just aren’t
paying attention to it yet, right? They say, “It’s not a priority. We’ve got other strategic initiatives going
on.” What do you say back to those folks, and how do you get them to recognize the importance of cost
optimization?
Badri Varadarajan
In a way, I’m kind of sympathetic to that idea that cost optimization isn’t a problem till it is, unfortunately.
I think there is this great article about Dropbox, I think, where as long as the revenue was growing and the
market was rewarding growth. In valuations, they didn’t worry a jot about cost optimizations, but arguably
rightly so. But then the problem is once that curve flattens, you don’t want to go into panic and sort of
getting into whiplash and say, “Cost isn’t the problem,” and suddenly, the next quarter, it is the number one
thing. “We don’t care. We shut down all our innovation projects.” I think you want to sort of a wholesome way to
think about cost optimization. If you keep doing cost optimization, as a matter of course, first, it’s good
hygiene. Second, you build up your cost optimization muscles organizationally. And when it becomes a real
problem, then you can sort of hit the ground running and take proportional measures as opposed to just going
from not worrying about it at all to it being the only thing you worry about.
Dionn Schaffner
I like that. “Building the cost optimization muscles within the organization,” love that. So when the time
comes, you can flex big, you’re ready. And to follow up with that, one of the other things we hear then though
is, people are like, “Hey, yeah, we are working on cost optimization. We have an internal team. We’ve got some
internal tools.” How do you balance that challenge?
Badri Varadarajan
One way to approach that is to ensure that you’re not just doing cost optimization by listing a bunch of tasks.
You want to sort of go towards a goal from doing this overtime is just the proof that such a thing is possible,
right? I mean, you want to do the four-minute mile. You want to understand that costs can be reduced
organically. And our framework to think of it is, if you’re just starting with your cost optimization journey,
you can get to 50 to 60% cost reduction. Now and then, folks like Rahul work some magic and get 66% by doing one
thing, but that’s the exception, not the rule. I mean, you sort of, you will get to 10% by doing something
simple. And then the next 20% ends up being a little bit more complicated. And then the last 30% involves a
bunch of sprints, which go deep, but it’s healthy for you to know, starting, that it is possible. It’s not a
fool’s errand. You will get there if you do it systematically and choose your projects well.
Dionn Schaffner
Well, how do you know when you’re doing it wrong? How do you know if you’re not choosing your projects well?
What does that look like?
Badri Varadarajan
What it looks like is quarter after quarter of potential savings. I mean, it’s very easy for you to either hire
a vendor or do it yourself and get an impressive-looking report that says your cost can be lowered by 60%. The
problem is that that’s not realizable. All it does is, every quarter, you feel bad about what you did not
achieve last year.
Dionn Schaffner
There’s the money left on the table. Dang.
Badri Varadarajan
That’s right. I’m now reading this book called Switch, about organizational change. You don’t want to just paint
a big picture and not take the first step. It’s healthy for you to sort of feed your reptilian brain by booking
small victories. If you have a grand plan, never do anything, you’re probably doing it wrong. You should be able
to do organic incremental improvements.
Dionn Schaffner
Rahul, I feel like you probably have some war stories about this.
Rahul Subramaniam
Yeah. Early on when AWS had just started, I think it, as Badri said, it was possible to make a few changes that
would get big returns because there were a lot of gaps in the services, the infrastructure, the API, and stuff
like that. But over the last 15 years, AWS has built so much maturity around how they build up their services
that getting those big savings by doing one thing is just incredibly hard. Early in the days of our cost
optimization efforts, we ran into a bunch of scenarios, for example, migrating databases or moving over
applications to serverless. So, a large number of applications that we acquired were primarily on-premise
monolithic applications. And we tried to switch them all over to services like Lambda when Lambda first came
around.
Rahul Subramaniam
Lambda is not designed to deal with monolithic applications like the ones that we had. Right? And that meant
that we were embarking on this major surgery on our applications, trying to replicate the same function in a
microservices pattern. And suddenly, we had so much chaos that we just didn’t know how to manage it. And we had
several failed exercises like that where either, over some time, we completed maybe 20 or 30% of the
functionality as we moved over to microservices-based on Lambda. Or it was just a non-starter because certain
things were being done in a certain way that customers were comfortable with, and you would have to change the
entire mechanics of it as it moved to the serverless world. Right? So, we just couldn’t make all those
dimensions meet. So, we realized the hard way that those big bang approaches were very few and far between as
the services matured over some time.
Dionn Schaffner
Well, and so you tried a ton of cost optimization tools before deciding to build your own, right? So what were
they missing you thought, “I can do this better. Let me just sit down and put some stuff together.” What were
they missing and why did you think you could do it better?
Rahul Subramaniam
Yeah. First and foremost, I already have the big burden of being responsible for almost two and a half billion
lines of code that we own across all of the companies in our portfolio. And I had no interest in building a new
product or building and owning a new codebase. That was not the intention at all. Our default is to go and look
for every tool out there in the market that could potentially help us solve this problem. So, we did just that.
We tried out all the tools, but very soon we realized that there were three fundamental problems. Problem number
one was that most of these tools ended up being visualization tools we’re closing that gap on some of the
visualization problems that AWS had. By the way, all of those are gone. Right now, if you look at AWS tools,
they pretty much give you all the data you want, but most of the cost optimization tools that you find in the
market today are still glorified visualization tools that take all the data that’s in AWS and present it to you
in fancy graphs. Okay?
Rahul Subramaniam
The bottom line is it became our problem to go figure out how to realize those savings because you have to slice
and dice all that data and figure out what the insight is, and then go figure out how to realize the savings.
Rahul Subramaniam
The second problem with these tools was that none of them fixed anything. Even if they provided some insights or
some recommendations, they didn’t fix anything for us. More often than not these recommendations like, “Hey,
resize your EC2 instances,” or “Why don’t you move to a completely different serverless platform, because the
per-unit cost there is completely different.” All of those recommendations, while great sounding and the
recommended savings or potential savings was 50, 60%, just realizing that was incredibly hard because you needed
to perform major application surgery to achieve even remotely close to those kinds of savings. And we just
couldn’t get our teams to sign off on or be successful at those major surgeries that they had to perform.
Rahul Subramaniam
And the third issue with a lot of these tools was that they were just insanely complicated. Just navigating a
lot of these tools required almost like a Ph.D. in AWS services where you wouldn’t even know… They just kept
slapping on stuff over the basic visualization that they started with, and you wouldn’t even know where to go
look for insights even if they provided some insights. And a lot of these tools just got so overly complicated,
requiring admin permissions to do anything, that it just became a no-go for a large proportion. So the admin
permissions, and complex UI, became a big issue, as well. And because all of those hurdles were things that made
a lot of these tools no-gos for us, we had to invest in figuring out a simpler way to realize savings, not just
talk about potential savings.
Dionn Schaffner
This is where the rubber hits the road. It’s great in theory, but how do we instantiate and get these results?
Rahul Subramaniam
By the way, there was a year that we spent trying out all these tools and putting together a SWAT team trying to
get all the savings. We spent a few million trying to realize these big savings but saved nothing.
Dionn Schaffner
Well, that’s part of the journey, right? And that led you to CloudFix. Okay, you get one minute to talk about
CloudFix specifically. Well, you both do. So, Rahul, you go first, and then Badri, you’re going to follow up
with your comments on CloudFix. Ready? Go
Rahul Subramaniam
Very simply, what we did was we looked at all the AWS recommendations that they had made around cost. We
filtered them down to the ones that we believed were completely non-disruptive and that we could execute
centrally. And that’s literally what we did. And then, of course, Change Manager came at just about the right
time, where we were able to use Change Manager as a mechanism to deploy those fixes without needing admin
credentials. So, in effect, we were fixing the problem instead of just talking about potential. And we did not
require admin credentials. We followed all of AWS’s best practices and recommendations to realize those savings.
And that was basically what closed all the gaps for us that we had with the other tools.
Dionn Schaffner
Badri?
Badri Varadarajan
I have nothing to add now. AWS [inaudible 00:19:33]. CloudFix is supposed to be the “fix it, don’t talk about
it,” too. So, that’s it. We’re done.
Rahul Subramaniam
Yeah. I mean, it’s supposed to be the simplest tool out there. With five clicks, you save 10 to 20%. It’s
supposed to be simple.
Dionn Schaffner
I like it. And so let’s talk about the 5 to 20%. Let’s talk about the money. You mentioned that you all spent a
million dollars trying to get to this product. How much are you saving now by having this tool in your arsenal?
Maybe you can use percentages if you’re not going to talk real money.
Rahul Subramaniam
Yeah. Our savings are a combination of what you see in CloudFix today and stuff that will be coming up in
CloudFix because we run all of these finders and fixes on our setup first before running it or deploying it for
other customers. And we run it for quite a while. We measure ourselves every week, “How much do we save on an
annualized basis? What amount of spend do we claw back every week?” That’s our metric, and we measure it in
dollar terms, in concrete dollar terms.
Rahul Subramaniam
And today, for our spend, which across AWS customers isn’t very large. We still manage to claw back about a
quarter of a million dollars a week, which is pretty significant, and this is, again, on an annualized basis.
But I think, on an average, for customers, you could find that whatever be your AWS spend, when we try to buy a
company, or when we are evaluating acquisitions, it’s a simple assumption that we make, which is, “Long term, we
can save 50% via this incremental mechanism, but short term, 10 to 20%…” is a given. It is something that we can
absolutely go on.
Dionn Schaffner
That’s great. And if you think about it in terms of the cost optimization spectrum, CloudFix is on the easy
side. Press five steps, and you’re going to get this kind of return. And then who you’ve spoken before of, the
really harder problems that take up more of an investment of the organization to sort of go and dig in. Maybe
Badri, can you tell us maybe, how you balance the cost of your internal resources and the risks against your
expected ROI of getting these cost optimization results in-house? Is there, like, a magic number? Where do you
find the tipping point for the business to decide, “Hey, yes, we are going to make this investment with our
time, with our people, with our resources, and to really dig into this particular cost optimization problem”?
How do we know when we should jump or when we should just wait?
Badri Varadarajan
Yeah. That’s a good question. I’d say I’d slice it into different tiers. There are about 10 to 20% off savings.
That’s the realm that CloudFix operates in, where your investment should really only be on the tool, not on the
people managing the tool. The tool should just basically do it. Then the next 30% you get by investing in people
as well. Essentially, those are the sorts of savings for which you need engineering teams to get involved. You
want to ensure that your cost optimization does not affect the functionality of the product. So I’d almost
operate that as an engineering project itself and look at the ROI in those terms, so you do need to account for
what you’re spending in terms of manpower there, how many hours are people spending, and what features are they
not shipping because they’re doing that.
Badri Varadarajan
So I divide the ROI into those two different tiers. And I think one of the reasons CloudFix exists is this
realization that you can actually get 10 to 20% without involving people and investing brainpower in it. It’s
just a tool that just does its thing, and you don’t need to do anything beyond click a few buttons and count the
cash.
Rahul Subramaniam
I’ll add one more dimension to this. The non-disruptiveness of what you’re going after is the second dimension
in this. So I absolutely agree with Badri that the tool versus people is one dimension, but also look at the
non-disruptiveness of the changes that you’re trying to make. And that also, again, it’s like a quadrant. So
there are a bunch of non-disruptive changes that do require people’s assets. So for example, there’s a bunch of
financial engineering or just process-related stuff that can save you a bunch of costs. For example, if you’re
migrating a bunch of workloads from on-prem to AWS, you should absolutely have someone take lead and sign up for
the Migration Acceleration Program, where AWS covers a bunch of your costs while you’re migrating all your stuff
so that you’re not paying double during the migrations. That’s a great way to save a ton of money. Most people
don’t even know about the Migration Acceleration Program, but it’s the easiest thing to get signed up for. All
you need to do is tag your resources in a certain way, and you start getting credit from AWS for all of those
workloads.
Rahul Subramaniam
Another example is, if you have certain services where your consumption of those services is just insanely
expensive, you absolutely should have somebody go talk to the product team and see if you can work out a
discount or a volume discount with that particular product team because AWS does that very often. If they find a
customer using a particular service far more than anyone else, they will make concessions, and you can always go
negotiate that.
Rahul Subramaniam
The third one is, leveraging things like savings plans, CRIs, and the reserved instances, that criteria. You
need someone that you can dedicate to who can understand all of this. It’s just a bunch of financial engineering
where it’s a trade-off between commits and discounts that you get. And you can do that and get up to a 50%
discount on your spending, depending on how much of a trade-off you’re willing to make in terms of commitment
versus the discount you get.
Rahul Subramaniam
And lastly, there is the EDP. And again, I don’t recommend anyone sign the EDP. I treat it almost like a
handcuff around your hands. I mean, it’s not something I recommend, but that is an instrument of last resort, as
well, to get a certain amount of discount if you’re willing to make commitments or if you’re really, really,
really sure of what your spend over the next three to five years is going to look like.
Rahul Subramaniam
Now, those are all the things that you could do by investing in certain people and getting those cost benefits.
And they measure near zero on the disruption side of the equation.
Dionn Schaffner
It really sounds like you need allies within your own organization sometimes to sort of come on board to
participate in this cost-optimization journey. Engineers who are down in it every day, who are feeling the pain,
how do they sell wanting to go do these cost-optimization projects, the organization, or sideways in the
organization to get more folks like, “Hey, we’re going to need some talent resources. We’re going to need some
business folks to really help us understand the business challenges of this,” how do you champion this
throughout the organization?
Badri Varadarajan
Actually, that is one of the more important things here. A lot of this, there’s good know-how out there,
particularly AWS publishes a bunch of these things. One problem is, that they make it complicated, which is why
those like CloudFix are needed. But the other thing is organizational buy-in. If you look at strategies that we
have seen that work, they switch between the small, furry mammal strategy and the apex predator strategy for
survival. You first get some small wins and sort of earning your stripes, and then the organizational people
take you seriously. It’s like, “Okay, hey, this project can save money. Let’s invest more here.” And then you go
into those… I mean, you may not save a lot of money that way, but at least you’ve proven that such a thing is
possible. And then you can switch to the apex predator strategy, which is, “We need engineering teams to care
about this.”
Badri Varadarajan
Even things like tagging that Rahul was talking about, you need folks to buy in and go tag those resources. The
one thing I’d recommend is to first earn your organizational stripes and then fight those battles. That’s one
thing.
Badri Varadarajan
The other thing is just the process itself. Change Manager really makes it simple because you don’t have to get
buy-in which involves meetings, playbooks, and people reading off of the same playbook and following processes,
and so on. That gets baked into the AWS console itself, which is a big thing. The AWS Change Manager team wrote
a blog post on how we approach this. I was talking to the product manager the other day, and she told me that
has been used by other organizations as well. So there are organizational playbooks here, as well, in terms of
how you can structure cost optimization.
Rahul Subramaniam
Yeah, I think, again, going back to what Badri was saying earlier, the two dimensions of tool versus people, it
is in my opinion, a 100x harder once you get into that people investment domain. So, you want to maximize and
get all your wins from the tools and the automation before you get into asking for an investment of people,
time, and resources, because expertise is scarce, and the resources are scarce. Whatever limited resources you
have, you want to invest them in building features and building your applications, because they have the domain
knowledge about your business. And that’s where you want to invest that skill and expertise.
Rahul Subramaniam
The second dimension, of course, also plays a role. Most people get scared when you start moving towards any
sort of disruptive change because they don’t understand it. They fear what the impact is going to be. And
they’re more likely to figure out ways to reject ideas of change than be participants in it. So, you have to
earn their trust. You have to earn your stripes as Badri said, but try to get as many wins out of the tools
first and exhaust all of those options before you start asking for people’s resources, building up knowledge
amongst the people, and allaying their fears about what these changes might mean for them. And that would be my
approach.
Dionn Schaffner
We do hear from our customers who are considering various cost optimization projects, including CloudFix, that
security is really their struggle to participate, right? They’re like, “Wait, you want me to give you admin
access to all of this?” How do you combat that particular hurdle?
Badri Varadarajan
Yeah, absolutely. Right. I mean, security is a big concern and that’s why you want to address it organically as
part of every project itself. So, from a tool point of view, that was actually one of the key design
considerations that went behind CloudFix. When you go into these people’s projects, again, there you want to
ensure permissions, not only in security, not only in terms of who’s allowed to access it, but I think you also
want to limit what people can do. There are two dimensions to security, like, which tool or which person is
allowed to do something and be what they’re allowed to do. And there you want to ensure that you put in the
right service control policies in place, you’re putting other rules in place. AWS will give you these tools, but
they do believe in just giving you tools and letting you build them yourself.
Rahul Subramaniam
Yeah, absolutely. I mean, AWS has invested a lot in creating a well-architected framework that you can use to
define your security parameters very well in the AWS framework so that you can be assured of exactly what a tool
or people are allowed or not allowed to do. Unfortunately, for most organizations, security comes more as an
afterthought. Like, they will first build the application, they will first do a bunch of stuff, and then they
will think about, “Okay, now what should I do about security?” When I say “security,” I also include permissions
and schemes that may not be secure in terms of somebody attacking your infrastructure or something like that.
Security, for me, is also where you let somebody launch a bunch of instances without control or where you let
people launch different kinds of resources that may or may not be auditable by the organization. So, best
practice is setting up your service catalog, setting up policies, and things like that.
Dionn Schaffner
Let’s talk about the people. Whose lives are changed on a daily by implementing some of these cost-optimization
initiatives? Like, how does their life change before a cost optimization to after?
Badri Varadarajan
I think the folks were happiest to run this in the CFO’s office, right? Because it’s good for them. But I think
that’s part of the challenge of this, is to ensure that you’re making the CFO’s office happy without affecting
the CTO’s office or the VP eng.’s job. And ideally, that’s what you want to do. You want to ensure that all your
projects have well-known blast radii at the beginning that affect as few people in the engineering organization
as possible, and wherever it does affect them, the effect is contained, that they know exactly what they need to
do, or some automation, or tool, or UI messaging will tell them exactly what they need to change in their
behavior. And hopefully it’s not [inaudible 00:33:15].
Badri Varadarajan
I’ll give you an example. You can put in a policy that says you cannot launch instances of a certain type
because you want to protect yourself against Bitcoin mining attacks. That’s fine, but you need to have a clear
way of communicating that and ensuring that that does not affect operations that happen as a matter of course.
And good practice there would be to before you make a change, just do a dry run and see what all operations it
could’ve affected in the last three months, and target messaging towards people who would be affected by those.
Dionn Schaffner
Tell us about a blast radius that has gotten away from you, and sort of been the worst cost-optimization
project. And what do you think was the key contributor to something like that? Rahul, you probably have some
good stories.
Rahul Subramaniam
There were times in the early days when we were just starting with our cost-optimization journey, we had a bunch
of instances that were doing very little to the point that the resource utilization was near zero. And in one of
the very early cost-optimization exercises that we did, we ended up shutting off hundreds of such machines,
which ended up being fairly critical from an operations standpoint. Thankfully, we snapshotted all of these
instances before we decided to shut them off, as a safety measure. So, though there was disruption, we were able
to bring all of these instances back, but had we not put that measure in place, we would probably have suffered
greatly. I think one of the other things that is historic, or it’s been an interesting insight for me at least,
is, over a period of time, as organizations evolve and change, I find it really shocking how little the
organization knows about all of their infrastructures.
Rahul Subramaniam
You have tens of thousands of machines and resources running all over the place. You would think that there’s a
perfectly auditable manifest of exactly what machine runs what, and you know when you’re going to shut it off,
or whatever. You’d be surprised as to how many instances are running because nobody knows what’s in there and
they’re just petrified to shut it off, or EBS volumes that are running or that have been detached. They’re just
literally sitting there, massive discs of terabytes of storage capacity provisioned on them, but when you ask
someone to go delete it because nobody has touched it in a year, they’re like, “I don’t know what that has, so
I’m just petrified to shut it off. And I don’t know anyone else who knows what that instance has.” Right? So
you’d be surprised as to how prevalent that is across large organizations. Especially over a period of time,
people really don’t have a sense of the inventory of infrastructure and resources that are running, and that’s a
big cause of a lot of waste and cost, as well.
Dionn Schaffner
From the business side, it’s the “Infrastructure is just there. It’s running. We’re not going to mess with it.
We’re not going to peel back the layers and see what’s under the covers. All we know is it’s working, don’t
touch it,” right? But at some point, there is the ROI discussion of, “Okay, we need to tighten it up. We need to
free up some resources and do some other things. Let’s scarily open the hood, see what kinds of things fly out,
and let’s go and address that.” And oftentimes you don’t see organizations utilizing their IT infrastructure as
a strategic advantage, right? It’s just supporting the business. And so you have to have these moments of
everyone around the table, we all come together and look and say, “Hey, this is really important strategically.
We really need to dig in and make this work for us, not just keep us treading water evenly. How can this help us
advance where we’re going?”
Dionn Schaffner
All right. So what are the key factors to making a cost-optimization project be successful? Badri, you talked
about identifying the blast radius. What other things are critical that not just IT but the whole business needs
to make sure in place, as we launch a cost-optimization initiative?
Badri Varadarajan
Who needs to be informed, like, whose lives get impacted, as you said earlier. To Rahul’s point earlier, as
well, that’s a key fact what Rahul mentioned is, that folks actually don’t know what’s running in their AWS
infrastructures. And in fact, within the same team, different people don’t know what others are doing. I mean,
we’ve had both things happen is, a manager signs off on something because they’re not aware of exactly what
their DevOps engineers are doing. And you think this is something like a simple change, and it turned out to be
a huge, big problem. To give you a concrete example, because you think you’re just restarting an instance, it’s
going to come back within three minutes, but it’s running some startup script that nobody knows about.
Badri Varadarajan
And the DevOps engineer who knew about it is on vacation. And we never even asked him because, “Hey, this is
just a restart. Why do you need to ask anybody?” On the flip side, projects have gotten blocked for months
together because the manager thought it was a much bigger deal than it actually was. And when the engineers got
involved, they were like, “Ah, this is just a tag. I’ll do it. I have automation for this. I already have a way
to address all these machines.” So, figuring out who the stakeholders are for any given project is super useful.
And that’s also why you want to cut it up by the project. You don’t want to have this one big goal of saving
60%. You want to slice it into a particular application or a particular service that you want to optimize.
Rahul Subramaniam
And I’ll just add one more thing. I think having automation and tooling in place is really key. When you do
manual one-off stuff, you are more likely to make mistakes and regret them. So, whatever project you take on,
make sure that it is automation-driven because that’s how you ensure that what you’re doing is repeatable, not
just for one resource but across your entire setup, and you’re not going to make mistakes, because it’s all
enshrined in code. So, automation is, I think, key. No matter what kind of project you’re delivering, try not to
have manual steps in the process.
Dionn Schaffner
Because people are always causing problems.
Rahul Subramaniam
I mean, in reality, if you did first principles thinking and said, “What is the root cause of all bugs?”
Somebody wrote some line of code that caused a bug.
Dionn Schaffner
Mm-hmm. Let’s talk about the talent that we have at CloudFix. So you all are attracting some amazing talent
there. How are you doing that? How are you bringing the best and brightest minds to come in and tackle this
problem?
Rahul Subramaniam
I think one of the advantages that we have is that we are literally working at the bleeding edge of cloud
computing, where we are trying to stay ahead of the curve of this firehose of AWS: Services, product updates,
and announcements. And that is literally the bleeding edge of computing.
Dionn Schaffner
Well, rumor has it that two out of the three first FinOps certifications are sitting at folks in the ESW capital
organization right now. So you truly have the best of the best in this area. You get the top two takeaways about
cost optimization that you want our listeners to walk away with. Badri, you start.
Badri Varadarajan
Cost optimization is possible, and you can do it incrementally.
Rahul Subramaniam
Care about the dollars saved and realized. Don’t go after the potential. And the second one would be, to rely on
tools and automation as much as you can, because the minute you start needing people to do a whole bunch of
stuff, you just get slowed like crazy.
Dionn Schaffner
Well, thank you both for this enlightening and arousing conversation. Badri, thank you. Rahul, thank you. It’s
been a great time talking to y’all.
Badri Varadarajan
Thanks.
Rahul Subramaniam
Likewise, an absolute pleasure.
Dionn Schaffner
Thanks, everyone for listening today. If you enjoyed our podcast, please be sure to rate, review and subscribe.
See you next time on AWS Insiders.
Dionn Schaffner
We hope you enjoyed this episode of AWS Insiders. If so, please take a moment to rate and review the show. For
more information on how to implement a hundred percent safe AWS recommended account fixes that can save you 10
to 20% off your AWS bill, visit cloudfix.com. Join us again next time for more secrets and strategies from top
Amazon insiders and experts. Thank you for listening.