On this episode, Alex dives deep into the intricacies of Amazon DynamoDB, a Planet-Scale NoSQL database service that supports key-value and document data structures. Alex discusses the consistency and predictability in the design of DynamoDB’s performance, and how to best utilize it.
AWS superfan and CTO of ESW Capital
Author of “The DynamoDB Book”
Speaker 1
Hello and welcome to AWS Insiders. On this podcast, we’ll uncover how today’s tech leaders can stay ahead of the
constantly evolving pace of innovation at AWS. You’ll hear the secrets and strategies of Amazon’s top Product
Managers on how to reduce costs and improve performance. Today’s episode features an interview with Alex DeBrie,
Principal at DeBrie Advisory and author of The DynamoDB Book. On this episode, Alex dives deep into the
intricacies of Amazon DynamoDB, a planet scale, NoSQL database service that supports key value and document data
structures. Alex discusses the consistency and predictability in the design of DynamoDB’s performance and how to
best utilize it. But before we get into it, here’s a brief word from our sponsor.
Speaker 2
This podcast is brought to you by CloudFix. Ready to save money on your AWS bill? CloudFix finds and implements
100% safe AWS recommended account fixes that can save you 10% to 20% on your AWS bill. Visit cloudfix.com for a
free savings assessment. And now here’s your host, AWS super fan and CTO of the SW Capital, Rahul Subramaniam.
Rahul Subramaniam
Hi everyone and welcome to another episode of AWS Insiders. Today I have with me, Alex DeBrie, the author of The
DynamoDB Book. Thanks for coming to the show, Alex. I’m really excited to dive deep into the details of DynamoDB
with you.
Alex DeBrie
Absolutely. I’m excited to be here. Thanks for having me, Rahul. This is great.
Rahul Subramaniam
Awesome. So I get surprised so many times that you are the go-to person for any questions, DynamoDB. You’re the
first person to get tagged even before they actually tag the AWS team on questions. So I couldn’t be more
excited to have you come in and give your perspective about DynamoDB. So how did you discover DynamoDB or how
did you get started with it?
Alex DeBrie
Well, the first thing I would say is probably no one’s more surprised than me that it fell into this way. I
don’t have a hard database background or even a hard computer science background or anything like that. So it
happened accidentally over time and it’s something that I really like and am interested in, so I love being in
it but yeah, it’s accidental. But I would say, I think about how I discovered DynamoDB, I’d put it in three
different phases. So first one, I was working for a company, I was doing internal infrastructure stuff, so more
like data warehousing, working with Amazon Redshift, and different things like that. And our team just had some
need for an internal slack application to integrate with slack and store some data, different things like that.
Alex DeBrie
I offered to build the thing and I didn’t want to have an EC2 instance running and a relational database
instance running and all that stuff just to hold 5-10 requests a day, not even that much. So I looked into all
this serverless stuff and found Lambda, found DynamoDB, and I’ll use this, it’s super easy, it’s low cost, it’s
going to fit in a free tier, and so I looked into doing that and I built it and I used it totally wrong, but it
was small enough scale but it didn’t matter, it could just brute force it. But then just taking that, I really
liked how that service application works. So, six months after that, I took a job with serverless.com. These are
the people that created the serverless framework and really are at the cutting edge of making this serverless
revolution happen in conjunction with AWS and Lambda and those things.
Alex DeBrie
So I’m working for serverless.com and I see so many people using DynamoDB in their serverless applications
because of how well it worked with Lambda and especially how relational databases didn’t work well with Lambda,
because you had all these networking requirements, you had connection limits, you had, and then also just like
the scalability of Lambda as compared to the scalability of relational, especially quick spikes and things like
that just wasn’t working well. So I started using Dynamo there, probably poorly most of the time, and then in
late 2017, I watched Rick Cohan’s talk at AWS Reinvent and he was, this guy that worked for AWS and helped
Amazon move a bunch of their internal applications from Oracle to DynamoDB to NoSQL patterned a lot of these
relational database patterns and shared it. I watched his talk like 5, 6, and 7 times that break and it just
blew my mind. And that’s when I actually figured out how to use Dynamo and I would say, that’s when I discovered
what it means to use it.
Rahul Subramaniam
That’s pretty neat. At what point did you realize that there’s tons of value in writing a book about DynamoDB
and what really triggered you to go do that?
Alex DeBrie
Yeah, so just to continue that story, this is late 2017, this is Christmas break. I’m watching Rick Cohan talk.
I actually listened to it the first time when I was driving to work. And I was like, “Oh, this is really
amazing.” And then I watched a few more times, took a bunch of notes and I was like, “There’s a lot of
interesting stuff here that I didn’t get the first time and I just wanted to help share that with people.” So
during that Christmas break, I created a website called dynamodbguide.com and it was just 10, 15, and 20 pages,
about the DynamoDB documentation, but presented in a different way, in a way that made sense to me just like,
“Hey, this is how DynamoDB is different than other databases and what it looks like.”
Alex DeBrie
So I put that together in early 2018 and it was pretty popular right away. And just like, it started to snowball
where then people would be asking me questions. And a lot of times, I didn’t know the answers to the question
because I’m still pretty new to Dynamo at that point, at least to modeling Dynamo correctly, but people would
ask me a question and I’m like, “Well, I got to go figure it out.” So I’m figuring stuff out and doing this and
it’s starting to build up knowledge and all that. Then I start writing blog posts. I start giving talks. I
worked with the AWS team a little bit, things that throughout 2018, 2019, and around the summer of 2019, I was
like, “You know what? I think there’s an opportunity here for a book where DynamoDB is interesting enough,
distinct enough, and it has these unique advantages with serverless applications, also just with predictability,
consistency, things like that, but it’s so different and people need a comprehensive look of how you model
DynamoDB.
Alex DeBrie
So, started thinking about that in the summer of 2019, and was working on it, but I still had a full-time job
and it’s really hard to do that with a full-time job. So at the end of 2020, I was like, “Hey, I think I can do
this. I put in my notice in my company and I spent the next four months at the beginning of 2020, writing the
book and releasing it, and of course, right when I release… I set this release date of like April 2020 and two
weeks before that, the whole United States and the whole world shuts down with COVID and my guys are going to be
spending money on books when who knows what’s happening with themself, but I released it and it went well and
it’s just been a fun journey since then as well.
Rahul Subramaniam
That’s a really awesome story. For most of us who come from the SQL world or the relational database world,
DynamoDB was this huge paradigm shift, and wrapping our heads around all those concepts, because you had to
think about everything differently, was really hard. So for those who are new, who are listening to this
podcast, how would you describe the underlying architecture or the uniqueness of DynamoDB, and how to think
about DynamoDB differently, especially when they come from a relational database world?
Alex DeBrie
Yep. Yeah. Sure thing. I think Dynamo is so interesting and unique and particularly, I think it has the most
explicit trade-offs of any database, so it’s easy for me to go to a place where people are saying, “Hey, should
we use Dynamo?” “And I love Dynamo because it gives you this, this and this, but it also takes away, this, this
and this on the other side and it just, whether you want to give or accept those trade-offs.” So I like the
clarity of the trade-offs. If I’m talking to people and telling them, “Hey, how is Dynamo different than a
relational database?” I’d say the biggest thing is it’s designed for consistency and predictability, especially
around its performance and what that response time is going to be, and also just like how high it can scale up
on different axes and things like that. To do that, to do its underlying architecture, it’s very much going to
force you to use what’s called the primary key to identify your items.
Alex DeBrie
So, if you’re coming from a relational database, you can query based on any column in your table, but in Dynamo,
you’re almost always going to want to filter on the primary key, which is going to be different than a primary
key in a relational database because that’s often going to be an auto-assigned [inaudible 00:08:20], whereas, in
DynamoDB, it’s going to be something meaningful in your application. It might be a username, it might be a
custom email, it might be an order ID, but whatever it is that primary key is really going to have value in your
application because that’s what’s going to be used for retrieving your items, for writing your items back, and
things like that. So I think that’s the first thing is they’re really going to force you into using that primary
key. The other thing I would say, that’s different with relational databases or even other NoSQL databases, is
that a lot of times those databases will have a query planner, right?
Alex DeBrie
So you’ll issue a query against your database. Something internal to that database is going to take apart your
query, look at table statistics and figure out the most efficient way to execute that query. Dynamo doesn’t have
a query planner at all. It’s basically giving you lower-level access to some basic data structures, whether
that’s like HashMap or B-Tree, some pretty basic stuff and just making your plan for that, and basically you
become the query planner to where you have to arrange your data in particular ways that match your access
patterns. It requires more time spent upfront in thinking through your access patterns and designing for your
access patterns, but then you get those great things about Dynamo, the consistency, the predictability around
that stuff where you know about how long anything is going to take without having to go through this query
planner and maybe giving you unexpected results as it hits higher load.
Alex DeBrie
So those would be the first ones. I think three other pretty unique things that fall out of that would be number
one, it’s very explicit on what the limits of the application are. So you can’t have an item over a particular
size, you can’t retrieve more than a mega data in a single request, and you can’t hit a particular key more than
3000 times per second, which is all useful to know upfront as compared to, I think to a relational database
where there are those limits, you just don’t know what they are and it depends a lot on how many other queries
are going on. It depends a lot on what architecture you’re on, what instant size you’re on, all sorts of things,
whereas Dynamo that’s pretty explicit for you.
Alex DeBrie
I’m rambling a bit here, but two other things I think that are super interesting and why Dynamo is so popular,
especially in the serverless world, is how well can scale up and down, right? It’s easy to scale DynamoDB up if
you have cycling traffic during the day or the week or the month, whatever, you can scale that up and scale it
back down pretty easily. And then also, it has an HDP-based API, so you don’t have to set up VPCs and do all
this private networking stuff to access your DynamoDB table. It’s all access over HDP. It uses AWS-IAM for
authentication and it just really works well that way. So it works well in serverless architecture [inaudible
00:10:57].
Rahul Subramaniam
Yeah. AWS DynamoDB is now 10 years old, right? It’s got such an amazing set of properties or characteristics
that sometimes I feel like AWS is almost underselling the service. When I first looked at DynamoDB years ago, I
looked at it and I said, “Here’s a new key-value store.” And coming from the old school relational store, when
you think of key-value, you literally think of a properties file kind of a setup or very simple cash of sorts
where you could just query for some very basic data. For quite a while, I had not even realized that DynamoDB
was or could actually, allow you to go leverage it for so many different use cases. And I think over the last 10
years, the ways in which DynamoDB has been used to create some amazing global scale products have been
absolutely fascinating.
Alex DeBrie
I totally agree. And I think you’re right there that the marketing undersells it a little bit. They have this
purpose-built database strategy and they put Dynamo in the key-value bucket, which you absolutely can do, but it
can do so much more. It can handle all these relationships and interesting things and all of Amazon, all of AWS
into early is running on DynamoDB. So you can do some very complex things there. And again, yeah, I think you’re
right, that they aren’t selling enough of the story around that because it’s more complex, right? You have to
tell people, “Hey, it’s a totally different way of modeling.” You don’t want them to bring a relational mindset
to DynamoDB and use that because you’re going to end up in a pickle there. So it’s an education problem in a lot
of sense, and I think we’re seeing a lot more of that.
Alex DeBrie
Rick Cohan did a lot of great work here. A lot of the team is doing great work, and I think also because Dynamo
works so well in that serverless realm, you had to expand to this much larger user base to where, for the first
five, six years of its existence, it was just the really high I scale customers that needed and use DynamoDB.
And now with that serverless and how well it works there, you’re like, “Okay, it works for a lot more, it works
really well with service, but it also works for any OLTP application if you put in the work to understand how it
works.”
Rahul Subramaniam
With that thought, you just cued me to bring up a conversation about single table design. I was first introduced
to the concept around 2018 or 19 and since then, it seems to have catalyzed a whole lot of very fascinating use
cases for DynamoDB, especially when it comes to building planet-scale applications. What’s your take on it?
Alex DeBrie
Yeah. Absolutely. So let me tell you, let me just, I guess for the listeners, say a little bit about what single
table design is, some of the benefits, and then also some of the other things to think about with that as well.
So first of all, why single table design? I think the first thing you should know about Dynamo is that it
doesn’t have joints, right? And it does have that primary key base access. You’re accessing by primary key, you
don’t have a joint operation, so you still need to fetch related data in DynamoDB how can you do that? So Dynamo
does have an operation called the query operation, which allows you to retrieve a contiguous set of items. We’re
talking in a B-Tree set up here, you can retrieve a range of items that has a particular partition key.
Alex DeBrie
So what you can do, I think the clearest example of a single table is you can rejoin related data in a way that
matches your access patterns, right? If you know you’re going to fetch a customer and the customer’s most recent
orders, because you want to populate their order history page, you can model your data so that those are next to
each other in a single table. You can do that query operation. It’s a very efficient, predictable request, and
you can don’t have that joint operation that gives you that unpredictability. So I think that’s one of the big
reasons for single table design. How do I get these disparate related items in a single efficient request rather
than doing joints in my application, right? So another reason you might want to do single table design,
especially before they had on-demand billing or some of the auto-scaling, is it just simplified data management,
right?
Alex DeBrie
So if you have an application that has five different tables, now you have to manage capacity and throughput for
all those different tables, scale those up and down, and things like that, whereas if you put them all into a
single table, now you only manage one. You only have to scale that up and down and often whatever your biggest
use case is, it often ends up being a lot of your throughput and you can hide all your other operations in there
and get them almost for free just in your excess capacity. So, that’s pretty interesting. One thing I would like
to say… I’m not as much of a purist on a single table, I think it’s useful, especially if you’re going to be
pre-joining data, especially if you want to lower management, the biggest thing for me, I like about a single
table is that it forces you to understand, “Hey, the way I’m modeling my data in Dynamo is not going to match
how I was modeling in a relational database.”
Alex DeBrie
And it forces you to understand those principles. And if you are modeling with single table design principles,
in almost all cases you can put it together in a single table design if it’s going to work. But there also might
be reasons to split it out into different tables and usually, those are operational type reasons or other
reasons, right? So if you’re using DynamoDB streams, which is basically change data capture or change log for
your DynamoDB table, maybe you have very different use cases for some of the different entities in your
application. Well, you could split out those entities in different tables and have different stream consumers
for each of those, maybe that simplifies that access, right? Maybe you have different data exporting needs or
different backup needs, or maybe occasionally, you need to scan all items of a particular entity and that’s
easier if it’s in a separate table. So things like that. So I think it’s often a balance of, do I want to put
this into a single table or are there reasons to break off and have other tables?
Alex DeBrie
I think the key point here is to make sure you’re modeling it in a DynamoDB first way, which should be
compatible with a single table design if that works for the other considerations.
Rahul Subramaniam
I’ve heard a lot of people equate the single table design to hyper-denormalization, but it’s really so much more
than that. Could we take an example and walkthrough, of what it takes to create or design a single table schema?
Alex DeBrie
I think you’re right, it’s not just about hyper denormalization. I think if you de-normalize, you can get
yourself into trouble there, normalization you need to balance, find the right balance there. I think single
table design is more thinking about access patterns first, rather than model design first and then doing your
query, right? So if you’re working with a relational database, often you’ll have your entity relationship
diagram, you’ll have your different boxes for each entity and then each box becomes a table in your relational
database and then you’ll write the queries after the fact to join those together and filter as you need to add
indexes if you need to, whereas Dynamo, you’ll create your energy relationship diagram, but then you’ll think,
“Hey, what are my actual access patterns here? How am I going to read this data? How am I going to write this
data? Update the data?” Different things like that. Once you have those listed out, then you go about actually
designing your table to handle those specific access patterns in the way that you want them to.
Alex DeBrie
So in terms of examples to have here, one example I talk about a lot would be a very simplified eCommerce
example where you have a customer and orders, right? So a customer’s going to make multiple orders. That order
is going to be comprised of multiple items, right? Also, maybe the customer has multiple addresses that they’ve
stored on file for you because they want to ship to their house or to a business or to their parent’s house or
whatever that is. So you have a lot of different relationships here, right? You have customer to address,
customer to order, and order to order items.
Alex DeBrie
So those are all one-to-many relationships, but you might model those in different ways, depending on how that
works, even if they’re all in the same table. So just thinking about the easy one, and we talk about
denormalization, if we think about the customer and their addresses, you’re almost always going to be looking at
that address within the context of a customer, right? So maybe a customer’s checking out, you want to show them
their address options, where you’re doing that within the context of a customer. So in that case, you could
probably do some denormalization there and just store the addresses directly on that customer record, because
you’re not going to be accessing the address separate from the customer any case. So that can be a particularly
easy one. But then you have, “Hey, a customer has multiple orders, and sometimes we want to show the order
history, sometimes we want to retrieve a particular order, how we do that?
Alex DeBrie
Well, you probably don’t want to do that same denormalization strategy of that customer, the order onto that
customer item, because as they make more orders over time, it’s going to expand that customer item to where it
gets really big. So you’ll get a slower response time, and you also pay more because, in Dynamo, you are paying
based on the amount of data you’re reading at a time, and at some point, you’re going to hit a DynamoDB size
limit. You’re actually not going to be able to add any more orders there. So you probably want to split that out
separately, have an order item specifically, but then when we get into the single table design, if we’re showing
that order history, often we want to get all the orders, but maybe we have some data that’s normalized up onto
to that customer that we want to fetch with that order history page.
Alex DeBrie
We can co-locate those together, give them the same partition key they’re located together, we do a query
operation, we can fetch the customer, we can fetch the customer to orders, all in a single request and show that
summary view. And then you can do that same strategy with an order and the order items, right? If I want to
click into an order, and see all the different order items, maybe you put it, so the order itself and then the
different order items are all located near each other, you can do that in a single query operation and get those
very efficiently.
Rahul Subramaniam
Yeah, it makes a lot of sense. And I think, understanding all of those nuances is really, really valuable
because I see a lot of folks jumping straight into single table design saying, this is… And there have been a
bunch of folks who profess that you can pretty much solve any relational schema situation with single table
design, but it’s not really that you have to really think about what your credit patterns look like. You have to
design bottom-up based on the query patterns and then decide if the single table design actually works for you
or not.
Alex DeBrie
Yep. Absolutely. One thing I always tell people is, to do the math. And that’s why Dynamo’s predictability and
consistency are so helpful. They’re going to charge you based on how much data you’re reading, how much data
you’re writing, if you have indexes, you have to pay it additional rights for those sorts of things. But like
you’re saying, you have to figure out your access patterns, really dig deep on that, how often am I writing this
item? How often am I updating it? How often am I reading it? What are my different patterns? How big is that
item? Knowing all those sorts of things and the specifics of your access pattern is really going to drive how
you actually model your data, which I think is different than a relational database, where there’s generally one
way to model it and then you try and figure out how to optimize your queries on top of that.
Rahul Subramaniam
You’re absolutely right. We find that every time we are trying to go create a new DynamoDB schema. The minimum
set of requirements we need to collect before we get started, involves a list of the entities, of course, then
the relative cardinality of those entities in the store, and then the most important thing is, the queries or
the access patterns that we are likely to see across these entities, but also the volume or the relative volume
of the number of queries we are making for each of those access patterns. And I think only once we have that
list, we find that we are able to create a reasonably efficient schema for DynamoDB.
Alex DeBrie
Absolutely. And I think you’re right, you can optimize that pretty well. It’s also just amazing to me that you
can get a pretty good sense of that without having to do a load test, which isn’t going to be representative of
your actual data, but you can get a decent sense of your cost where you can say, “Hey, we think we’ll have 400
transactions per second doing these types of operations. These items are going to be this big.” Here’s the
ballpark range of costs we’re going to have, and you can even look at that. What if we were off by 2X, 3X, 4X,
5X, whatever, and get a sense of what your costs are going to be, I just have no idea to do that with a
relational database. You can do some testing but the load testing you’re going to do is just not going to be
representative of that stuff. So it’s just a lot harder I think, to get a good test on a relational database. So
then you see people over-provisioning and paying for 8% utilization on their relational database.
Rahul Subramaniam
Yeah. I think that’s one of the fundamental mind shifts that happen when you come to DynamoDB because when you
start off with a relational database, the first thing you start worrying about is, “Okay, how many cores do I
need? How much memory do I have?” Then you start looking at IOPS to see how much is going to get loaded in
memory. You’re literally focusing on all the wrong stuff. If you really think about… If your job is to design
the data model and stuff like that, then having to deal with the memory and your IOPS and your network IOPS, how
much network load is going to happen if my storage is going to be on attached storage or whatever, you are then
spending so much energy trying to solve for stuff that may or may not be in your control. Like tomorrow, if
there’s a little jitter in the network or something’s wrong with one disk or any of these parameters go awry,
you basically then land up to the situation where the performance is completely a function of infrastructure,
not the database itself.
Alex DeBrie
Yeah, absolutely. I think related to that, you’re just talking about the depth of things you have to know about
when you’re modeling with relational databases and I think it’s interesting, the learning curve difference
between relational and Dynamo, I think it’s easier to get started and know the basics of a relational database,
and that’s why, it’s pretty popular and people like it, but I think it’s really hard to get to the point where
you understand how all those different things are affecting your query, especially at a high scale, when you
talk about disk and your kernel and the cash and all these disk buffers and all sorts of things like that you
have to know to really know that relational database well, and how it’s going to up under pressure, network
jitter, and all that sort of things. Whereas with Dynamo, I think it’s harder to get started. So you have to
learn this completely different way of modeling. It’s not like an Excel table people think of with a relational
database, it’s just totally different.
Alex DeBrie
But I think you can become an expert to where you have pretty good confidence in how this is going to perform at
a high scale and what you need to press on. I think you can become an expert in that a lot quicker in Dynamo
than you can in a relational database.
Rahul Subramaniam
Yeah. And also, I feel like the parameters are so much simpler than you, it’s a different paradigm, but in
Dynamo, the number of parameters you have to juggle to get to the right solutions is so much lesser. There was a
time when I was trying to, make sure that relational, which is basically a MySQL store, once got to a certain
scale, I literally had to get down to the level of understanding, what the difference between the two engines
was and how the engines were operating and what they were optimized for and then literally, going in optimizing
the engine parameters, that just felt like I was wasting energy in the wrong place. So beyond a certain basic
scale, I almost feel like you have to go explore other data sources, relational stuff falling apart at some
point of the scale.
Alex DeBrie
Yeah, you could do some amazing things with the relational databases, but again, you really need to get down
into the details. You probably need to plan a lot in your application on how you’re doing some different
charting or different things like that, where you’re relaxing constraints anyway, whereas again like you’re
saying, with Dynamo, you learn five key concepts, and then you know the parameters you look for. I go to places
all the time where I do training and by the end of it, they know 80%, 90% of what they’re going to need to know
for most of their data models. And if they have a really hard question, again, you just need to point them in
the direction and say, “Hey, here are the factors to consider.” These are the same factors that we talked about
before, just how do you apply them to your situation? But it’s basic factors, it’s math you can do in an Excel
Spreadsheet pretty easily, rather than having to do a load test or really get down into the deep details of your
hardware.
Rahul Subramaniam
Thanks, Alex. This has been such a fun and insightful conversation so far, but unfortunately, we run our time
for this episode, but I still feel like we have so much more to talk about. I’d love to floor DynamoDB further
in an extended conversation. For the audience, if you’ve liked what you’ve heard so far, please review and
comment on the episode, and don’t forget to join us next week as we continue this conversation with Alex about
DynamoDB. Thank you.
Speaker 1
We hope you enjoyed this episode of AWS Insiders. If so, please take a moment to rate and review the show. For
more information on how to implement a hundred percent safe AWS recommended account fixes that can save you 10%
to 20% of your AWS bill, visit cloudfix.com. Join us again next time for more secrets and strategies from top
Amazon insiders and experts.Thank you for listening.