AWS Made Easy

Tip #34: How to optimize for AWS operational resilience

Build stability and trust by optimizing your AWS infrastructure for operational resilience

The AWS definition for operational resilience is:

Operational resilience is the ability to provide continuous service through people, processes, and technology that are aware of and adaptive to constant change. It is a real-time, execution-oriented norm embedded in the culture of AWS that is distinct from traditional approaches in Business Continuity, Disaster Recovery, and Crisis Management, which rely primarily on centralized, hierarchical programs focused on documentation development and maintenance.

Why should you optimize for operational resilience?

This is simple. If you don’t, even minor failures could result in prolonged downtimes, scaring away customers and angering stakeholders. According to this IDC study, the average annual cost of downtime for Fortune 1000 companies ranges between $1.25-$2.5 billion per year.

How can you achieve operational resilience?

Avoid assumptions and confirm the facts. There are four AWS operational resilience pillars you need to check: Infrastructure, Operations, Security, Software. Make sure your application can handle failures and recover quickly. If you want to dig deeper, check out AWS training material for each of these pillars.

Infrastructure

Take advantage of the highly resilient infrastructure provided by AWS, and host your resources in multiple Availability Zones to virtually eliminate the risk of infrastructure failures such as hardware failures, natural disasters, and power outages. AWS provides a plethora of managed services to create highly available and scalable applications, so you don’t have to reinvent the wheel.

Operations

If you haven’t already, create playbooks so your team knows how to handle failures and disaster recovery, and practice it before going into production. Create regular backups to avoid or minimize data loss.

Use any feedback and learnings when handling issues and incidents to continuously improve these playbooks.

Security

AWS provides a shared responsibility model when it comes to security. In short, AWS is responsible for the security of their hardware and infrastructure, and they provide excellent tools so you can implement the required security measures on your side. Always go for the least privilege strategy to make sure your data is safe, and create regular backups to minimize the blast radius of ransomware attacks and similar.

Software

AWS Managed Services helps you minimize the risks to provision, run, and support the infrastructure by automating common activities such as change requests, monitoring, patch management, security, and backup services.

AWS provides complete toolkits to improve the stability and security of your application development, such as AWS CodeDeploy and AWS Code Pipeline.

Conclusion

Optimizing for operational resilience is often overlooked until companies feel the pain for the first time. Make sure to plan ahead and retain your customers with stability and trustworthiness.

References

Email
Twitter
Facebook
LinkedIn

Leave a Reply

Your email address will not be published.

Related Tips & Tricks