97 Things Every Cloud Engineer Should Know - Collective Wisdom from the Experts

It’s my turn to publish a review here :slight_smile:

Some time ago I finished reading this book that @jamie_bright recommended to me. The book title is 97 Things Every Cloud Engineer Should Know and, as you can expect, the premise of the book is to be a collection of wisdom that are relevant to Cloud Engineers (I like to think Practitioners), like you and me, put together by the great Emily Freeman and Nathen Harvey.

To achieve that, this book presents itself in a really different way from the usual book. It feels like it is a collection of Medium articles, usually around 2 pages long, from almost 100 different authors, like our own @bnwoods and @rwsweeney! There are some GREAT stories and nuggets of information in there (like theirs), but some articles can be quite disappointing…

The book is dived in 12 parts: Fundamentals, Architecture, Migration, Security and Compliance, Operations and Reliability, Software Development, Cloud Economics and Measuring Spend, Automation, Data, Networking, Organizational Culture, and Personal and Professional Development - exactly in this order.

I’d like to highlight Nathen Harvey’s article in fundamentals. I see many people that decide that “AWS is the next big thing”, for instance, start studying right away for a certification, for instance, but forget to get the basics nailed down. I like his article because he provides exactly that, the basics, getting it straight from the source, the NIST definition. Cloud Computing is so much more than just “someone else’s computer”.

Dan Moore’s article also nails in the head on the a big fundamental cloud basic: use managed services. There is no point in going to any Cloud Service Provider if you don’t intend to use them in a way different than just “someone else’s computer”. Learn and embrace the concept of the Shared Responsibility Model. The higher you go on that scale, the more you dump on the Cloud Provider and the more you can focus on what matter’s: your business. If you think it’s cheaper to run your own Database, or Storage, or whatever-it-is, you are doing your math wrong.

There are two articles, one article by Duncan Mackenzie and the other by Lisa Huynh, that go over such a core concept that I believe they should be part of Fundamentals instead. The articles go over the basic of Scalability: Scaling Up/Vertically vs Out/Horizontally. This is a core concept, cloud or otherwise, whenever building an application. To scale up/vertically is probably easier to achieve: throw more RAM and CPU into the problem and you might get a faster answer back. To scale out/horizontally can get a bit more tricky -

Mattias Geniar tells the reader something that was really refreshing to hear: It’s OK to not run Kubernetes, despite seeing that the entire planet is currently doing it.

Mike Silverman delivers a clear message that I think most fail to understand: Lift-and-Shift as a model to move to the cloud is unlikely to succeed. It’s really important to understand, like Dan Moore said, that cloud ins’t about “someone’s else” computer. If you treat it like that, failure is likely to be the result.

Security and Compliance
I really like Fernando Duran’s view on SSHing into Production: don’t! With cloud, it’s easier than ever to just spin up a new instance with the fix/update/change/put-your-ssh-need-here instead of changing a moving part in a production environment. I mean, given you are using servers, right?

Stephen Kenzly is 100% honest about something that everyone that ever built on top of AWS already knows: IAM is @!#^$ hard! It’s an invaluable resource for proper security as well, so learning how to write good policies is key. He then teaches us how IAM Policies are evaluated by AWS and that can be really helpful when troubleshooting an access problem.

Operations and Reliability
Never take a single region dependency! That’s what Derek Martin, and any good Cloud Architect, will tell you. Of course, that’s easier said than done, so Derek provides a few tips and some considerations that you should have in mind when building multi-region architectures.

The trio Kit Merker, Brian Singer and Alex Nauda talk about the basics of Service-Level Objectives, SLO for those that are more intimate with it. It’s fairly usual to hear customers, managers, peers, teammates going for 100% uptime, for perfection. Even if that was possible, it would be really expensive. Understanding SLOs help you better define a target for your service/application in Cloud, leveraging, of course, those sweet services that have a higher (but not perfect!) availability than you would achieve doing it yourself.

A really boring (at least to me), yet really important topic is Monitoring. Tidjani Belmansour does a great job in introducing Monitoring to the Cloud Computing space, going over the basics of what monitoring actually is and what to monitor in the Cloud, from User Experience to Billing.

Software Development
It works on my machine! is heard way too often. But cloud doesn’t care about it, and Alessandro Diaferia makes sure the reader understand that. He goes over the basics of infrastructure that a Developer should be aware of, like infrastructure as code, CI/CD pipelines, container orchestration, etc.

Cloud Economics and Measuring Spend
When we are running our services on premises, the concept of cost of operation is really vague, especially to developers. It isn’t surprising to see requests to the infra team for servers that have 64Gb of RAM, just in case. The concept of FinOps is real and strong in the cloud, and Deepak Ramchandani Vensi does a great job introducing the basics of it, including CapEx vs OpEx.

On a really similar topic, Michael Winslow mentions something that many have probably heard already, when someone brought up service costs to a Developer: I’m a cloud engineer, not an accountant!. He goes over some tips that every Cloud practitioner should be aware of, like always reading the /price page of the service of choice.

Networking is one of the core foundations in any datacenter. Including in the cloud. David Murray, a principal solution architect at AWS, goes in length to make sure that the reader understands that, despite the Shared Responsibility Model, understanding Networking is still a customer responsibility when building in the cloud.

Organizational Culture
Our own Skycrafter @bnwoods has a GREAT article about organizational Silos. We know she’s a great author already (have you read all her insightful comments here in the forum?), but she really shines going over the Silo challenge that many organizations go through and how we, cloud practitioners, can help break them down.

Still talking about culture, Tiffany Jachja shares 3 lessons that she learned working with DevOps: define your target outcomes, safe environments, and architect your technology. In the process, she shares invaluable external resources, such as the book Accelerate by Nicole Forsgren and Dare to Lead by Brené Brown.

Personal and Professional Development
Last (literally, since it’s the book’s last article), but not the least, another Skycrafter @rwsweeney writes a bit about her REALLY cool journey on becoming a DevOps Engineer (now SRE) in such a short time. Spoiler: community and dedication play big roles on her story!

I just briefly highlighted a few of the articles you’ll find in the book. And, despite not every single one of them being A+ in my personal view, I still highly recommend this book, specially for those that are closer to the beginning of their cloud career. You can find it in most booksellers, like Amazon. Also Red Hat is making the eBook available for free over here.

Have you read it? What were your favorite parts?
Based on my review, is this something you’re going to add to your reading list?

1 Like