SuperPumpup (dot com)
General awesomeness may be found here.

14 October 2013

OpsWorks and System Architecture

TL;DR Moving to OpsWorks has been very helpful for us to get our system infrastructure represented and reproducible as code. I don't think it's a great tool for us to stay on as things grow in complexity, or we just want better ease-of-use. It's a great stepping stone.

At OwnLocal, I've been involved in porting our application infrastructure from EC2 instances controlled by Capistrano to Amazon's new product, OpsWorks.

I was impressed by the Capistrano scripts, though some things caused us a lot of pain:

  • We did not always get consistent deploys
  • We had to hard-code in IP addresses in our deploy files
  • Server configuration was unreliable (some of our servers had customized configurations that were problematic to rebuild)

Understanding what I do about Opsworks has taken a lot of effort, and I'm certainly no expert (yet?) but we now have servers that spin up on weekdays as our system comes under load and down as the load abates without any intervention.

The first time I saw my load-based servers spin up was amazing

However, I have felt some tension between what I consider good system design (an ecosystem of small applications) and the way that OpsWorks "Stacks" work. Oddly, when you add an App to this server that is deployed using their friendly GUI, it adds the app to all the servers, and then a deploy command is triggered on ALL instances by default. Which is odd. There's no obvious way to deploy several different Rails app layers within the same stack (so communicating between service applications require hard-coding addresses).

This design makes it seem easier to leave all your logic in One Big App

I also have been frustrated that though I have to write my own cookbooks to run other than the most trivial vanilla "Stack", I do not have access to the most powerful Chef concepts like "Search". Plus, debugging is an absolute nightmare. Amazon does a lot of "crafting" of the settings used on lifecycle events that are not easily (possibly?) replicated in something like a Vagrant environment. Therefore the feedback cycle on customizing scripts is ungodly slow at times (tens of minutes for some apsects of the life cycle).

Now I'm at a place where I have the set of cookbooks written to be able to deploy my infrastructure as code (GO ME!) and OpsWorks helped me get there. However, I'm really looking for a justifiable way to jump ship and go to a hand-rolled cloud (maybe VPC) that runs Chef (probably paying Opscode) with more modern tools like knife and search. In fact, I'll probably achieve this the way all great change is done - slowly and deliberatly, one service at a time.

Goals for moving forward:

  • Search for nodes (or settings) in a sane way
  • Reconfigure multiple "stacks" more quickly (to install package the "OpsWorks way", I click a lot, which is "convenient", though it's a pain to do on two stacks - like Production and Staging)
  • Have better visualization tools
  • Have a better deployment tool ("cap -S staging deploy branch=test_something_dangerous")

So in all, OpsWorks is great. If you want an easy-ish way to move to infrastructure as code, I encourage you to check it out. However, it's not the be-all and end-all, and it won't save you from learning Chef. So if your application has some significant complexity already, it may not be right for you.

Categories: Ruby on Rails Software