How dotCloud, Instagram and One Crafty Systems Administrator are Using PagerDuty

Monitoring your infrastructure. It can be challenging, but that’s why you have all of the tools in place to make sure you don’t miss a beat when things go wrong. You’ve probably got Nagios monitoring your overall infrastructure, Pingdom or Neustar WPM monitoring your website, Boundary and New Relic monitoring your apps, or something completely home grown to watch everything else. No matter what you use, you have your systems configured in a way that is unique to your situation.

Since we can all learn from one another, below are a few ways in which your peers are managing and maintaining their IT infrastructure and how they are using PagerDuty to be alerted when systems go down.

dotCloud: Organizing a 24×7 bullet-proof on-call rotation with PagerDuty
by dotCloud – (@dot_cloud)

Instagram:  What Powers Instagram: Hundreds of Instances, Dozens of Technologies
by Instagram Engineering – (@instagram)

Twilio + PagerDuty = PhoneDuty
by David S. Shafer, manager of the group responsible for enterprise storage systems at a major national research university (@davidsshafer)

Share on FacebookTweet about this on TwitterGoogle+
This entry was posted in Operations Performance and tagged , , , . Bookmark the permalink.
  • Baskar Puvanathasan

    A test comment

    • PagerDuty

      ## Please do not write below this line ##

      Request received: [pagerduty] Re: How dotCloud, Instagram and One Crafty Systems Administrator are Using PagerDuty (ticket #7319)
      Your request (#7319) has been received, and is being reviewed by our support staff.
      To review the status of the request and add additional comments, follow the link below: (Requires separate PagerDuty support login)

      Disqus, Mar 26 11:26 (PDT)

  • Baskar Puvanathasan

    another test comment