john Archives

On-Call Best Practices: Page Your Manager

Aug 13, 2015

By John Laban | In Alerting, Best Practices & Insights, Operations Performance

Having one person on-call isn’t enough. What happens if your on-call engineer sleeps through their alert? What happens if their phone’s battery dies without them knowing, or if they get an alert at a really inconvenient time, like when stuck on a bus or in traffic? It will happen. We present best practices for back up. One or more people, waiting in the wings, ready to spring into action if your primary on-call is unable to perform his or her duties to the best of their abilities at any given time.

Approaching the Hiring of Engineers as a Machine Learning Problem

Oct 16, 2012

By John Laban | In Alerting, Operations Performance

Tags hiring, Hiring Best Practices, Hiring Developers, Hiring Engineers, Jobs at PagerDuty, john, machine learning

Hiring software engineers is hard. We all know this. If you get past the problem of sourcing and landing good candidates (which is hard in…

Pressure Release Valves

Jan 27, 2012

By John Laban | In Reliability

Tags increasing-availability, john, MTTR, reliability

This is the fourth in a series of posts on increasing overall availability of your service or system. Have you ever gotten paged, and known…

A Standard Operating Procedure for when s*IT hits the fan

Nov 08, 2011

By John Laban | In Reliability

Tags increasing-availability, john, MTTR, reliability

This is the third in a series of posts on increasing overall availability of your service or system. In the first post of this series, we…

More control over Optimistic Locking in Rails

Oct 10, 2011

By John Laban | In Reliability

Tags Code, john, Optimistic Locking, reliability

Like pretty much everything else in Rails, optimistic locking is nice and easy to setup: you simply add a “lock_version” column to your ActiveRecord model…

Availability lessons from shoe companies and ancient warlords

Oct 03, 2011

By John Laban | In Reliability

Tags increasing-availability, john, MTTR, reliability

This is the second in a series of posts on increasing overall availability of your service or system. In the first post of this series,…

New APIs Available Now

Jun 20, 2011

By John Laban | In Announcements, Features

Tags API, john

Have you ever said to yourself: “PagerDuty is great, but I wish I could better integrate it into the custom tools I already use.” Or…

Standing on the shoulders of giants and stumbling with them – the Amazon AWS outage’s "pain" statistics

Apr 22, 2011

By John Laban | In Reliability

Tags john, reliability

Today, at around 1am Pacific Time, Amazon began having major problems with some of their cloud infrastructure: specifically with their EC2, EBS, and RDS offerings. We’d like to share some statistics on the alerts we sent out – via phone or SMS – during the outage.

The ups and downs of Availability

Apr 18, 2011

By John Laban | In Reliability

Tags increasing-availability, john, MTBF, MTTR, reliability, SLA

This post is meant as a quick introduction to some concepts of system availability, so that subsequent posts in this series make sense. I’ll go over concepts like availability, SLA, mean time between failure, mean time to recovery, etc.

On-Call Best Practices: Part 1

Mar 30, 2011

By John Laban | In Alerting, Best Practices & Insights, Operations Performance

Tags Best Practices, john, On-call

This is Part 1 in a multi-part series dealing with tips for being on-call.

Incident Management

AIOps

Process Automation

Customer Service Ops

Status Pages

Stakeholders Communications

Integrations

PagerDuty Copilot

Developer Platform

Professional Services

Security

Enterprise Class

Integrations

john

On-Call Best Practices: Page Your Manager

Approaching the Hiring of Engineers as a Machine Learning Problem

Pressure Release Valves

A Standard Operating Procedure for when s*IT hits the fan

More control over Optimistic Locking in Rails

Availability lessons from shoe companies and ancient warlords

New APIs Available Now

Standing on the shoulders of giants and stumbling with them – the Amazon AWS outage’s "pain" statistics

The ups and downs of Availability

On-Call Best Practices: Part 1

Popular Posts

Search