<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PagerDuty Blog</title>
	<atom:link href="http://blog.pagerduty.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.pagerduty.com</link>
	<description></description>
	<lastBuildDate>Tue, 31 Aug 2010 23:34:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Load Balancers need static IPs!</title>
		<link>http://blog.pagerduty.com/2010/08/31/load-balancers-need-static-ips/</link>
		<comments>http://blog.pagerduty.com/2010/08/31/load-balancers-need-static-ips/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 23:02:11 +0000</pubDate>
		<dc:creator>Andrew Miklas</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=269</guid>
		<description><![CDATA[We&#8217;ve been hosting PagerDuty on AWS for about the last year. One of the biggest draws to the platform for us was the promise of ready-built components &#8212; on AWS there&#8217;s no need to run your own redundant DB setup or load balancer, &#8230; <a href="http://blog.pagerduty.com/2010/08/31/load-balancers-need-static-ips/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been hosting PagerDuty on <a href="http://aws.amazon.com/">AWS</a> for about the last year. One of the biggest draws to the platform for us was the promise of ready-built components &#8212; on AWS there&#8217;s no need to run your own <a href="http://aws.amazon.com/rds/">redundant DB setup</a> or <a href="http://aws.amazon.com/elasticloadbalancing/">load balancer</a>, since Amazon provides them: pre-built and professionally managed.</p>
<p>Well, that&#8217;s the theory, anyway. Unfortunately, every time I&#8217;ve evaluated any AWS service beyond their simple EC2 hosting, AWS has come up short. Perhaps most frustrating, their services cover 95% of what we need. But without fail, they are lacking some small but critical piece of functionality.<span id="more-269"></span></p>
<p>Consider AWS&#8217;s <a href="http://aws.amazon.com/elasticloadbalancing/">elastic load balancer</a> (ELB), for example. It provides an easy way to distribute traffic fairly over all of your front-end instances.  It can automatically stop routing requests to failed instances, completely hiding network and instance failures from the user. The ELB can even automatically spin up new instances in response to traffic spikes. All of this would take some serious engineering effort to replicate on your own.</p>
<p>Unfortunately, it&#8217;s totally unusable in many real-world deployments. The problem is that Amazon doesn&#8217;t assign static IPs to their load balancers. Instead, you get a hostname and are told to setup CNAME records aliasing www.yourdomain.com to the ELB&#8217;s name. This has three serious problems.</p>
<p>First, you can&#8217;t use a CNAME for the root of a domain.  This is because a CNAME record can&#8217;t coexist with a SOA record at the same point in the DNS hierarchy.  As a result, if your site is hosted at yourdomain.com, you&#8217;ll need to move it to www.yourdomain.com. Of course, even with redirects in place at the original domain, this sort of branding change is going to be unacceptable to many businesses.</p>
<p>Second, you can&#8217;t properly accept email to a domain hosted by an ELB.  This too is due to a DNS limitation &#8212; you can&#8217;t have a MX and CNAME record at the same point in the DNS hierarchy.  While you might be able to accept mail if you run a SMTP server on the machines behind the ELB, this is far from a typical configuration.  At PagerDuty, this is a showstopper, since we need to be able to both host a site and accept mail at yoursubdomain.pagerduty.com.</p>
<p>Finally, you have no &#8220;out&#8221; if the ELB blows up, short of adjusting your DNS records and waiting for cached records to expire. This is a big problem for us, since we&#8217;re very hesitant to introduce components into PagerDuty&#8217;s infrastructure that we can&#8217;t quickly swap out in the event of a problem.</p>
<p>The solution to this problem is simple &#8212; it should be possible to map an Amazon Elastic IP to an ELB. Since the ELB would now have a static IP, the DNS issues would be solved. And if the ELB blew up, you could simply provision another and remap the IP &#8212; no DNS changes required. I realize that ELB&#8217;s &#8220;no static IP&#8221; architecture is probably a deeply baked in design decision &#8212; but unfortunately, a LB without a static IP isn&#8217;t really usable.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/08/31/load-balancers-need-static-ips/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>3 Major New Features &#8211; Part 3: PagerDuty &amp; Cloudkick Partnership</title>
		<link>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-3-pagerduty-cloudkick-partnership/</link>
		<comments>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-3-pagerduty-cloudkick-partnership/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 22:46:11 +0000</pubDate>
		<dc:creator>Andrew Miklas</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=144</guid>
		<description><![CDATA[This is third article of a three part series about the latest improvements to PagerDuty.  Be sure to check out Part 1 and Part 2. One of the biggest challenges in creating PagerDuty was determining how to get the wide variety &#8230; <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-3-pagerduty-cloudkick-partnership/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>This is third article of a three part series about the latest improvements to PagerDuty.  Be sure to check out <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-1-integration-api">Part 1</a></em><em> and <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-2-the-nagios-pagerduty-api">Part 2</a></em><em>.</em></p>
<p>One of the biggest challenges in creating PagerDuty was determining how to get the wide variety of monitoring systems out there and PagerDuty to talk to each other.  Up until now, we&#8217;ve relied on email to be the common interface between monitoring tools and PagerDuty; however, there&#8217;s a limit to the amount of information we can extract from free-text messages.  We now offer an API that can be used to programmatically send monitoring events to PD, but we recognize that this can be a bit of a chore to set up with your favorite monitoring tool.</p>
<p>The ideal scenario would be if your monitoring tools were PagerDuty-aware: if they &#8220;just knew&#8221; about your PagerDuty account and could seamlessly send events directly to PagerDuty.  We&#8217;re beginning to do just that by building partnerships with the best cloud-based (software-as-as-service) and run-it-yourself monitoring systems.</p>
<p><img class="size-full wp-image-231 alignleft" style="margin: 15px 15px 10px 0;" title="Cloudkick" src="http://blog.pagerduty.com/wp-uploads/2010/08/cloudkick.gif" alt="Cloudkick" width="100" height="28" /></p>
<p>We&#8217;re pleased to announce that <a href="https://www.cloudkick.com/">Cloudkick</a> is the first monitoring tool to include out-of-the-box integration capability with PagerDuty.  Cloudkick is a recognized leader in cloud-based monitoring.  They offer comprehensive connectivity and host-level monitoring across not only every major cloud provider, but also machines in your own data center via their agent software.  What&#8217;s more, Cloudkick&#8217;s plugin architecture makes it easy to set up application-level monitoring. Plugins exist for most major server software, but you can easily write new ones for your custom apps.  For a limited time, Cloudkick is offering all PagerDuty customers a 15% discount &#8212; see their <a href="https://www.cloudkick.com/t/pagerduty">pricing page</a> for more details.</p>
<p>To learn more about how Cloudkick can help you keep a handle on your systems, take a look at their <a href="https://www.cloudkick.com/features">feature list</a>.  Be sure to check out our <a href="http://www.pagerduty.com/docs/guides/cloudkick-integration-guide">Cloudkick integration guide</a> to see just how easy it is to have Cloudkick alerts delivered using your existing PagerDuty configuration.</p>
<p><img class="aligncenter size-full wp-image-251" title="Cloudkick notification via PagerDuty" src="http://blog.pagerduty.com/wp-uploads/2010/08/cloudkick1.gif" alt="Cloudkick notification via PagerDuty" width="530" height="410" /></p>
<p>Stay tuned for more partnership announcements &#8212; we hope to get out-of-the-box PagerDuty support into every popular monitoring tool over the next few months.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-3-pagerduty-cloudkick-partnership/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>3 Major New Features &#8211; Part 2: The Nagios -&gt; PagerDuty API</title>
		<link>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-2-the-nagios-pagerduty-api/</link>
		<comments>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-2-the-nagios-pagerduty-api/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 22:44:25 +0000</pubDate>
		<dc:creator>Andrew Miklas</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Features]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=140</guid>
		<description><![CDATA[This is second article of a three part series about the latest improvements to PagerDuty. Be sure to check out Part 1 and Part 3. We&#8217;ve just released a Nagios API for PagerDuty.  If you&#8217;re using Nagios to monitor your hosts, &#8230; <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-2-the-nagios-pagerduty-api/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>This is second article of a three part series about the latest improvements to PagerDuty. </em><em>Be sure to check out <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-1-integration-api">Part 1</a></em><em> and <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-3-pagerduty-cloudkick-partnership">Part 3</a></em><em>.</em></p>
<p><img class="size-full wp-image-222 alignleft" style="margin: 10px 15px 10px 0;" title="Nagios" src="http://blog.pagerduty.com/wp-uploads/2010/08/nagios1.gif" alt="Nagios" width="100" height="24" />We&#8217;ve just released a <a href="http://www.nagios.org/">Nagios</a> API for PagerDuty.  If you&#8217;re using Nagios to monitor your hosts, you no longer have to use PagerDuty&#8217;s email integration mechanism to get SMSes and phone calls from your Nagios installation.  Instead, you can completely bypass the email step and have Nagios directly communicate problem, acknowledgement, and recovery messages to PagerDuty via a HTTPS API.</p>
<p><img class="aligncenter size-full wp-image-225" title="Add a Nagios service" src="http://blog.pagerduty.com/wp-uploads/2010/08/new_service.gif" alt="Add a Nagios service" width="530" height="475" /></p>
<p>The main benefit of the API over the email integration mechanism is that PagerDuty can now automatically close out incidents when Nagios reports that the problem has been fixed.  No more getting a call 30 minutes after fixing a problem because you forgot to mark the incident as resolved in PagerDuty!  Also, since the API allows us to distinguish between PROBLEM and RECOVERY messages, PagerDuty will no longer spuriously start the alerting process on a RECOVERY message.</p>
<p>Using the new Nagios API is very simple &#8212; you simply create a Nagios service within PagerDuty, copy a little Perl script to your Nagios server, and then add a &#8220;pseudo-contact&#8221; to your Nagios config corresponding to the new service.  For step-by-step details on how to do this, please take a look at our <a href="http://www.pagerduty.com/docs/guides/nagios-integration-guide">Nagios integration guide</a>.</p>
<p>By switching your Nagios installation to use the API, you&#8217;ll be able to benefit from a number of new PagerDuty features we have planned.  One feature now in the works is the ability to have PagerDuty send out email and SMS alerts when an incident is resolved.  With this feature, you&#8217;ll be able to see at a glance whether an issue has resolved itself before crawling out of bed at 3am.</p>
<p>Another feature we&#8217;re now considering is the ability to assign Nagios alerts to different PagerDuty Escalation Policies based on Nagios variables such as the HOSTGROUP and SERVICEGROUP.  Let us know if this sounds useful to you &#8212; we&#8217;d love to know if this is something that your ops team would use.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-2-the-nagios-pagerduty-api/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>3 Major New Features &#8211; Part 1: Integration API</title>
		<link>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-1-integration-api/</link>
		<comments>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-1-integration-api/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 22:43:03 +0000</pubDate>
		<dc:creator>Alex Solomon</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Features]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=150</guid>
		<description><![CDATA[This is first article of a three part series about the latest improvements to PagerDuty. Be sure to check out Part 2 and Part 3. Today, we are proud to announce a major release for PagerDuty. We are launching not &#8230; <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-1-integration-api/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>This is first article of a three part series about the latest improvements to PagerDuty. </em><em>Be sure to check out <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-2-the-nagios-pagerduty-api">Part 2</a></em><em> and <a href="http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-3-pagerduty-cloudkick-partnership">Part 3</a></em><em>.</em></p>
<p>Today, we are proud to announce a major release for PagerDuty. We are launching not 1, nor 2 but <strong>3 major new features</strong>:</p>
<ul>
<li>Our new integration API.</li>
<li>Our new Nagios plugin and Nagios integration guide.</li>
<li>Our new Cloudkick plugin and Cloudkick integration guide.</li>
</ul>
<p>This is a lot of news to take in all at once. Thus, we have broken up the annoucement into 3 parts. Part 1, which is the article you&#8217;re reading now, will cover the new PagerDuty integration API. <a href="http://blog.pagerduty.com/2010/08/02/3-major-new-features-part-2-the-nagios-pagerduty-api">Part 2</a> will cover the new Nagios plugin and corresponding Nagios integration guide. <a href="http://blog.pagerduty.com/2010/08/02/3-major-new-features-part-3-pagerduty-cloudkick-partnership">Part 3</a> will cover integrating PagerDuty with the Cloudkick cloud server monitoring system.</p>
<h1>The integration API</h1>
<p>The new integration API allows you to add PagerDuty&#8217;s advanced alerting functionality to any system that can make an API call. The API provides a simple and powerful interface to PagerDuty and allows you to add phone, SMS and email alerting to your monitoring tools, ticketing systems, and custom software.</p>
<p>The only requirement to integrate PD with your systems is that your tool must be able to make an HTTP API call, or at least invoke a command-line script which then calls our API (Hint: most monitoring tools can do this).</p>
<h1>The API in a nutshell</h1>
<p>The integration API is very simple. It allows your system to send events to PagerDuty. We support 3 event types:</p>
<ul>
<li>Trigger</li>
<li>Acknowledge</li>
<li>Resolve</li>
</ul>
<p><strong>Trigger events</strong> should be sent out by your systems when problems occur. They result in the creation of an incident in PagerDuty; once the incident is created, we start alerting the on-call engineer.</p>
<p><strong>Acknowledge events</strong> are used to acknowledge incidents (no surprise there). Normally, you&#8217;ll ack an incident when you receive the phone call or SMS alert. We&#8217;ve added support for the acknowledge event, in case you have a monitoring system (or custom software) that sends out acks.</p>
<p>And finally, <strong>resolve events</strong> are used to resolve an incident in PagerDuty. This allows your monitoring systems to automatically resolve an incident in PagerDuty when the underlying problem is fixed.</p>
<p>To learn more about the integration API, please take a look at our API documentation here: <a href="http://www.pagerduty.com/docs/api/api-documentation" target="_blank">http://www.pagerduty.com/docs/api/api-documentation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/08/03/3-major-new-features-part-1-integration-api/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Assorted New Features</title>
		<link>http://blog.pagerduty.com/2010/06/14/assorted-new-features/</link>
		<comments>http://blog.pagerduty.com/2010/06/14/assorted-new-features/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 19:42:38 +0000</pubDate>
		<dc:creator>Andrew Miklas</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=117</guid>
		<description><![CDATA[Browsing through our UserVoice feature requests is a pretty humbling experience for all of us working on PagerDuty.  It seems that as far as we&#8217;ve come with PagerDuty in our first year, we have at least another ten years of &#8230; <a href="http://blog.pagerduty.com/2010/06/14/assorted-new-features/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Browsing through our UserVoice feature requests is a pretty humbling experience for all of us working on PagerDuty.  It seems that as far as we&#8217;ve come with PagerDuty in our first year, we have at least another ten years of work ahead!</p>
<p>We&#8217;re just putting the finishing touches on the Nagios integration API now.  We&#8217;re going to have the first version of this out in the next two weeks or so.  But in the mean time, we thought we should show you all some of the other smaller features we&#8217;ve recently launched.</p>
<h1>Looping on Escalation Chains</h1>
<p>Up until now, incidents ran exactly once through their escalation policies.  Thus, unanswered incidents remained assigned to the last person on the escalation chain. Needless to say, this caused some problems if an alert made it through the escalation process without anyone taking action.</p>
<p>To ensure that open incidents are <em>always </em>dealt with, the final rule of an escalation policy can now direct PagerDuty to reassign the incident to the first person in the chain, and begin the escalation process anew.</p>
<p>We&#8217;re especially curious if anyone needs additional flexibility in the escalation policies.  Would the ability to loop back to a rule other than the first be useful to anyone?</p>
<p><img class="aligncenter size-full wp-image-127" title="PagerDuty Escalation Policies" src="http://blog.pagerduty.com/wp-uploads/2010/06/pagerduty_escalation.png" alt="PagerDuty Escalation Policies" width="530" height="301" /></p>
<h1>Better Regex Support</h1>
<p>We&#8217;ve made it possible to specify both &#8220;AND&#8221; and &#8220;OR&#8221; trigger message regex filters.  We&#8217;ve also added the option to filter incoming messages based on the &#8220;from&#8221; address.  If you&#8217;ve ever accidentally hit &#8220;reply-all&#8221; on a trigger message you&#8217;ve been CC&#8217;ed, you&#8217;ll know exactly why we&#8217;ve added the from filter option.</p>
<p><img class="aligncenter size-full wp-image-129" title="PagerDuty Service Email Filters" src="http://blog.pagerduty.com/wp-uploads/2010/06/pagerduty_regex.png" alt="PagerDuty Service Email Filters" width="530" height="222" /></p>
<h1>SSL &amp; TLS</h1>
<p>Our customers often find themselves needing to log into their PagerDuty accounts while on open WiFi points at airports, coffee shops, and the like.  Up until now, this was sort of a dicey proposition, since only the PagerDuty login and billing pages were SSL protected.  By popular request, we&#8217;ve added the option to enable SSL across your entire PagerDuty account.  To enable this option, get your account owner to visit the &#8220;Account Settings&#8221; page and flip on the SSL option.</p>
<p><img class="aligncenter size-full wp-image-124" title="PagerDuty Account  Settings - SSL" src="http://blog.pagerduty.com/wp-uploads/2010/06/pagerduty_ssl.png" alt="PagerDuty Account Settings - SSL" width="530" height="242" /></p>
<p>By the way, we&#8217;ve also configured our mail servers to accept TLS protected SMTP sessions&#8230; perfect in case you suspect your network operator or upstream provider has some <a title="BOFH" href="http://www.theregister.co.uk/odds/bofh/">BOFH</a> tendencies.  Simply configure your outbound mail servers to use TLS opportunistically, and you should be all set.  If you&#8217;d like to check to see if your mail is being received encrypted at our end, click &#8220;View Email&#8221; on an incident trigger and then use the &#8220;View Raw Message&#8221; link.  If the message is encrypted, the last hop listed in the receive headers will mention a TLS-enabled connection.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/06/14/assorted-new-features/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PagerDuty 2.0</title>
		<link>http://blog.pagerduty.com/2010/04/12/pagerduty-20/</link>
		<comments>http://blog.pagerduty.com/2010/04/12/pagerduty-20/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 05:48:56 +0000</pubDate>
		<dc:creator>Alex Solomon</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=89</guid>
		<description><![CDATA[We&#8217;re happy to announce we&#8217;ve released the new version of PagerDuty, which has multi-incident support. To try it out, just log into your PagerDuty account. This new feature corrects an over-simplification in PagerDuty&#8217;s design: up to now, PD required you &#8230; <a href="http://blog.pagerduty.com/2010/04/12/pagerduty-20/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re happy to announce we&#8217;ve released the new version of PagerDuty, which has multi-incident support. To try it out, just log into your PagerDuty account.</p>
<p>This new feature corrects an over-simplification in PagerDuty&#8217;s design: up to now, PD required you to create a new alarm for each type of problem that your monitoring systems are capable of detecting. Unfortunately, this doesn&#8217;t work very well if you&#8217;re using a monitoring tool like Nagios, which can monitor thousands of hosts and services at once. The new release can now handle multiple open incidents from a single monitoring system; we call this &#8220;multi-incident support&#8221;.</p>
<p>Here&#8217;s a quick summary of the changes in the new release:</p>
<ul>
<li>Alarms have been renamed to Services.</li>
<li>Alarm Groups have been renamed to Escalation Policies.</li>
<li>Services can now track multiple open incidents at once.</li>
<li>Incident &#8220;suppression&#8221; has been renamed to &#8220;acknowledgement&#8221;.</li>
<li>The amount of time an incident stays Acknowledged is now configurable on a service-by-service basis</li>
</ul>
<p>The new version of PD is 100% backwards compatible with the previous version. Yes, we&#8217;ve renamed a bunch of stuff, but we&#8217;ve been very careful to retain the same behavior as the old version for your existing services. Read on for more details.</p>
<h2>The big change: Multi-Incident Support</h2>
<p>PagerDuty is now capable of tracking multiple open concurrent incidents.  Put another way, your monitoring system can tell PagerDuty about 100 simultaneous and independent problems without you needing to create 100 PagerDuty alarms (as was the case in the old version of PD).</p>
<p>PagerDuty now uses “incidents” rather than “alarms” as the main object.  Your support team will be acknowledging, escalating, and resolving incidents, instead of alarms.  Incidents in PagerDuty are similar to tickets in a bug tracking system: they are created when a problem is detected, and are resolved or closed when the problem is fixed.</p>
<p>Since PagerDuty can now handle hundreds of open incidents at once, we’ve tried to carefully design PagerDuty’s interface to make it easy to work with large collections of incidents.  The new Incidents and Dashboard tabs feature tables that let you see all of the open incidents assigned to you at a glance.  You can also easily triage your incidents straight from these pages using the controls located at the top of the table.</p>
<p style="text-align: center;"><a href="http://blog.pagerduty.com/wp-uploads/2010/03/incidents_tab2.png" target="_blank"><img class="aligncenter size-full wp-image-81" title="Incidents Tab" src="http://blog.pagerduty.com/wp-uploads/2010/03/incidents_tab.png" alt="Incidents tab" width="477" height="271" /></a></p>
<h2 style="text-align: left;">Turning on multi-incident support for your PagerDuty services</h2>
<p style="text-align: left;">By default, the PagerDuty services still work the same way they&#8217;ve always worked: they can only have one incident open at once. The reason for this is to maintain backwards compatibility.</p>
<p>You can enable multi-incident support for any existing service. Here&#8217;s how:</p>
<ol>
<li>Click on the &#8220;Services&#8221; tab, and click the &#8220;Edit&#8221; link (under Actions) for the service you wish to modify.</li>
<li>Under the &#8220;Email integration settings&#8221; section, you&#8217;ll see 3 options:
<ul>
<li>Open a new incident for each trigger email</li>
<li>Open a new incident for each new trigger email subject</li>
<li>Open a new incident only if an open incident does not already exist</li>
</ul>
<p style="text-align: left;"><a href="http://blog.pagerduty.com/wp-uploads/2010/03/service_email_incident_creation2.png" target="_blank"><img class="aligncenter size-full wp-image-77" title="service_email_incident_creation2" src="http://blog.pagerduty.com/wp-uploads/2010/03/service_email_incident_creation.png" alt="Email integration settings" width="477" height="258" /></a><br />
The first option, if selected, will cause the service to open a new incident for each trigger email sent to the service&#8217;s email address.</p>
<p>The second option, if selected, will cause the service to open a new incident based on the email subject: if an open incident with the same subject already exists, the email is appended to this incident; if not, a new incident is created.</p>
<p>The third option, which should be selected by default for an existing service, allows a service to maintain the behavior of the old version of PagerDuty. It basically turns multi-incident support off: if selected, the service can only have one open incident at any one time. When the service receives a trigger eamil, it opens a new incident if the service doesn&#8217;t already have an open incident; otherwise, it appends the email to the open incident.</li>
<li>To turn multi-incident support on, select either the first or second option.</li>
<li>Click &#8220;Save changes&#8221; at the bottom of the page, and you&#8217;re done.</li>
</ol>
<h2>Alarms are now Services</h2>
<p>We’ve renamed “alarms” to “services”.  Services are now used only to represent an integration point between PagerDuty and your monitoring services. Currently, the PagerDuty services integrate with your monitoring systems via email integration (just like in the old version of PD). In the coming weeks, we will also add support for an HTTP-based API for the PagerDuty services. This will allow your monitoring systems to trigger/acknowledge/resolve incidents in PagerDuty via a synchronous API call.</p>
<p>For similar reasons, we’ve renamed “alarm groups” to “escalation policies”.  We think the new name better captures the use of these objects.</p>
<h2>Incident &#8220;suppression&#8221; is now incident &#8220;acknowledgement&#8221;</h2>
<p>We’ve also renamed incident “suppression” to “acknowledge”.  As before, this feature is used to temporarily prevent an incident from generating alerts.  We thought the word “acknowledge” better captured the purpose of the feature: “stop bothering me about this problem for now… I’m working on it!”.</p>
<p>We&#8217;ve also made the acknowledgement timeout configurable on a service-by-service basis. This means that you can set the amount of time that an incident stays in the Acknowledged state, before it reverts to back to Triggered and alerts you again. The timeout is set to 30 minutes by default for each service, but you can change it or even turn it off easily:</p>
<ol>
<li>Click on the &#8220;Services&#8221; tab, and click the &#8220;Edit&#8221; link (under Actions) for the service you wish to modify.</li>
<li>Under the &#8220;Incident settings&#8221; section, you&#8217;ll see an entry for the &#8220;Incident ack timeout&#8221;.
<p style="text-align: center;"><a href="http://blog.pagerduty.com/wp-uploads/2010/04/incident_ack_timeout.png"><img class="size-full wp-image-106 aligncenter" style="border: 1px solid #666666;" title="incident_ack_timeout" src="http://blog.pagerduty.com/wp-uploads/2010/04/incident_ack_timeout.png" alt="Incident ack timeout" width="477" height="155" /></a></p>
</li>
<li>By default, the timeout is set to &#8220;30 minutes&#8221;. To modify the timeout, click and change the value of this drop-down.You can also disable the timeout altogether, by unchecking the checkbox labeled &#8220;Enable a timeout for incidents left in the Acknowledged state for too long&#8221;. We recommend leaving the timeout enabled, to ensure you don&#8217;t forget incidents in the Acknowledged state.</li>
<li>Click &#8220;Save changes&#8221; at the bottom of the page, and you&#8217;re done.</li>
</ol>
<h2>What’s next?</h2>
<p>Next up is support for a PagerDuty API. This will make it easier to integrate PagerDuty with popular monitoring tools like Nagios, Zenoss, monit, Munin and many others. The API will allow your monitoring system to trigger, acknowledge and resolve incidents directly in PagerDuty, via a synchronous call to the API.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/04/12/pagerduty-20/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Preview release of the new &#8220;multi-incident&#8221; version of PagerDuty</title>
		<link>http://blog.pagerduty.com/2010/03/18/preview-release-multi-incident/</link>
		<comments>http://blog.pagerduty.com/2010/03/18/preview-release-multi-incident/#comments</comments>
		<pubDate>Thu, 18 Mar 2010 20:11:00 +0000</pubDate>
		<dc:creator>Alex Solomon</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Features]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=61</guid>
		<description><![CDATA[We&#8217;ve been carefully reviewing your feature requests to try to understand how best to improve PagerDuty.  One feature request came up far more often than the rest: make it easier to integrate PagerDuty with monitoring tools.  We&#8217;ve taken this request &#8230; <a href="http://blog.pagerduty.com/2010/03/18/preview-release-multi-incident/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been carefully reviewing your feature requests to try to understand how best to improve PagerDuty.  One feature request came up far more often than the rest: make it easier to integrate PagerDuty with monitoring tools.  We&#8217;ve taken this request to heart and have begun reworking PagerDuty so that we will soon be able to support API integration with monitoring systems like Nagios.</p>
<p>Before we can release an API for PagerDuty, though, we need to correct some over-simplifications in PagerDuty&#8217;s design.  Up until now, PD required you to create a new alarm for each kind of problem that your monitoring systems are capable of detecting.  Unfortunately, this doesn&#8217;t work so well if you&#8217;re using a monitoring tool like Nagios that can track thousands of conditions at once.</p>
<p>So, for the past few weeks, we&#8217;ve been busy re-designing PD so that it can handle multiple open incidents from a single monitoring service.  We&#8217;re just about ready to roll out this new-and-improved version of PagerDuty, but before we do, we&#8217;d like to give you the chance to familiarize yourself with the system, and let us know if there&#8217;s any way we can make the new system even better prior to launch.</p>
<h2>How do I try it out?</h2>
<p>Glad you asked!  For at least the next week, we&#8217;re going to run a preview of the new PagerDuty system.  To log in, visit:</p>
<p style="padding-left: 30px;"><strong>http://&lt;your-subdomain&gt;.pd-staging.com</strong></p>
<p>and use your normal PagerDuty email and password.</p>
<p>All of your data has been migrated from your PagerDuty account, so you can see exactly how the system will look once we update the software on our production servers.  The preview release is fully functional, so please feel free to kick-the-tires and have it dispatch a few alerts for you.  Don&#8217;t worry &#8212; nothing you do in your preview account will have any impact to your production environment.  Of course, all SMS and phone calls made from the preview environment will be free of charge.</p>
<p>In order to maintain backward compatibility, we&#8217;ve configured all existing alarms to only support one active incident at once.  To remove this restriction, simply:</p>
<ol>
<li>Click the &#8220;Services&#8221; tab</li>
<li>Select one of your existing alarms</li>
<li>Click &#8220;Edit this service&#8221; on the right side of the screen</li>
<li>Switch the incident creation mode to &#8220;Open a new incident for each trigger email&#8221;</li>
<li>Click &#8220;Save Changes&#8221;</li>
</ol>
<p style="text-align: center;">
<h4><a href="http://blog.pagerduty.com/wp-uploads/2010/03/service_email_incident_creation_big.png" target="_blank"><img class="aligncenter size-full wp-image-77" title="service_email_incident_creation2" src="http://blog.pagerduty.com/wp-uploads/2010/03/service_email_incident_creation2.png" alt="service_email_incident_creation2" width="477" height="258" /></a></h4>
<h2>The big change: Multi-Incident Support</h2>
<p>PagerDuty is now capable of tracking multiple open concurrent incidents.  Put another way, your monitoring system can tell PagerDuty about 100 simultaneous and independent problems without you needing to create 100 PagerDuty alarms, as is the case now.</p>
<p style="text-align: left;">PagerDuty now uses &#8220;incidents&#8221; rather than &#8220;alarms&#8221; as the main object.  Your support team will be acknowledging, escalating, and resolving incidents, instead of the alarms that they work with now.  Incidents in PagerDuty are similar to tickets in a bug tracking system: they are created when a problem is detected, and are resolved or closed when the problem is fixed.</p>
<p>Since PagerDuty can now handle hundreds of open incidents at once, we&#8217;ve tried to carefully design PagerDuty&#8217;s interface to make it easy to work with large collections of incidents.  The new Incidents and Dashboard tabs feature tables that let you see all of the open incidents assigned to you at a glance.  You can also easily triage your incidents straight from these pages using the controls located at the top of the table.</p>
<p style="text-align: center;"><a href="http://blog.pagerduty.com/wp-uploads/2010/03/incidents_tab_big.png" target="_blank"><img class="aligncenter size-full wp-image-81" title="incidents_tab2" src="http://blog.pagerduty.com/wp-uploads/2010/03/incidents_tab2.png" alt="incidents_tab2" width="477" height="271" /></a></p>
<p style="text-align: center;">
<p style="text-align: left;">One of the biggest advantages to PagerDuty&#8217;s existing single-incident design is that it can&#8217;t generate alert storms.  Even if Nagios sends hundreds of emails to PagerDuty at once, you&#8217;ll only receive one set of phone calls and SMS messages.  We&#8217;ve been careful to preserve this feature in the new version of the product.  PagerDuty will intelligently bundle multiple incidents into a single set of notifications so that you aren&#8217;t overwhelmed with alerts.</p>
<h2>Other changes</h2>
<p>We&#8217;ve made a few of other small changes to support the new multi-incident functionality.</p>
<p>First, we&#8217;ve renamed &#8220;alarms&#8221; to &#8220;services&#8221;.  Alarms/services are now used only to represent an integration point between PagerDuty and your monitoring services.  Currently, PagerDuty only has one type of service: the simple email-triggered mechanism you used in the previous version of PagerDuty.  In the coming weeks, we will be adding support for API-driven services so that we can offer even closer integration with products like Nagios.</p>
<p>For similar reasons, we&#8217;ve renamed &#8220;alarm groups&#8221; to &#8220;escalation policies&#8221;.  We think the new name better captures the use of these objects.</p>
<p>Finally, we&#8217;ve renamed incident &#8220;suppression&#8221; to &#8220;acknowledge&#8221;.  As before, this feature is used to temporarily prevent an incident from generating alerts.  We thought the word &#8220;acknowledge&#8221; better captured the purpose of the feature: &#8220;stop bothering me about this problem for now&#8230; I&#8217;m working on it!&#8221;.</p>
<h2>What&#8217;s next</h2>
<p>Next up is support for a PagerDuty API.  Once we&#8217;ve deployed PagerDuty multi-incident to production and ensured that everyone is comfortable with the new system, we&#8217;ll announce our plans for the API.  Stay tuned for more info!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/03/18/preview-release-multi-incident/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>New Feature: Alarm Auto-resolution</title>
		<link>http://blog.pagerduty.com/2010/03/07/new-feature-alarm-auto-resolution/</link>
		<comments>http://blog.pagerduty.com/2010/03/07/new-feature-alarm-auto-resolution/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 05:16:49 +0000</pubDate>
		<dc:creator>Alex Solomon</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=55</guid>
		<description><![CDATA[We&#8217;d like to announce a new PagerDuty feature: auto-resolution of alarms. Auto-resolution is a setting on the PagerDuty alarms; if enabled, an alarm will automatically resolve itself after a specified amount of time. Alarm auto-resolution is an important safety mechanism &#8230; <a href="http://blog.pagerduty.com/2010/03/07/new-feature-alarm-auto-resolution/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;d like to announce a new PagerDuty feature: auto-resolution of alarms. Auto-resolution is a setting on the PagerDuty alarms; if enabled, an alarm will automatically resolve itself after a specified amount of time.</p>
<p>Alarm auto-resolution is an important safety mechanism in case you forget an alarm in the Triggered state. This all makes perfect sense if you understand how the PagerDuty alarms work.</p>
<p>Alarms in PagerDuty are stateful. Each alarm starts out in the Idle state. Upon receiving a trigger email, the alarm transitions to the Triggered state and begins to alert your team based on the rules specified by the alarm&#8217;s alarm group. However, if an already Triggered alarm receives additional trigger emails, it logs them but *does not re-start the alerting process*. This can be dangerous, as I&#8217;ll explain below.</p>
<p>In the normal case, an alarm is triggered and notifies the person on-call. That person receives the phone/SMS/email alert, fixes the problem and resolves the alarm. In some cases, the person on-call does not receive the alert (this can happen if your cell runs out of batteries, or has no reception, or you forget your phone in another room and go to sleep). In these cases, the alarm is automatically escalated to a secondary person, who then picks up the alert and resolves the alarm. It&#8217;s also possible (and this has happened a few times to some of our customers) that an alarm triggers and contacts all of the people in the escalation chain, but nobody picks it up.</p>
<p>When an alarm runs out of people to notify, it stays in the Triggered state until someone resolves it. This is a dangerous state for an alarm to be in, because, as I mentioned above, any trigger emails to the alarm will not restart the alerting process. The alarm must be explicitly resolved to re-enable alerting.</p>
<p><img class="aligncenter size-full wp-image-58" title="autores_600" src="http://blog.pagerduty.com/wp-uploads/2010/03/autores_600.png" alt="autores_600" width="600" height="140" /><br />
This is where auto-resolution comes in. We strongly recommend you turn it on for all of your alarms. Here&#8217;s how to enable auto-resolution for an alarm:</p>
<ol>
<li>Click on the Alarms tab, and click one of your alarms.</li>
<li>Near the top of the page, you&#8217;ll see &#8220;Auto resolve&#8221;. Click &#8220;change&#8221;.</li>
<li>Set the amount of time after which the alarm is auto-resolved. This should be set according to the amount of time an alarm would take to run out of people to notify (as specified by the rules set in your alarm group).</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2010/03/07/new-feature-alarm-auto-resolution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Feature: Reports</title>
		<link>http://blog.pagerduty.com/2009/11/23/new-feature-reports/</link>
		<comments>http://blog.pagerduty.com/2009/11/23/new-feature-reports/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 11:05:43 +0000</pubDate>
		<dc:creator>Andrew Miklas</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=39</guid>
		<description><![CDATA[For about the last month, we&#8217;ve been busy at work on our most requested feature: billing. Hmm&#8230; ok, perhaps billing isn&#8217;t quite the #1 requested feature request, but surprisingly, we did actually have a few customers who were asking for &#8230; <a href="http://blog.pagerduty.com/2009/11/23/new-feature-reports/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For about the last month, we&#8217;ve been busy at work on our most requested feature: billing. Hmm&#8230; ok, perhaps billing isn&#8217;t quite the #1 requested feature request, but surprisingly, we did actually have a few customers who were asking for it.</p>
<p>Yesterday, we rolled out PagerDuty&#8217;s reporting component, which is probably the most user-visible component of the billing project. Reports will give you a &#8220;phone bill&#8221; style view of all the alerts <a href="http://www.pagerduty.com">PagerDuty</a> sent in a month, along with who received the alert and which alarm triggered each alert. These reports should help you determine which of our <a href="http://www.pagerduty.com/plans">pricing plans</a> is best for your organization. For some of our customers, reports will also be useful when billing internal departments for out-of-hour service requests dispatched by PagerDuty.</p>
<div id="attachment_47" class="wp-caption alignnone" style="width: 610px"><img class="size-full wp-image-47" title="monthly_alert_report1" src="http://blog.pagerduty.com/wp-uploads/2009/11/monthly_alert_report1.png" alt="Shows who PagerDuty contacted, which method we used, and which alarm triggered the alert." width="600" height="388" /><p class="wp-caption-text">Reports show who PagerDuty contacted, which method we used, when we sent out out the alert, and which alarm triggered the alert.</p></div>
<p>If you have other types of reports you&#8217;d like to be able to generate from your PagerDuty account data, please let us know. One feature we&#8217;re thinking of adding to the reporting module is the ability to download the reports in CSV format. We haven&#8217;t started work on this yet, but if it&#8217;s a heavily requested feature, we could look at sliding it up a bit in the work queue.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2009/11/23/new-feature-reports/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New PagerDuty Feature: Alarm Groups</title>
		<link>http://blog.pagerduty.com/2009/09/09/new-pagerduty-feature-alarm-groups/</link>
		<comments>http://blog.pagerduty.com/2009/09/09/new-pagerduty-feature-alarm-groups/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 06:45:57 +0000</pubDate>
		<dc:creator>Andrew Miklas</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Features]]></category>

		<guid isPermaLink="false">http://blog.pagerduty.com/?p=16</guid>
		<description><![CDATA[We are proud to announce the release of a brand new feature to PagerDuty: alarm groups. Sounds simple, but it&#8217;s actually quite a sizable update to our system. To sum it up, alarm groups allow you to route problems differently &#8230; <a href="http://blog.pagerduty.com/2009/09/09/new-pagerduty-feature-alarm-groups/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We are proud to announce the release of a brand new feature to PagerDuty: <a href="http://www.pagerduty.com/tour/alarm-groups" target="_blank">alarm groups</a>. Sounds simple, but it&#8217;s actually quite a sizable update to our system. To sum it up, alarm groups allow you to route problems differently depending on their source.</p>
<p>First of all, alarm groups allow you to organize your alarms into groups. For example, you might want to create a group called &#8220;DB Alarms&#8221; for your database alarms, and another group called &#8220;Website Alarms&#8221; for alarms related to your site.</p>
<p>Secondly, and more importantly, alarm groups allow you to specify what happens when an alarm in the group is triggered. Each alarm group has a set of rules called Alerting and Escalation rules. These rules specify who to contact when an alarm is triggered, and when to escalate if the person does not acknowledge the alert.</p>
<p><img class="alignnone size-full wp-image-27" title="alarm_groups" src="http://blog.pagerduty.com/wp-uploads/2009/09/alarm_groups.png" alt="alarm_groups" width="600" height="232" /></p>
<p>As you can see in the example above (of an alarm group for database-specific alarms), the first rule says to contact the on-call person on the &#8220;DB Admins Primary&#8221; schedule. The escalation timeout for the first rule is 5 minutes; this means that if the Primary DB on-call doesn&#8217;t acknowledge the alert within 5 min, it will be escalated. If this happens, the second alerting &amp; escalation rule is invoked, and so on.</p>
<p>You can set the alerting rules to contact a specific individual (like rule 4 above) or the person that is on-call on a specific <a href="http://www.pagerduty.com/tour/on-call-scheduling" target="_blank">on-call schedule</a> (like rules 1 to 3 above).</p>
<p>With this new release, we have also lifted the restriction of only 3 on-call schedules (aka rotations). You now have the ability to create as many schedules as you need. You can browse all of your on-call schedules by clicking the On-call Schedules tab.</p>
<p><img class="alignnone size-full wp-image-24" title="schedule_index" src="http://blog.pagerduty.com/wp-uploads/2009/09/schedule_index.png" alt="schedule_index" width="600" height="454" /></p>
<p>The new alarm groups feature is available to all existing accounts under the Alarm Groups tab. We&#8217;ve created a &#8220;Default&#8221; alarm group for you already, and have put all of your existing alarms in this group. Your alerting and escalation settings have now reside under this &#8220;Default&#8221; alarm group. To access them, click on the Alarm Groups tab. In the alarm groups table, click on Default (under Alarm group name).</p>
<p>Rest assured, all of your settings, on-call schedules, alerting and escalation rules, users, and user contact rules are unchanged. If you have any questions or feedback, please <a href="mailto:support@pagerduty.com">get in touch</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pagerduty.com/2009/09/09/new-pagerduty-feature-alarm-groups/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
