Greenhouse Recruiting Unavailable

Incident Report for Greenhouse

Postmortem

WHAT HAPPENED?

At 5:51pm EST on December 27th 2017, a Greenhouse Recruiting silo became unexpectedly unavailable. The cause was identified and resolved at 5:57pm EST. Only customers that use Silo 1 (a large proportion of our customers) were affected by this outage. Other silos, our Job Boards infrastructure, our APIs and Greenhouse Onboarding remained unaffected.

WHAT WAS THE EFFECT?

All affected users (see below for more detail) could not access the site.

WHO WAS AFFECTED?

This disruption affected all customers trying to access Silo 1 of Greenhouse Recruiting at https://app.greenhouse.io. Customers on other silos, those that access Greenhouse Recruiting via SSO, candidates applying via job boards, and our APIs were unaffected.

WHAT WAS THE CAUSE?

The outage was caused by an unexpected interaction between an Amazon Elastic Load Balancer (ELB) component and the technology that deploys and monitors our virtualized infrastructure. A fairly unique characteristic of an ELB is that it may routinely change its own IP address in a manner that is beyond our control. In this instance, an IP address change by the ELB caused our monitoring systems to believe our web servers were unavailable and prevented our routing layer from sending traffic to them, causing the downtime. We were able to update the configuration of the routing layer at 5:56pm, resolving the issue at 5:57pm.

WHAT ARE WE DOING TO PREVENT THIS FROM OCCURRING AGAIN?

We have identified two solutions that we will pursue to prevent this from occurring again:

Work is already in progress to move to a new form of load balancer that does not periodically change its own IP address.
We are tracking several fixes to our deployment and monitoring technology that should mitigate this problem, regardless of the load balancer type or configuration.

We take the reliability of our software very seriously, and are committed to making changes to prevent similar issues from occurring again. Please accept our apologies for any inconvenience caused. If you have any questions, please reach out to your Account Manager or support@greenhouse.io.

Posted Dec 28, 2017 - 14:49 EST

Resolved

This incident has been resolved. For approximately 11 minutes, Greenhouse Recruiting was unavailable for a majority of customers. We will provide a postmortem once we have identified root cause. We apologize for any inconvenience.

Posted Dec 27, 2017 - 18:11 EST

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Dec 27, 2017 - 17:57 EST

Investigating

We are currently investigating site outages for Greenhouse Recruiting. We will follow up when we have more information available.

Posted Dec 27, 2017 - 17:54 EST