At 5:51pm EST on December 27th 2017, a Greenhouse Recruiting silo became unexpectedly unavailable. The cause was identified and resolved at 5:57pm EST. Only customers that use Silo 1 (a large proportion of our customers) were affected by this outage. Other silos, our Job Boards infrastructure, our APIs and Greenhouse Onboarding remained unaffected.
WHAT WAS THE EFFECT?
All affected users (see below for more detail) could not access the site.
WHO WAS AFFECTED?
This disruption affected all customers trying to access Silo 1 of Greenhouse Recruiting at https://app.greenhouse.io. Customers on other silos, those that access Greenhouse Recruiting via SSO, candidates applying via job boards, and our APIs were unaffected.
WHAT WAS THE CAUSE?
The outage was caused by an unexpected interaction between an Amazon Elastic Load Balancer (ELB) component and the technology that deploys and monitors our virtualized infrastructure. A fairly unique characteristic of an ELB is that it may routinely change its own IP address in a manner that is beyond our control. In this instance, an IP address change by the ELB caused our monitoring systems to believe our web servers were unavailable and prevented our routing layer from sending traffic to them, causing the downtime. We were able to update the configuration of the routing layer at 5:56pm, resolving the issue at 5:57pm.
WHAT ARE WE DOING TO PREVENT THIS FROM OCCURRING AGAIN?
We have identified two solutions that we will pursue to prevent this from occurring again:
We take the reliability of our software very seriously, and are committed to making changes to prevent similar issues from occurring again. Please accept our apologies for any inconvenience caused. If you have any questions, please reach out to your Account Manager or email@example.com.