WHAT HAPPENED?
Greenhouse Job Boards and the Job Board API were unavailable for approximately 10 minutes from 10:18pm EDT on May 8th 2017. This was due to issues with our caching layer in our job boards infrastructure.
WHAT WAS THE EFFECT?
Greenhouse Job Boards and the Job Board API were unavailable for a period of approximately 10 minutes. Other parts of the Greenhouse platform, such as Greenhouse Recruiting, Greenhouse Onboarding, and Greenhouse Analytics were not affected and remained 100% available. No external candidate applications could be submitted during this period.
WHO WAS AFFECTED?
This disruption affected all of our customers using hosted or embedded job boards, as well as any career sites built using the Job Board API.
WHAT WAS THE CAUSE?
Due to a memory issue, our caching servers were automatically restarted at 10:18pm EDT on May 8th 2017. Unfortunately, the web servers that were depending on these caching servers were unable to connect to them after the restart. This caused the web servers to enter an 'unhealthy' state, prompting our routing infrastructure to serve error pages with the status of 503 Service Unavailable
.
Once the issue was identified, it was resolved by restarting the web servers. By 10:28pm EDT, the job boards and Job Board API were fully functional.
WHAT ARE WE DOING TO PREVENT THIS FROM OCCURRING AGAIN?
We will be upgrading our caching infrastructure to ensure that these memory issues are less likely to occur in the future, as well as ensuring that restarts will be handled more gracefully. We will also be improving our monitoring on this part of our infrastructure to provide us with an earlier indication that something is wrong.
We take the availability of our software very seriously, and are committed to making changes to prevent this kind of downtime happening again. Please accept our apologies for any inconvenience caused.