System Outage on Friday Feb 12

Incident Report for Documill View API

Resolved

On Friday Feb 12, about 14:18 UTC, processing nodes on Amazon EC2 failed and brought our backend processing cluster down with it. The outage was noticed immediately and steps were taken to investigate the issue. While we were still investigating the issue, nodes got back to working condition and backend processing cluster was running again at 14:51 UTC. However, due to human error, the updating of the statuspage's metrics was not restarted until about 17:20 UTC, which explains the longer pause in the metrics graphs. Due to this incident we have taken steps to make our service even more robust and tolerant of these kinds of events.

Posted Feb 16, 2016 - 08:49 EET