[RESOLVED] Unable to send or receive messages
Incident Report for Spruce Health
Postmortem

Incident Summary

Spruce experienced a service incident from 6:04 pm to 7:55 pm PST on Wednesday, September 11, during which time customers may have experienced the following impacts:

  • Delayed delivery of inbound communication via SMS, fax, phone, and voicemail throughout this time.
  • Disrupted access to the Spruce inbox, preventing outbound communication from 6:46 pm to 7:55 pm PST.

There was no interruption to inbound calls, which remained fully functional throughout this time.

After assessing the severity of the incident, we triggered our backup communications system at 7:20pm PST, giving customers who have signed up for this service notifications of inbound communication during the incident.

Analysis and Response

As part of a regular deployment to our Spruce platform, we executed a database modification that ran much longer than anticipated. During this time, writes to the database stalled and then failed, causing delayed delivery of inbound communication. Eventually the buildup of stalls consumed all database resources, causing the Spruce inbox to become inaccessible.

Our engineering team was involved immediately. After assessment, we decided to allow the database modification to complete in order to safeguard data integrity. Upon this decision, the team took steps to accelerate the database modification operation. We also triggered our backup communications system.

Action Items

As a result of this incident, we have identified the following action items:

  • Adopt new protocol for executing database operations on large data models with high traffic patterns in order to prevent service disruptions at any hour of the day.
  • Work with a database consultant to optimize the databases used across our platform.

In addition, we encourage every customer to subscribe to our backup communications system, which successfully delivered events to customers during this incident, even when the inbox was inaccessible. Please review more information here or contact Spruce Support from the app

Posted Sep 12, 2019 - 15:52 PDT

Resolved
The Spruce platform should be fully recovered and functional. We will dig deeper into what happened and post an incident report here, likely tomorrow (September 12 2019). We understand how important it is for Spruce to be fully operational and highly available and apologize for the inconvenience caused here.
Posted Sep 11, 2019 - 20:31 PDT
Monitoring
Spruce inbox is functional and accessible again. Inbound and outbound calls, faxes, SMS and secure messages should also be functional now.

Any inbound voicemails, SMS or faxes received between 6:04pm PST and 7:50pm PST will be delayed but eventually show up in your inbox.

Any message that a provider or patient attempted to send from the app between 6:04pm PST and 7:50pm PST would not have been delivered during this time period.
Posted Sep 11, 2019 - 19:55 PDT
Update
The Spruce inbox is inaccessible on mobile and desktop. We are actively working to get the system back up and running as quickly as possible. We will update this page as soon as we have made progress.
Posted Sep 11, 2019 - 18:46 PDT
Update
We have identified the issue to be a database update that is taking longer than expected to complete during which all message posts are failing. We are monitoring the status of the database update and expect it to take another 30 minutes to complete.
Posted Sep 11, 2019 - 18:39 PDT
Identified
We are currently investigating an issue where posting a message in Spruce (SMS, Secure, Internal messages and outbound fax) or receiving a message is failing. Receiving voicemails and faxes will also be impacted but will eventually be delivered to your inbox.

Inbound and outbound calls remain functional.
Posted Sep 11, 2019 - 18:04 PDT
This incident affected: Web App, Mobile Apps, and SMS Routing.