Postmortem -
Read details
Jan 25, 17:46 PST
Resolved -
The Spruce system experienced an outage from 6:45am PT to 9:45am PT on January 25, 2023. During this time:
- The Spruce Inbox failed to load for patients and providers
- Providers could not place voice or video calls to patients, given that the inbox would not load
- Providers could not send SMS or secure messages to patients
- Patients could not send secure messages to practices
- Inbound fax, SMS messages, and email arrived to the Spruce inbox in a delayed fashion
- Inbound calls were operational during this time; however, voicemails arrived to the Spruce inbox in a delayed fashion
- Workflow automation was executed, albeit in a delayed fashion
The outage was caused by CPU exhaustion on one of the core databases. The engineering team believes the CPU exhaustion to have been the result of a frequently run inefficient query that built up over time, and which was optimized as part of the fix for this incident. The engineering team will be proactively and closely monitoring the system in the next days to ensure that there is no sign of database CPU exhaustion, including especially during peak hours of the day.
We know how important it is for Spruce to be fully operational at all times. Working to build and maintain a medical communications system gives us all immense energy on a daily basis and is not a job we take lightly. We're very sorry for the outage and the impact to practices and patients. We will continue to work hard in pursuit of a highly available and reliable service. If you have any questions at all, please don't hesitate to reach us at support@sprucehealth.com.
Jan 25, 17:44 PST
Monitoring -
The system should be fully functional at this point. We are continuing to monitor the system. We will post an incident report once we've had a chance to investigate more deeply here.
Jan 25, 10:05 PST
Identified -
We are starting to see some recovery and are slowly ramping the system back up to fully serviceable to see the impact on database and CPU in general. We'll keep updating this page as we have more to share.
Jan 25, 09:45 PST
Update -
We have made a database optimization for a high frequency lookup. We have intentionally brought down the API layer that clients connect to while we work to ensure that the rest of the system is functional. Once all asynchronous work has been completed, we will turn on the API layer slowly to ensure that we are not seeing any CPU performance issues again.
Jan 25, 09:21 PST
Update -
We continue to investigate the issue with no root cause yet unfortunately. We are all hands on deck working to identify the reason for the outage.
Jan 25, 08:52 PST
Update -
The backup system for notifying providers of incoming SMS, Fax, call events and voicemails has now been activated. Anyone that has registered contact information for our backup system will now get notified over email. You can read more about the backup system here: https://help.sprucehealth.com/article/424-spruce-backup-system
Jan 25, 08:20 PST
Update -
We have not identified the root cause yet. The inbox continues to be down for most. We are actively investigating this issue.
Jan 25, 07:51 PST
Investigating -
The Spruce inbox is unable to load at the moment and consequently patients and providers are unable to view/send messages, fax or make calls. Inbound calls should be working. Inbound fax likely delayed.
Jan 25, 07:17 PST