Spruce system availability issues
Incident Report for Spruce Health
Postmortem

Incident time: 7:32am PT - ~2pm PT

Summary: Amazon Web Services (AWS) us-east-1 region experienced API related issues that made it so that their console wouldn't load and services were impacted due to impaired network connectivity.

This resulted in the following customer facing issues:

  • Inbound calls may have failed to route to appropriate agents
  • Inbound SMS, voicemails and call events may have failed to deliver to the inbox. We will be retroactively delivering these to the appropriate inbox.
  • Notifications were significantly delayed, which resulted in patients missing incoming video calls
  • Due to the delayed notifications, many providers could not connect with patients over video calls
  • Autoresponders failed to trigger during the incident
  • Workflows that customers have in place likely did not trigger accurately during this time
Posted Dec 08, 2021 - 11:01 PST

Resolved
Notifications are no longer delayed, and the system is operational again. We'll be doing some more investigations internally to see how we can gain more insight and hopefully reduce the likelihood of delayed notifications in the future.
Posted Dec 07, 2021 - 14:28 PST
Update
Notifications continue to get delivered in a delayed fashion, and sometimes in quick succession if there are a series of notifications pertaining to a particular user. When the user clicks on a notification, they may not find a new message because the notification may be for a message that they have already read. Patients may also be getting delayed notifications indicating an incoming call from a provider.

The reason for these delayed notifications appears to be AWS delaying the delivery of notifications to Apple and Google servers to deliver them to the end user on iPhone or Android. While some of this is out of our control in the last mile message delivery, we're doing all we can to monitor the situation, provide updates and identify ways in which we can improve the system for the long term.
Posted Dec 07, 2021 - 13:21 PST
Update
Still monitoring. Only known system wide impact at this time is delayed notifications.

Note that the delayed notifications may impact the incoming video call notification for patients, thereby making it so that patients are unaware of a video call from their provider. If you're looking to engage in video calling with your patients, we suggest that the patient has the Spruce app open while waiting for your video call.
Posted Dec 07, 2021 - 11:07 PST
Monitoring
Quick update on overall health of system thus far:
- App notifications are delayed.
- Spruce inbox functional for most, may be slow to load from time to time.
- Sending and receiving of secure messages should be functional. This includes internal notes, secure messages between patients and providers, and team conversations.
- Inbound calls are functional, though there continue to be intermittent failures.
- Inbound SMS is functional, though there continue to be intermittent failures.
- Outbound Call and SMS should be functional
- Fax remains functional
- Video calling remains functional, though there may be intermittent failures since Twilio is reporting failures.

We will continue to monitor the issue and post updates.
Posted Dec 07, 2021 - 09:39 PST
Update
In our testing, inbound calls are routing to the appropriate agents at this time. It's still possible that there are intermittent issues for some given that Twilio continues to report issues with call routing.
Posted Dec 07, 2021 - 09:07 PST
Update
Inbound SMS is being accepted by our system now, though there may still be intermittent failures in SMS being accepted.
Posted Dec 07, 2021 - 08:54 PST
Identified
Given the underlying AWS issue, the system is impacted in the following ways:
- Spruce inbox msy be slow to load
- Inbound calls are not being directed to the appropriate agents
- SMS is not being accepted by our system. Our underlying telecom provider, Twilio, is also reporting issues, given that it relies on AWS.
- Outbound calls and SMS seems functional.
- Posting messages into conversations within Spruce both for patients and providers seems to be functional
Posted Dec 07, 2021 - 08:43 PST
Update
It appears to be an issue with our underlying cloud service provider, AWS, that is experiencing an outage. While the AWS status page does not report an issue, several people on the internet are reporting similar issues with their AWS environments, as you can see here: https://downdetector.com/status/aws-amazon-web-services/
Posted Dec 07, 2021 - 08:06 PST
Investigating
We are currently investigating error rates for inbound calls and SMS, and the Spruce inbox potentially being slow to load. The issue started around 7:32am PT.
Posted Dec 07, 2021 - 08:02 PST
This incident affected: Web App, Mobile Apps, Phone Call Routing, and SMS Routing.