We have successfully mitigated last week’s service disruption and end-to-end delivery times are back into normal ranges.
Following are details on the root cause:
-Last week we implemented a series of planned networking changes in our data centers ahead of a scheduled network upgrade.
-As part of the planned changes, we shifted our traffic away from the data center where the changes were to take place.
-We have shifted traffic in this manner many times this year without issue.
-This time, the added load to one of our data centers increased sufficiently to uncover a previously unknown scaling bottleneck in the mail sending system.
-This scaling bottleneck has been isolated to a single component within our software system.
Our engineering team rolled out a series of changes on September 1 to mitigate the root cause of the disruption.
Here are the changes we implemented:
-We rolled out load balancing improvements across our data centers.
-These changes led to a substantial improvement in the processing of our traffic.
-All new traffic is now being processed at expected levels of performance.
-We continue to be in a steady state of performance as email volumes have increased throughout the day.
-Previous emails that have been delayed have been sent.
We deeply apologize for the inconveniences experienced. Please contact our support team with any questions: https://support.sendgrid.com/hc/en-us