We had an incident recently where our alerts didn't fire due to an issue with a bad instance behind a load balancer. The system was not "down enough" to trigger an alert; however, the rate of errors was dramatically increased vs normal.
Alerting when error rates are above a threshold should help to counter many scenarios where simpler triggers do not work well. In a cloud-hosted environment, transient errors are common, being able to filter out transient errors from a genuine issue is critically important to avoid incidents but also false positives.
Please authenticate to join the conversation.
In Review
π‘ Feature Request
Alerting
Over 2 years ago
shahiddev
Get notified by email when there are changes.
In Review
π‘ Feature Request
Alerting
Over 2 years ago
shahiddev
Get notified by email when there are changes.