Allow for X number of successful attempts before recovered

In some cases, we have seen one successful check clearing out an incident only to have it reopened shortly after because the service was in a recovery state but not a fully recovered state. In the heat of the moment, muting or disabling multiple checks to avoid this is not a feasible option for us.

Rather than having 1 successful check being deemed a recovery, it would be nice to say we want to have 3 or some configurable number of successful checks in a row before issuing the all clear. This way we can avoid where a server comes back up and is immediately inundated with requests that it goes back down and retriggers the check failure and then alerts our paging system again and keeps sending out alerts to our on-calls. This is the same kind of theory as an AWS scaling event, X number of triggered events to scale out a service and then having Y number of triggered events to scale that service back in.

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

πŸ’‘ Feature Request

Date

About 1 year ago

Author

Rick Clymer

Subscribe to post

Get notified by email when there are changes.