Allow for different time based failure thresholds

Something we have seen lately is we have a system that alerts in the middle of the night due to 2 failures in a row. In our overnight hours, we are a bit more fault tolerant because we scale our services down a bit and know that there is a period where those services will scale back up if demand exists. During those periods, 2 failures is going to happen with scaling events taking a few minutes and trigger our incident.io alert flow. By the time the escalation is sent and acknowledged, this has usually recovered because our systems have scaled up.

I know one option I could set is to have it be 4 failures in a row instead of 2 all day long. But id really prefer to have a rule that I was able to say between say 8 am and 8 pm, 2 failures in a row trigger it and from 8 pm to 8 am, 4 failures are what triggers a failure.

I’m going to look into a potential of adding an alert counter to incident.io to help with this but feel like this should be an alert configuration rather than a workaround.

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

πŸ’‘ Feature Request

Date

1 day ago

Author

Rick Clymer

Subscribe to post

Get notified by email when there are changes.