Creating Alert Checkers
To create an alert check you need to go to
/add_alert_check/ or click on
Add Checker on the menu bar of the alert checkers page.
First of all when you create an alert, you need to choose the 'entity' type that you are building the alert for.
Examples of entity types include
You can see a complete list of entity types in the Entity Type Reference, complete with metric and attribute lists and examples.
They are mostly self-explainatory, if you want to alert on metrics that exist on ports, pick
'Port', if you want something that has to do with a sensor, pick
Device is a special case,
and will allow you to alert on device-level metrics, such as whether the device is up or down, its uptime
and the ping/snmp response times, the entity type
Device has nothing to do with
ports or sensors on the device itself.
Once you picked the entity type, there's a couple of more things that need to be filled in but these are simple, pick a name for the alert, and pick a message you want to be included once an alert is sent out.
Entity typeshould be set to the entity type that you're creating a checker for.
Alert Nameis a unique text id used to identify your checker in the UI. It must be unique or adding the checker will fail.
Messageis a meaningful text message send along with any alerts generated by this checker. It should be used to direct the recipient to the cause and importance of the problem.
Alert Delayallows you to delay an alert for X checks before it's alerted. You can use this to 'smooth' noisy alerts and suppressing alerts for traffic and processor spikes.
Send Recoveryallows you to enable or disable the sending of recovery notifications.
Severityis currently locked at critical.
Alert Delay to set the amount of poller runs your alert checker should wait until it generates notifications.
An alert entry which is being delayed will be in the
delayed state and show as orange in the UI.
This is useful when you're creating a check for processor usage, but you don't want to be alerted on every temporary CPU spike,
but only on persistently high load conditions. If you set a delay of say,
2, it'll take 3 poller runs before setting the state to
failed and generating notifications.
Next we have the Checker Conditions pane.
This pane allows you configure the actual rules that will trigger your alert. The conditions are entered in text, with one condition per line. A condition consists of three values:
- the name of the metric to be tested
- a 'test' evaluator (
- a value to test against
Syntax of test evaluators:
||less or equals||
||greater or equals||
||match with wildcard||
You can use
||not match with wildcard||
You can use
||match for regular expression||
||not match for regular expression||
||in a list||
||not in a list||
In this pane you also configure whether your checker requires all conditions to match to trip, or any condition.
An example of a condition to test if traffic on a port exceeds 80% would be:
ifInOctets_perc gt 80 ifOutOctets_perc gt 80
You might want to set this checker to have a delay of 4, so that the alert only trips when the port exceeds 80% capacity for 20 minutes or more.
The associations pane allows you to define an initial set of rules to match entities to your checker. These rules define the subset of entities the entity type you've chosen that this alert checker will apply to. The initial form allows the creation of a single association, but multiple associations can be added later for more flexibility.
The format of the alerter assocation rules are similar to the checker conditions and use the same test evaluators explained in the table above. The association conditions can match against device attributes like hostname, os, distribution, location and sysObjectID and against entity attributes like a port's ifDescr, ifAlias or ifSpeed, a processor's description or a BGP session's remote AS.
There are some differences between checker conditions and associations:
- Instead of using metrics, you’ll be using attributes
- To match all possible devices or entities, simply match all device hostnames
Device Hostname match *.
- Rules are built using a hierarchical rule builder which can create complex nested rulesets
That last exception allows for more specific filtering, for example, you would want to match against all sensors that are of class
airflow, but when that nets you to many results, you can add a match for its description
sensor_descr, or you’d want to match all ports of type
ethernetCsmacd, but you only want certain ones with a specific description
An example to match all "Swap" memory pools on all Linux devices would be
Device OS equals linux Memory Pool Description match swap*
On saving this checker, you should end up with a new checker with all of your Linux Swap pools associated. The entry statuses will turn to
OK as they're polled.
Alert Checker Examples
Pre-written examples of alert checkers complete with example association rules can be found on the Alerting Examples page.