Creating Alert Checkers

To create an alert check you need to go to /add_alert_check/ or click on Add Checker on the menu bar of the alert checkers page.

Entity type

First of all when you create an alert, you need to choose the 'entity' type that you are building the alert for. Examples of entity types include Port, Device and Sensor.

Entity Types

You can see a complete list of entity types in the Entity Type Reference, complete with metric and attribute lists and examples.

They are mostly self-explainatory, if you want to alert on metrics that exist on ports, pick 'Port', if you want something that has to do with a sensor, pick Sensor. Device is a special case, and will allow you to alert on device-level metrics, such as whether the device is up or down, its uptime and the ping/snmp response times, the entity type Device has nothing to do with ports or sensors on the device itself.

Checker Details

Once you picked the entity type, there's a couple of more things that need to be filled in but these are simple, pick a name for the alert, and pick a message you want to be included once an alert is sent out.

Checker Details Pane

Entity type should be set to the entity type that you're creating a checker for.
Alert Name is a unique text id used to identify your checker in the UI. It must be unique or adding the checker will fail.
Message is a meaningful text message send along with any alerts generated by this checker. It should be used to direct the recipient to the cause and importance of the problem.
Alert Delay allows you to delay an alert for X checks before it's alerted. You can use this to 'smooth' noisy alerts and suppressing alerts for traffic and processor spikes.
Send Recovery allows you to enable or disable the sending of recovery notifications.
Severity is currently locked at critical.

Use Alert Delay to set the amount of poller runs your alert checker should wait until it generates notifications. An alert entry which is being delayed will be in the delayed state and show as orange in the UI. This is useful when you're creating a check for processor usage, but you don't want to be alerted on every temporary CPU spike, but only on persistently high load conditions. If you set a delay of say, 2, it'll take 3 poller runs before setting the state to failed and generating notifications.

Checker Conditions

Next we have the Checker Conditions pane.

Checker Conditions Pane

This pane allows you configure the actual rules that will trigger your alert. The conditions are entered in text, with one condition per line. A condition consists of three values:

the name of the metric to be tested
a 'test' evaluator (le, ge, lt, gt, eq, ne, match, notmatch, regexp, !regexp, in, notin)
a value to test against

Syntax of test evaluators:

Test	Alternate	Meaning	Syntax
`le`	`<=`	less or equals	`metric le number`
`ge`	`>=`	greater or equals	`metric ge number`
`lt`	`less` `<`	less than	`metric lt number`
`gt`	`greater` `>`	greater than	`metric gt number`
`equals`	`eq` `is` `==` `=`	equals	`metric equals numbervalue/text`
`notequals`	`isnot` `ne` `!=`	notequals	`metric notequals numbervalue/text`
`match`	`matches`	match with wildcard	`metric match text` `metric match text` `metric match text` You can use `?` or `` as wildcard, in the code we generate SQL and `?` is replaced with `.` , an asterisk `` is replaced with `.*`
`notmatch`	`notmatches` `!match`	not match with wildcard	`metric notmatch text` `metric notmatch text` `metric notmatch text` You can use `?` or `` as wildcard, in the code we generate SQL and `?` gets replaced with `.` and an asterisk `` is replaced with `.*`
`regexp`	`regex`	match for regular expression	`metric regexp <regex>`
`notregexp`	`notregex` `!regexp` `!regex`	not match for regular expression	`metric notregexp <regex>`
`in`	`list`	in a list	`metric in 1,2,3,4,5` `metric in bla,blabla,blablabla`
`notin`	`!in` `!list` `notin` `notlist`	not in a list	`metric notin 1,2,3,4,5` `metric notin bla,blabla,blablabla`
`between`		between two values	`metric between 100,9000
`notbetween`		not between two values	`metric notbetween 512,8192

In this pane you also configure whether your checker requires all conditions to match to trip, or any condition.

An example of a condition to test if traffic on a port exceeds 80% would be:

ifInOctets_perc gt 80
ifOutOctets_perc gt 80

You might want to set this checker to have a delay of 4, so that the alert only trips when the port exceeds 80% capacity for 20 minutes or more.

Associations

The associations pane allows you to define an initial set of rules to match entities to your checker. These rules define the subset of entities the entity type you've chosen that this alert checker will apply to. The initial form allows the creation of a single association, but multiple associations can be added later for more flexibility.

Associations Pane

The format of the alerter assocation rules are similar to the checker conditions and use the same test evaluators explained in the table above. The association conditions can match against device attributes like hostname, os, distribution, location and sysObjectID and against entity attributes like a port's ifDescr, ifAlias or ifSpeed, a processor's description or a BGP session's remote AS.

There are some differences between checker conditions and associations:

Instead of using metrics, you’ll be using attributes
To match all possible devices or entities, simply match all device hostnames Device Hostname match *.
Rules are built using a hierarchical rule builder which can create complex nested rulesets

That last exception allows for more specific filtering, for example, you would want to match against all sensors that are of class airflow, but when that nets you to many results, you can add a match for its description sensor_descr, or you’d want to match all ports of type ifType ethernetCsmacd, but you only want certain ones with a specific description ifAlias.

An example to match all "Swap" memory pools on all Linux devices would be

Device OS equals linux
Memory Pool Description match swap*

Associations Pane

On saving this checker, you should end up with a new checker with all of your Linux Swap pools associated. The entry statuses will turn to OK as they're polled.

Associations Pane

Alert Checker Examples

Pre-written examples of alert checkers complete with example association rules can be found on the Alerting Examples page.