Creating Alert Checkers

To create an alert check you need to go to /add_alert_check/ or click on Add Checker on the menu bar of the alert checkers page.

Entity type

First of all when you create an alert, you need to choose the 'entity' type that you are building the alert for. Examples of entity types include Port, Device and Sensor.

Entity Types

You can see a complete list of entity types in the Entity Type Reference, complete with metric and attribute lists and examples.

They are mostly self-explainatory, if you want to alert on metrics that exist on ports, pick 'Port', if you want something that has to do with a sensor, pick Sensor. Device is a special case, and will allow you to alert on device-level metrics, such as whether the device is up or down, its uptime and the ping/snmp response times, the entity type Device has nothing to do with ports or sensors on the device itself.

Checker Details

Once you picked the entity type, there's a couple of more things that need to be filled in but these are simple, pick a name for the alert, and pick a message you want to be included once an alert is sent out.

Checker Details Pane

  • Entity type should be set to the entity type that you're creating a checker for.
  • Alert Name is a unique text id used to identify your checker in the UI. It must be unique or adding the checker will fail.
  • Message is a meaningful text message send along with any alerts generated by this checker. It should be used to direct the recipient to the cause and importance of the problem.
  • Alert Delay allows you to delay an alert for X checks before it's alerted. You can use this to 'smooth' noisy alerts and suppressing alerts for traffic and processor spikes.
  • Send Recovery allows you to enable or disable the sending of recovery notifications.
  • Severity is currently locked at critical.

Use Alert Delay to set the amount of poller runs your alert checker should wait until it generates notifications. An alert entry which is being delayed will be in the delayed state and show as orange in the UI. This is useful when you're creating a check for processor usage, but you don't want to be alerted on every temporary CPU spike, but only on persistently high load conditions. If you set a delay of say, 2, it'll take 3 poller runs before setting the state to failed and generating notifications.

Checker Conditions

Next we have the Checker Conditions pane.

Checker Conditions Pane

This pane allows you configure the actual rules that will trigger your alert. The conditions are entered in text, with one condition per line. A condition consists of three values:

  • the name of the metric to be tested
  • a 'test' evaluator (le, ge, lt, gt, eq, ne, match, notmatch, regexp, !regexp, in, notin)
  • a value to test against

Syntax of test evaluators:

Test Alternate Meaning Syntax
le <= less or equals metric le numbervalue
ge >= greater or equals metric ge numbervalue
lt less < less then metric lt numbervalue
gt greater > greater then metric gt numbervalue
equals eq is == = equals metric equals numbervalue/text
notequals isnot ne != notequals metric notequals numbervalue/text
match matches match with wildcard metric match text* metric match *text metric match *text*

You can use ? or * as wildcard, in the code we generate SQL and ? is replaced with . , an asterisk * is replaced with .*
notmatch notmatches !match not match with wildcard metric notmatch text* metric notmatch *text metric notmatch *text*

You can use ? or * as wildcard, in the code we generate SQL and ? gets replaced with . and an asterisk * is replaced with .*
regexp regex match for regular expression metric regexp <regex>
notregexp notregex !regexp !regex not match for regular expression metric notregexp <regex>
in list in a list metric in 1,2,3,4,5 metric in bla,blabla,blablabla
notin !in !list notin notlist not in a list metric notin 1,2,3,4,5 metric notin bla,blabla,blablabla

In this pane you also configure whether your checker requires all conditions to match to trip, or any condition.

An example of a condition to test if traffic on a port exceeds 80% would be:

ifInOctets_perc gt 80
ifOutOctets_perc gt 80

You might want to set this checker to have a delay of 4, so that the alert only trips when the port exceeds 80% capacity for 20 minutes or more.

Associations

The associations pane allows you to define an initial set of rules to match entities to your checker. These rules define the subset of entities the entity type you've chosen that this alert checker will apply to. The initial form allows the creation of a single association, but multiple associations can be added later for more flexibility.

Associations Pane

The format of both the device and entity association conditions are the same as for the checker conditions and use the same test evaluators explained in the table above. The device association conditions match against device attributes like hostname, os, distribution, location and sysObjectID. The Entity association conditions likewise match against entity attributes like a port's ifDescr, ifAlias or ifSpeed, a processor's description or a BGP session's remote AS.

There are some differences between checker conditions and associations:

  • instead of using metrics, you’ll be using attributes
  • you can’t use a device attribute twice in the same association rule, so for example multiple “hostname match bla” statements with in the same association rule won’t work. You will need to add multiple association rules.
  • for a single device association line, you can have multiple entity association lines

That last exception allows for more specific filtering, for example, you would want to match against all sensors that are of class airflow, but when that nets you to many results, you can add a match for its description sensor_descr, or you’d want to match all ports of type ifType ethernetCsmacd, but you only want certain ones with a specific description ifAlias.

An example to match all "Processor" memory pools on all Cisco IOS devices would be

Device

os equals ios

Entity

mempool_descr match *processor

To match all possible devices or entities, simply use an asterisk *.

After creating a checker you need to go back to the alerts_checks or alerts page and run "regenerate". This will rebuild the alerts table to include the newly associated alerts. Alert entries are automatically regenerated at the end of a discovery run, keeping your alerts updated as you add and removed components from your network.

You should end up with all of your Cisco processor memory pools added to the checker.

Alert Checker Examples

Pre-written examples of alert checkers complete with example association rules can be found on the Alerting Examples page.