Creating Alert Checkers
To create an alert check you need to go to /add_alert_check/
or click on Add Checker
on the menu bar of the alert checkers page.
Entity type
First of all when you create an alert, you need to choose the 'entity' type that you are building the alert for.
Examples of entity types include Port
, Device
and Sensor
.
Entity Types
You can see a complete list of entity types in the Entity Type Reference, complete with metric and attribute lists and examples.
They are mostly self-explainatory, if you want to alert on metrics that exist on ports, pick
'Port', if you want something that has to do with a sensor, pick Sensor
. Device
is a special case,
and will allow you to alert on device-level metrics, such as whether the device is up or down, its uptime
and the ping/snmp response times, the entity type Device
has nothing to do with
ports or sensors on the device itself.
Checker Details
Once you picked the entity type, there's a couple of more things that need to be filled in but these are simple, pick a name for the alert, and pick a message you want to be included once an alert is sent out.
Entity type
should be set to the entity type that you're creating a checker for.Alert Name
is a unique text id used to identify your checker in the UI. It must be unique or adding the checker will fail.Message
is a meaningful text message send along with any alerts generated by this checker. It should be used to direct the recipient to the cause and importance of the problem.Alert Delay
allows you to delay an alert for X checks before it's alerted. You can use this to 'smooth' noisy alerts and suppressing alerts for traffic and processor spikes.Send Recovery
allows you to enable or disable the sending of recovery notifications.Severity
is currently locked at critical.
Use Alert Delay
to set the amount of poller runs your alert checker should wait until it generates notifications.
An alert entry which is being delayed will be in the delayed
state and show as orange in the UI.
This is useful when you're creating a check for processor usage, but you don't want to be alerted on every temporary CPU spike,
but only on persistently high load conditions. If you set a delay of say, 2
, it'll take 3 poller runs before setting the state to failed
and generating notifications.
Checker Conditions
Next we have the Checker Conditions pane.
This pane allows you configure the actual rules that will trigger your alert. The conditions are entered in text, with one condition per line. A condition consists of three values:
- the name of the metric to be tested
- a 'test' evaluator (
le
,ge
,lt
,gt
,eq
,ne
,match
,notmatch
,regexp
,!regexp
,in
,notin
) - a value to test against
Syntax of test evaluators:
Test | Alternate | Meaning | Syntax |
---|---|---|---|
le |
<= |
less or equals | metric le numbervalue |
ge |
>= |
greater or equals | metric ge numbervalue |
lt |
less < |
less than | metric lt numbervalue |
gt |
greater > |
greater than | metric gt numbervalue |
equals |
eq is == = |
equals | metric equals numbervalue/text |
notequals |
isnot ne != |
notequals | metric notequals numbervalue/text |
match |
matches |
match with wildcard | metric match text* metric match *text metric match *text* You can use ? or * as wildcard, in the code we generate SQL and ? is replaced with . , an asterisk * is replaced with .* |
notmatch |
notmatches !match |
not match with wildcard | metric notmatch text* metric notmatch *text metric notmatch *text* You can use ? or * as wildcard, in the code we generate SQL and ? gets replaced with . and an asterisk * is replaced with .* |
regexp |
regex |
match for regular expression | metric regexp <regex> |
notregexp |
notregex !regexp !regex |
not match for regular expression | metric notregexp <regex> |
in |
list |
in a list | metric in 1,2,3,4,5 metric in bla,blabla,blablabla |
notin |
!in !list notin notlist |
not in a list | metric notin 1,2,3,4,5 metric notin bla,blabla,blablabla |
In this pane you also configure whether your checker requires all conditions to match to trip, or any condition.
An example of a condition to test if traffic on a port exceeds 80% would be:
ifInOctets_perc gt 80
ifOutOctets_perc gt 80
You might want to set this checker to have a delay of 4, so that the alert only trips when the port exceeds 80% capacity for 20 minutes or more.
Associations
The associations pane allows you to define an initial set of rules to match entities to your checker. These rules define the subset of entities the entity type you've chosen that this alert checker will apply to. The initial form allows the creation of a single association, but multiple associations can be added later for more flexibility.
The format of both the device and entity association conditions are the same as for the checker conditions and use the same test evaluators explained in the table above. The device association conditions match against device attributes like hostname, os, distribution, location and sysObjectID. The Entity association conditions likewise match against entity attributes like a port's ifDescr, ifAlias or ifSpeed, a processor's description or a BGP session's remote AS.
There are some differences between checker conditions and associations:
- instead of using metrics, you’ll be using attributes
- you can’t use a device attribute twice in the same association rule, so for example multiple “hostname match bla” statements with in the same association rule won’t work. You will need to add multiple association rules.
- for a single device association line, you can have multiple entity association lines
That last exception allows for more specific filtering, for example, you would want to match against all sensors that are of class airflow
, but when that nets you to many results, you can add a match for its description sensor_descr
, or you’d want to match all ports of type ifType
ethernetCsmacd, but you only want certain ones with a specific description ifAlias
.
An example to match all "Processor" memory pools on all Cisco IOS devices would be
Device
os equals ios
Entity
mempool_descr match *processor
To match all possible devices or entities, simply use an asterisk *
.
After creating a checker you need to go back to the alerts_checks or alerts page and run "regenerate". This will rebuild the alerts table to include the newly associated alerts. Alert entries are automatically regenerated at the end of a discovery run, keeping your alerts updated as you add and removed components from your network.
You should end up with all of your Cisco processor memory pools added to the checker.
Alert Checker Examples
Pre-written examples of alert checkers complete with example association rules can be found on the Alerting Examples page.