Performance Tuning

Opcode cacher

An opcode cacher compiles PHP code to Opcode, which means it won't have to be interpreted every time you visit a page. It will automatically detect if the code has changed and re-compile.

Opcache

Starting with PHP 5.5 the Zend Opcache is integrated and shipped with PHP. PHP7 includes Zend Opcache cache by default. For web user interface (WUI) Opcache enabled by default, but for cli scripts you can enable Opcache in your php.ini:

opcache.enable_cli=1

Fox chech if Opcache used by Observium, goto About page (in your WUI) and see "Version Information". You should see "OPcache: ENABLED" together with PHP version. For same check in cli, run version information from any script, ie:

./discovery.php -VV

Old PHP versions

We highly recommend not to use the old version php (less than 5.6, see minimum requirements here) and alternative accelerators like XCache and APC.

Fast userspace caching

Observium, since version 17.2.8348 includes an additional userspace cache mechanism, which is used to speed up data load on pages. This caching ONLY works with php version 5.5 and up, please see minimum requirements.

By default it detects the best available way to cache user data on a server from: Zend Memory Cache, APCu, Sqlite, Files.

Later, additional caching drivers like Memcached, Redis, Predis, Ssdb can be added.

Currently you can use only single very fast in-memory caching driver by using the PECL-APCu extension. To use it, install and reload your apache server:

aptitude install php-apcu

To manually select the cache driver, set in config.php by:

$config['cache']['driver']                 = 'auto';    // Driver to use for caching (auto, zendshm, apcu, xcache, sqlite, files)

To see debug information about caching, add the string "cache_info" at end of any url. If you have trouble with caching, disable it in config.php by setting:

$config['cache']['enable']                 = FALSE;

Multiple poller instances

One poller instance can only poll so much devices in the 5 minutes it has before it has to start again. Running more pollers in parallel will allow Observium do check more devices in the same amount of time. The install instructions set the poller to run two concurrent poller threads, which is only sufficient for small installs.

Syntax:

*/5 * * * * root /opt/observium/poller-wrapper.py <number of pollers> >> /dev/null 2>&1

Example for a system with 4 cores/8 threads and a fast I/O subsystem, running 10 parallel pollers:

 */5 * * * * root /opt/observium/poller-wrapper.py 10 >> /dev/null 2>&1

Do note that increasing pollers will only increase performance until your MySQL database becomes the bottleneck, or more likely, when all the RRD writes to disk start to slow down the disk I/O.

If you try to run too many poller processes on storage without enough I/O, you'll simply cause disk thrashing and make the web interface slow. Ideally the entire poller-wrapper process should take as close to 300 second as possible to ensure the lowest average load.

Performance data for the poller can be seen on the "Polling Information" page at /pollerlog/ on your Observium installation.

Putting the RRDs on a RAM disk

See the separate Persistent RAM disk RRD storage page to find out how to set up a RAM disk with sync to the disks so you don't lose your data when your machine crashes/reboots.

Separate disk for MySQL

When all the I/O for the RRDs is clobbering your disk, your MySQL database will likely become slow too, due to disk congestion. This slows down the web interface as well. It is advised to put the database on a separate disk (separate physical storage medium, not another LV on the same disk/RAID) for this reason.

Note that running your MySQL server on another machine increases the latency per query. When a lot of queries are done this could somewhat influence Observium's performance (however, moving MySQL to another machine than the one with all the RRD I/O could still prove to be a valuable enhancement).

Observium fires off a lot of SNMP queries to your devices. It has to wait for each reply to come back (per poller), before the poller can continue. If your uplink is congested, or latency to your devices is high, less devices can be polled in the same time frame. You could up the number of parallel pollers (if congestion is not the issue) or upgrade your uplink so more can fit through the pipe to remedy this somewhat (until we run into "speed of light in a fiber" issues).