WAN/LAN Alta Bacchus Feenix Gemini Hydra Molde Narvik Pixie Puffin Stamsund Bergen Mercury Torvik Ystad
Network Timekeeping CPU load Ecast I/O Ecast SNR Pkt loss Europe SNR Europe Pkts FSY size Correlations Losses Air °C Disk °C How-to

How to add these performance monitors

Information here covers memory & CPU loading, hard disk temperatures, and Cable Modem monitoring.  I have also used MRTG for monitoring signal levels and error rates on the EUMETCast satellite data service, and for monitoring timekeeping using NTP.  If you are using Windows Vista or Windows-7 you may first need to install and enable the Windows SNMP component.
  

Memory and CPU load

Once you have installed: SNMP Informant on each PC you wish to monitor, you can access the data directly from MRTG, as it has a specific SNMP object ID (OID), so the script fragments are as shown below. To keep the configuration file clean, I actually used include statements in the mrtg.cfg file, such as:

  Include: narvik-monitor.inc

As PC Narvik has two CPUs, there are two instances 48 and 49 listed in the [Target] line in the sample below.

Contents of narvik-monitor.inc

 
#---------------------------------------------------------------
# PC Narvik - Memory
#---------------------------------------------------------------

Target[Narvik-mem]: 1.3.6.1.4.1.9600.1.1.2.19.0&1.3.6.1.4.1.9600.1.1.2.2.0:public@127.0.0.1 * 1024
MaxBytes[Narvik-mem]: 8000000000
Options[Narvik-mem]: integer, gauge, nopercent, growright, unknaszero
YLegend[Narvik-mem]: Memory
ShortLegend[Narvik-mem]: B
LegendI[Narvik-mem]: Used  
LegendO[Narvik-mem]: Avail  
Legend1[Narvik-mem]: Memory committed
Legend2[Narvik-mem]: Memory available
Title[Narvik-mem]: Narvik Memory
PageTop[Narvik-mem]: <H2>PC Narvik - Memory</H2>

#---------------------------------------------------------------
# PC Narvik - CPU load, dual-core CPU
#---------------------------------------------------------------

Target[Narvik-CPU]: 1.3.6.1.4.1.9600.1.1.5.1.5.1.48&1.3.6.1.4.1.9600.1.1.5.1.5.1.49:public@narvik
MaxBytes[Narvik-CPU]: 100
YLegend[Narvik-CPU]: CPU %
ShortLegend[Narvik-CPU]: %
LegendI[Narvik-CPU]: CPU 1
LegendO[Narvik-CPU]: CPU 2
Legend1[Narvik-CPU]: CPU 1 usage
Legend2[Narvik-CPU]: CPU 2 usage
Options[Narvik-CPU]: integer, gauge, nopercent, growright, unknaszero
Title[Narvik-CPU]: Narvik CPU
PageTop[Narvik-CPU]: <H2>PC Narvik - CPU load</H2>
# If PC Narvik were a single-core CPU, use two instances of object 48, as MRTG requires that 
# you have two variables returned.  You may also want to prevent display of the second output
# line by adding the "no-ouput" option (noo) to the Options line:
Target[Narvik-CPU]: 1.3.6.1.4.1.9600.1.1.5.1.5.1.48&1.3.6.1.4.1.9600.1.1.5.1.5.1.48:public@narvik
Options[Narvik-CPU]: integer, gauge, nopercent, growright, noo
# I found that on a lower-spec PC (Bacchus), returning the CPU twice caused an artificially
# high value to be returned for the second call (presumably the CPU busy processing the first
# request?!), so I actually changed to using the SNMP value: Maximum Number of Process Contexts
# i.e.  .1.3.6.1.2.1.25.1.7.0 (check this on your system using GetIF), which returns integer 0.
Target[Bacchus-CPU]: 1.3.6.1.4.1.9600.1.1.5.1.5.1.48&1.3.6.1.2.1.25.1.7.0:public@192.168.0.4
 

As this is my first attempt, any suggestions for improvements are welcome.  The only thing noticeably different is using OIDs in the [Target] line, as described here, and I used the GetIF program and the MIBs from SNMP Informant to work out what to monitor.  There are a lot more parameters available from the free SNMP Informant.  I added the unknaszero option so that when the PC is offline, the zero CPU and memory usage are clearly visible.  If you are running MRTG on the PC which is being monitored, then you don't need to specify the name name in the Target line, you can use the loopback IP instead (127.0.0.1).

All running under Windows, including Vista! Here's some current data:

Memory
Used
Free
CPU 1 & 2
usage

Note on quad-core CPU systems

With common systems now having up to 4 CPUs (well, dual-core plus hyperthreading), you may find it more helpful to plot the total CPU usage as well as, or even in place of, the four individual CPU graphs plotted on two separate graphs. SNMP Informant supports "_Total" as well as "0", "1", "2" and "3" for the individual CPU OIDs.   "_Total" is represented as the six-character, counted ASCII string: 6.95.84.111.116.97.108   I have used the "localhost" IP address in the example below, rather than the PC name, as you can re-use that string on any PC!

#---------------------------------------------------------------
#	PC Alta - total CPU load
#---------------------------------------------------------------

Target[Alta-CPU-total]: 1.3.6.1.4.1.9600.1.1.5.1.5.6.95.84.111.116.97.108&1.3.6.1.4.1.9600.1.1.5.1.5.6.95.84.111.116.97.108:public@127.0.0.1
MaxBytes[Alta-CPU-total]: 100
YLegend[Alta-CPU-total]: Total CPU %
ShortLegend[Alta-CPU-total]: %
LegendI[Alta-CPU-total]: Total CPU
Legend1[Alta-CPU-total]: Total CPU usage
Options[Alta-CPU-total]: integer, gauge, nopercent, growright, unknaszero, noo
Title[Alta-CPU-total]: Alta Total CPU
PageTop[Alta-CPU-total]: <H2>PC Alta - Total CPU load</H2>

.. and here's an example comparing _Total with the four separate CPUs for PC Alta:

 

PC Alta
Total CPU
usage

 

CPU 1 & 2
CPU 3 & 4

Note on virtual and physical memory sizes exceeding 4GB

As the amount of physical memory has increased, virtual memory may now exceed 4GB, as may the physical memory in the machine and that available for use.  Consequently, I have revised the object IDs (OIDs) I use for memory from those which report in bytes to those which report in kilobytes, and I multiply the result obtained by 1024 to get the actual number in bytes. It seems that the OIDs which report in bytes are only 32-bit values, whereas MRTG can work with 64-bit integers.  You can seen the change here.

Earlier call using byte OIDs:

  Target[Molde-mem]: 1.3.6.1.4.1.9600.1.1.2.4.0&1.3.6.1.4.1.9600.1.1.2.1.0:public@127.0.0.1

Later call using kilobyte OIDs and multiplying the returned value:

  Target[Molde-mem]: 1.3.6.1.4.1.9600.1.1.2.19.0&1.3.6.1.4.1.9600.1.1.2.2.0:public@127.0.0.1 * 1024
Watch out for the MaxBytes value which you may need as well!

 

Disk space usage

Windows since Windows 2000 has included basic performance monitoring counters which include a set for disk usage measurement.  For each disk, there are at least three basic values available: disk size, disk used, and disk space units.  Having the units specified separately means that to get the disk used in bytes you must multiply the disk-used by the disk-allocation-units.  Fortunately, MRTG allows a target to be specified as A * B, so that's not a problem.  What is slightly more of an issue is the variable number of disks, which means that as disks come and go on your system - even plugging in a USB memory stick, for example - the index in the table of disks of a particular drive may vary, at least if you have a RAMdisk or extra partition with a high drive-letter - T: or Z:, for example.  I haven't discovered a way of fixing this so far, but that may just be my ignorance of using SNMP!

How to determine the drive index

You need to "walk the MIB" for the PC in question.  To do this, download a program such as GetIF version 2.3.1 which allows this - direct download link.  Having installed GetIF, open the program, enter the PC's name or IP address into the Host Name box, and press the Start button.

Now move to the MBrowser tab, and the string:

  .iso.org.dod.internet.mgmt.mib-2.host.hrStorage

in the first box (to replace .iso), and press the Start button.  You should see the list box window at the bottom populate with a set of values, 92 values in the example below:

You can now scroll down the list of values to find the index for the name of the monitored volume (drive T:) in this case.  Click on the entry is the list, and its description and value will appear.

where the OID (object ID) is given in the last line as ".1.3.6.1.2.1.25.2.3.1.3.10", so the index is "10", and you can now scroll further down to find the corresponding allocation units (".1.3.6.1.2.1.25.2.3.1.4.10") and storage used (".1.3.6.1.2.1.25.2.3.1.6.10").  

  

  

So the storage used in bytes is the product of the numbers returned from these two values (4096 * 33933 = 138989568 bytes, 133.55MB).

How to use the values in SNMP monitoring

As MRTG requires two values to be returned by the Target string (at least I think it does....), you need to add a second OID so that two values are returned.  I simply used a value which returns and integer on my system.  I don't know whether you could just put a zero.  Please tell me if you know better!  So the Target is:

  OID1 & OID2 * OID1 & OID2.

Please note that to stop the Web page becoming excessively wide, I have shown the Target line below as:

  <A> *
  <B>

whereas it should be written on a single line since MRTG doesn't allow continuation lines (as far as I know):

  <A> * <B>

Contents of gemini-disk-used.inc

Target[Gemini-fsy]: 1.3.6.1.2.1.25.1.7.0&.1.3.6.1.2.1.25.2.3.1.6.10:public@gemini * 
    1.3.6.1.2.1.25.1.7.0&.1.3.6.1.2.1.25.2.3.1.4.10:public@gemini

MaxBytes[Gemini-fsy]: 40000000000
Options[Gemini-fsy]: integer, gauge, nopercent, growright, unknaszero, noi
YLegend[Gemini-fsy]: Temp disk used
ShortLegend[Gemini-fsy]: B
LegendO[Gemini-fsy]: Size  &nbsp;
Legend2[Gemini-fsy]: Temp disk used
Title[Gemini-fsy]: Gemini - 40GB temp disk
PageTop[Gemini-fsy]: <H2>PC Gemini - Temp disk used</H2

Sample Results of disk space monitoring


   

Monitoring CPU core temperature

If you install and run the SpeedFan program from: http://www.almico.com/speedfan.php you can then install the SpeedFan SNMP Extension package from: http://deve.loping.net/projects/sfsnmp/ which provides SNMP access to the data provided by SpeedFan.  The temperatures are in SNMP OIDs numbered:

.1.3.6.1.4.1.30503.1.2.1
.1.3.6.1.4.1.30503.1.2.2
.1.3.6.1.4.1.30503.1.2.3

and so forth.  If you examine the output from the SpeedFan program temperatures display, and compare it to the values shown by GetIF, on the MBrowser tab, starting with OID:

.1.3.6.1.4.1.30503.1.2

you should be able to determine which OIDs on your system correspond to the CPU core temperatures on your own system.  You can then create a suitable include file for MRTG such as the one shown below.  However, on my system the values are returned in centi-degrees (i.e. 3000 = 30°C), and so need to be divided by 100 to get the actual temperature.  You can achieve this using the arithmetic capability of MRTG adding a " / 100" after the usual string:

   OID&OID:public@host / 100

Please note that there must be spaces around the components of the arithmetic expression.  Here's my include file:

Contents of narvik-cpu-temp.inc

#---------------------------------------------------------------
# PC Narvik - CPU core temperatures
#---------------------------------------------------------------

Target[narvik_cpu_temp]: .1.3.6.1.4.1.30503.1.2.3&.1.3.6.1.4.1.30503.1.2.4:public@localhost / 100
MaxBytes[narvik_cpu_temp]: 100
MaxBytes2[narvik_cpu_temp]: 100
Title[narvik_cpu_temp]: CPU core temperatures for PC Narvik
Options[narvik_cpu_temp]: integer, gauge, nopercent, growright, unknaszero
YLegend[narvik_cpu_temp]: Temperature °C
ShortLegend[narvik_cpu_temp]: °C
Legend1[narvik_cpu_temp]: CPU core 0 temperature in °C
Legend2[narvik_cpu_temp]: CPU core 1 temperature in °C
LegendI[narvik_cpu_temp]: CPU core 0 :
LegendO[narvik_cpu_temp]: CPU core 1 :
PageTop[narvik_cpu_temp]: <H1>PC Narvik -- CPU Core Temperatures</H1>

Here's some current data:
PC Narvik
CPU core temperatures

Please note that the SpeedFan program must be running continuously for the SNMP DLL to see any data.  This should not be too much of a visual distraction as the program minimises to a simple system tray icon.   Note that the temperarature is a snapshot at the instant when MRTG reads the data - it isn't an averaged value over the last five minutes.

Using automatically with Windows Vista and Windows-7

 

Monitoring DISK temperature

If you are fortunate enough to have a PC which is supported by the Mother Board Monitor program, you can just use that and add the appropriate SNMP objects as described above.  My PCs did not support MBM, so I wrote a small program which accesses the S.M.A.R.T. data provided by some hard disks and the BIOS.  Not all PCs do this, and not all PCs make all of the data accessible.  To test your PC, download the DiskTemp.exe program, and run it from the command-line.  You should see four lines like:

C:\>DiskTemp.exe
30
33
0
0

C:\>

So as the program returns the disk temperatures, you can plot it in MRTG like this, using the ability of MRTG to read the output of a command-line program.

Contents of narvik-disk-temp.inc

#---------------------------------------------------------------
# PC Narvik - disk temperatures
#---------------------------------------------------------------

Target[narvik_disk_temp]: `DiskTemp`
MaxBytes[narvik_disk_temp]: 100
MaxBytes2[narvik_disk_temp]: 100
Title[narvik_disk_temp]: Disk temperatures for PC Narvik
Options[narvik_disk_temp]: integer, gauge, nopercent, growright, unknaszero
YLegend[narvik_disk_temp]: Temperature °C
ShortLegend[narvik_disk_temp]: °C
Legend1[narvik_disk_temp]: Disk 0 temperature in °C
Legend2[narvik_disk_temp]: Disk 1 temperature in °C
LegendI[narvik_disk_temp]: Disk 0:
LegendO[narvik_disk_temp]: Disk 1:
PageTop[narvik_disk_temp]: <H1>PC Narvik -- Disk Temperatures</H1>

Here's some current data:
PC Narvik
Disk temperature

750GB Samsung HD753LJ
1TB Samsung HD103SI

Here's another example, showing what happened when I replaced a 750GB 7200rpm standard disk with a 1TB "eco" disk spinning at just 5400rpm, and with a slower seek speed.  While the green line is more or less constant allowing for the daily temperature changes, the blue line showing the second disk on PC Narvik has dropped significantly from being a few degrees above the 750GB disk to being a degree or three below.  A lower working temperature should produce greater reliability, and it's a few watts less power consumption.  Performance of the PC appears to be unaffected.
The horizontal lines from before 0800 to after 1100 were the nonexistent values while the PC was powered down for the disk clone.  After seeing those misleading values, I added unknaszero to the options shown above.

Vista and Windows-7

I found that the code I used required to be run in Administrator mode with Windows Vista and Windows-7, which meant that I could not start MRTG automatically at startup.  I decided that the best way to work round this problem was to write a separate program (DiskTemperatureReporter) to read the disk temperatures, and then deposit a file in the \MRTG\bin\ directory so that MRTG could read the data with the configuration:

Target[stamsund_disk_temp]: `type disk-temps.dat`

This enabled me to capture what was possibly the hottest May day ever in Edinburgh - 2010 May 23 - and the next couple of days for comparison!  Here's the screen-shot The DiskTemperatureReporter program is unpublished, but available on an as-is basis by e-mail request.  It needs to be started by hand when the user logs into the PC.

Alternatively, you could use the SpeedFan & SNMP add-on mentioned above which includes disk temperature monitoring.

 

Air Temperature

For air-termperature monitoring upstairs and downstairs I use a couple of simple USB sensors:

   http://www.pcsensor.com/index.php?_a=viewProd&productId=6

coupled with a simple program I wrote myself.  The MRTG config lines are much like the disk temperature lines above.  Contact me if interested in a copy of the program.   If the product URL has changed, look for the product "USB TEMPer" on the site.

One thing I have been playing with since June 2011 is to record the temperatures as 1000 times the actual value, i.e. milli-degrees.   This has two consequences - first that the data from the probes can be recorded with greater precision.  Although the probes are only accurate to within a couple of degrees, they do record the data to greater precison - perhaps half or a quarter degree C.  Secondly, and perhaps more importantly, using milli-degrees allows MRTG to record a more precise average for the week, month and year graphs which are no longer limited to integer temperature values.  To allow this, the program has an option to multiply returned values by 1000, enabled with the -1K parameter.
For outside tempertures, which could be negative Celsius, I have decided to record Fahrenheit which are most unlikely to go negative here in Edinburgh.  This is enabled with the -F option to the program.
MRTG would normally display values such as 20,000 (i.e. from 20°C) as "20.0 k", where the "k" is the standard thousands units multiplier.  With capital "K" also being temperature in Kelvins, this is doubly undesriable, so the display of the units multipler is supressed with the "kMG" entry (see below).  To make the value displayed below the graphs correct, the "Factor" entry is set to 0.001.  Fortunately, although the documentation doesn't say so explicitly, a floating point value is accepted here as the multipler.

Example graphs may be found here.


Ideas for the contents of: narvik-air-temperatures.inc

#---------------------------------------------------------------
#	PC Narvik - Example for indoor air temperature
#---------------------------------------------------------------

Target[Narvik_air_temp]: `GetAirTemp  -1K`
MaxBytes[Narvik_air_temp]: 100000
MaxBytes2[Narvik_air_temp]: 100000
Title[Narvik_air_temp]: Air temperature for PC Narvik
Options[Narvik_air_temp]: integer, gauge, nopercent, growright, unknaszero, noo
YLegend[Narvik_air_temp]: Temperature °C
ShortLegend[Narvik_air_temp]: °C
kMG[Narvik_air_temp]: ,,
Factor[Narvik_air_temp]: 0.001
Legend1[Narvik_air_temp]: Air temperature in °C
LegendI[Narvik_air_temp]: Air temperature °C
PageTop[Narvik_air_temp]: <H1>PC Narvik -- Air Temperature</H1>

#---------------------------------------------------------------
#	PC Narvik - Example for outside air temperature
#---------------------------------------------------------------

Target[Narvik_air_temp_b]: `GetAirTemp -1K  -F`
MaxBytes[Narvik_air_temp_b]: 140000
MaxBytes2[Narvik_air_temp_b]: 140000
Title[Narvik_air_temp_b]: Outside air temperature
Options[Narvik_air_temp_b]: integer, gauge, nopercent, growright, unknaszero, noo
YLegend[Narvik_air_temp_b]: Temperature °F
ShortLegend[Narvik_air_temp_b]: °F
kMG[Narvik_air_temp_b]: ,,
Factor[Narvik_air_temp_b]: 0.001
Legend1[Narvik_air_temp_b]: Outside air temperature in °F
LegendI[Narvik_air_temp_b]: Outside Air temperature °F
PageTop[Narvik_air_temp_b]: <H1>Outside Air Temperature</H1>

 

Cable Modem Signal levels

I found that the Motorola Cable Modem I have happens to report its signal levels via SNMP, provided you know the right object ID (OID).  I'm not sure where I found this data from, but you might want to search Google with "snmp oid docsIfDownChannelPower" or look here:

http://www.oidview.com/mibs/0/DOCS-IF-MIB.html

http://support.ipmonitor.com/mibs/DOCS-IF-MIB/item.aspx?id=docsIfDownChannelPower

The only thing of note is that I put the IP address for the cable modem into my Hosts file as "cm-hfc", since my ISP only provides a dynamic IP.

Contents of: cable-modem.inc

#---------------------------------------------------------------
# Cable modem RF signal levels
#---------------------------------------------------------------

Target[CM-levels]: 1.3.6.1.2.1.10.127.1.1.1.1.6.3&1.3.6.1.2.1.10.127.1.2.2.1.3.2:public@cm-hfc / 10
AbsMax[CM-levels]: 70
MaxBytes[CM-levels]: 70
Title[CM-levels]: Motorola SB5101E Cable Modem - RF Signal Levels
Options[CM-levels]: integer, gauge, nopercent, growright, unknaszero
ShortLegend[CM-levels]: dBmV
YLegend[CM-levels]: dBmV
LegendO[CM-levels]: Transmit level&nbsp;
LegendI[CM-levels]: Received level&nbsp;
Legend2[CM-levels]: Transmit (upstream) level: +30..+56dBmV is OK
Legend1[CM-levels]: Received (downstream) level: -10..+10dBmV is OK
PageTop[CM-levels]: <H2>Motorola SB5101E Cable Modem - RF Signal Levels</H2>

#---------------------------------------------------------------
# Cable modem SNR
#---------------------------------------------------------------

Target[CM-SNR]: 1.3.6.1.2.1.10.127.1.1.4.1.5.3&1.3.6.1.2.1.10.127.1.1.4.1.5.3:public@cm-hfc / 10
AbsMax[CM-SNR]: 70
MaxBytes[CM-SNR]: 70
Title[CM-SNR]: Motorola SB5101E Cable Modem - SNR & bandwidth
Options[CM-SNR]: integer, gauge, nopercent, growright, unknaszero, noo
ShortLegend[CM-SNR]: dB
YLegend[CM-SNR]: dB
LegendI[CM-SNR]: RX SNR&nbsp;
Legend1[CM-SNR]: Received SNR (dB)
PageTop[CM-SNR]: <H2>Motorola SB5101E Cable Modem - SNR</H2>

#---------------------------------------------------------------

Since Virgin Media chose to remove any SNMP capability from their modems, I have since been using a screen-scraper program to capture the data. This has recently been updated to work with the even more inaccessible signal level information from the VM "Superhub".

  

Monitoring network I/O

I haven't said anything about monitoring network I/O as this is already covered in the MRTG documentation.

I did happen to capture this screenshot, showing just how useful having monitoring on your PCs can be.  In this case, I had changed the firewall on a PC, and suddenly the network input to all devices on the network shot up.  Checking with wireshark on a laptop PC, the software was sending out 254 ARP packets every 120 seconds, and each packet was the maximum wire size (1514 bytes).  Other ARP packets were either 42 or 60 bytes!  I've report this to the developers.

 


Acknowledgements: the SNMP work was triggered by an e-mail exchange with Lonni J Friedman who asked how I got MRTG working under Vista (answer: Run As Administrator, having added SNMP and allowed it through the firewall), but who had the performance monitoring working under Windows XP!  Steve Catto first introduced me to MRTG - thanks Steve!

Update: I just found this PDF document which covers monitoring a Windows system with MRTG. Local copy as the original http://www.sans.edu/resources/student_presentations/mrtg_notes.pdf is no longer online.

 
Copyright © David Taylor  Edinburgh SatSignal home page Last modified: 2011 Jul 28 at 14:12:10