
INTRODUCTION
============

This is a set of tools intended to help creation and graphing of RRD files.
Specifically, it includes a command to create a set of similar RRDs, and other
one to graph them in some medium-weight elaborated ways. They share a common
library (actually a perl module) in charge to parse configuration files.
Initially, this tool didn't help creation of web pages that would very likely
show the graphs, although two extra commands were included later to address
this point.

They were developed during our initial test & play with the RRDtools, when
trying to arrive to acceptable consolidation functions lead to continuous
recreation of the RRD's, using previously gathered data stored on text files.
They have been extensively used in two areas, one related to performance
data storing from our servers, and another one related to the plotting of
SNMP-retrieved values from our squid cache servers.

One single configuration file can describe a single RRD file structure
for multiple nodes, or many similar RRD files for a single node (or many of
them) as it could be the case for running processes (httpd, ftpd, ...)

As an operational extension, a set of utilities to sample SNMP agents using
the information from the configuration files is included in directory
snmp-kit.



USAGE
=====

The basic usage for all the commands is the same. There is a first mandatory
argument which is the configuration file to read. If there is any additional
parameters, they are the nodes we want apply the command to. With only one
argument, the nodes list is taken from the configuration file.
[NOTE: The first argument can actually be a list of comma separated config
files, with no embedded space within]

RRDcreator - Creates the RRD according to the configuration file.
RRDgrapher - This is the advanced plotter. It extracts all the graphs defined
             on the configuration file and plots them. The available graphs
             are line and area-stacked ones, and is able to use any
             consolidation functions. Also multinode plots for single datasource
             are available.
RRDweb     - Creates a basic navigation tree, with one page at the top of the
             graphs hierarchy, which links to nodes pages and ends with html
             pages with an MRTG-alike history.


RRDnode    - In this case, the mandatory parameter is not a conf file but a
             node name. It creates a page with pointers to all the graphs
             defined for one particular node.
             (Depends on RRDweb)


tagged configurations
---------------------

As mentioned in the introduction, a single configuration file can define
multiple instances of an RRD for one host. This is achieved using the
'tagged' configs. When the configuration file that is supplied to the
commands is like config.tag, config is used as the real configuration
file, and tag is used as a key to search within that configuration file
and also to name the resulting RRD. More information of this subject is
found at the TAGGED CONFIGURATIONS section.



THE CONF FILES
==============

Comment lines begins with #, and definition lines must begin with a valid
keyword. Some variables defined on the RRDutils module can be overriden
in the configuration file using the keywords
DataDir  - Changes the top directory of the RRD's hierarchy
GraphDir - The same for the generated graphs
DefGraph - Changes the default graph size (pixels). Both with and height
    must be indicated

The rest of the configuration file follows closely the rrdcreate format,
and the available keywords and their meaning are:

BaseStep - the base interval of the RRD (--step option)
Start    - the first value to be kept (the --start option)
TimeStep - a multiplier factor for the value of base interval. Only
    affects the consolidation function definitions, acting as a scale
    change for them. It's intended to make easier the reading of the
    configuration file, but it's real utility is much clear on the
    EXAMPLES file

H - Adds the listed hosts to those affected by the supplied configuration
    file. Can appear on multiple lines even repeating the tag name, which
    is optional. The unix-alike usage can be written as
       H[tag] host1 host2 host3

T - Adds indexes to an SNMP table. The first token is the name of the table
    owner, and the remaining ones are the actual index values. It is not
    necessary to list every index on the same line, because it's possible to
    list the same host on multiple lines.
       T host index1 index2 index3

[data_source] - to define a data source called data_source. Any additional
    text is considered as a label for the data source and as such is
    included on the graphs. No white spaces or directory separators (/) are
    allowed in the data source name. If any text is present just before the
    opening square bracket, it is considered as the OID name of the entity.
    Optionally, the next line can include extra information for the data
    source in the unix usage format
    [C|G|D|A] [ beat [ [min] max ] ]                (defaults C 3 U U)
    The CGDA letters allow identification among the different data source
    types allowed by RRDtools, and the other parameters are explained by
    themselves. The first numerical parameter is the beat, which is written
    as a factor of BaseStep, thus measuring the number of PDP's that could
    remain unknown. If a second numerical parameter is present, it should
    be the maximum value, that can be set to 'U' for unknown

CF - Defines a RRA, with up to four possible parameters. The initial 'CF'
    might be omitted, but the type of consolidation function is mandatory.
    With one value supplied, it must be the number of rows to keep in the
    RRA. The extra parameters are differentiated by the presence of decimal
    dot, mandatory for xff (0<= xff < 1), and forbidden for steps (integer)
    The number of steps is multiplied by the current TimeStep value to be
    transformed into BaseStep units.
    Default values are 0.5, 1 and 192.
    Unix usage as 
    [CF] type [ [xff] [steps] rows ]

Graphs Definitions
------------------

Although graphs are defined also in the configuration file, their definition
is complex enough to reserve a section to describe them.

I - Defines extra time interval for graphs, supplying a name and a length in
    seconds. Day, week, month, quarter and year intervals are already defined

O  - Is used to define common rrdgraph options (as --logarithmic). Later
     occurrences cause an append to the options list. They are written after
     any other options, so the graph size can be overridden
O= - The same than previous, but instead of appending, resets the options
     to those indicated

The plots themselves are indicated with

P  - Line plot definition
PS - Area stacked plot definition
PH - Plots one datasource for every node in the same graph. Multiple ds
     produce multiple plots

The plot definition line is slightly complex, but also rather complete
(within their limitations). The minimum requirements are a time interval
and a number of datasources to plot. The graphfile name is constructed as
a join of the ds names, but a shorter graphfile name can be supplied right
after the type of plot. Only lowercases and underscores can be used for this
name, with one leading underscore (for reading purposes) discarded. In both
cases, the time interval is appended to the graph name.

The default consolidation function is AVERAGE, but other CF can be chosen
by prefixing the data source name with a character-underscore sequence, with
the CF type coded into the character as: a(VERAGE), m(IN), M(AX), l(AST).

A non-standard size for the graph can be supplied before the time interval
with 300x90 format. Any of the width or height might be omitted, but the 'x'
is mandatory to discriminate from (not allowed anyway) numeric named time
intervals.

The unix-usage format looks like
   P[S|H][_plot_name] [[width]x[height]] interval [m_]ds1 ds2 ...


TAGGED CONFIGURATIONS
=====================

The tagged configurations allow the use of a single configuration file to
define multiple RRD files with the same structure. They came to live to
alleviate the repetition of the same configuration file with different names
when dealing with process accounting. To keep record of the resources usage
of different processes, a different RRD file is needed for each one. If
you have a bunch of nodes running, for example web, ftp and mail servers,
you must keep three copies of the same configuration file, with the only
difference (if any) of the affected hosts.

The tagged configurations add a little complexity to the configuration file
and commands usage, and a lot of flexibility to the whole system. Instead
of multiple configurations affecting different hosts, you can keep one single
configuration file affecting multiple hosts and multiple 'tags', which are
like instances of the configuration files. Instead of one Host line per file,
we have multiple ones, one for each tag:
Hhttp host1 host2 host3 host4
Hftp  host1 host3
Hmail host2 host3 host4

The commands usage changes slightly to accommodate tagged configurations.
Instead of being blindly passed to RRDutils functions, the command line
supplied config file is splitted at the dot occurrence. The first part
is the 'physical' config file, and as such is passed to the RRDutils
functions, and the second one is the 'tag' part, which is used to extract
the affected hosts and as the real name for the rrdfiles and graphs, in
the same way that normal configuration file names.

