
RiverMuse architecture
RiverMuse Core 3.4.3 & 3.4.4
Introduction
RiverMuse Core is an event management application that is used to monitor and manage the status of a network. RiverMuse Core utilizes rsyslogd to receive syslog events as well as SNMP Traps. For example, an interface going up or down, a ping fail, traps being received or error rates exceeding a predefined threshold might all be classified as events.
A stream of events is sent to Rivermuse Core, which uses a set of rules to determine the events that should be upgraded to alerts. Alerts are correlated events that the rules recognize as important enough to require the attention of a network operator.
For example, RiverMuse Core Edition automatically de-duplicates repeat event messages, and correlates failure and recovery pairs of event messages.
|
|
Architecture
RiverMuse is built around a MySQL database, rivermuse. The RiverMuse system contains two key executables, omosd and yarpd, as well as a number of agents.
The database does not have to be MYSQL. In the future, RiverMuse will be designed to work with any database.
omosd
omosd is the open network management operating system daemon, which talks directly to the rivermuse database. omosd is responsible for: deciding which events become alerts and events in the system, as well as, translating between monitored events, and RiverMuse alerts and events.
omosd inputs alerts and events into the rivermuse database via stored procedures that abstract the table schema.
yarpd
yarpd is a rule-processing daemon. yarpd monitors the rivermuse database for new alerts, and changes to alerts, and then executes the rule logic present in the database to perform correlation. For example, link-up and link-down pairing is done through rules executing the yarp daemon.
link-up, link-down: A common problem in network management is the status of a port or network connection is reported as status changes. As a link goes down you receive a link-down alert, when the link goes up, you receive a link-up alert. In the absence of any correlation, you would be over whelmed by hundreds of link-up and link-down alerts, particularly in the common case of intermittent failures causing rapid port cycling. Link-up and link-down pairing is a simple correlation whereby the arrival of a link-up alert will cause the system to automatic close a link-down alert active for the same port. For a given port, you will either see a single link down alert, or link-up which represents the real-time status of the link or port.
yarpd performs the correlation in RiverMuse by allowing you to define conditions. Conditions are Boolean filters acting on the attributes of an alert. In addition, conditions control the monitoring of variables that are linked to predefined types of actions, which are in the event that a condition triggers.
The rule inside of a condition uses the $ syntax to create a Boolean statement, which evaluates to either TRUE or FALSE. TRUE causes the condition to fire, whereas, FALSE causes the condition to be ignored.
For further information on yarpd, refer to the yarpd overview.
RiverMuse desktop
The RiverMuse desktop is a thin client application. The RiverMuse desktop retrieves all its data from the rivermuse database.

Alerts and Events
An alert represents the current state of a set of events. For example, if the system polls an interface on server 1 every minute using ping, and receives a ping fail, the conclusion is that the server is unreachable due to a network fault. The ping fails will continue to manifest in the system until the fault is resolved.
The difference between an alert and an event is that when you get the first ping fail, the system creates an alert and an event; however, when the system receives the second ping fail, another event but not another alert is created. The system will not create a second alert because a key feature of the RiverMuse fault management tool is de-duplication.
de-duplication: Many event sources monitored by RiverMuse agents produce repetitive data. For example, many network management systems monitor the status of servers using a simple ICMP ping. When a server goes down, if the server is pinged every minute a ping fail alert will be created every minute. Consequently, you will be deluged by ping fail alerts. De-duplication automatically recognizes such duplicate alerts. Instead of multiple alerts, you will receive one alert with the number of times it has occurred and a series of time stamps to indicate the time of the first and last occurrence.
An alert measures the state of the device whereas an event is an occurrence relevant to a device, or to an alert. RiverMuse supplies the following event types:
- NewAlert
- DuplicateAlert
- AssignedToUser
- DeAssignedFromUser
- AssignmentAcknowledged
- AssignmentUnacknowledged
- RuleUpdate
- AlertClosed
In the previous example, the second event created on the second ping fail would be an alert duplicate event. Two fields in the alert are updated: the count will go from one to two, and the last_occurred time stamp will be updated with the current time. The alert state now shows that there are two events and the last_occurred time was now, and the first time was one minute ago. Each ping fail will create a new event.
An event in the lifecycle of an alert
Duplicate occurrences are just one example of the system using events.
Alerts have owners, and state to indicate workflow. For example, an operator at a network center on seeing a new alert may assign the alert to an engineer who deals with ping fail alerts. The engineer subsequently acknowledges receipt of the assignment and may update the alert journal, run specific tools i.e., trace route on the alert, and will eventually close the alert.
Instead of simply updating the alert, RiverMuse, for each action in the life of the alert will create a new event. The schema of the rivermuse database, which defines stored procedures and triggers, will update the status of the alert. For example, the system, or system user, owns an alert and an event when they are first created. If you assign the alert to engineer Bob, the GUI will create an assign event in the rivermuse database. The rivermuse database will take the assign event and update the alert to set the owner to Bob.