ELSA was written because commercial tools were both lacking and cost prohibitive. The only tool that provided the features I needed was Splunk. Unfortunately, it was cost prohibitive and was too slow to receive the log volume I wanted on the hardware I had available. ELSA is inspired by Splunk but is focused on speed versus dashboards and presentation.
In designing ELSA, I tried the following components but found them too slow. Here they are ordered from fastest to slowest for indexing speeds (non-scientifically tested):
1. Tokyo Cabinet
3. TokuDB MySQL plugin
4. Elastic Search (Lucene)
8. MySQL Fulltext
What is ODE?
ODE is a fully integrated open source log management platform for collecting, indexing, and analyzing both structured and unstructured data from many sources. It is a centralized syslog framework built on Syslog-NG, MySQL and Sphinx full-text search. It provides a fully asynchronous web-based query interface that normalizes logs and makes searching billions of them for arbitrary strings as easy as searching the web. ODE includes tools for assigning permissions, to view the logs as well as email based alerts, scheduling queries and creating graphs.
Some features include:
● Receiving/indexing High-volume logs (a single node can receive > 30k logs/sec, sustained).
● Full Active Directory/LDAP integration for authentication, authorization, email settings
● Ability to generate instant ad-hoc reports/graphs on arbitrary queries even on enormous data sets
● Dashboards using Google Visualizations
● Schedule searches and set up email alerts & report generation.
● Plugin architecture for web interface
● Distributed architecture for clusters
● Ships with normalization for some Cisco logs, Snort/Suricata, Bro, and Windows via Eventlog-to-Syslog or Snare
ELSA achieves n node scalability by allowing every log receiving node to operate completely independently of the others. Queries from a client through the API against the nodes are sent in parallel so the query will take only the amount of time of the longest response. Query results are aggregated by the API before being sent to the client as a response. Response times vary depending on the number of query terms and their selectivity, but a given node on modest hardware takes about one half second per billion log entries.
Log reception rates greater than 50,000 events per second per node are achieved through the use of a fast pattern parser in Syslog-NG called PatternDB. The pattern parser allows Syslog-NG to normalize logs without resorting to computationally expensive regular expressions. This allows for sustained high log reception rates in Syslog-NG which are piped directly to a Perl program which further normalizes the logs and prepares large text files for batch inserting into MySQL. MySQL is capable of inserting over 100,000 rows per second when batch loading like this. After each batch is loaded, Sphinx indexes the newly inserted rows in temporary indexes, then again in larger batches every few hours in permanent indexes.
Sphinx can create temporary indexes at a rate of 50,000 logs per second consolidate these temporary indexes at around 35,000 logs per second, which becomes the terminal sustained rate for a given node. The effective bursting rate is around 100,000 logs per second, which is the upper bound of Syslog-NG on most platforms. If indexing cannot keep up, a backlog of raw text files will accumulate. In this way, peaks of several hours or more can be endured without log loss but with an indexing delay.
The overall flow diagram looks like this:
Network → Syslog-NG (PatternDB) → Raw text file
HTTP upload → Raw text file
Batch load (by default every minute):
Raw text file → MySQL → Sphinx