• Why ELSA?

      ELSA was written because commercial tools were both lacking and cost prohibitive. The only tool that provided the features I needed was Splunk. Unfortunately, it was cost prohibitive and was too slow to receive the log volume I wanted on the hardware I had available. ELSA is inspired by Splunk but is focused on speed versus dashboards and presentation.

      In designing ELSA, I tried the following components but found them too slow. Here they are ordered from fastest to slowest for indexing speeds (non-scientifically tested):
      1. Tokyo Cabinet
      2. MongoDB
      3. TokuDB MySQL plugin
      4. Elastic Search (Lucene)
      5. Splunk
      6. HBase
      7. CouchDB
      8. MySQL Fulltext

    • What is ODE?

      ODE is a fully integrated open source log management platform for collecting, indexing, and analyzing both structured and unstructured data from many sources. It is a centralized syslog framework built on Syslog-NG, MySQL and Sphinx full-text search. It provides a fully asynchronous web-based query interface that normalizes logs and makes searching billions of them for arbitrary strings as easy as searching the web. ODE includes tools for assigning permissions, to view the logs as well as email based alerts, scheduling queries and creating graphs.
      Some features include:

      ● Receiving/indexing High-volume logs (a single node can receive > 30k logs/sec, sustained).
      ● Full Active Directory/LDAP integration for authentication, authorization, email settings
      ● Ability to generate instant ad-hoc reports/graphs on arbitrary queries even on enormous data sets
      ● Dashboards using Google Visualizations
      ● Schedule searches and set up email alerts & report generation.
      ● Plugin architecture for web interface
      ● Distributed architecture for clusters
      ● Ships with normalization for some Cisco logs, Snort/Suricata, Bro, and Windows via Eventlog-to-Syslog or Snare

    • Capabilities

      ELSA achieves n node scalability by allowing every log receiving node to operate completely independently of the others. Queries from a client through the API against the nodes are sent in parallel so the query will take only the amount of time of the longest response. Query results are aggregated by the API before being sent to the client as a response. Response times vary depending on the number of query terms and their selectivity, but a given node on modest hardware takes about one half second per billion log entries.

      Log reception rates greater than 50,000 events per second per node are achieved through the use of a fast pattern parser in Syslog-NG called PatternDB. The pattern parser allows Syslog-NG to normalize logs without resorting to computationally expensive regular expressions. This allows for sustained high log reception rates in Syslog-NG which are piped directly to a Perl program which further normalizes the logs and prepares large text files for batch inserting into MySQL. MySQL is capable of inserting over 100,000 rows per second when batch loading like this. After each batch is loaded, Sphinx indexes the newly inserted rows in temporary indexes, then again in larger batches every few hours in permanent indexes.

      Sphinx can create temporary indexes at a rate of 50,000 logs per second consolidate these temporary indexes at around 35,000 logs per second, which becomes the terminal sustained rate for a given node. The effective bursting rate is around 100,000 logs per second, which is the upper bound of Syslog-NG on most platforms. If indexing cannot keep up, a backlog of raw text files will accumulate. In this way, peaks of several hours or more can be endured without log loss but with an indexing delay.

      The overall flow diagram looks like this:

      Live, continuously:
      Network → Syslog-NG (PatternDB) → Raw text file
      HTTP upload → Raw text file
      Batch load (by default every minute):
      Raw text file → MySQL → Sphinx

    • Calculating Disk Requirements

      The basic rule of thumb is that ELSA will require about 50% more disk than flat log files. This will provide archived and indexed logs. Archive logs require about 10% of flat file logs, log indexes require 40-50% more disk than the flat files, so together, there is a roughly 50% overall penalty.

      To specify how much disk to use, see the config file entry for log_size_limit, which is the total limit ELSA will use. Within that limit, the archive section’s config value for “percentage” dictates what percentage of the overall log_size_limit will be used for archive, and the rest will be used for indexed logs. If you do not wish to archive, set the percentage to zero and all space will be for the index, or vice versa.

    • Choosing the right Hardware

      Single ELSA node will comfortably handle about 10,000 events/second, sustained, even with slow disk. As shown above, ELSA will happily handle 50,000 events/second for long periods of time, but eventually index consolidation will be necessary, and that’s where the 10,000-30,000 events/second rate comes in. A virtual machine probably won’t handle more than 10,000 events/second unless it has fairly fast disk (15,000 RPM drives, for instance) and the disk is set to “high” in the hypervisor, but a standalone server will be able to run at around 30,000 events/second on moderate server hardware.

      Recommendation is minimum of two cores, but as described above, there is work enough for four. RAM requirements are a bit less obvious. The more RAM you have, the more disk cache you get, which helps performance if an entire index fits on disk. A typical consolidated (“permanent”) index is about 7 gigabytes on disk (for 10 million events), so I recommend 8 GB of RAM for best performance, though 2-4 GB will work fine.

      RAM also comes into play in temporary index count. When ELSA finds that the amount of free RAM has become too small or the amount of RAM ELSA uses has surpassed a configured limit (80 percent and 40 percent, by default, respectively), it will consolidate indexes before hitting its size limit (10 million events, by default). So, more RAM will allow ELSA to have more temporary indexes and be more efficient about consolidating them.

      In conclusion, if you are shopping for hardware for ELSA, you don’t need more than four CPU’s, but you should try to get as much disk and RAM as possible.

    • ReactJS

      React + React Native

    • VueJS


    • Installation

      Installation is done by running the install.sh file obtained either by downloading from the sources online or grabbing from the install tarball featured on the ELSA Google Code home page. When install.sh runs, it will check for the existence of /etc/elsa_vars.sh to see if there are any local customizations, such as passwords, file locations, etc. to apply. The install.sh script will update itself if it finds a newer version online, so be sure to store any changes in /etc/elsa_vars.sh.

      The install.sh script should be run separately for a node install and a web install. You can install both like this: sh install.sh node && sh install.sh web. Installation will attempt to download and install all prerequisites and initialize databases and folders. It does not require any interaction.

      Currently, Linux and FreeBSD 8.x are supported, with Linux distros based on Debian (including Ubuntu), RedHat (including CentOS), and SuSE tested. install.sh should run and succeed on these distributions, assuming that the defaults are chosen and that no existing configurations will conflict

    • Supported Operating System

      Tested on:
      ● Ubuntu 12.04
      ● Ubuntu 14.04
      ● RedHat 6.6
      ● CentOS 6.6

    • Installation using Packages

      ODE Installation

      Note: You may still use elsa_vars.sh under /etc directory before running the package to make any configuration changes as is the case with the original ELSA installation.

      Debian – Ubuntu 12.04 and 14.04

      Download the package,

      wget https://s3-us-west-1.amazonaws.com/ode0.3/ode_0.3-2_all.deb

      Run the package,

      sudo dpkg -i ode_0.3-2_all.deb
      sudo apt-get install -f   (this command is not required for updates)

      Note: dpkg will complain of missing dependencies (for fresh install), ignore it.

      RPM – RHEL 6.6 and Centos 6.5

      Download the package,

      wget https://s3-us-west-1.amazonaws.com/ode0.3/ode-0.3-3.noarch.rpm


                                  curl -L -o ode-0.3-3.noarch.rpm  https://s3-us-west-1.amazonaws.com/ode0.3/ode-0.3-3.noarch.rpm

      Run the package,

      sudo yum -y install ode-0.3-3.noarch.rpm  (for fresh install)
      sudo yum -y update ode-0.3-3.noarch.rpm (for upgrade from ODE 0.1 to 0.3)

      After upgrade (ODE 0.1 to 0.3) you may have to restart services due to a bug in 0.1 package,

      service syslog-ng restart
      service searchd restart
      service starman restart

      Note: The start of services is only required when upgrading from 0.1 to 0.3 not for fresh installs.

      Enable port 80 (firewall blocks port 80 by default on Centos),

      sudo vi /etc/sysconfig/iptables

      Copy the ssh accept line and change the port to 80

      sudo service iptables restart

      reference: http://www.binarytides.com/open-http-port-iptables-centos/

      Verify Installation

      Check the log,

      vi /var/log/ode_install.log

      Run web UI,


      Removing Packages

      In case you want to remove the installed package. Note, this will delete all the ode related data and configuration on your machine.
      Debian (ubuntu),

      apt-get --purge autoremove ode

      RPM (RHEL/Centos),

      yum remove ode

      AWS Images

      You may also use the pre-built ODE images (medium and large systems) on AWS for quick installation or evaluation. You can search for “opallios” in the Community AMIs for these images.

      ● Ubuntu 14.04 Med – ODE-0.3-ubuntu-14.04-med-Opallios
      ● RedHat 6.6 Med – ODE-0.3-rhel-6.6-med-Opallios

    • File Locations

      The main ELSA configuration files are /etc/elsa_node.conf and /etc/elsa_web.conf. All configuration is controlled through these files, except for query permissions which are stored in the database and administered through the web interface. Nodes read in the elsa_node.conf file every batch load, so changes may be made to it without having to restart Syslog-NG.

      Most Linux distributions do not ship recent versions of Syslog-NG. Therefore, the install compiles it from source and installs it to $BASE_DIR/syslog-ng with the configuration file in $BASE_DIR/syslog-ng/etc/, where it will be read by default. By default, $BASE_DIR is /usr/local and $DATA_DIR is /data. Syslog-NG writes raw files to $DATA_DIR/elsa/tmp/buffers/ and loads them into the index and archive tables at an interval configured in the elsa_node.conf file, which is 60 seconds by default. The files are deleted upon successful load. When the logs are bulk inserted into the database, Sphinx is called to index the new rows. When indexing is complete, the loader notes the new index in the database which will make it available to the next query. Indexes are stored in $DATA_DIR/sphinx and comprise about as much space as the raw data stored in MySQL.

      Archive tables typically compress at a 10:1 ratio, and therefore use only about 5% of the total space allocated to logs compared with the index tables and indexes themselves. The index tables are necessary because Sphinx searches return only the ID’s of the matching logs, not the logs themselves, therefore a primary key lookup is required to retrieve the raw log for display. For this reason, archive tables alone are insufficient because they do not contain a primary key.

      If desired, MySQL database files can be stored in a specified directory by adding the “mysql_dir” directive to elsa_node.conf and pointing it to a folder created which has proper permissions and SELinux/apparmor security settings.

      Hosting all files locally

      If your ELSA web server will not have Internet access, you will need to host the Javascript for the web pages locally. To do this, after installing:

      cd /usr/local/elsa/web/inc
      wget “http://yuilibrary.com/downloads/yui2/yui_2.9.0.zip”
      unzip yui_2.9.0.zip

      Edit the elsa_web.conf file and set yui/local to be “inc” and comment out “version” and “modifier.”

      Caveats for Local File Hosting

      If Internet access is not available, some plugins will not function correctly. In particular the whois plugin uses an external web service to do lookups, and these will not be possible without Internet connectivity. In addition, dashboards will not work if the client’s browser does not have connectivity to Google to pull down their graphing library.

    • Web Server

      The web frontend is typically served with Apache, but the Plack Perl module allows for any web server to be used, including a standalone server called Starman which can be downloaded from CPAN. Any implementation will still have all authentication features available because they are implemented in the underlying Perl.

      The server is backended on the ELSA web database, (elsa_web by default), which stores user information including permissions, query log, stored results, and query schedules for alerting.

      Admins are designated by configuration variables in the elsa_web.conf file, either by system group when using local auth, or by LDAP/AD group when using LDAP auth. To designate a group as an admin, add the group to the array in the configuration. Under the “none” auth mode, all users are admins because they are all logged in under a single pseudo-username.

      The web server is required for both log collectors and log searchers (node and web) because searches query nodes (peers) using a web services API.

    • Web Configuration

      Most settings in the elsa_web.conf and elsa_node.conf files should be fine with the defaults, but there are a few important settings which need to be changed depending on the environment.
      ● Nodes: Contains the connection information to the log node databases which hold the actual data.
      ● Auth_method: Controls how authentication and authorization occurs. For LDAP, the ldap settings must also be filled out.
      ● Link_key: should be changed to something other than the default. It is used to salt the auth hashes for permalinks.
      ● Email: For alerts and archive query notifications, you need to setup the email server to use. If you wish to get the actual results from an alert, in addition to a link to the results, add the following config to the email section:
      “email”: {
      “include_data”: 1
      ● Meta_db: Should point to the database which stores the web management information. This can reside on a node, but probably shouldn’t. The performance won’t be much of a factor, so running this locally on the web server should be fine.
      ● Excluded_classes: If you want to remove some classes from the menus and searches altogether, configure the config entry for excluded_classes like this:
      “excluded_classes”: {
      “BRO_SSL”: 1
      ● APIKeys: The “apikeys” hash holds all known username/apikey combinations, such as:
      “apikeys”: { “elsa”: “abc” }
      ● Peers: Configuration for how this ELSA node will talk to other ELSA nodes. Note that a configuration for itself ( is required for any query to complete. An example configuration is:
      “peers”: {
      “”: {
      “url”: “”,
      “user”: “elsa”,
      “apikey”: “abc”
      ● Default OR: By default, all search terms are required to be found in the event to constitute a match (AND). If you wish, you can set the config value “default_or” to a true value to change the default behavior to making the search match if any of the given values are true:
      “default_or”: 1

      Node Configuration

      ● Database: Edit the connection settings for the local database, if non-default.
      ● Log_size_limit: Total size in bytes allowed for all logs and indexes.
      ● Sphinx/perm_index_size: This setting must be tweaked so that perm_index_size number of logs come into the system before (num_indexes* sphinx/index_interval) seconds pass.
      ● Archive/percentage: Percentage of log_size_limit reserved for archive.
      ● Archive/days: Max number of days to retain logs for in the archive
      ● Sphinx/days: Max number of days to retain logs for in the indexes
      ● forwarding/forward_only: This node will only forward logs and not index them.
      ● forwarding/destinations: An array of hashes of forwarders, as detailed in the Forwarding section.

      Forwarding Logs

      ELSA can be setup to forward (replicate) logs to an unlimited number of destinations in several ways:

      Method Config Directive
      File Copy cp
      SSH scp
      HTTP/S url

      File Copy

      Configuration options:

      Option Meaning Required
      dir Directory to copy the file to. This can be a destination where backup agent reads from or an NFS mount. Yes


      Configuration options:

      Option Meaning Required
      user Username for SSH Yes
      password Password for the user If no key_path
      key_path Path for RSA/DSA keypair files (.pub) If no password
      host IP or DNS name of host to forward to Yes
      dir Remote directory to copy to Yes


      Configuration items:

      Option Meaning Required
      url Full URL, (including https://), of where to send logs Yes
      verify_mode Boolean indicating whether strict SSL certificate checking is to be enforced. Use zero for certificates that don’t have a trusted certificate authority on the forwarder (default self-signed, for instance) No
      timeout Number of seconds to issue a timeout on. Defaults to zero (no timeout) No
      ca_file SSL certificate authority file to use to verify the remote server’s certificate No
      cert_file Client-side SSL certificate the server may require to verify the client’s identity No
      key_file Key corresponding with cert_file No

      An example forwarding configuration may look like this:

      “forwarding”: {
      “forward_only”: “1”,
      “destinations”: [
      { “method”: “url”, “url”: “http://example.com/API/upload” },
      { “method”: “url”, “url”: “https://secure.example.com/API/upload”, “ca_file”: “/etc/mycafile.pem” }

      Low volume configuration tuning

      If your ELSA node isn’t receiving many logs (less than a few hundred per minute), you may need to tune your setup so that permanent indexes aren’t underutilized. There are at most num_indexes number of permanent indexes, and if there isn’t a free one available, the oldest one will be overwritten. If this happens before the log_size_limit has been reached, then it means that you rolled logs before you wanted to. This means you need to tweak some settings in elsa_node.conf:

      ● Increase num_indexes to something larger like 400
      ● Increase allowed_temp_percent to 80

      This should give you .8 x 400 x 60 seconds of time before temp indexes get rolled into a perm index, and should give you more perm indexes before they get rolled. With 400 perm indexes, that should be more than 88 days of possible index time. If that’s still not enough, move index_interval up from 60 seconds to something larger (this will extend the “lifetime” of a temp index).

      If you set num_indexes to be larger than 200, you should increase the open files limit for searchd (Sphinx). You can do this on Linux by editing/etc/security/limits.conf and adding:

      root soft nofile 100000
      root hard nofile 200000

      Then logout, login, and restart searchd.

      Changing num_indexes

      If you change the num_indexes setting in /etc/elsa_node.conf, you will need to regenerate the /usr/local/etc/sphinx.conf file. To do so, either delete or move the existing sphinx.conf file and then run:

      echo “” | perl /usr/local/elsa/node/elsa.pl -on
      pkill searchd
      /usr/local/sphinx/bin/searchd –config /usr/local/etc/sphinx.conf

      This will regenerate the config file using the new num_indexes value. There is one last step that needs to be taken, and that is to instantiate the actual Sphinx files by running indexer on these previously non-existent files. This step depends on what the new value of num_indexes is. In this example, we have changed num_indexes from 200 to 400, so we need to instantiate indexes 201 through 400. We do this thusly:

      for COUNTER in `seq 201 400`; do /usr/local/sphinx/bin/indexer –config /usr/local/etc/sphinx.conf temp_$COUNTER perm_$COUNTER; done

      Now, restart searchd and the new indexes should be available.

      Making changes to syslog-ng.conf

      install.sh will use /usr/local/elsa/node/conf/syslog-ng.conf as a template, using /etc/elsa_syslog-ng.conf (if it exists) as a reference for any persistent changes, and write the combination to /usr/local/syslog-ng/etc/syslog-ng.conf which is what is actually run. So, put any local changes in /etc/elsa_syslog-ng.conf to make sure they survive an update. Keep in mind that the file is included before the log {} statements, so you can redefine sources and destinations there, or put in additional log {} statements.

      Firewall Settings

      Source Destination Port
      Web Clients Web Node TCP 80/443
      Web Node LDAP/AD Server TCP 389/636
      Web Node Log Node TCP 3306 deprecated
      Web Node Log Node TCP 9306 (formerly 3307) deprecated
      Web Node Log Node TCP 80/443
      Log Clients Log Node TCP/UDP 514
    • API Keys – API section.

      The literal structure of an APIKey as it is transmitted is in the form of an HTTP Authorization header. The format is this: Authorization: ApiKey : :

      As an example, if the API key were “abc,” then the request would look like this for a user of “myuser” and a timestamp of 1364322947 would be:

      Authorization: ApiKey


      To revoke an API key, simply remove that username from the list of “apikeys” in elsa_web.conf or change the key for that username to reset it.

    • Preferences

      You can set per-user preferences by navigating to the “Preferences” dialog under the “ELSA” menu in the upper-left-hand corner of the page. Preference changes will take effect at the next page load.

      Type Name Value to Enable Function
      default_settings reuse_tab 0 Overrides server setting for whether or not to reuse the current tab for each new query
      default_settings grid_display 1 Defaults results to grid view
      default_settings no_column_wrap 1 Disables column wrapping in grid view
      custom openfpc_username User name for sending to OpenFPC if pcap_url is set
      custom openfpc_password OpenFPC password
      default_settings pcap_offset Number of seconds before/after to set get_pcap retrieval time to
      default_settings use_utc 1 Display all dates in UTC (GMT)
      default_settings orderby_dir DESC Default to reverse sort (descending)
      default_settings timeout Override the system default for query timeout
      default_settings default_or 1 Override the system default for making events match if any of the query terms match instead of if all query terms match
      default_settings limit 100 Default limit to use for number of results to return
      default_settings rows_per_page 15 Default for rows per page of results when displayed
    • Keyboard Shortcuts

      Key Action
      F8 Closes all result tabs
      F9 Closes all result tabs before active
      F10 Closes all tabs except active
    • Adding Parsers

      In order to add parsers, you need to add patterns to the patterndb.xml file. If you need to create new log classes and fields, it’s not too hard, but right now there is no web interface (that’s planned in the future). You’ll need to add classes to the “classes” table, fields to the “fields” table, then use the offsets listed under $Field_order in web/lib/Fields.pm to create the right entries in “fields_classes_map.” Please note that the class name MUST be upper-case. Other than those few database entries, adding the pattern and restarting syslog-ng and apache is all you have to do. The new fields will show up in the web interface, etc. If you can, try to create patterns which re-use existing classes and fields, then just dropping them into the patterndb.xml file will instantly make them parse correctly-no DB work or restarts needed. I plan on making a blog post on how to do this soon, but let me know if you run into any troubles. Here’s an example to get you started:
      Example log program: test_prog message: source_ip sent 50 bytes to destination_ip from user joe Pick a class_id greater than 10000 for your own custom classes. Let’s say this is the first one, so your new class_id will be 10000. What to insert into syslog database on log node:
      INSERT INTO classes (id, class) VALUES (10000, “NEWCLASS”);
      Our fields will be conn_bytes, srcip, and dstip, which already exist in the “fields” table as well as “myuser” which we will create here for demonstration purposes:
      INSERT INTO fields (field, field_type, pattern_type) VALUES (“myuser”,
      “string”, “QSTRING”);

      INSERT INTO fields_classes_map (class_id, field_id, field_order)
      VALUES ((SELECT id FROM classes WHERE class=”NEWCLASS”), (SELECT
      id FROM fields WHERE field=”srcip”), 5);
      INSERT INTO fields_classes_map (class_id, field_id, field_order)
      VALUES ((SELECT id FROM classes WHERE class=”NEWCLASS”), (SELECT
      id FROM fields WHERE field=”conn_bytes”), 6);
      INSERT INTO fields_classes_map (class_id, field_id, field_order)
      VALUES ((SELECT id FROM classes WHERE class=”NEWCLASS”), (SELECT
      id FROM fields WHERE field=”dstip”), 7);
      Now the string field “myuser” at field_order 11, which maps to the first string column “s0″:
      INSERT INTO fields_classes_map (class_id, field_id, field_order)
      VALUES ((SELECT id FROM classes WHERE class=”NEWCLASS”), (SELECT
      id FROM fields WHERE field=”myuser”), 11);
      5, 6, and 7 correspond to the first integer columns in the schema “i0,” “i1,” and “i2.” In the pattern below, we’re extracting the data and calling it i0-i2 so that it goes into the log database correctly. The above SQL maps the names of these fields in the context of this class to those columns in the raw database when performing searches.
      Example pattern:


      source_ip @IPv4:i0:@ sent @ESTRING:i1: @bytes to destination_ip @IPv4:i2:@ from user @ANYSTRING:s0:@

      source_ip sent 50 bytes to destination_ip from user joe



      Add this in the patterndb.xml between the
      elements. You can test this on a log node using the /usr/local/syslog-ng/bin/pdbtool utility like so:
      /usr/local/syslog-ng/bin/pdbtool test -p /usr/local/elsa/node/conf/patterndb.xml
      This should print out all of the correct test values. You can test it against example messages as well like this:
      /usr/local/syslog-ng/bin/pdbtool match -p /usr/local/elsa/node/conf/patterndb.xml -P test_prog -M “source_ip sent 50 bytes to destination_ip from user joe”
      After the patterndb.xml file and the database are updated, you will need to restart syslog-ng:
      service syslog-ng restart
      If you are already logged into ELSA, simply refreshing the page should make those new classes and fields available.

    • Configuring IDS to Forward Logs

      There are two ways to configure Snort to send logs. Either configure barnyard or Snort itself to send logs to local syslog. Both configuration entries (in either snort.conf or barnyard.conf) will look like this:
      output alert_syslog: LOG_LOCAL6 LOG_ALERT

      To log to local syslog from Suricata, edit the “outputs” stanza to contain:
      – syslog:
      enabled: yes
      identity: “snort”
      facility: local6

      Forwarding Local Logs to ELSA
      You will then need to configure the local syslog on the box that is running Snort to forward logs to ELSA.
      If the box is running a simple syslogd, it would look like this to forward all logs to ELSA (which is usually a good idea):
      *.* @ip.address.of.elsa

      If it’s running syslog-ng, use this:
      source src { unix-dgram(“/dev/log”); };
      filter f_local6 { facility(local6); };
      destination d_elsa { udp(“ip.address.of.elsa”); };
      log { source(src); filter(f_local6); destination(d_elsa); };

    • Eventlog-to-Syslog

      Sending logs from Windows servers is best achieved with the free, open-source program Eventlog-to-Syslog. It’s incredibly easy to install:
      1. Login as an administrator or use runas
      2. Copy evtsys.exe and evtsys.dll to Windows machine in the system directory (eg.C:\Windows\System32).
      3. Install with: evtsys.exe -i -h ip.of.elsa.node
      4. Profit
      The logs will be sent using the syslog protocol to your ELSA server where they will be parsed as the class “WINDOWS” and available for reporting, etc.

    • Datasources

      ELSA can be configured to query external datasources with the same framework as native ELSA data. Datasources are defined by plugins. The only plugin currently available is for databases. Database datasources are added under the “datasource” configuration section, like this:
      “datasources”: {
      “database”: {
      “hr_database”: {
      “alias”: “hr”,
      “dsn”: “dbi:Oracle:Oracle_HR_database”,
      “username”: “scott”,
      “password”: “tiger”,
      “query_template”: “SELECT %s FROM (SELECT person AS name, dept AS department, email_address AS email) derived WHERE %s %s ORDER BY %s LIMIT %d,%d”,
      “fields”: [
      { “name”: “name” },
      { “name”: “department” },
      { “name”: “email” }
      The configuration items for a database datasource are as follows:
      Item Purpose
      alias What the datasource will be referred to when querying
      dsn Connection string for Perl
      username User
      password Password
      query_template sprintf formatted query with the placeholders listed below
      fields A list of hashes containing name (required), type (optional, default is char), and alias which functions as both an alternative name for the field as well as the special aliases “count” to refer to the column to use for summation and “timestamp” which defines the column to use in time-based charts.

      Query_template parameters (all are required):
      1. The columns for SELECT
      2. The expression for WHERE
      3. The column for GROUP BY
      4. The column for ORDER BY
      5. OFFSET
      6. LIMIT

    • Sourcetypes

      ELSA ships with several plugins:
      ● Windows logs from Eventlog-to-Syslog
      ● Snort/Suricata logs
      ● Bro logs
      ● Url logs from httpry_logger

      List of classes supported out of the box

      ● BRO_CONN
      ● BRO_DNS
      ● BRO_FILE
      ● BRO_FILES
      ● BRO_FTP
      ● BRO_HTTP
      ● BRO_IRC
      ● BRO_NOTICE
      ● BRO_SMTP
      ● BRO_SSH
      ● BRO_SSL
      ● BRO_SYSLOG
      ● BRO_TUNNEL
      ● BRO_WEIRD
      ● CEF
      ● CISCO_WARN
      ● DHCP
      ● ELSA_OPS
      ● EXCHANGE
      ● FIREEYE
      ● FTP
      ● NAT
      ● NETFLOW
      ● SNORT
      ● SSH_LOGIN
      ● SSH_LOGOUT
      ● URL
      ● VPN
      ● WINDOWS

      These plugins tell the web server what to do when a user clicks the “Info” link next to each log. It can do anything, but it is designed for returning useful information in a dialog panel in ELSA with an actions menu. An example that ships with ELSA is that if a StreamDB URL is configured (or OpenFPC) any log that has an IP address in it will have a “getPcap” option which will autofill pcap request parameters for one-click access to the traffic related to the log being viewed.
      New plugins can be added easily by subclassing the “Info” Perl class and editing the elsa_web.conf file to include them. Contributions are welcome!
      To setup logging with ELSA and Suricata running on same/different boxes

    • Livetail

      Livetail is deprecated until further notice due to stability issues. ELSA has the ability to allow each user to get a live feed of a given search delivered to a browser window. Livetail allows you to use full PCRE to search incoming logs without impacting logging performance. This is done by forking a separate process on each node that reads the text file being written by the main logging process, ensuring that no extra load is put on the main process and therefore avoiding log loss in high volume situations.
      Starting a Livetail
      To start a livetail, simply choose the “Livetail” option from the “Index” button, which will open a new window. Your search will begin immediately and results will be displayed from all nodes as they are available. Your browser window will poll the server every five seconds for new results. The window will scroll with new results. If you keep your mouse pointer over the window, it will cease scrolling.
      Ending a Livetail
      Livetails will automatically be cancelled when you close the browser window. If the browser crashes and the livetail continues, it will be replaced by any livetail you start again, or will timeout after an hour. An administrator can cancel all livetails by choosing “Cancel Livetails” from the “Admin” menu.
      Livetail results are temporary and cannot be saved. You can copy and paste data from the window, or run a normal ELSA search to save the data

    • Queries


      Query syntax is loosely based on Google search syntax. Terms are searched as whole keywords (no wildcards). Searches may contain boolean operations specifying that a term is required using the plus sign, negating using the minus sign, or no sign indicating that it is an “OR.” Parenthesis may be used to group terms. Numeric fields, including hosts, may have greater than or less than (and equal to) operators combined with the boolean operators.

      Boolean Operators

      Operator Meaning
      keyword Query MUST include the keyword
      -keyword Query MUST NOT include the keyword
      OR keyword Query MAY include the keyword

      Range Operators

      Range operators can only be used to filter search results, not provide the results to be filtered. That is, you must include a keyword in addition to the range operator. You can provide a single range operator; they do not need to be in pairs.

      Operator Meaning
      attribute>value Attribute MAY be greater than value
      attribute Attribute MAY be less than value
      attribute>=value Attribute MAY be greater than or equal to value
      attribute<=value Attribute MAY be less than or equal to value
      +attribute>value Attribute MUST be greater than value
      +attribute Attribute MUST be less than value
      +attribute>=value Attribute MUST be greater than or equal to value
      +attribute<=value Attribute MUST be less than or equal to value
      -attribute>value Attribute MUST NOT be greater than value
      -attribute Attribute MUST NOT be less than value
      -attribute>=value Attribute MUST NOT be greater than or equal to value
      -attribute<=value Attribute MUST NOT be less than or equal to value


      Queries can have transforms applied to them. Transforms are covered later in the documentation. The syntax for using transforms is represented below.

      Term Meaning
      search clause Any combination of keywords and filters, as defined above
      transform name Name of the transform as defined in the transform plugin
      param Parameter supplied to the transform to direct its behavior

      [ | ([param1,param2,paramN]) ] [ | ([param1,param2,paramN]) ]


      New Transforms

      median Finds the field’s median value for the given result of the subquery.For ex., “class:xxx | median(bytes)”
      min Finds the field’s minimum value for the given result of the subquery.For ex.,  “class:xxx | min(eventid)”
      max Finds the field’s maximum value for the given result of the subquery.For ex., “class:xxx | max(eventid)”
      avg finds the field’s average value for the given result of the subquery.For ex., “class:xxx | avg(bytes)”



      Queries have a number of modifiers in the form of directives which instruct ELSA how to query.

      Term Meaning Default Value Query can Batch Example
      limit Return this number of results. A limit of zero means return an unlimited number, which constitutes a bulk query and forces the query to run in batch mode, with results delivered via a link in an email. 100 Batch can occur when limit set to 0 or > Max matches (default is 1000) limit:1000
      cutoff Like limit, except it tells ELSA to stop searching after finding this many records, which is valuable when searching a common term and the total number of hits (as opposed to total returned) is irrelevant. undefined No cutoff:100
      offset Partners with limit to indicate how far into a result set to go before returning results. Meaningless unless a limit larger than the default 100 is used. 0 No offset:900
      orderby Order results by this attribute. Technically, undefined, but effectively timestamp, ascending in most scenarios. No orderby:host
      orderby_dir Direction to order results. Must be used in conjunction with orderby. asc No orderby_dir:desc
      start Quoted value representing the earliest timestamp to return. Valid values are almost any date representation. See details for the complete documentation here. undefined No end:”2013-01-01 00:00:00″
      end Quoted value representing the latest timestamp to return. Valid values are as with start. undefined No end:”2013-01-01 00:00:00″
      groupby Similar to SQL GROUP BY, returns the unique values for a given attribute and the count of the distinct values. undefined No groupby:host
      node Apply a filter for results only from this node (subject to boolean representations as detailed above. undefined No node:
      datasource Query the given datasource as configured in the elsa_web.conf file. undefined No datasource:hr
      timeout Stop querying and return any results found after this number of seconds. 300 No timeout:10
      archive If set to a true value, query will be run on archived data instead of indexed data, batching if the estimated query time exceeds the configured value (with a default of 30 seconds). 0 Yes, if estimated time is > query_time_batch_threshold (30 seconds by default) archive:1
      analytics If set to a true value, the query will automatically be batched and have no limit set. Results will be saved to a bulk result file, with a link to that file emailed. 0 Yes, always analytics:1
      nobatch Run the query in the foreground, regardless of the estimated time it will take. 0 No, never nobatch:1
      livetail Deprecated

      Query examples

      Queries can be very simple, like looking for any mention of an IP address:

      Or a website


      Here is an example query for finding Symantec Anti-Virus alerts on Windows logs on ten hosts that does not contain the keyword “TrackingCookie”

      eventid:51 host> host< -TrackingCookie

      One could also look for account lockouts that do not come from certain hosts:

      class:windows locked -host> -host<

      To see what hosts have had lockout events, one could run:

      class:windows ”locked out”

      and choose the ANY.host field from the “Report On” menu. Here’s an example showing hits from website example.com or website bad.com:

      site:example.com OR site.bad.com


      You can change the column used to order your query as well as the direction using the orderby and orderby_dir keywords. For instance, to order a query by host in reverse order, use: orderby:host orderby_dir:desc. The default is orderby:timestamp orderby_dir:ASC.


      Keywords are the words indexed and available for searching. Note that you cannot search for a partial keyword, it must be complete. Also note that keywords are comprised of not only alpha-numeric words, but also hyphens, dots, and at-signs. So, these are all complete keywords:

      Searches for 1.1 or example.com or ip.addr would all fail to find these terms. If you need to perform searches on partial keywords, you need to switch from an index query to an archive query by clicking the “Index” pull-down menu and choosing archive. Keep in mind that archive searches are slow, so narrowing down a time period will help significantly.

      Search data flow

      When the API issues the query, it is parsed and sent to Sphinx. It then receives the log ID’s that match and the API queries MySQL for those ID’s:

      Query → Parse → Authorize → Log → Sphinx → MySQL → Aggregation → Presentation

    • Archive Queries

      Queries for logs in the archive tables take much longer than indexed queries. For this reason, they are run in the background and the requester is notified via email when the query results are ready. The results are viewed through the link in the email or through the web interface menu for “Saved Results.” Archive queries are run exactly like normal queries except that the “Index” toggle button is changed to “Archive.” They may be performed on the same time range available in the indexed logs as a way of performing wildcard searches not restricted to a keyword. For example, if it was necessary to find a log matching a partial word, one could run an archive search with a narrow time selection. A user may only run a single archive query at a time to prevent system overload. In addition, there is a configuration variable specifying how many concurrent users may run an archive query (the default is four). Most systems can search about 10 million logs per minute per node from the archive. The overall flow looks like this:

      Archive Query → Parse → Authorize → Log → Batch message to user

      (then in background) → MySQL → Store in web MySQL → Email

    • Transforms

      ELSA has a powerful feature called transforms which allow you to pass the results of a query to a backend plugin. The plugins that currently ship with ELSA include whois, dnsdb, and CIF (Collective Intelligence Framework). There are also utility transforms filter, grep, and sum.


      Transforms are modeled after UNIX-style command pipes, like this:

      site:www.google.com | whois | sum(descr)

      This command finds all URL requests for site www.google.com, passes those results to the whois plugin which attaches new fields like org and description, and then passes those results to the sum transform which takes the argument “descr” indicating which field to sum. The result is a graph of the unique “descr” field as provided by the whois plugin.

      Plugins take the syntactical form of:

      query | plugin_1(arg1,arg2,argn) | plugin_n(arg1,arg2,argn)

      Current Plugins

      The currently shipped plugins are:

      Name Args Description Configuration
      whois ARIN and RIPE online databases to add network owner info web: “transforms/whois/known_subnets”, “transforms/whois/known_orgs”
      dnsdb isc.dnsdb.org’s database (if an api key is provided) web: “transforms/dnsdb/limit”, “transforms/dnsdb/apikey”
      cif Queries a local Collective Intelligence Framework server web: “transforms/whois/known_subnets”, “transforms/whois/known_orgs”, “transforms/cif/base_url”
      grep regex on field, regex on value Only passes results that match the test
      filter regex on field, regex on value Only passes results that do not match the test
      sum field Sums the total found for the given field
      anonymize Anonymizes any IP’s found that match the configuration for “transforms/whois/known_subnets” web: “transforms/whois/known_subnets”
      database (example) field to pass to database Adds record found in database to displayed record after using the given field as a lookup in the database web: “transforms/database/”
      geoip Uses the local GeoIP database to attach geo info to any IP’s of hostnames found
      has value,operator (defaults to >),field Defaults to returning only records that have more than the given count in a groupby result. Args can change operator to less than, etc., and also specify a specific field in a non-groupby result.
      interval Calculates the number of seconds elapsed between records returned and adds that value as a transform field
      local Returns only records which have a value in the configured local subnets web: “transforms/whois/knonw_subnets”
      remote Returns only records which do not have a value in the configured local subnets web: “transforms/whois/knonw_subnets”
      parse pattern_name Given the name of a configured pattern, will use preconfigured regular expressions to extract fields from result messages. It can be used as a way of post-search parsing. web: “transforms/parse/(pattern_name)”
      scanmd5 Checks all configured URL sources for hits on any MD5’s contained in a record. By default, it will check Shadowserver, but can also check VirusTotal if an API key is configured. web: “transforms/scanmd5/virustotal_apikey”
    • Subsearches

      Subsearches are a special kind of transform that is built-in to ELSA. They are used to take the results of a groupby (report) query and concatenate those results as an OR onto a second query. For example:

      dstip: groupby:srcip | subsearch(dstip:

      This query will find all source IP’s that talked to and then find any of those IP’s which also talked to You can mix in other transforms as well:

      dstip: groupby:srcip | subsearch(dstip: | whois | filter(cc,us)

      This will find IP’s which talked to both,, and are not in the US.

      Subsearches can be chained together arbitrarily:

      dstip: groupby:srcip | subsearch(dstip: groupby:srcip) | subsearch(class:windows groupby:eventid)

      This will find all unique Windows event ID’s for hosts that talked to both and

      To make a field from the source groupby become a specific field in the subsearch, you can pass a second argument:

      dstip: groupby:srcip | subsearch(dstip:,srcip)

      This will mandate that the subsearch uses srcip:host for each host found in the first query.

    • Saved Searches (Macros)

      Any completed search can be saved by clicking the “Results” button and choosing “Save Search.” This will bring up a dialog box asking for a name to save the search as. The name must be alphanumeric plus underscore. You can view and edit all saved searches using the “Saved Searches” menu option in the “ELSA” menu at in the upper-left-hand part of the menu bar at the top of the ELSA page.


      Any saved search can be invoked inside of another query by using the dollar-sign-name convention. For example, if there is a saved search named “trojan” which was saved with a query like this: +sig_msg:trojan , then you can invoke that query within any other query like this:srcip: $trojan. The query will be interpolated and fully evaluate to srcip: +sig_msg:trojan.

      Built-in Macros

      The system will auto-populate some network-based macros for convenience if the whois transform configuration has been entered. ELSA ships with default values of RFC1918 IP space:

      “whois”: {
      “known_subnets”: {
      “”: {
      “end”: “”,
      “org”: “MyOrg”

      Edit the “known_subnets” stanza to add your local org-specific values. ELSA will use these values to create macros for srcip and dstip such that the macros $myorg, $src_myorg, and $dst_myorg will be available automatically and will resolve to srcip> srcip< dstip> dstip< for $myorg, and the src and dst versions for the $src_myorg and$dst_myorg, respectively.

      Having these macros available can greatly aid searching for IDS events, HTTP events, among many others. For instance, you can easily find HTTP POST’s to your org by searching +method:POST +$dst_myorg.

      These built-in macros will be overridden by user-created macros of the same name.

    • Monitoring

      You can use the “Stats” page under the “Admin” menu on the web interface to see what ELSA’s usage looks like. To diagnose problems, refer to the $DATA_DIR/elsa/log directory, especially the node.log and web.log files, respectively.

      You may also want to look for network problems on nodes, especially kernel drops. You can errors like this with this command:

      netstat -s | grep -i errors

      Look at the whole output of “netstat -s” for context if you see errors.

      It may also be a good idea to establish a log that you know should periodically occur. Then do a query on the web interface and report on a time value, such as hour or day, and look for any fluctuations in that value that could indicate log loss.

    • Dashboards


      To create a dashboard, click on the “ELSA” menu button in the upper-left-hand corner of the main web interface. A dialog box will open showing a grid of current dashboards you’ve created as well as a link for “Create/import.” Click the link to open another dialog which will ask for params:

      ● Description: What the title of the dashboard page will show.

      ● Alias: The final part of the URL used for accessing, e.g. http://elsa/dashboard/alias

      ● Auth required: The level of authorization, can be none, authentication, or a specific group.

      ● Import: You can paste import configuration (as generated by a dashboard export) here to auto-create all of these parameters, plus all of the underlying charts.

      ● Groups: This field shows up when you’ve selected “Specific Groups” as the auth. You can paste in a groupname here, or use the drop down later.

      Once created, a dashboard appears on the table of dashboards and the “Actions” button will allow you to view, edit, and delete the dashboard.


      Dashboards present a way to provide access to charts which have underlying queries that some users would not normally have permissions to query on their own. It is essentially a way to delegate query access for the purposes of making charts and is especially helpful for making reports that are customer-facing. Queries are logged and authorized as if they were made by the creator of the chart. A log is noted in the web.log file to record that the query was run on the behalf of another user. As previously stated, access to the dashboard itself can be governed, so there is essentially a two-tiered access system: the first is access to the dashboard, the second is the access to the data.

      Currently, only a single group can be permitted access if using group-specific authorization. This restriction may be lifted in the future.

      Adding Charts

      Charts can be added either from the main ELSA query interface using the “Results” button and “Add to dashboard” or you can do so from the “edit dashboard” interface if you’ve chosen the “edit” option from the “Actions” menu in the “Dashboards” dialog. When adding a chart from the main query interface, you must choose a dashboard and a chart, which can be “New Chart” to create one. The dashboard must exist beforehand, so you may need to create a dashboard first.

      Adding Queries

      Queries are easiest to add using the above method in which a query you’ve just run in the standard ELSA query interface is added via the “Results” button. If the query has a “Report On” or “groupby” value, that value will be used to create the chart. Otherwise, the query will be plotted over time by count of occurrences.

      Editing Charts

      Charts can be edited from the edit dashboard interface in two ways: the appearance and the queries. The appearance will dictate what kind of chart it is, the title, and other formatting variables. The queries dictate the underlying data series. When editing charts, changes appear live as you edit.

      Advanced Chart Editing

      In some cases, you may need to edit the actual JSON used to construct a dashboard to get exactly the right look and feel. Here’s an excerpt from the ELSA mailing list touching on how to do that:

      the width (fixed) is located into /opt/elsa/web/lib/Web/GoogleDashboard.pm

      I set

      our $Default_width = 1850;

      for fit 46″ screen 1920pixel

      For the number of elements (charts?) I found something into /opt/elsa/web/lib/API/Charts.pm

      There is:

      # Sanity check

      if ($args->{x} > 2){

      die(‘Cannot have more than 3 charts on one line: ‘ . $args->{x});


      So 3 charts per line, if you need more, increase the number 2.

      For the height there isn’t a unique value to modify as width but as Martin suggested to me some post ago, you can export dashboard, modify values, then create new dashboard with new values.

      For example for a map I needed to set height:

      “y” : “6”,

      “options” : {

      “width” : 940,

      “height”: 500,

      In this case you can modify height (and width) for each chart.

      Chart Types

      ELSA uses Google Visualizations to draw charts. See their documentation for what chart types are available. Of particular note is the “table” type, which is hidden under the “More” menu of available char types. It’s a great way to display long text in a readable format.

    • Viewing Dashboards

      Dashboards are accessed via /dashboard/?

    • Performance Considerations

      Take care when creating charts that the queries used do not tax the system too much. This can happen when a query is not selective enough. That is, there is more “grouping” than “searching” occurring. For anything less than a billion records, this should not be much of an issue, but if your query returns more than a billion or so, you may notice that it can take seconds or minutes for the charts to load.


    • GeoIP Support

      In addition to whois lookups, ELSA has a transform for GeoIP provided by MaxMind.com. By default, ELSA will use the country database provided in the standard Perl module, but you can download the free city database from here. The transform works like any other transform, e.g.:

      site:www.google.com | geoip

      This will attach the location fields to results. Results that have these fields can then be exported using the GoogleEarth export, which returns a .kml file suitable for opening in Google Earth or Google Maps.


    • Command-line Interface and API

      ELSA ships with a command-line interface, elsa/web/cli.pl, which can be run when logged in on the web frontend from the shell. This can be helpful for testing or piping results to other programs. However, the Perl API provides a much more comprehensive method for accessing ELSA in a scripted fashion. You can use the included cli.pl as an example for using the API.

    • Alerts

      Any query that has been run may be turned into an alert by clicking the “Results…” menu button and choosing “alert.” This will execute the exact same search after every new batch of logs is loaded, and will notify the user via email of new hits in a manner similar to the archive search results.

    • Scheduled Queries

      Any query may be scheduled to run at a specified interval. This can be useful for creating daily or hourly reports. Creating the scheduled query is similar to creating the alert in that you choose the option from the “Results…” button after performing a search you wish to create a report from.

    • Permissions

      Log access is permitted by allowing certain groups either universal access (admins) or a whitelist of attributes. The attributes can be log hosts (the hosts that initially generate the logs), ranges of hosts (by IP), log classes, or log nodes (the nodes that store the logs). Groups can be either AD groups or local system groups, as per the configuration. Those in the admins group have the “Admin” drop-down menu next to the “ELSA” drop down menu in the web interface which has a “Manage Permissions” item which opens a new window for administrating group privileges.

    • Troubleshooting

      Log Files


      This is the main ELSA log on each log node. It will contain any errors or information regarding the recording and indexing of logs. If no new logs are coming in, this is the first log file to check.


      This log can be named differently or be in /var/log/httpd. It is the standard Apache log file which will be the first place to check if any “Query Failed” error messages appear on the web interface. Errors only show up here if they are major enough to break the underlying ELSA code. Typically, these kinds of errors are connectivity or permissions related.


      This is the main ELSA log for the web interface. It has information on any ELSA-specific actions initiated from the web interface. If queries are not returning the results you expect, check this log.


      Syslog-NG’s internal log file will give low-level debugging info like raw message rates. It should generally only be needed when you’re not sure that a node is receiving logs.


      This file contains the query log generated by the Sphinx searchd daemon. It should not normally be needed, but can be a good place to get a feel for what queries the system is running and how long they are taking.


      This is the Sphinx searchd daemon log and will contain info on index rotation.

      Common Troubleshooting Symptoms

      Symptom Resolution
      Chronic warnings in the web UI for “couldn’t connect to MySQL” This can be caused if a web frontend has issues and the MySQL server decides it no longer wishes to speak to the server because of too many dropped connections. To fix it, you need to log into the node referred to in the message and issue:mysqladmin flush-hosts which will cause the MySQL daemon to once again accept connections from the “flaky” frontend.
      “Query Failed” red error message This is a low-level error indicating that there is a connectivity or permissions problem between the web server and the MySQL/Sphinx daemons on the node. It will also show up in the node.log as “No nodes available.” You can verify database connectivity by manually running: mysql -h -uelsa -p syslog and mysql -h -P9306. If both work, then the problem may be something more specific with either MySQL or Sphinx. To troubleshoot that, run tcpdump -i eth0 -n -s0 -X “port 3306 or port 9306″ and watch the traffic to see what’s occurring when you run a query.


    • Steps to Configure Fluentd

      1. Installation

           1.1. RVM Installation (rpm based boxes only)

      FluentD plug-in’s, written in Ruby & Gems, require RVM as prerequisite to be installed on RHEL/CentOS boxes. Follow the steps below to Install RVM.


        #Step 1: Upgrade Packages

      a)      Login as root
      b)      $ yum update
      c)      $ yum groupinstall "Development Tools"


        #Step 2: Installing Recommended Packages

      a)      $ yum install gcc-c++ patch readline readline-devel zlib zlib-devel
      b)      $ yum install libyaml-devel libffi-devel openssl-devel make
      c)      $ yum install bzip2 autoconf automake libtool bison iconv-devel


        #Step 3: Install RVM (Ruby Version Manager)

      $ gpg --keyserver hkp://keys.gnupg.net --recv-keys D39DC0E3
      $  \curl -sSL https://get.rvm.io | bash -s stable$ source /etc/profile.d/rvm.sh$ rvm autolibs disable
      $ rvm requirements # manually install these
      $ rvm reload
      $ rvm install 1.9.3
      $ ruby –v    # check ruby version


          1.2. FluentD Installation

      Run the fluentd_ode.sh to install fluentD and related plug-in’s. Please change the file permissions to execute. 
      $ cd /usr/local/elsa/contrib/fluentd
      $ ./ fluentd_ode.sh


          1.3. Validate data directories in td-agent.conf file

      Typically the fluentd data directory will be created under $DATA_DIR/fluentd. In case DATA_DIR is not set, you can find them under /data/fluentd. Validate with the directories mentioned at /etc/td-agent/td-agent.conf file. They should be identical.

      2. Configuring FluentD

      Fluentd needs to be configured for each data source type. As the input message format may vary from use case to use case, here we have provided some sample messages and its configuration as for your reference to configure your own data source for fluentd.

          2.1. Create Tags:

      Users need to create tags for each message type they want to process in the file located at /etc/td-agent/plugin/log_tags.rb


      Existing file has following tags which could be used for reference-
      $apache_tag = '%ode-5-10005:'
      $json_tag = '%ode-6-10006:'
      $custom_tag ='%ode-5-10001:'
      $netflow_tag = 'netflow_syslog'

          2.2. Create formatter plug-in’s

                        For each type of message, formatter plugin needs to be created using the Tags created above, mentioned below is the example of formatter plug in created for Json Messages. Please note the name of created file should start with formatter_. Please refer to file /etc/td-agent/plugin/ formatter_json_ltsv.rb. Please make required changes as explained in bold in below example.


      require_relative 'log_tags'
      module Fluent
      module TextFormatter
      class LTSVFormatter1 < Formatter                        --- Class Name to be changed
      Plugin.register_formatter('json_ltsv', self)        --- formatter name should be changed as per message
      #   include Configurable # This enables the use of config_param
      include HandleTagAndTimeMixin # If you wish to use tag_key, time_key, etc.
      #   def configure(conf)
      #     super
      #   end
      config_param :delimiter, :string, :default => "\t"
      config_param :label_delimiter, :string, :default =>  ":"
      def format(tag, time, record)
      filter_record(tag, time, record)
      formatted = $json_tag + record.inject('') { |result, pair|    --- use Tag name which is already created in 2.1
      result << @delimiter if result.length.nonzero?
      result << "#{pair.first}#{@label_delimiter}#{pair.last}"
      }        formatted << "\n"


          2.3. FluentD configuration for Json Messages

              i) For Source type File :

      As per example mentioned below, please note

                 For tag

      path —  directory path where input files will be stored.

      Pos_file — directory path where Fluentd stores position of               different input files.

                 For tag

      Keep_keys — these are the fields of interest from the input message which user would like to store in the database

                 For tag

      Path — output file directory path of where flattened messages will get stored


                 Following needs to be created in  /etc/td-agent/td-       agent.conf file

      # source for file input

      type tail
      format json
      read_from_head true
      path /data/fluentd/json_log/in_files/*
      pos_file /data/fluentd/json_log/out_files/json.log.pos
      tag json000

      type record_transformer
      renew_record true

      type flatten_hash
      add_tag_prefix flattened.
      separator _

      type file
      format json_ltsv
      append true
      delimiter ,
      label_delimiter =
      path /data/fluentd/json_log/out_files/json
      buffer_type file
      time_slice_format  out
      append true
      flush_interval  1s

              ii) For Source type Stream :

          As per example mentioned below, please note

          For tag

      port —  Port number where messages to be sourced ( this should be different that already reserved port like 514 etc)

      Rest of the setting remains same as mention above for file

      # source as stream

      type tcp
      format json
      port 5170
      tag json000

      type record_transformer
      renew_record true

      type flatten_hash
      add_tag_prefix flattened.
      separator _

      type file
      format json_ltsv
      append true
      delimiter ,
      label_delimiter =
      path /data/fluentd/json_log/out_files/json
      buffer_type file
      buffer_path /data/fluentd/json_log/out_files/buffer
      time_slice_format  out
      append true
      flush_interval  1s

       2.4. FluentD configuration for Apache Messages

          1. For Source type File :

      As per example mentioned below, please note

              For tag

      path —  directory path where input files will be stored.

      Pos_file — directory path where Fluentd stores position of different input files.

              For tag

      Keep_keys — these are the fields of interest from the input message which user would like to store in the database

             For tag

      Path — output file directory path of where flattened messages will get stored


              Following needs to be created in  /etc/td-agent/td-agent.conf file

      # source for file input

      type tail
      format apache
      read_from_head true
      path /data/fluentd/apache_log/in_files/*
      pos_file /data/fluentd/apache_log/out_files/apache.log.pos
      tag apache000

      type record_transformer
      renew_record true
      keep_keys host,user,method,path,code,size,referer,agent

      type flatten_hash
      add_tag_prefix flattened.
      separator _

      type file
      format apache_ltsv
      append true
      delimiter ,
      label_delimiter =
      path /data/fluentd/apache_log/out_files/apache
      buffer_type file
      buffer_path /data/fluentd/apache_log/out_files/buffer
      time_slice_format  out
      append true
      flush_interval  1s


      2) For Source type Stream :

          As per example mentioned below, please note

          For tag

      port —  Port number where messages to be sourced ( this should be different that already reserved port like 514 etc)

      Rest of the setting remains same as mention above for file

      # source as stream

      type tcp
      format apache2
      port 5170
      tag json000

      type record_transformer
      renew_record true
      keep_keys host,user,method,path,code,size,referer,agent

      type flatten_hash  add_tag_prefix flattened.  separator _

      type file
      format apache_ltsv
      append true
      delimiter ,
      label_delimiter =path /data/fluentd/apache_log/out_files/apache
      buffer_type file
      buffer_path /data/fluentd/apache_log/out_files/buffer
      time_slice_format  outappend trueflush_interval  1s

              2.5. FluentD configuration for Netflow Messages

                  1) For Source type File :

      As per example mentioned below, please note

                      For tag

      path —  directory path where input files will be stored.

      Pos_file — directory path where Fluentd stores position of different input files.

                      For  tag

      Path — output file directory path of where flattened messages will get stored


                    Following needs to be created in  /etc/td-agent/td-agent.conf file

      # source for file input

      type tail
      format none
      read_from_head true
      path /data/fluentd/netflow_log/in_files/*
      pos_file /data/fluentd/netflow_log/out_files/netflow.log.pos
      tag net000

      type file
      format netflow_ltsv
      path /data/fluentd/netflow_log/out_files/netflow
      buffer_type file
      buffer_path /data/fluentd/netflow_log/out_files/buffer
      time_slice_format out
      append true
      flush_interval  1s


              2) For Source type Stream :

                   As per example mentioned below, please note

                   For tag

      port —  Port number where messages to be sourced ( this should be different that already reserved port like 514 etc)

      Rest of the setting remains same as mention above for file

      # source as stream

      type tcp
      format none
      port 5170
      tag net000

      type file
      format netflow_ltsv
      path /data/fluentd/netflow_log/out_files/netflow
      buffer_type file
      buffer_path /data/fluentd/netflow_log/out_files/buffer
      time_slice_format out
      append true
      flush_interval  1s

           2.6. FluentD configuration for Customized Messages

               1. For Source type File :

      As per example mentioned below, please note

                   For tag

      path —  directory path where input files will be stored.

      Pos_file — directory path where Fluentd stores position of different input files.

                  For   tag

      Path — output file directory path of where flattened messages will get stored


                  Following needs to be created in  /etc/td-agent/td-agent.conf file

      # source for file input

      type tail
      format none
      read_from_head true
      path /data/fluentd/custom_log/in_files/*
      pos_file /data/fluentd/custom_log/out_files/custom.log.pos
      tag custom000

      type file
      format custom_ltsv
      path /data/fluentd/custom_log/out_files/custom
      buffer_type file
      buffer_path /data/fluentd/custom_log/out_files/buffer
      time_slice_format out
      append trueflush_interval  1s

               2) For Source type Stream :

          As per example mentioned below, please note

          For tag

      port —  Port number where messages to be sourced ( this should be different that already reserved port like 514 etc)

      Rest of the setting remains same as mention above for file

      # source as stream

      type tcp
      message_key format none
      port 5170
      tag custom000

      type file
      format custom_ltsv
      path /data/fluentd/custom_log/out_files/custom
      buffer_type file
      buffer_path /data/fluentd/custom_log/out_files/buffer
      time_slice_format out
      append trueflush_interval  1s


          2.7. Patterns configuration


      Users needs to create patterns for each message type as explained below and copy them in patterndb.xml files located at /usr/local/elsa/node/conf directory.

      Please check for the program tag in the below patterns (highlighted). They should be same as mentioned in file /usr/local/elsa/contrib/plugin/log_tags.rb

      You can add your tags for new messages and also mention in the patterns as mentioned below.

              1. Pattern for Json Messages

      #pattern for Json


      empty=false,boxId=1,networks_0=Network 1,networks_1=Network 4,srcLocation_countryName=Local,

      # Test if json parser is working
      > /usr/local/syslog-ng/bin/pdbtool match -p /usr/local/elsa/node/conf/patterndb.xml -P %ode-6-10006 -M
      networks_0=Network 1,networks_1=Network 4,srcLocation_countryName=Local,srcLocation_countryCode=Local,

      1. This should parse srcIp, destIp, srcPort, dstPort .

      # Push the changes to merged.xml

      b)      > sudo sh -c "sh /usr/local/elsa/contrib/install.sh node set_syslogng_conf"

       2) Pattern for Apache messages

      #pattern for apache


      (compatible; U; AnyEvent-HTTP/2.22; +http://software.schmorp.de/pkg/AnyEvent)
      Mozilla/5.0 (compatible; U; AnyEvent-HTTP/2.22; +http://software.schmorp.de/pkg/AnyEvent)

                   # Test if Apache parser is working
      /usr/local/syslog-ng/bin/pdbtool match -p /usr/local/elsa/node/conf/patterndb.xml -P %ode-5-10005 -M "host=,
      (compatible; U; AnyEvent-HTTP/2.22; +http://software.schmorp.de/pkg/AnyEvent)%ode-5-10005:host=::1,
      user=-,method=GET,path=/,code=200,size=137896,referer=-,agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/
      Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"


      1. This should parse host,user,method,path .
      2. Push the changes to merged.xml

      > sudo sh -c "sh /usr/local/elsa/contrib/install.sh node set_syslogng_conf"


          3) Pattern for Netflow messages

      #pattern for apache




      tcp||35843||443|30486|2173|US|Palo Alto, CA|37.376202|-122.182602|
      HPES - Hewlett-Packard Company

      Palo Alto, CA

      HPES - Hewlett-Packard Company

                   # Test if Netflow parser is working
      /usr/local/syslog-ng/bin/pdbtool match -p /usr/local/elsa/node/conf/patterndb.xml -P netflow_syslog -M
      "tcp||35843||222|30486|2173|US|Palo Alto, CA|37.376202|-122.182602|HPES - Hewlett-Packard Company"

      1. This should parse class,proto,srcip srcportdstip,dstport,conn_bytesasn etc
      2. Push the changes to merged.xml

      > sudo sh -c "sh /usr/local/elsa/contrib/install.sh node set_syslogng_conf"


          4) Pattern for customized messages

      # Pattern for custom message

       @ESTRING:s0: @@ESTRING:s1: @@ESTRING:i0: @This is event

      program='%ode-5-10001'>Server1 Warning 1 This is event 1

                   # Test if customized parser is working
      /usr/local/syslog-ng/bin/pdbtool match -p /usr/local/elsa/node/conf/patterndb.xml -P %ode-5-10001 -M
      "Server5 AllEvents 9900 This is event 9900 This is padding for all the events, except the specia2 cases listed above

      1. This should parse eventide,servername etc
      2. Push the changes to merged.xml

      > sudo sh -c "sh /usr/local/elsa/contrib/install.sh node set_syslogng_conf"


      2.8. Configuring syslog-ng

                        Copy following lines in syslog-ng files ( /usr/local/syslog-ng/etc/syslog-ng.conf)

      Copy these lines just above source s_network  at the beginning of file. Also check the directories mentioned,
      these should exactly be same as mentioned in td-agent.conf as mentioned in above steps.

      source s_file {
      file("/data/fluentd/json_log/out_files/json.out.log" follow_freq(1));            -- This is outfile for Json Messages
      file("/data/fluentd/apache_log/out_files/apache.out.log" follow_freq(1));  -- This is outfile for Apache Messages
      file("/data/fluentd/netflow_log/out_files/netflow.out.log" follow_freq(1)); -- This is outfile for Netflow Messages
      file("/data/fluentd/custom_log/out_files/custom.out.log" follow_freq(1)); -- This is outfile for customized Message};

      Make below mentioned changes at the bottom of file in existing code
      log {source(s_file);    -- Add this
      rewrite(r_from_pipes); -- comment this#
      rewrite(r_pipes);   -- comment this
      destination(d_elsa);  -- Add this

       2.9. Create classes and rules in DB

      Please refer to the file   /usr/local/elsa/contrib/fluentd/sample-classdb.sh
      All the rules should be executed under syslog_data database in mysql.
      Please change permission to execute, if want to create rules for sample data.

       3. Log rotation

      Over a period of time the file generated in out_files directory of various messages will grow. In order to log rotate them please follow below –

      Create a config file under /etc/logrotate.d/ .you can use already available files as template
      For  json messages it should be like–
      /data/fluentd/json_log/out_files/json.out.log {
      rotate 5
      olddir /data/fluentd/json_log/out_files/old
      create 640 td-agent td-agent
      Please check for the path of file mentioned.
      Currently this scheduled on daily basis via a cron job. This can be configured with different cron job at desired frequency.
       In order to test please run
       $ /etc/cron.daily/logrotate
       Check for  logrotated file  in /data/fluentd/json_log/out_files/old directory

      4. Directory Setup & Structure  

          4.1. Data directories

      As we are expecting huge data to be processed, so we have created input & output file directories under DATA Directory. Following is the structure for each type of message.


      For Json Messages

      $DATA_DIR/fluentd/json_log/in_files                     — for input files

      $DATA_DIR/fluentd/json_log/out_files                  — for out files

      $DATA_DIR/fluentd/json_log/out_files/old_files  —  for log rotated files

      For Apache Messages

      $DATA_DIR/fluentd/apache_log/in_files                     — for input files

      $DATA_DIR/fluentd/apache_log/out_files                  — for out files

      $DATA_DIR/fluentd/apache_log/out_files/old_files  —  for log rotated files


      For Netflow Messages

      $DATA_DIR/fluentd/netflow_log/in_files                     — for input files

      $DATA_DIR/fluentd/netflow_log/out_files                  — for out files

      $DATA_DIR/fluentd/netflow_log/out_files/old_files  —  for log rotated files

      For custom  Messages

      $DATA_DIR/fluentd/custom_log/in_files                     — for input files

      $DATA_DIR/fluentd/custom_log/out_files                  — for out files

      $DATA_DIR/fluentd/custom_log/out_files/old_files  —  for log rotated files

      • Log directory

      You can check the td-agent logs @ /var/log/td-agent/td-agent.log to troubleshoot any issue related to td-agent.

      5. Message setup Verification

          5.1. In order to test setup for each message type, create files with sample messages under following directories ( you can test with any or all of them)





          5.2. Start Services
      $ service  td-agent start
      $ service   td-agent reload
      $ service syslog-ng restart

          5.3. Verify output files

          Following ouput files would be created


      $DATA_DIR/fluentd/apache_log/out_files /apache.out.log

      $DATA_DIR/fluentd/netflow_log/out_files /netflow.out.log

      $DATA_DIR/fluentd/custom_log/ out_files /custom.out.log

          5.4.  Query Web UI

      Query the web UI for relevant messages

          5.5.  Streaming option verification

      Similar to the files set up as explained above , If messages are to be sourced via stream  then please follow below steps

      a) Open /etc/td-agent/td-agent.conf file

      b) Uncomment the sources for streaming ( you can see port 5170 is currently configured , you need to change as per your requirements. Make sure port you use should be different than the reserved one like 514 etc.

      c) Restart td-agent service

      You can test with netcat utility as by executing following on $ prompt


      echo '{"startTime":141741690024939218,"endTime":141741690025403185,"srcMac":"00:14:22:18:DA:7A","destMac":"00:17:C5:15:AC:C4",
      "networks":["Network 1","Network 4"],"srcLocation":{"countryName":"Local","countryCode":"Local","longitude":0.0,
      "latitude":2.0}}' | nc 5170

      d) Check the output on web UI.