Processing, Analyzing & Visualizing

Queries

Syntax

Query syntax is loosely based on Google search syntax. Terms are searched as whole keywords (no wildcards). Searches may contain boolean operations specifying that a term is required using the plus sign, negating using the minus sign, or no sign indicating that it is an “OR.” Parenthesis may be used to group terms. Numeric fields, including hosts, may have greater than or less than (and equal to) operators combined with the boolean operators.

Boolean Operators

Operator Meaning
keyword Query MUST include the keyword
-keyword Query MUST NOT include the keyword
OR keyword Query MAY include the keyword

Range Operators

Range operators can only be used to filter search results, not provide the results to be filtered. That is, you must include a keyword in addition to the range operator. You can provide a single range operator; they do not need to be in pairs.

Operator Meaning
attribute>value Attribute MAY be greater than value
attribute<value Attribute MAY be less than value
attribute>=value Attribute MAY be greater than or equal to value
attribute<=value Attribute MAY be less than or equal to value
+attribute>value Attribute MUST be greater than value
+attribute<value Attribute MUST be less than value
+attribute>=value Attribute MUST be greater than or equal to value
+attribute<=value Attribute MUST be less than or equal to value
-attribute>value Attribute MUST NOT be greater than value
-attribute<value Attribute MUST NOT be less than value
-attribute>=value Attribute MUST NOT be greater than or equal to value
-attribute<=value Attribute MUST NOT be less than or equal to value

Transforms

Queries can have transforms applied to them. Transforms are covered later in the documentation. The syntax for using transforms is represented below.

Term Meaning
search clause Any combination of keywords and filters, as defined above
transform name Name of the transform as defined in the transform plugin
param Parameter supplied to the transform to direct its behavior

<search clause> [ | <transform name>([param1,param2,paramN]) ] [ | <transform name>([param1,param2,paramN]) ]

 

New Transforms

median Finds the field’s median value for the given result of the subquery.For ex., “class:xxx | median(bytes)”
min Finds the field’s minimum value for the given result of the subquery.For ex.,  “class:xxx | min(eventid)”
max Finds the field’s maximum value for the given result of the subquery.For ex., “class:xxx | max(eventid)”
avg finds the field’s average value for the given result of the subquery.For ex., “class:xxx | avg(bytes)”

 

Directives

Queries have a number of modifiers in the form of directives which instruct ELSA how to query.

Term Meaning Default Value Query can Batch Example
limit Return this number of results. A limit of zero means return an unlimited number, which constitutes a bulk query and forces the query to run in batch mode, with results delivered via a link in an email. 100 Batch can occur when limit set to 0 or > Max matches (default is 1000) limit:1000
cutoff Like limit, except it tells ELSA to stop searching after finding this many records, which is valuable when searching a common term and the total number of hits (as opposed to total returned) is irrelevant. undefined No cutoff:100
offset Partners with limit to indicate how far into a result set to go before returning results. Meaningless unless a limit larger than the default 100 is used. 0 No offset:900
orderby Order results by this attribute. Technically, undefined, but effectively timestamp, ascending in most scenarios. No orderby:host
orderby_dir Direction to order results. Must be used in conjunction with orderby. asc No orderby_dir:desc
start Quoted value representing the earliest timestamp to return. Valid values are almost any date representation. See details for the complete documentation here. undefined No end:”2013-01-01 00:00:00″
end Quoted value representing the latest timestamp to return. Valid values are as with start. undefined No end:”2013-01-01 00:00:00″
groupby Similar to SQL GROUP BY, returns the unique values for a given attribute and the count of the distinct values. undefined No groupby:host
node Apply a filter for results only from this node (subject to boolean representations as detailed above. undefined No node:192.168.1.1
datasource Query the given datasource as configured in the elsa_web.conf file. undefined No datasource:hr
timeout Stop querying and return any results found after this number of seconds. 300 No timeout:10
archive If set to a true value, query will be run on archived data instead of indexed data, batching if the estimated query time exceeds the configured value (with a default of 30 seconds). 0 Yes, if estimated time is > query_time_batch_threshold (30 seconds by default) archive:1
analytics If set to a true value, the query will automatically be batched and have no limit set. Results will be saved to a bulk result file, with a link to that file emailed. 0 Yes, always analytics:1
nobatch Run the query in the foreground, regardless of the estimated time it will take. 0 No, never nobatch:1
livetail Deprecated

Query examples

Queries can be very simple, like looking for any mention of an IP address:

10.0.20.1

Or a website

site:www.google.com

Here is an example query for finding Symantec Anti-Virus alerts on Windows logs on ten hosts that does not contain the keyword “TrackingCookie”

eventid:51 host>10.0.0.10 host<10.0.0.20 -TrackingCookie

One could also look for account lockouts that do not come from certain hosts:

class:windows locked -host>10.0.0.10 -host<10.0.0.20

To see what hosts have had lockout events, one could run:

class:windows ”locked out”

and choose the ANY.host field from the “Report On” menu. Here’s an example showing hits from website example.com or website bad.com:

site:example.com OR site.bad.com

Ordering

You can change the column used to order your query as well as the direction using the orderby and orderby_dir keywords. For instance, to order a query by host in reverse order, use: orderby:host orderby_dir:desc. The default is orderby:timestamp orderby_dir:ASC.

Keywords

Keywords are the words indexed and available for searching. Note that you cannot search for a partial keyword, it must be complete. Also note that keywords are comprised of not only alpha-numeric words, but also hyphens, dots, and at-signs. So, these are all complete keywords:

1.1.1.1
this-example.com
me@example.com
mal.form.ed-.ip.addr

Searches for 1.1 or example.com or ip.addr would all fail to find these terms. If you need to perform searches on partial keywords, you need to switch from an index query to an archive query by clicking the “Index” pull-down menu and choosing archive. Keep in mind that archive searches are slow, so narrowing down a time period will help significantly.

Search data flow

When the API issues the query, it is parsed and sent to Sphinx. It then receives the log ID’s that match and the API queries MySQL for those ID’s:

Query → Parse → Authorize → Log → Sphinx → MySQL → Aggregation → Presentation

Archive Queries

Queries for logs in the archive tables take much longer than indexed queries. For this reason, they are run in the background and the requester is notified via email when the query results are ready. The results are viewed through the link in the email or through the web interface menu for “Saved Results.” Archive queries are run exactly like normal queries except that the “Index” toggle button is changed to “Archive.” They may be performed on the same time range available in the indexed logs as a way of performing wildcard searches not restricted to a keyword. For example, if it was necessary to find a log matching a partial word, one could run an archive search with a narrow time selection. A user may only run a single archive query at a time to prevent system overload. In addition, there is a configuration variable specifying how many concurrent users may run an archive query (the default is four). Most systems can search about 10 million logs per minute per node from the archive. The overall flow looks like this:

Archive Query → Parse → Authorize → Log → Batch message to user

(then in background) → MySQL → Store in web MySQL → Email

 

Transforms

ELSA has a powerful feature called transforms which allow you to pass the results of a query to a backend plugin. The plugins that currently ship with ELSA include whois, dnsdb, and CIF (Collective Intelligence Framework). There are also utility transforms filter, grep, and sum.

Syntax

Transforms are modeled after UNIX-style command pipes, like this:

site:www.google.com | whois | sum(descr)

This command finds all URL requests for site www.google.com, passes those results to the whois plugin which attaches new fields like org and description, and then passes those results to the sum transform which takes the argument “descr” indicating which field to sum. The result is a graph of the unique “descr” field as provided by the whois plugin.

Plugins take the syntactical form of:

query | plugin_1(arg1,arg2,argn) | plugin_n(arg1,arg2,argn)

Current Plugins

The currently shipped plugins are:

Name Args Description Configuration
whois ARIN and RIPE online databases to add network owner info web: “transforms/whois/known_subnets”, “transforms/whois/known_orgs”
dnsdb isc.dnsdb.org’s database (if an api key is provided) web: “transforms/dnsdb/limit”, “transforms/dnsdb/apikey”
cif Queries a local Collective Intelligence Framework server web: “transforms/whois/known_subnets”, “transforms/whois/known_orgs”, “transforms/cif/base_url”
grep regex on field, regex on value Only passes results that match the test
filter regex on field, regex on value Only passes results that do not match the test
sum field Sums the total found for the given field
anonymize Anonymizes any IP’s found that match the configuration for “transforms/whois/known_subnets” web: “transforms/whois/known_subnets”
database (example) field to pass to database Adds record found in database to displayed record after using the given field as a lookup in the database web: “transforms/database/”
geoip Uses the local GeoIP database to attach geo info to any IP’s of hostnames found
has value,operator (defaults to >),field Defaults to returning only records that have more than the given count in a groupby result. Args can change operator to less than, etc., and also specify a specific field in a non-groupby result.
interval Calculates the number of seconds elapsed between records returned and adds that value as a transform field
local Returns only records which have a value in the configured local subnets web: “transforms/whois/knonw_subnets”
remote Returns only records which do not have a value in the configured local subnets web: “transforms/whois/knonw_subnets”
parse pattern_name Given the name of a configured pattern, will use preconfigured regular expressions to extract fields from result messages. It can be used as a way of post-search parsing. web: “transforms/parse/(pattern_name)”
scanmd5 Checks all configured URL sources for hits on any MD5’s contained in a record. By default, it will check Shadowserver, but can also check VirusTotal if an API key is configured. web: “transforms/scanmd5/virustotal_apikey”

Subsearches

Subsearches are a special kind of transform that is built-in to ELSA. They are used to take the results of a groupby (report) query and concatenate those results as an OR onto a second query. For example:

dstip:1.1.1.1 groupby:srcip | subsearch(dstip:2.2.2.2)

This query will find all source IP’s that talked to 1.1.1.1 and then find any of those IP’s which also talked to 2.2.2.2. You can mix in other transforms as well:

dstip:1.1.1.1 groupby:srcip | subsearch(dstip:2.2.2.2) | whois | filter(cc,us)

This will find IP’s which talked to both 1.1.1.1, 2.2.2.2, and are not in the US.

Subsearches can be chained together arbitrarily:

dstip:1.1.1.1 groupby:srcip | subsearch(dstip:2.2.2.2 groupby:srcip) | subsearch(class:windows groupby:eventid)

This will find all unique Windows event ID’s for hosts that talked to both 1.1.1.1 and 2.2.2.2.

To make a field from the source groupby become a specific field in the subsearch, you can pass a second argument:

dstip:1.1.1.1 groupby:srcip | subsearch(dstip:2.2.2.2,srcip)

This will mandate that the subsearch uses srcip:host for each host found in the first query.

Saved Searches (Macros)

Any completed search can be saved by clicking the “Results” button and choosing “Save Search.” This will bring up a dialog box asking for a name to save the search as. The name must be alphanumeric plus underscore. You can view and edit all saved searches using the “Saved Searches” menu option in the “ELSA” menu at in the upper-left-hand part of the menu bar at the top of the ELSA page.

Macros

Any saved search can be invoked inside of another query by using the dollar-sign-name convention. For example, if there is a saved search named “trojan” which was saved with a query like this: +sig_msg:trojan , then you can invoke that query within any other query like this:srcip:1.1.1.1 $trojan. The query will be interpolated and fully evaluate to srcip:1.1.1.1 +sig_msg:trojan.

Built-in Macros

The system will auto-populate some network-based macros for convenience if the whois transform configuration has been entered. ELSA ships with default values of RFC1918 IP space:

“whois”: {
“known_subnets”: {
“10.0.0.0”: {
“end”: “10.255.255.255”,
“org”: “MyOrg”
},

Edit the “known_subnets” stanza to add your local org-specific values. ELSA will use these values to create macros for srcip and dstip such that the macros $myorg, $src_myorg, and $dst_myorg will be available automatically and will resolve to srcip>10.0.0.0 srcip<10.255.255.255 dstip>10.0.0.0 dstip<10.255.255.255 for $myorg, and the src and dst versions for the $src_myorg and$dst_myorg, respectively.

Having these macros available can greatly aid searching for IDS events, HTTP events, among many others. For instance, you can easily find HTTP POST’s to your org by searching +method:POST +$dst_myorg.

These built-in macros will be overridden by user-created macros of the same name.

Monitoring

You can use the “Stats” page under the “Admin” menu on the web interface to see what ELSA’s usage looks like. To diagnose problems, refer to the $DATA_DIR/elsa/log directory, especially the node.log and web.log files, respectively.

You may also want to look for network problems on nodes, especially kernel drops. You can errors like this with this command:

netstat -s | grep -i errors

Look at the whole output of “netstat -s” for context if you see errors.

It may also be a good idea to establish a log that you know should periodically occur. Then do a query on the web interface and report on a time value, such as hour or day, and look for any fluctuations in that value that could indicate log loss.

 

 

Dashboards

Creating

To create a dashboard, click on the “ELSA” menu button in the upper-left-hand corner of the main web interface. A dialog box will open showing a grid of current dashboards you’ve created as well as a link for “Create/import.” Click the link to open another dialog which will ask for params:

  • Description: What the title of the dashboard page will show.
  • Alias: The final part of the URL used for accessing, e.g. http://elsa/dashboard/alias
  • Auth required: The level of authorization, can be none, authentication, or a specific group.
  • Import: You can paste import configuration (as generated by a dashboard export) here to auto-create all of these parameters, plus all of the underlying charts.
  • Groups: This field shows up when you’ve selected “Specific Groups” as the auth. You can paste in a groupname here, or use the drop down later.

Once created, a dashboard appears on the table of dashboards and the “Actions” button will allow you to view, edit, and delete the dashboard.

Authorization

Dashboards present a way to provide access to charts which have underlying queries that some users would not normally have permissions to query on their own. It is essentially a way to delegate query access for the purposes of making charts and is especially helpful for making reports that are customer-facing. Queries are logged and authorized as if they were made by the creator of the chart. A log is noted in the web.log file to record that the query was run on the behalf of another user. As previously stated, access to the dashboard itself can be governed, so there is essentially a two-tiered access system: the first is access to the dashboard, the second is the access to the data.

Currently, only a single group can be permitted access if using group-specific authorization. This restriction may be lifted in the future.

Adding Charts

Charts can be added either from the main ELSA query interface using the “Results” button and “Add to dashboard” or you can do so from the “edit dashboard” interface if you’ve chosen the “edit” option from the “Actions” menu in the “Dashboards” dialog. When adding a chart from the main query interface, you must choose a dashboard and a chart, which can be “New Chart” to create one. The dashboard must exist beforehand, so you may need to create a dashboard first.

Adding Queries

Queries are easiest to add using the above method in which a query you’ve just run in the standard ELSA query interface is added via the “Results” button. If the query has a “Report On” or “groupby” value, that value will be used to create the chart. Otherwise, the query will be plotted over time by count of occurrences.

Editing Charts

Charts can be edited from the edit dashboard interface in two ways: the appearance and the queries. The appearance will dictate what kind of chart it is, the title, and other formatting variables. The queries dictate the underlying data series. When editing charts, changes appear live as you edit.

Advanced Chart Editing

In some cases, you may need to edit the actual JSON used to construct a dashboard to get exactly the right look and feel. Here’s an excerpt from the ELSA mailing list touching on how to do that:

 

 

the width (fixed) is located into /opt/elsa/web/lib/Web/GoogleDashboard.pm

I set

our $Default_width = 1850;

for fit 46″ screen 1920pixel

For the number of elements (charts?) I found something into /opt/elsa/web/lib/API/Charts.pm

There is:

# Sanity check

if ($args->{x} > 2){

die(‘Cannot have more than 3 charts on one line: ‘ . $args->{x});

}

So 3 charts per line, if you need more, increase the number 2.

For the height there isn’t a unique value to modify as width but as Martin suggested to me some post ago, you can export dashboard, modify values, then create new dashboard with new values.

For example for a map I needed to set height:

“y” : “6”,

“options” : {

“width” : 940,

“height”: 500,

In this case you can modify height (and width) for each chart.

Chart Types

ELSA uses Google Visualizations to draw charts. See their documentation for what chart types are available. Of particular note is the “table” type, which is hidden under the “More” menu of available char types. It’s a great way to display long text in a readable format.

Viewing Dashboards

Dashboards are accessed via <elsa>/dashboard/<alias>?<time unit>=<number of units>&start=<ISO start>&end=<ISO end> All parameters after the alias are optional, and the default is to show the past seven days by hour. To see, for instance, the last fifteen minutes by second, you’d use: <alias>?seconds=900 which would give you extremely granular data. You could also view any time period at any granularity by providing a more specific start and/or end, such as <alias>?seconds=900&start=two days ago or <alias>?seconds=900&start=2012-08-27 00:00:00 . You can add &refresh=<seconds> to a dashboard URL to have it refresh every n seconds, where n is at least five.

Performance Considerations

Take care when creating charts that the queries used do not tax the system too much. This can happen when a query is not selective enough. That is, there is more “grouping” than “searching” occurring. For anything less than a billion records, this should not be much of an issue, but if your query returns more than a billion or so, you may notice that it can take seconds or minutes for the charts to load.

 

GeoIP Support

In addition to whois lookups, ELSA has a transform for GeoIP provided by MaxMind.com. By default, ELSA will use the country database provided in the standard Perl module, but you can download the free city database from here. The transform works like any other transform, e.g.:

site:www.google.com | geoip

This will attach the location fields to results. Results that have these fields can then be exported using the GoogleEarth export, which returns a .kml file suitable for opening in Google Earth or Google Maps.

 

Alerts

Any query that has been run may be turned into an alert by clicking the “Results…” menu button and choosing “alert.” This will execute the exact same search after every new batch of logs is loaded, and will notify the user via email of new hits in a manner similar to the archive search results.

Scheduled Queries

Any query may be scheduled to run at a specified interval. This can be useful for creating daily or hourly reports. Creating the scheduled query is similar to creating the alert in that you choose the option from the “Results…” button after performing a search you wish to create a report from.

Command-line Interface and API

ELSA ships with a command-line interface, elsa/web/cli.pl, which can be run when logged in on the web frontend from the shell. This can be helpful for testing or piping results to other programs. However, the Perl API provides a much more comprehensive method for accessing ELSA in a scripted fashion. You can use the included cli.pl as an example for using the API.