Data collection and analysis rules

Data mining rules are used for detecting incidents by analyzing events saved in the KUMA database.

In contrast to streaming correlation and retrospective scanning, data mining rules involve the preprocessing of data using SQL queries (including additional calculations, grouping and aggregation using SQL) and running regularly on a schedule. With correctly configured data mining rules, the SIEM can help you detect advanced threats.

To manage the section, you need one of the following roles: General administrator, Tenant administrator, Tier 1 analyst, Tier 2 analyst.

When creating or editing data collection and analysis rules, you need to specify the settings listed in the table below.

Settings of data collection and analysis rules

Setting	Description
Name	Required setting. Unique name of the resource. Must contain 1 to 128 Unicode characters.
Tenant	Required setting. The name of the tenant that owns the resource. If you have access to only one tenant, this field is filled in automatically. If you have access to multiple tenants, the name of the first tenant from your list of available tenants is inserted. You can select any tenant from this list.
Sql	Required setting. The SQL query must contain an aggregation function with a LIMIT and/or a data grouping with a LIMIT. You must use a LIMIT value between 1 and 10,000. Examples of SQL queries A query containing only an aggregation function: SELECT count(DeviceCustomFloatingPoint1) AS `Aggregate` FROM `events` WHERE Type = 1 ORDER BY Aggregate DESC LIMIT 10 A query containing only data grouping: SELECT SourceAddress, DeviceCustomFloatingPoint1 FROM `events` WHERE Type = 1 GROUP BY SourceAddress, DeviceCustomFloatingPoint1 ORDER BY DeviceCustomFloatingPoint1 DESC LIMIT 10 A query containing an aggregation function and data grouping: SELECT SourceAddress, sum(DeviceCustomFloatingPoint1) FROM `events` WHERE Type = 1 GROUP BY SourceAddress, DeviceCustomFloatingPoint1 ORDER BY DeviceCustomFloatingPoint1 DESC LIMIT 10 A query containing an expression using aggregation functions: SELECT stddevPop(DeviceCustomFloatingPoint1) + avg(DeviceCustomFloatingPoint1) AS `Aggregate` FROM `events` WHERE Type = 1 ORDER BY Aggregate DESC LIMIT 10 You can also use SQL function sets: `enrich` and `lookup`.
Query interval	Required setting. The interval for executing the SQL query. You can specify the interval in minutes, hours, and days. The minimum interval is 1 minute. The default timeout of the SQL query is equal to the interval that you specify in this field. If the execution of the SQL query takes longer than the timeout, an error occurs. In this case, we recommend increasing the interval. For example, if the interval is 1 minute, and the query takes 80 seconds to execute, we recommend setting the interval to at least 90 seconds.
Tags	Optional setting. Tags for resource search.
Depth	Optional setting. Expression for the lower bound of the interval for searching events in the database. To select a value from the list or to specify the depth as a relative interval, place the cursor in the field. For example, if you want to find all events from one hour ago to now, set the relative interval of `now-1h`.
Description	Optional setting. Description of data collection and analysis rules.
Mapping	Settings of the mapping the fields of an SQL query result to KUMA events: Source field is the field from the SQL query result that you want to convert into a KUMA event. Event field is the KUMA event field. You can select one of the values in the list by placing the mouse cursor in this field. Label is a unique custom label for event fields that begin with DeviceCustom*. You can add new table rows or delete table rows. To add a new table row, click Add mapping. To delete a row, select the check box next to the row and click the button. If you do not want to fill in the fields manually, you can click the Add mapping from SQL button. The field mapping table is populated with the values of the SQL query fields, including aliases (if any). For example, if the value of an SQL query field is `SourceAddress` and this value is the same as the name of an event field, this value is inserted into in the Event field column in the field mapping table. Clicking the Add mapping from SQL button again does not refresh the table, and fields from the SQL query are added to it again.

You can create a data collection and analysis rule in one of the following ways:

In the Resources → Resources and services → Data collection and analysis rules section.
In the Events section.

To create a data collection and analysis rule in the Events section:

Create or generate an SQL query and click the button.
A new browser tab for creating a data collection and analysis rule is opened in the browser with pre-filled SQL query and Depth fields. The field mapping table is also be populated automatically if you did not use an asterisk (*) in the SQL query.
Fill in the required fields.
If necessary, you can change the value in the Query interval field.
Save the settings.

The data collection and analysis rule is saved and is available in the Resources and services → Data collection and analysis rules section. For a data collection and analysis rule to run, you must create a scheduler for it.

Page top