Timeseries Features DSLedit

Defines how time series data must be aggregated into features for learning and inference.

Features define how to aggregate numerical data from a measurement A measurement defines an object type in the data source:

  • For Elasticsearch data source, the measurement is not used. You can set the doc_type in config.yml data source settings. Default is doc if not set
  • For InfluxDB data source, the measurement is the series measurement name

Accepted parameters to define model features are:

name

Name given to this feature

field

TSDB column name

measurement

TSDB table to fetch data

match_all

list of TSDB tag names and values that must be matched when fetching the data

metric

An aggregation operator supported by the TSDB

default

A default float value to replace NaN values returned by the TSDB, or previous in order to fill the series with previous non-NaN value

transform

null, or diff additional processing to apply to TSDB data in order to generate the feature

scores

min_max, or standardize additional processing to apply to TSDB data in order to generate the feature

anomaly_type

low_high, low, or high according to the type of abnormal values to detect

The list below defines the aggregation operators can be used to define features. These values can be extracted by using numeric fields in the data source documents.

Note

This version does not support feature generation by a provided script, nor extracting features from categories ie, non numeric fields.

The metric operators that are supported consist of: count, min, max, sum, avg, sum_of_squares, variance, and std_deviation.

Note

This version supports the following additional operators with InfluxDB: derivative, integral, spread, mode, 5percentile, 10percentile, 90percentile, 95percentile, stddev