There are a few concepts that are core to Loud ML. Understanding these concepts from the outset will tremendously help ease the learning process.
Time Series Database (TSDB)edit
A time series database (TSDB) is a software system that is optimized for handling time series data —- arrays of numbers indexed by time (a datetime or a datetime range).
Near Real Time (NRT)edit
Loud ML is a near real time API that aggregates data from external databases. What this means is there is a slight latency (normally from one to 30 seconds) from the time you index documents in the TSDB until the time it becomes available for search and aggregation performed by Loud ML.
A bucket is an external storage system that supports data query and aggregation. For example, it can be a TSDB or any other document source supported by the API. This object is expressed in YAML format and defined in the configuration.
A machine learning model uses features to represent changes in the data. With Loud ML, these features are assigned by the user when creating the model. For example, a feature can be
avg(cpu_load) to represent the average metric calculed on the document field named
cpu_load. The features are defined at model creation time and are used both in Trainingedit and Inferenceedit.
Data is dynamic and changes over time. A baseline provides a fit for expected normal value, and expected normal range for the data at a given point in time. A good fit is observed when the observed data is contained within a predicted range. For example, if a battery voltage is measured at 9.56V in the current temperature operating conditions and the baseline predicts 9.23V to 9.88V we shall assume normal operating conditions.
Historical data is used to baseline normal patterns in the data. You are training a model to discover the baseline that fits and understands the data. Your training result is saved on the local filesystem, so it does not have to be repeated if not necessary. Training will output the model loss value ie the fitness between the baseline and the original data. A low loss value is a good fit indicator.
After training your model, you can perform inference. This means your model can repeat the operations that it knows (or the ones that have been discovered through training) using brand new data. For example, with time series data, running inference means your model will predict future data based on present and past data: if your features are
max(cpu_load), and your
bucket_interval is 60s, you will predict the temperature and load in the next minute.
You can run inference using both past history data (usually to verify the model accuracy), and present data.