Basic Conceptsedit

There are a few concepts that are core to Loud ML. Understanding these concepts from the outset will tremendously help ease the learning process.

Time Series Database (TSDB)edit

A time series database (TSDB) is a software system that is optimized for handling time series data —- arrays of numbers indexed by time (a datetime or a datetime range).

Near Real Time (NRT)edit

Loud ML is a near real time API that aggregates data from external databases. What this means is there is a slight latency (normally from one to 30 seconds) from the time you index documents in the TSDB until the time it becomes available for search and aggregation performed by Loud ML.

Data Sourceedit

A data source is an external system that supports data query and aggregation. For example, it can be a TSDB or any other document source supported by the API. This object is expressed in YAML format and defined in the configuration.

Modeledit

A machine learning model uses features to represent changes in the data. With Loud ML, these features are assigned by the user when creating the model. For example, a feature can be avg(cpu_load) to represent the average metric calculed on the document field named cpu_load. The features are defined at model creation time and are used both in Trainingedit and Inferenceedit.

Trainingedit

This is the magic part: your model will learn the features according to data history. This operation is called training. It can be CPU intensive and data hungry. The more data, the longer the training. Your training result is saved on the local filesystem, so it does not have to be repeated if not necessary. Training will output the model accuracy level as a percentage. The higher the accuracy, the better in order to use the model in production.

Inferenceedit

After training, you can perform inference. This means your model can repeat the operations that it knows (or the ones that have been discovered through training) using brand new data. For example, with time series data, running inference means your model will predict future data based on present and past data: if your features are avg(cpu_temperature) and max(cpu_load), and your bucket_interval is 60s, you will predict the temperature and load in the next minute.

You can run inference using both past history data (usually to verify the model accuracy), and present data.