Data Transformations

Binning

The binning transformation divides the value range of a field into intervals, and counts the number of values within each interval. To apply the transformation, here is an example:

    let t = datatable.transform("bin", ["col1"]);

The returned result t is a DataTable. Each tuple (i.e., row) represents a bin or an interval. There are three fields (i.e., columns): “x0” (lower bound of the bin, inclusive), “x1” (upper bound of the bin, exclusive except the last bin), and “col1_count” (the number of values in the bin).

parameterrequired?explanationdefault value
binWidthoptionalwidth of bincomputed using Sturge’s formula
minoptionalminimum bin valueminimum field value
maxoptionalmaximum bin valueminimum field value

Filtering

The filtering transformation removes tuples in a data table that do not satisfy user defined criteria. The filtering criteria are defined as an array of predicates. To apply the transformation, here is an example:

    let t = datatable.transform("filter", [{field: "col1", value: "value1"}]);

The returned result t is a DataTable, where the rows are a subset of the rows in the original data table.

Kernel Density Estimation

The KDE transformation estimates the probability density of a field using an Epanechnikov kernel:

    let t = datatable.transform("kde", ["col1"]);

The returned result t is a DataTable. Each tuple (i.e., row) has two fields (i.e., columns): “col1” (value samples from the input field), and “col1_density” (the estimated probability density for the value sample).

parameterrequired?explanationdefault value
minoptionalminimum valueminimum field value
maxoptionalmaximum valuemaximum field value
bandwidthrequiredsmoothing parameter
intervalrequiredwidth of bin

Sorting

The sorting transformation orders tuples in a data table by the values of the specified fields in the defined order. By default, sorting is in ascending order. For example, the following code sorts the table rows first by "col1" then by "col2".

    datatable.transform("sort", ["col1", "col2"]);

The original data table is transformed and the return type is void.