CREATE MODEL
: This statement allows you to invoke Drifto’s autoML
capabilities. CREATE MODEL
creates a new model that is stored in your model
registry with the given name. By default, Drifto uses an internal
MLflow instance, but can be configured to use an external
instance as well. Like regular create statements, one should use
CREATE OR REPLACE MODEL
to overwrite an existing model with the same name.
You may also use this statement to copy a pre-existing ML model (see examples below).
The SELECT
subquery in the create statement should return the full training
dataset (held-out sets are automatically subsampled but can be manually specified
as well).
Each row in the dataset corresponds to one data point for ML training and
each column is a feature. The WITH
clause is optional but for ML training
and each column is a feature. The WITH
clause is optional but the target
column to be called target
.
The ML training routine will train and validate a variety of ML architectures.
You can control this automated model search with argument flags.
{'auto', 'gbm', 'linear', 'dnn', 'drf'}
.'auto'
option selects an autoML routine that will try multiple
different algorithms and hyperparameters and select the best one.'gbm'
option selects a gradient-boosting machine (GBM) algorithm.'linear'
option selects a generalized linear model (GLM).'dnn'
option selects a fully-connected feed-forward deep neural net.'drf'
option selects a distributed random forest (DRF) architecture
that includes extreme random forests (XRF).{'auc', 'accu', 'logloss'}
.
Options (Regression) = {'mse', 'mae', 'r2'}
.