Platform Description
GLOBEM provides three major modules and a few utility functions:
Feature Preparation Module
Model Computatoin Module
Configuration Module
Each algorithm (DepressionDetectionAlgorithmBase
defined in algorithm/base.py
) consists of these three modules
to form a complete pipeline that leads to one (or more) machine learning models:
from feature preparation (as model input) to model computation (to obtain model output), with parameters controlled by the configuration module.
Input
After dataset preparation (as explained in Setup page), an initial input data point will be a standard (feature matrix
, label
) pair.
label
: the ground truth (currently, it is a binary label) indicating a subject's self-report depressive symptom status on a certain date.
feature matrix
: given the date of the label
, the feature matrix includes daily feature vectors in the past four weeks, with the dimension as (28, # of features)
.
Feature Preparation Module
This module defines the features used by the algorithm as the input.
The function DepressionDetectionAlgorithmBase.prep_data_repo
determines this process of an algorithm.
For traditional machine learning algorithms, this can be basic feature selection, aggregation, and filtering (e.g., mean, std) along the feature matrix's temporal dimension (e.g., Canzian et al., Saeb et al.), or complex feature extraction (e.g., Xu et al., Chikersal et al.).
For deep learning algorithms, this is a definition of a feature data feeding process (i.e., a data generator) that prepares data for deep model training (e.g., ERM).
Model Computation Module
This module defines the model construction and training process.
The function DepressionDetectionAlgorithmBase.prep_model
determines a prediction model generated by the algorithm.
The prep_model
function will return a DepressionDetectionClassifierBase
object that specifies the model design, training, and prediction process.
For traditional machine learning algorithms, this can be some off-the-shelf model such as an SVM (e.g., Farhan et al.), or some customized statistical model (e.g., Lu et al.) that is ready to be trained with input data.
For deep learning algorithms, this is a definition of deep modeling architecture and training process (e.g., IRM), and builds a deep model that is ready to be trained with input data.
Multiple Models from One Algorithm
It is worth noting that one algorithm can define multiple models.
For example, ERM can use different deep learning architectures such as ERM-1D-CNN, ERM-2D-CNN, ERM-Transformer; DANN can take each dataset as a domain (DANN-dataset as domain), or each person as a domain (DANN-person as domain).
This is controlled by the config files and the algorithm_factory
.
The next page introduces how the two parts work together.
Configuration Module
This module provides the flexibility of controlling different parameters in the Feature Preparation Module
and Model Computation Module
.
Each algorithm has its own unique parameters that can be added to this module.
The platform employs a simple yaml
file system.
Each model (NOT algorithm) has its own unique config yaml file in config
folder with a unique file name.
For example, Chikersal et al. can have one model,
and its config file is config/ml_chikersal.yaml
;
DANN can have two models,
so it has two config files: config/dl_dann_ds_as_domain.yaml
and config/dl_dann_person_as_domain.yaml
, respectively.