How to Extend GLOBEM
GLOBEM also provide flexbile ways for researhers and developers to add new algorithms, new datasets, as well as new modeling targets.
How to add a new algorithm
The platform supports researchers in developing their own algorithms easily. Reading through Platform Description before implementing new algorithms is strongly recommended.
An algorithm just needs to extend the abstract class DepressionDetectionAlgorithmBase and implement:
- Define the function
prep_data_repo(as the feature preparation module) It takes inDatasetDictas the input and returns aDataRepoobject (see the definition here), which is a simple data object that savesX,y, andpids(participant ids). This can be used for preparing both training and testing sets. - Define the function
prep_model(as the model computation module) It returns aDepressionDetectionClassifierBaseobject (see the definition here), which needs to supportfit(model training),predict(model prediction), andpredict_proba(model prediction with probability distribution). - Add a configuration file in
config(as the configuration module) At least one yaml file with a unique name needs to be put in theconfigfolder. The config file will contain controllable parameters that can be adjusted manually. Please refer toconfig/README.mdfor more details. - Register the new algorithm in
algorithm/algorithm_factory.pyby adding appropriate class import and if-else logic.
The platform further prepare two templates for easier implementation of common traditional ML and DL algorithms.
How to add an ML algorithm
We provide a basic traditional machine learning algorithm DepressionDetectionAlgorithm_ML_basic that extends DepressionDetectionAlgorithmBase.
Its prep_data_repo function
- takes the feature vector at the same day of the collected label
- performs a feature normalization
- filters empty features and days with a large amount of missing data
- imputes the rest of the missing data using median
- puts the data into a
DataRepoand return it
Its prep_model function is left empty for custom implementation.
This object can serve as a starting point and other traditional ML algorithms can extend DepressionDetectionAlgorithm_ML_basic.
For example, the implementation of Saeb et al.'s algorithm
can be found algorithm/ml_saeb.py
and config/ml_saeb.yaml.
How to add a DL algorithm
We use ERM (algorithm/dl_erm.py) as the basic deep learning algorithm DepressionDetectionAlgorithm_DL_erm that extends DepressionDetectionAlgorithmBase.
Its prep_data_repo function
- prepares a set of data loaders
MultiSourceDataGeneratoras training&validation or testing set - puts them into a
DataRepoand returns it
Its prep_model function
- defines a standard deep-learning classifier
DepressionDetectionClassifier_DL_ermthat extendsDepressionDetectionClassifierBase - defines how a deep model should be trained, saved, and evaluated.
The training setup is parameterized in config files such as
config/dl_erm_1dCNN.yaml.
This algorithm can serve as a starting point, and other DL algorithms can extend DepressionDetectionAlgorithm_DL_erm and DepressionDetectionClassifier_DL_erm.
For example, the implementation of IRM algorithm can be found at algorithm/dl_irm.py and config/dl_irm.yaml.
For both traditional ML and DL algorithms, if the pre-implementation is not help, developers can also start from the plain DepressionDetectionAlgorithmBase and DepressionDetectionClassifierBase.
How to add a new dataset
To include a new dataset in the pipeline, follow the steps:
- Define the name of the new dataset with the template
[group name]_[dataset NO in the group], e.g.,ABC_1. - Following the same structure as other dataset folders in
data_raw, the new dataset folder (e.g.,,ABC_1) needs to contain three subfolders. Please refer toGLOBEM Datasetspage for more details:FeatureData- A csv file
rapids.csvindexed bypidanddatefor feature data, and separate files[data_type].csvindexed bypidanddatefor each data type. - Each row is a feature vector of a subject at a given date. Example columns: [
pid,date,feature1,feature2...]. - Columns include all sensor features of Phone Location, Phone Screen, Calls, Bluetooth, Fitbit Steps, and Fitbit Sleep from RAPIDS toolkit.
- A csv file
SurveyData- csv files indexed by
pidanddatefor label data. - For depression detection specifically, there are two files:
dep_weekly.csvanddep_endterm.csv. - For other tasks, there are three files:
pre.csv,post.csv, andema.csv.
- csv files indexed by
ParticipantsInfoData- A csv file
platform.csvindexed bypidfor data collection device platform (i.e., iOS or Android). - Example columns of the file: [
pid,platform].
- A csv file
- Register the new path in
data/data_factory.pyby adding new key-value pairs in the following dictionaries:feature_folder,survey_folder, anddevice_info_folder(e.g., adding{"ABC": {1: ...}}). - Register the new dataset key into the
config/global_config.yamlintoglobal_config["all"]["ds_keys"](e.g., appending"ABC_1").
How to add a new modeling target
Our current platform only supports binary classification tasks. Future work will be needed to extend to multi-classification and regression tasks. To build a model for a new target other than depression detection, please follow the steps:
- Pick a column in either
ema.csv, orpost.csv(seedata_raw/README.mdfor more details) as the target name.- Note that the picked column needs to be consistent across all datasets defined in
config/global_config.yaml. A column inpre.csvwould also work as long as the date can be handled correctly. HereUCLA_10items_POSTfrompost.csvis used as an example, a metric measuring loneliness.
- Note that the picked column needs to be consistent across all datasets defined in
- Define the binary label for the target in
data/data_factory.py'sthreshold_book.- A simple threshold based method is used to add a
key:valuepair to thethreshold_book, wherekeyis the target name andvalueis a dionctionary{"threshold_as_false": th1, "threshold_as_true":th2}(note thatth1is different fromth2). - For example, for
UCLA_10items_POST, scores < = 24 will be defined asFalse, and scores > 24 will beTrue. This corresponds to adding the followingkey:valuepair to thethreshold_book:"UCLA_10items_POST": {"threshold_as_false": 24, "threshold_as_true":25}.
- A simple threshold based method is used to add a
- Define it in the
config/global_config.yamlto involve it in the pipeline.- Replace
global_config["all"]["prediction_tasks"]to be[the new target]. Continuing the example, it will be["UCLA_10items_POST"].
- Replace