How to Run An Algorithm

Command Template

The template command of running an algorithm:

conda activate globem

python evaluation/model_train_eval.py
  --config_name=[a model config file name]
  --pred_target=[prediction target]
  --eval_task=[evaluation task]

config_name corresponds to the specific model to evaluate. Please refer to the benchmark page for the list of algorithms, and the codebase for the config files.

pred_target supports the depression detection task:

dep_weekly: weekly depression status prediction
dep_endterm: end-term depression status prediction

eval_task supports a few single or multiple evaluation setups, including:

single: 5-fold cross-user validation in a single dataset
single_within_user: within-user training/testing on a single dataset (first 80% for training and the remaining 20% for testing)
allbutone: leave-one-dataset-out setup
crosscovid: pre/post COVID setup, only support certain datasets that cover years before and after COVID
crossgroup: cross-institute or cross-year evaluation
two_overlap: train/test on overlapping users between two datasets
all: do all evaluation setups

Examples

An example of running Chikersal et al.'s algorithm on each dataset to predict weekly depression:

python evaluation/model_train_eval.py \
  --config_name=ml_chikersal \
  --pred_target=dep_weekly \
  --eval_task=single_within_user

An example of running Reorder algorithm with a leave-one-dataset-out setup to predict weekly depression (Note that deep learning algorithms are not compatible with the end-of-term prediction task due to the limited data size):

python evaluation/model_train_eval.py \
  --config_name=dl_reorder \
  --pred_target=dep_weekly \
  --eval_task=allbutone

The model training and evaluation results will be saved at evaluation_output folder with corresponding path. Reading the results is straightforward:

import pickle
import pandas, numpy

with open("evaluation_output/evaluation_single_dataset_within_user/dep_weekly/ml_chikersal.pkl", "rb") as f:
    evaluation_results = pickle.load(f)
    df = pandas.DataFrame(evaluation_results["results_repo"]["dep_weekly"]).T
    print(df[["test_balanced_acc", "test_roc_auc"]])

with open("evaluation_output/evaluation_allbutone_datasets/dep_weekly/dl_reorder.pkl", "rb") as f:
    evaluation_results = pickle.load(f)
    df = pandas.DataFrame(evaluation_results["results_repo"]["dep_weekly"]).T
    print(df[["test_balanced_acc", "test_roc_auc"]])

Please refer to the analysis folder in the GLOBEM codebase for more examples of results processing.

Code Breakdown

The two examples above are equivalent to the following code blocks that shows breakdown process:

Chikersal et al.'s algorithm doing the single dataset evaluation task:

import pandas, numpy
from data_loader import data_loader_ml
from utils import train_eval_pipeline
from algorithm import algorithm_factory

ds_keys = ["INS-W_1", "INS-W_2", "INS-W_3", "INS-W_4"] # list of datasets to be included
pred_targets = ["dep_weekly"] # list of prediction task
config_name = "ml_chikersal" # model config

dataset_dict = data_loader_ml.data_loader(ds_keys_dict={pt: ds_keys for pt in pred_targets})
algorithm = algorithm_factory.load_algorithm(config_name=config_name)
evaluation_results = train_eval_pipeline.single_dataset_within_user_driver(
    dataset_dict, pred_targets, ds_keys, algorithm, verbose=0)

df = pandas.DataFrame(evaluation_results["results_repo"][pred_targets[0]]).T
print(df[["test_balanced_acc", "test_roc_auc"]])

Model	Balanced Accuracy				ROC AUC
Model	INS-1	INS-2	INS-3	INS-4	INS-1	INS-2	INS-3	INS-4
Chikersal et al.	0.656	0.611	0.641	0.690	0.726	0.679	0.695	0.763

Reorder algorithm doing the leave-one-dataset-out generalization task:

import pandas
from data_loader import data_loader_dl
from utils import train_eval_pipeline
from algorithm import algorithm_factory

ds_keys = ["INS-W_1", "INS-W_2", "INS-W_3", "INS-W_4"] # list of datasets to be included
pred_targets = ["dep_weekly"] # list of prediction task
config_name = "dl_reorder" # model config

dataset_dict = data_loader_dl.data_loader_dl_placeholder(pred_targets, ds_keys)
algorithm = algorithm_factory.load_algorithm(config_name=config_name)
evaluation_results = train_eval_pipeline.allbutone_datasets_driver(
    dataset_dict, pred_targets, ds_keys, algorithm, verbose=0)

df = pandas.DataFrame(evaluation_results["results_repo"][pred_targets[0]]).T
print(df[["test_balanced_acc", "test_roc_auc"]])

Model	Balanced Accuracy				ROC AUC
Model	INS-1	INS-2	INS-3	INS-4	INS-1	INS-2	INS-3	INS-4
Reorder	0.548	0.542	0.530	0.568	0.567	0.564	0.552	0.571

There are also some additional evaluation parameters that can be set through config files. Please refer to the config folder in the GLOBEM codebase.

From Sample Data to Full Data

By default, the algorithms are running on sample data from the datasets. To switch to the full data, please follow the following simple steps:

Access and download the completed GLOBEM datasets from the PhysioNet page.
Unzip the downloaded data and put the datasets (each one is a unique folder) into data_raw folder. Please refer to data_raw/README.md for more dataset details.
Go to config/global_config.py to set global_config["all"]["ds_keys"] to use the right dataset folders.

How to Run An Algorithm

Command Template#

Examples#

Code Breakdown#

From Sample Data to Full Data#

Command Template

Examples

Code Breakdown

From Sample Data to Full Data