Skip to content
Datasets
Overview

Overview

To download the dataset, please visit our PhysioNet page.

💡

To learn more about the data collection study, please visit our UW-EXP study page.

Dataset Introduction

GLOBEM datasets contain the first released multi-year mobile and wearable sensing datasets that include four years of data collection studies (2018-2021), conducted at University of Washington (led by the UW-EXP Study Team).

The four years of datasets are named as

  • INS-W_1 (2018) - 155 participants
  • INS-W_2 (2019) - 218 participants
  • INS-W_3 (2020) - 137 participants
  • INS-W_4 (2021) - 195 participants
💡

Our datasets have a high representation of females (58.9%), immigrants (24.2%), first-generations (38.2%), and people with disability (9.1%), and have a wide coverage of races, with Asian (53.9%) and White (31.9%) being dominant (Hispanic/Latino 7.4%, Black/African American 3.3%).

We envision our datasets can:

  • Serve as a reliable testbed for a fair comparison between behavior modeling algorithms.
  • Support multiple cross-dataset generalization tasks to evaluate an algorithm’s generalizability (e.g., different datasets across different years) and robustness (e.g., same users over multiple years).
  • Assist ML and Ubicomp researchers in developing more generalizable longitudinal behavior modeling algorithms.

Data Collection and Processing

Data Collection

The overall data collection procedure is shown in the following figure:


Overview of the data collection studies

We developed a mobile app using the AWARE framework that continuously collects location, phone usage (screen status), Bluetooth scans, and call logs. The app is compatible with both the iOS and Android platforms. Participants installed the app on smartphones and left it running in the background. In addition, we provided Fitbits to collect their physical activities and sleep behaviors. The mobile app and wearable passively collected sensor data 24$\times$7 during the study. The average study length is 78 days per person per year among the four datasets.

Meanwhile, surveys are delivered to participants at the start/end and during the study. These surveys cover a wide range of life experience of participants, including personality, physical well-being, mental well-being, social justice, and substance usage.

Data Processing

Due to the sensitive nature of the dataset, we release our feature-level data with open credentialed access.

We utilize RAPIDS, an open-source platform that provides a Reproducible Analysis Pipeline for Data Streams. It supports feature extraction from data collected via multiple mobile and wearable devices with various time windows.

Data Types

Each of our datasets include three types of data: Feature Data, Survey Data, and Participant Info Data.

Survey Data

We collected survey data at multiple stages of the study, including

  • Pre/Post Surveys: collected at the start/end of the study
  • EMA Surveys: collectedly regualrly during the study

Behaivor Feature Data

Each year, our data collection study lasted three months and collected data from a mobile phone and a wearable fitness tracker 24×7. The behavior feature data types include

  • Location
  • PhoneUsage
  • Call
  • Bluetooth
  • PhysicalActivity
  • Sleep

Participant Info Data

We also collected additional information about participants, including the smartphone platform they installed the data collection app, as well as their demographics (e.g., age, gender, racical group)

Examples

The datasets capture various aspects of participants' life experiences, such as general behavior patterns, the weekly routine cycle, the impact of COVID (Year3, 2020), and the gradual recovery after COVID (Year4, 2021).


Timeseries data examples

Meanwhile, each year's dataset has its uniqueness. Here we show one example from each data type.


Distribution examples