Analyzing data

Downloading data

You can use the command npm run getdata (which runs the script scripts/download_data.mjs) to download your data from Firebase. If run with no arguments, the script will ask your four questions before downloading the data:

Whether you want to download testing data (generated by running your experiment in development mode) or real data (generated from the deployment of your experiment)
Whether you want to download all data, or only data from participants marked as complete (participants who reached the end of the experiment)
Which deployment branch you want to download data from (defaults to the current branch)
Whether you want to download participants' recruitment info, such as platform-specific ID numbers (defaults to False)
What file name (or full path) to save the data into

Alternatively, you can specify any number of these through command-line arguments:

npm run getdata -- --type <real|testing> --complete_only <all|complete_only> --branch_name <name> --filename <name-or-path>

The script will ask you regarding any arguments not specified through the command-line.

Analyzing data

Once you have a data file, you can load and analyze it in the programming language of your choice. The following examples all use Python. The data file is saved as a JSON file, and so using the json package is the simplest way to load it:

python

import json

DATA_PATH = '...'

with open(DATA_PATH, 'r') as f:
  raw_data = json.load(f)

The loaded object is a list, where each entry in the list is a dictionary with two keys: id is the ID of the participant's data in the database, and data is a dictionary with the participant's data. Many of the fields it includes are automatically populated by Smile, and it will include your experiment's data. The structure of the data should match the structure of the data attribute in your smilestore in smilestore.js.

For a slightly nicer data loading experience (which loads data into Python objects with attributes, rather than just dictionaries), you can use the following extension of types.SimpleNamespace:

python

import json
from types import SimpleNamespace

DATA_PATH = '...'

with open(DATA_PATH, 'r') as f:
  data = json.load(f, object_hook=lambda d: ExtendedSimpleNamespace(**d))

Data provenance

Smile tracks the provenance of each data file, including the version of the code that was used to generate it and the git commit hash of the code at the time it was run. This allows you to know exactly which version of the code was used to generate any given data file.

In the Firestore document, this information is stored in smileConfig.github field and includes the repo name, owner, branch, last commit message, last commit hash, and the URL of the commmit.

By keeping this information stored with the data, you can also link directly to the code used to generate the data even as it evolves during development.

Analyzing data ​

Downloading data ​

Analyzing data ​

Data provenance ​

Analyzing data

Downloading data

Analyzing data

Data provenance