π§ Analyzing data β
Downloading data β
You can use the command npm run getdata
(which runs the script scripts/download_data.mjs
) to download your data from Firebase. If run with no arguments, the script will ask your four questions before downloading the data:
- Whether you want to download testing data (generated by running your experiment in development mode) or real data (generated from the deployment of your experiment)?
- Whether you want to download all data, or only data from participants marked as complete (participants who reached the end of the experiment)?
- Which deployment branch do you want to download data from (defaults to the
main
branch)? - What file name (or full path) to save the data into?
Alternatively, you can specify any number of these through command-line arguments:
npm run getdata -- --type <real|testing> --complete_only <all|complete_only> --branch_name <name> --filename <name-or-path>
The script will ask you regarding any arguments not specified through the command-line.
Analyzing data β
Once you have a data file, you can load and analyze it in the programming language of your choice. The following examples all use Python. The data file is saved as a JSON file, and so using the json
package is the simplest way to load it:
import json
DATA_PATH = '...'
with open(DATA_PATH, 'r') as f:
raw_data = json.load(f)
The loaded object is a list, where each entry in the list is a dictionary with two keys: id
is the ID of the participant's data in the database, and data
is a dictionary with the participant's data. Many of the fields it includes are automatically populated by π« Smile, and it will include your experiment's data. The structure of the data should match the structure of the data
attribute in your smilestore
in smilestore.js
.
For a slightly nicer data loading experience (which loads data into Python objects with attributes, rather than just dictionaries), you can use the following extension of types.SimpleNamespace
:
import json
from types import SimpleNamespace
DATA_PATH = '...'
with open(DATA_PATH, 'r') as f:
data = json.load(f, object_hook=lambda d: ExtendedSimpleNamespace(**d))
Data provenance β
Smile tracks the provenance of each data file, including the version of the code that was used to generate it and the git commit hash of the code at the time it was run. This allows you to know exactly which version of the code was used to generate any given data file.
In the Firestore document, this information is stored in smile_config.github
field and includes the repo name, owner, branch, last commit message, last commit hash, and the URL of the commmit.
By keeping this information stored with the data, you can also link directly to the code used to generate the data even as it evolves during development.