Submission instructions

Dataframe format

The submission includes a single file that contains the following columns:

participant_id
cohort
research_stage
predicted_age_at_research_stage

Here is an example for a valid prediction file:

import pandas as pd
pd.read_csv('examples/group_1__test_final.csv')

	participant_id	cohort	research_stage	predicted_age_at_research_stage
0	0	10k	00_00_visit	54.6
1	0	10k	02_00_visit	56.8
2	0	10k	01_00_call	55.8
3	1	10k	04_00_visit	49.2
4	1	10k	02_00_visit	47.8
5	1	10k	00_00_visit	45.2
6	1	10k	01_00_call	46.9
7	1	10k	03_00_call	48.7

File naming

The file is saved a CSV with the following name template:

group_{0}__{test01}.csv
where replace {0} with your group number and {test01} with the current test set name (ask mentors).

If you are unsure what your group number is you can run the following code:

group = glob('/home/ec2-user/studies/group*')[0].split('/')[-1]

See also the example_notebooks/prediction_example.ipynb notebook that includes relevant code for submission (including file naming).

import os

def validate_y_pred(path_pred: str) -> None:
    """
    Validates the input y_pred pandas Series.
    
    :path_pred: The path to the predictions file.
    
    Returns the y_pred pandas Series.
    """
    required_levels = ['participant_id', 'cohort', 'research_stage', 'predicted_age_at_research_stage']
    assert os.path.exists(path_pred), 'The predictions file does not exist.'
    y_pred = pd.read_csv(path_pred)
    basename = os.path.basename(path_pred)
    group, subset, *_ = basename.split('.')[0].split("__")
    print(f'group: {group}\nsubset: {subset}')

    # Check if the series has a multiindex with required levels
    if not set(required_levels).issubset(set(y_pred.columns)):
        raise ValueError(f"y_pred should have at least columns {required_levels}.")

    y_pred = y_pred.set_index(required_levels[:-1])[required_levels[-1]]

    if not isinstance(y_pred.index, pd.MultiIndex):
        raise ValueError("y_pred should have a MultiIndex.")

    # Check if the series has unique indices per row
    if not y_pred.index.is_unique:
        raise ValueError("y_pred should have unique indices per row.")

    # Check if the series has numerical dtype
    if not pd.api.types.is_numeric_dtype(y_pred.dtype):
        raise ValueError("The values in y_pred should have a numerical dtype.")

    # Check if the series has no missing values
    if y_pred.isna().any():
        raise ValueError("y_pred should not have missing values.")

    # Check if the series has numerical values between 0 and 200
    if not (y_pred >= 0).all() or not (y_pred <= 200).all():
        raise ValueError("The values in y_pred should be between 0 and 200.")

    return y_pred

Validating the output

After creating the file, validate it with the following function. It will check the file name and its content. Make sure that the printed group name and subset match your group and the current submission.

validate_y_pred('examples/group_1__test_final.csv')

group: group_1
subset: test01

participant_id  cohort  research_stage
0               10k     00_00_visit       54.6
                        02_00_visit       56.8
                        01_00_call        55.8
1               10k     04_00_visit       49.2
                        02_00_visit       47.8
                        00_00_visit       45.2
                        01_00_call        46.9
                        03_00_call        48.7
Name: predicted_age_at_research_stage, dtype: float64

Uploading

Once the file has been created with the predictions on the test set, it is submitted through the “egress” folder. It is located under the studies folder, and can be detected with the following code:

egress_path = glob('/home/ec2-user/studies/*egress*')[0]

Make sure that the submission file is saved in this folder.

Finalizing the submission

Once the file is in the egress folder on the platform.

Go to the Workspace page on the platofrm console.
Locate your user’s current workspace.
Press on the Egress Store button.
Make sure that your submission file is listed.
Press on Submit Egress Request. The hackathon team will receive an email notifying on your submission.