Submission instructions

Submission instructions

Dataframe format

The submission includes a single file that contains the following columns:

  • participant_id
  • cohort
  • research_stage
  • predicted_age_at_research_stage

Here is an example for a valid prediction file:

import pandas as pd
pd.read_csv('examples/group_1__test_final.csv')
participant_id cohort research_stage predicted_age_at_research_stage
0 0 10k 00_00_visit 54.6
1 0 10k 02_00_visit 56.8
2 0 10k 01_00_call 55.8
3 1 10k 04_00_visit 49.2
4 1 10k 02_00_visit 47.8
5 1 10k 00_00_visit 45.2
6 1 10k 01_00_call 46.9
7 1 10k 03_00_call 48.7

File naming

The file is saved a CSV with the following name template:

group_{0}__{test01}.csv
where replace {0} with your group number and {test01} with the current test set name (ask mentors).

If you are unsure what your group number is you can run the following code:

group = glob('/home/ec2-user/studies/group*')[0].split('/')[-1]

See also the example_notebooks/prediction_example.ipynb notebook that includes relevant code for submission (including file naming).

import os

def validate_y_pred(path_pred: str) -> None:
    """
    Validates the input y_pred pandas Series.
    
    :path_pred: The path to the predictions file.
    
    Returns the y_pred pandas Series.
    """
    required_levels = ['participant_id', 'cohort', 'research_stage', 'predicted_age_at_research_stage']
    assert os.path.exists(path_pred), 'The predictions file does not exist.'
    y_pred = pd.read_csv(path_pred)
    basename = os.path.basename(path_pred)
    group, subset, *_ = basename.split('.')[0].split("__")
    print(f'group: {group}\nsubset: {subset}')

    # Check if the series has a multiindex with required levels
    if not set(required_levels).issubset(set(y_pred.columns)):
        raise ValueError(f"y_pred should have at least columns {required_levels}.")

    y_pred = y_pred.set_index(required_levels[:-1])[required_levels[-1]]

    if not isinstance(y_pred.index, pd.MultiIndex):
        raise ValueError("y_pred should have a MultiIndex.")

    # Check if the series has unique indices per row
    if not y_pred.index.is_unique:
        raise ValueError("y_pred should have unique indices per row.")

    # Check if the series has numerical dtype
    if not pd.api.types.is_numeric_dtype(y_pred.dtype):
        raise ValueError("The values in y_pred should have a numerical dtype.")

    # Check if the series has no missing values
    if y_pred.isna().any():
        raise ValueError("y_pred should not have missing values.")

    # Check if the series has numerical values between 0 and 200
    if not (y_pred >= 0).all() or not (y_pred <= 200).all():
        raise ValueError("The values in y_pred should be between 0 and 200.")

    return y_pred

Validating the output

After creating the file, validate it with the following function. It will check the file name and its content. Make sure that the printed group name and subset match your group and the current submission.

validate_y_pred('examples/group_1__test_final.csv')
group: group_1
subset: test01
participant_id  cohort  research_stage
0               10k     00_00_visit       54.6
                        02_00_visit       56.8
                        01_00_call        55.8
1               10k     04_00_visit       49.2
                        02_00_visit       47.8
                        00_00_visit       45.2
                        01_00_call        46.9
                        03_00_call        48.7
Name: predicted_age_at_research_stage, dtype: float64

Uploading

Once the file has been created with the predictions on the test set, it is submitted through the “egress” folder. It is located under the studies folder, and can be detected with the following code:

egress_path = glob('/home/ec2-user/studies/*egress*')[0]

Make sure that the submission file is saved in this folder.

Finalizing the submission

Once the file is in the egress folder on the platform.

  1. Go to the Workspace page on the platofrm console.
  2. Locate your user’s current workspace.
  3. Press on the Egress Store button.
  4. Make sure that your submission file is listed.
  5. Press on Submit Egress Request. The hackathon team will receive an email notifying on your submission.