The submission includes a single file that contains the following columns:
participant_id
cohort
research_stage
predicted_age_at_research_stage
Here is an example for a valid prediction file:
import pandas as pdpd.read_csv('examples/group_1__test_final.csv')
participant_id
cohort
research_stage
predicted_age_at_research_stage
0
0
10k
00_00_visit
54.6
1
0
10k
02_00_visit
56.8
2
0
10k
01_00_call
55.8
3
1
10k
04_00_visit
49.2
4
1
10k
02_00_visit
47.8
5
1
10k
00_00_visit
45.2
6
1
10k
01_00_call
46.9
7
1
10k
03_00_call
48.7
File naming
The file is saved a CSV with the following name template:
group_{0}__{test01}.csv
where replace {0} with your group number and {test01} with the current test set name (ask mentors).
If you are unsure what your group number is you can run the following code:
group = glob('/home/ec2-user/studies/group*')[0].split('/')[-1]
See also the example_notebooks/prediction_example.ipynb notebook that includes relevant code for submission (including file naming).
import osdef validate_y_pred(path_pred: str) ->None:""" Validates the input y_pred pandas Series. :path_pred: The path to the predictions file. Returns the y_pred pandas Series. """ required_levels = ['participant_id', 'cohort', 'research_stage', 'predicted_age_at_research_stage']assert os.path.exists(path_pred), 'The predictions file does not exist.' y_pred = pd.read_csv(path_pred) basename = os.path.basename(path_pred) group, subset, *_ = basename.split('.')[0].split("__")print(f'group: {group}\nsubset: {subset}')# Check if the series has a multiindex with required levelsifnotset(required_levels).issubset(set(y_pred.columns)):raiseValueError(f"y_pred should have at least columns {required_levels}.") y_pred = y_pred.set_index(required_levels[:-1])[required_levels[-1]]ifnotisinstance(y_pred.index, pd.MultiIndex):raiseValueError("y_pred should have a MultiIndex.")# Check if the series has unique indices per rowifnot y_pred.index.is_unique:raiseValueError("y_pred should have unique indices per row.")# Check if the series has numerical dtypeifnot pd.api.types.is_numeric_dtype(y_pred.dtype):raiseValueError("The values in y_pred should have a numerical dtype.")# Check if the series has no missing valuesif y_pred.isna().any():raiseValueError("y_pred should not have missing values.")# Check if the series has numerical values between 0 and 200ifnot (y_pred >=0).all() ornot (y_pred <=200).all():raiseValueError("The values in y_pred should be between 0 and 200.")return y_pred
Validating the output
After creating the file, validate it with the following function. It will check the file name and its content. Make sure that the printed group name and subset match your group and the current submission.
Once the file has been created with the predictions on the test set, it is submitted through the “egress” folder. It is located under the studies folder, and can be detected with the following code: