Config

Configuration parameters and synthetic datasets creation

source

get_dictionary_properties_file_path

 get_dictionary_properties_file_path ()

*Get the file path for dictionary properties - TODO: move to config file or DB. At this point only includes field_type properties.

Args:

Returns: str: the path to the file*


source

get_data_coding_file_path

 get_data_coding_file_path ()

*Get the file path for dictionary properties - TODO: move to config file or DB. At this point only includes field_type properties.

Args:

Returns: str: the path to the file*


source

generate_synthetic_data

 generate_synthetic_data (n:int=1000)

*Generates a sample DataFrame containing age, gender, and value data.

Args: n: The number of rows in the generated DataFrame.

Returns: A pandas DataFrame with columns ‘age’, ‘gender’, and ‘val’.*


source

generate_synthetic_data_like

 generate_synthetic_data_like (df:pandas.core.frame.DataFrame, n:int=1000,
                               random_seed:int=42)

*Generate a sample DataFrame containing the same columns as df, but with random data.

Args:

df: The DataFrame whose columns should be used.
n: The number of rows in the generated DataFrame.

Returns: A pandas DataFrame with the same columns as df.*


source

generate_categorical_synthetic_data

 generate_categorical_synthetic_data (n:int=1000)

*Generates a sample DataFrame containing age, gender, and categorical value data.

Args: n: The number of rows in the generated DataFrame.

Returns: A pandas DataFrame with columns ‘age’, ‘gender’, and ‘val1’.*

data = generate_synthetic_data()
data.head()
date_of_research_stage age_at_research_stage sex val1 val2
participant_id
0 2020-11-16 54.422828 1 103.721478 48.846734
1 2021-06-08 65.232948 0 129.512280 54.583974
2 2020-08-16 42.413863 1 114.878851 52.193946
3 2021-04-13 57.872618 1 113.653117 51.826225
4 2023-07-17 70.640233 1 129.669937 56.631272
generate_synthetic_data_like(data.head(), n=5)
date_of_research_stage age_at_research_stage sex val1 val2
participant_id
0 2020-08-16 57.872618 1 113.653117 48.846734
1 2021-04-13 65.232948 1 103.721478 56.631272
2 2023-07-17 42.413863 1 129.669937 54.583974
3 2021-06-08 54.422828 0 114.878851 52.193946
4 2020-11-16 70.640233 1 129.512280 51.826225
data = generate_categorical_synthetic_data()
data.head()
date_of_research_stage age_at_research_stage sex val1 val2
participant_id
0 2021-09-24 69.788555 1 E A
1 2021-03-02 36.289947 1 C B
2 2022-06-15 61.501970 1 C C
3 2020-07-23 46.299262 0 B A
4 2021-03-03 70.127055 1 B C