source
get_dictionary_properties_file_path
get_dictionary_properties_file_path ()
*Get the file path for dictionary properties - TODO: move to config file or DB. At this point only includes field_type properties.
Args:
Returns: str: the path to the file*
source
get_data_coding_file_path
get_data_coding_file_path ()
*Get the file path for dictionary properties - TODO: move to config file or DB. At this point only includes field_type properties.
Args:
Returns: str: the path to the file*
source
generate_synthetic_data
generate_synthetic_data (n:int=1000)
*Generates a sample DataFrame containing age, gender, and value data.
Args: n: The number of rows in the generated DataFrame.
Returns: A pandas DataFrame with columns ‘age’, ‘gender’, and ‘val’.*
source
generate_synthetic_data_like
generate_synthetic_data_like (df:pandas.core.frame.DataFrame, n:int=1000,
random_seed:int=42)
*Generate a sample DataFrame containing the same columns as df
, but with random data.
Args:
df: The DataFrame whose columns should be used.
n: The number of rows in the generated DataFrame.
Returns: A pandas DataFrame with the same columns as df
.*
source
generate_categorical_synthetic_data
generate_categorical_synthetic_data (n:int=1000)
*Generates a sample DataFrame containing age, gender, and categorical value data.
Args: n: The number of rows in the generated DataFrame.
Returns: A pandas DataFrame with columns ‘age’, ‘gender’, and ‘val1’.*
data = generate_synthetic_data()
data.head()
participant_id |
|
|
|
|
|
0 |
2020-11-16 |
54.422828 |
1 |
103.721478 |
48.846734 |
1 |
2021-06-08 |
65.232948 |
0 |
129.512280 |
54.583974 |
2 |
2020-08-16 |
42.413863 |
1 |
114.878851 |
52.193946 |
3 |
2021-04-13 |
57.872618 |
1 |
113.653117 |
51.826225 |
4 |
2023-07-17 |
70.640233 |
1 |
129.669937 |
56.631272 |
generate_synthetic_data_like(data.head(), n=5)
participant_id |
|
|
|
|
|
0 |
2020-08-16 |
57.872618 |
1 |
113.653117 |
48.846734 |
1 |
2021-04-13 |
65.232948 |
1 |
103.721478 |
56.631272 |
2 |
2023-07-17 |
42.413863 |
1 |
129.669937 |
54.583974 |
3 |
2021-06-08 |
54.422828 |
0 |
114.878851 |
52.193946 |
4 |
2020-11-16 |
70.640233 |
1 |
129.512280 |
51.826225 |
data = generate_categorical_synthetic_data()
data.head()
participant_id |
|
|
|
|
|
0 |
2021-09-24 |
69.788555 |
1 |
E |
A |
1 |
2021-03-02 |
36.289947 |
1 |
C |
B |
2 |
2022-06-15 |
61.501970 |
1 |
C |
C |
3 |
2020-07-23 |
46.299262 |
0 |
B |
A |
4 |
2021-03-03 |
70.127055 |
1 |
B |
C |