# Generate synthetic data
= generate_synthetic_data(n=1000) data
Basic plots
get_gender_indices
get_gender_indices (df, gender='male', gender_col='sex')
*Returns the indices of the requested gender from a specified column in a pandas DataFrame.
Parameters: - df: pandas DataFrame. - gender: str, ‘male’ or ‘female’, the gender to filter by. - gender_col: str, the name of the column containing gender information.
Returns: - indices: Index object with the indices of the rows matching the requested gender.*
data_histplot
data_histplot (data:pandas.core.frame.DataFrame, col:str, feature_str:Optional[str]=None, gender_col:str='sex', plot_both_genders:bool=True, ax=None)
*Plots a histogram from a DataFrame for a specific column.
Args: data (pd.DataFrame): The DataFrame containing the data to plot. col (str): The name of the column to plot. feature_str (Optional[str], optional): The name of the feature to plot. Defaults to None. gender_col (str, optional): The name of the column containing gender information. Defaults to “sex”. plot_both_genders (bool, optional): Whether to plot both genders or just one. Defaults to True. ax ([type], optional): The axis to plot on. Defaults to None.*
=data, col="val1", plot_both_genders=False) data_histplot(data
=data, col="val1") data_histplot(data
data_ecdfplot
data_ecdfplot (data:pandas.core.frame.DataFrame, col:str, feature_str:Optional[str]=None, gender_col:str='sex', plot_both_genders:bool=True, ax=None)
*Plots an empirical cumulative distribution function (ECDF) from a DataFrame for a specific column.
Args: data (pd.DataFrame): The DataFrame containing the data to plot. col (str): The name of the column to plot. feature_str (Optional[str], optional): The name of the feature to plot. Defaults to None. gender_col (str, optional): The name of the column containing gender information. Defaults to “sex”. plot_both_genders (bool, optional): Whether to plot both genders or just one. Defaults to True. ax ([type], optional): The axis to plot on. Defaults to None.*
=data, col="val1", plot_both_genders=False) data_ecdfplot(data
=data, col="val1") data_ecdfplot(data
hist_ecdf_plots
hist_ecdf_plots (data:pandas.core.frame.DataFrame, col:str, feature_str:Optional[str]=None, gender_col:str='sex', plot_both_genders:bool=True)
*Plots histograms and empirical cumulative distribution functions (ECDFs) from a DataFrame for a specific column.
Args: data: The input DataFrame containing the data to plot. col: The column name to plot. feature_str: The title of the plot. If not provided, the column name will be used. gender_col: The column name indicating sex (default is ‘sex’ - female:0; male:1). plot_both_genders (bool, optional): Whether to plot both genders or just one. Defaults to True.
Returns: None*
=data, col="val1") hist_ecdf_plots(data
plot_stats
plot_stats (data:pandas.core.frame.DataFrame, col:str, ax:matplotlib.axes._axes.Axes, color:str, x_position:float=0.3, label:Optional[str]='All')
*Adds a text box to an axis object with summary statistics for a given column in a pandas DataFrame.
Args: data (pd.DataFrame): The pandas DataFrame containing the data to calculate statistics for. col (str): The name of the column to calculate statistics for. ax (plt.Axes): The axis object to add the text box to. color (str): The color of the text box. x_position (float, optional): The x position of the text box. Defaults to 0.3. label (Optional[str], optional): The label to display in the text box. Defaults to “All”.*
plot_hist_stats
plot_hist_stats (data:pandas.core.frame.DataFrame, col:str, feature_str:Optional[str]=None, gender_col:str='sex', plot_both_genders:bool=True)
*Plots a histogram of a given column in a pandas DataFrame and adds summary statistics to the plot.
Args: data (pd.DataFrame): The pandas DataFrame containing the data to plot. col (str): The name of the column to plot. feature_str (Optional[str], optional): A string describing the feature being plotted. Defaults to None. gender_col (str, optional): The name of the column containing gender information. Defaults to “sex”. plot_both_genders (bool, optional): Whether to plot statistics separately for males and females. Defaults to True.*
# Generate synthetic data
= generate_synthetic_data(n=1000) data
"val1", plot_both_genders=False) plot_hist_stats(data,
"val1") plot_hist_stats(data,
plot_data_collection
plot_data_collection (data:pandas.core.frame.DataFrame, date_col:str='collection_date', feature_str:Optional[str]=None, ax:Optional[matplotlib.axes._axes.Axes]=None)
*Plots a histogram of the specified column in a pandas DataFrame and excludes the last point from the plot.
Args: data (pd.DataFrame): The pandas DataFrame containing the data to plot. date_col (str, optional): The name of the column containing the dates. Defaults to “collection_date”. feature_str (Optional[str], optional): The name of the feature to plot. If None, the name of the date column will be used. Defaults to None. ax (Optional[plt.Axes], optional): The axis object to plot on. If None, a new figure and axis will be created. Defaults to None.*
="date_of_research_stage", feature_str="val1") plot_data_collection(data, date_col
show_fundus
show_fundus (fname:str)
plot_horizontal_count_bars
plot_horizontal_count_bars (data, column_name, hue=None, n=20)
*Function to plot horizontal bar charts with counts.
Parameters: - data (pd.DataFrame): DataFrame containing the data - y (str): Column name for the y-axis - hue (str, optional): Column name for the hue (default is None) - n (int, optional): Number of top categories to display (default is None, showing all)
Returns: - ax (Axes object): The plot*
# Generate categorical synthetic data
= generate_categorical_synthetic_data(n=1000) categorical_data
'val1', hue='val2', n=3) plot_horizontal_count_bars(categorical_data,
'val1', hue=None) plot_horizontal_count_bars(categorical_data,