14-human_genetics

14 - human genetics

The Human Phenotype Project collects genomic variation data on all its participants. The genomic data together with the Human Phenotype Project deep-phenotypes allows to investigate the progression of disease, and to explore personalized treatments. We genotype millions of positions by low-pass sequencing combined with imputation using gencove platform technologies. Genotype imputation is a process of statistically inferring unobserved genotypes using known haplotypes in a population. The performance of Gencove genotype imputation is very high ( accuracy > 98% ) (Wasik et al. 2021).

The information is stored in a number of statistics parquet files: - main.parquet: sample metadata, including QC statistics, paths to PLINK variant files (raw and post-QC), and principal components (PCs). - variant_qc.parquet: variant QC statistics. - relatives/plink_ibs.parquet: IBS calculated by PLINK for pairs of participants. - relatives/king_kinship.parquet: King kinship coefficients for pairs of participants.

And a PLINK text file pca/eigenvec.var containing the principal component loadings.

from pheno_utils import PhenoLoader
dl = PhenoLoader('human_genetics', age_sex_dataset=None)
dl
DataLoader for human_genetics with
119 fields
1 tables: ['main']
dl.dict
field_string description_string parent_dataframe relative_location value_type units sampling_rate item_type array cohorts data_type debut pandas_dtype
tabular_field_name
collection_date Collection date The date of downloading Gencove results from t... NaN main.parquet Time Time NaN Data Single 10K Tabular 2019-03-11 datetime64[ns]
version Gencove version Gencove API version 1 or 2 NaN main.parquet Categorical (multiple) NaN NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_bases Bases sequenced number of total bases sequenced. from genecov ... NaN main.parquet Integer Count NaN Data Single 10K Tabular 2019-03-11 float64
genecov_qc_bases_dedup Deduplicated bases number of deduplicated bases. from genecov qc ... NaN main.parquet Integer Count NaN Data Single 10K Tabular 2019-03-11 float64
gencove_qc_bases_dedup_mapped Deduplicated bases aligned number of deduplicated bases that have aligned... NaN main.parquet Integer Count NaN Data Single 10K Tabular 2019-03-11 float64
genecov_qc_effective_coverage Effective coverage effective coverage. from genecov qc file NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
genecov_qc_fraction_contamination DNA contamination contamination by DNA from another sample of th... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
genecov_qc_snps SNPs covered number of variants in reference panel that are... NaN main.parquet Integer Count NaN Data Single 10K Tabular 2019-03-11 float64
genecov_qc_format_passed Passed FASTQ format validity Indicates proper formatting of the input FASTQ... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_r1_eq_r2_passed Passed paired-end count Indicates number of bases in R1 file equal to ... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_r1_r2_ids_match_passed Passed paired-end match Indicates R1 read identifiers match R2 read id... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_bases_passed Passed Bases sequenced Indicates number of total bases sequenced grea... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_bases_dedup_passed Passed Deduplicated bases Indicates number of deduplicated bases greater... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_bases_dedup_mapped_passed Passed Deduplicated bases aligned Indicates number of deduplicated bases aligned... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_effective_coverage_passed Passed Effective coverage Indicates effective coverage greater than 0. f... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_fraction_contamination_passed Passed DNA contamination Indicates contamination by DNA less than 0.06.... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
genecov_qc_snps_passed Passed SNPs covered Indicates number of variants covered by at lea... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 object
heterozygosity_proportion Heterozygosity proportion Estimated heterozygosity proportion across a s... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
pass_het_check Passed Heterozygosity proportion Indicates heterozygosity proportion is less th... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 bool
sample_qc_gp_pass_rate Genotype Probabilities pass rate Proportion of genotypes with maximum GP greate... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
sample_qc_gp_pass_check Passed Genotype Probabilities Indicates proportion of GP greater than 0.9 is... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 bool
genetic_sex DNA sex Sex as determined by the percentage of read-al... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 float64
chr_x_proportion Proportion of alignment to chromosome X percentage of reads aligned to chromosome X NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
chr_y_proportion Proportion of alignment to chromosome Y percentage of reads aligned to chromosome Y NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
sample_qc_gender_match_check Passed gender check Indicates inferred_gender and submitted_gender... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 bool
relatives Relatives based on IBD estimation List of second and first degree relatives acco... NaN main.parquet String Text NaN Data Single 10K Tabular 2019-03-11 object
self_report_ashkenaz_proportion Self-reported ashkenazi descent proportion Proportion of great-grandparents with ashkenaz... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
self_report_middleeastern_proportion Self-reported Middle-Eastern descent proportion Proportion of great-grandparents with Middle-E... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
self_report_northafrican_proportion Self-reported North-African descent proportion Proportion of great-grandparents with ashkenaz... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
self_report_sephardi_proportion Self-reported Sephardi descent proportion Proportion of great-grandparents with North-Af... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
self_report_yemen_proportion Self-reported Yemen descent proportion Proportion of great-grandparents with Yemen de... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
self_report_unknownother_proportion Self-reported unknown or other descent proport... Proportion of great-grandparents with unknown ... NaN main.parquet Continuous Precent NaN Data Single 10K Tabular 2019-03-11 float64
pc1 Principal component 1 Score for projected principal component 1 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc2 Principal component 2 Score for projected principal component 2 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc3 Principal component 3 Score for projected principal component 3 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc4 Principal component 4 Score for projected principal component 4 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc5 Principal component 5 Score for projected principal component 5 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc6 Principal component 6 Score for projected principal component 6 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc7 Principal component 7 Score for projected principal component 7 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc8 Principal component 8 Score for projected principal component 8 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc9 Principal component 9 Score for projected principal component 9 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc10 Principal component 10 Score for projected principal component 10 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc11 Principal component 11 Score for projected principal component 11 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc12 Principal component 12 Score for projected principal component 12 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc13 Principal component 13 Score for projected principal component 13 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc14 Principal component 14 Score for projected principal component 14 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc15 Principal component 15 Score for projected principal component 15 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc16 Principal component 16 Score for projected principal component 16 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc17 Principal component 17 Score for projected principal component 17 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc18 Principal component 18 Score for projected principal component 18 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc19 Principal component 19 Score for projected principal component 19 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
pc20 Principal component 20 Score for projected principal component 20 NaN main.parquet Continuous None NaN Data Single 10K Tabular 2019-03-11 float64
used_in_pca_calculation Included in principal component analysis Indicates samples was in the input for princip... NaN main.parquet Categorical (single) Boolean NaN Data Single 10K Tabular 2019-03-11 bool
postqc_bed_chr_1 PLINK BED file for post QC genotype data chr1 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_1 PLINK FAM file accompanying QC .bed chr1 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_1 PLINK BIM file for the post QC genotype data chr1 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_2 PLINK BED file for post QC genotype data chr2 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_2 PLINK FAM file accompanying QC .bed chr2 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_2 PLINK BIM file for the post QC genotype data chr2 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_3 PLINK BED file for post QC genotype data chr3 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_3 PLINK FAM file accompanying QC .bed chr3 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_3 PLINK BIM file for the post QC genotype data chr3 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_4 PLINK BED file for post QC genotype data chr4 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_4 PLINK FAM file accompanying QC .bed chr4 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_4 PLINK BIM file for the post QC genotype data chr4 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_5 PLINK BED file for post QC genotype data chr5 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_5 PLINK FAM file accompanying QC .bed chr5 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_5 PLINK BIM file for the post QC genotype data chr5 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_6 PLINK BED file for post QC genotype data chr6 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_6 PLINK FAM file accompanying QC .bed chr6 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_6 PLINK BIM file for the post QC genotype data chr6 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_7 PLINK BED file for post QC genotype data chr7 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_7 PLINK FAM file accompanying QC .bed chr7 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_7 PLINK BIM file for the post QC genotype data chr7 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_8 PLINK BED file for post QC genotype data chr8 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_8 PLINK FAM file accompanying QC .bed chr8 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_8 PLINK BIM file for the post QC genotype data chr8 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_9 PLINK BED file for post QC genotype data chr9 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_9 PLINK FAM file accompanying QC .bed chr9 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_9 PLINK BIM file for the post QC genotype data chr9 PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_10 PLINK BED file for post QC genotype data chr10 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_10 PLINK FAM file accompanying QC .bed chr10 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_10 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_11 PLINK BED file for post QC genotype data chr11 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_11 PLINK FAM file accompanying QC .bed chr11 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_11 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_12 PLINK BED file for post QC genotype data chr12 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_12 PLINK FAM file accompanying QC .bed chr12 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_12 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_13 PLINK BED file for post QC genotype data chr13 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_13 PLINK FAM file accompanying QC .bed chr13 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_13 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_14 PLINK BED file for post QC genotype data chr14 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_14 PLINK FAM file accompanying QC .bed chr14 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_14 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_15 PLINK BED file for post QC genotype data chr15 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_15 PLINK FAM file accompanying QC .bed chr15 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_15 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_16 PLINK BED file for post QC genotype data chr16 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_16 PLINK FAM file accompanying QC .bed chr16 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_16 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_17 PLINK BED file for post QC genotype data chr17 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_17 PLINK FAM file accompanying QC .bed chr17 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_17 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_18 PLINK BED file for post QC genotype data chr18 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_18 PLINK FAM file accompanying QC .bed chr18 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_18 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_19 PLINK BED file for post QC genotype data chr19 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_19 PLINK FAM file accompanying QC .bed chr19 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_19 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_20 PLINK BED file for post QC genotype data chr20 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_20 PLINK FAM file accompanying QC .bed chr20 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_20 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_21 PLINK BED file for post QC genotype data chr21 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_21 PLINK FAM file accompanying QC .bed chr21 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_21 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bed_chr_22 PLINK BED file for post QC genotype data chr22 Raw genotype calls (per chromosome) in binary ... NaN main.parquet NaN NaN NaN Bulk Single 10K Compressed binary file 2019-03-11 object
postqc_fam_chr_22 PLINK FAM file accompanying QC .bed chr22 PLINK sample information file accompanying a .... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object
postqc_bim_chr_22 PLINK BIM file for the post QC genotype data c... PLINK variant information file (per chromosome... NaN main.parquet NaN NaN NaN Bulk Single 10K Text file 2019-03-11 object