Bastian Greshake and I met at OpenCon during the ContentMine session, where he asked whether it would be possible to extract genotype, SNP, dependent variable information and the strength of the relation (reported in odds ratio, confidence interval and a p-value, most typically). I thought it most likely would be possible and started fiddling around. I got the data from the section below into a dataframe.
In this Chinese elderly population, prevalence of overweight, central obesity, diabetes, dyslipidemia, hypertension, and MetS were 48.3%, 71.0%, 32.4%, 75.7%, 68.3% and 54.5%, respectively. In the cross-sectional analyses, no SNP was found to be associated with MetS. Genotype TT of SNP rs4402960 within the gene IGF2BP2 was associated with overweight (odds ratio (OR) = 0.479, 95% confidence interval (CI): 0.316-0.724, p = 0.001) and genotype CA of SNP rs1801131 within the gene MTHFR was associated with hypertension (OR = 1.560, 95% CI: 1.194–2.240, p = 0.001). However, these associations were not observed in the longitudinal analyses.
Here are the results as I extracted them (+ I recalculate some values because it is possible and interesting).
I thought I'd share, considering that a refined version down the road might operate in a script with getpapers to continuously update the geno- and phenotype relations reported in the literature, which can serve as a database for meta-analysis, for example. Would love your input, and you can view the R code I used here.
Edit: I already see something went wrong with the p-value extraction, by the way..