GWAS-bionets
This project, in collaboration with Janssen, aims to apply GWAS networks to whole-exome sequencing (WES) data from the UK Biobank, addressing both scaling and biostatistical challenges. A long-term objective is to extend the application to even larger datasets, specifically whole-genome sequencing (WGS). Ultimately, the research seeks to unlock the tool's potential for studying genetic insights into psoriasis and other autoimmune diseases. As for the current project a stable consensus network (figure 1 and 2 explains how to build a consensus network, and its corresponding stable version) has been created, but the next step is to upgrade and add more network-methods to make this network more robust.
Figure 1 (above) shows how from a typical fileset (.bim, .bed and .fam files, which can be converted to PLINK 2.0 format files), one can create a consensus network. The network methods used at this moment are Heinz, HotNet2 and SigMod, but others like dmGWAS and Hierarchical HotNet are intended to be added. The last box (left to right in orange) explains the criterion used to select genes for the consensus network, if a gene is selected by two of the three methods, then we consider it to be part of the consensus network. Nonetheless, due to stochastic process of assigning P-values to each gene associated with a SNP, each time one can produce a different consensus network, thus a stable consensus version is suggested (figure 2, below).
Figure 2 (above) displays how to create a stable consensus network given the same fileset as with the consensus counterpart. We perform a cross-validation test by splitting the data in k times (in the figure we illustrate the case of splitting the data in 5 times, each new fileset represents 80% of the whole data), this will generate k filesets which will follow the workflow of using PLINK and then MAGMA. Since we create k filesets, then we will feed each network-method with a corresponding fileset (in the figure we present 15 solutions out of using 3 methods and 5 filesets). Finally, in the orange box we propose an ad-hoc criterion to select genes appearing in different solutions (>=11), but generally, is preferable to look at the modules (subnetworks) created in different number of solutions since they can contain insightful genes related with a phenotype (disease).
Figure 3 (above) displays a stable consensus network, and highlights subnetworks of interrelated genes possibly contributing to biological pathways associated with a phenotype (disease).