Novosibirsk State Technical University Scientists Develop One of the Largest Genetic Association Databases

NSU scientists developed one of the largest genetic association databases in the world. The database contains billions of associations of genome variants with human traits that were identified in hundreds of scientific studies conducted by the international scientific community. Information about these associations increases our understanding of human genetics and biology and can contribute to the diagnosis, prevention, and treatment of diseases. The results of this work were published in the public domain journal “Nucleic Acids Research”. 

Genome-wide association studies (GWAS) are the primary tool for identifying genetic factors that influence quantitative traits and the risk of developing common human diseases. Information about the associations identified during a GWAS helps to research the etiology of human diseases and develop risk prediction models. They can also be useful looking for candidate biomarkers, therapeutic interventions, and targets for these interventions. The number of genetic associations studied by the scientific community is growing rapidly, but the use of this data is limited by the large volume and lack of uniform standards for format and quality.

For many years, scientists from the NSU Natural Sciences Department Theoretical and Applied Functional Genomics Laboratory, in collaboration with colleagues from PolyKnomics (the Netherlands), collected information on associations and developed a computing infrastructure and computational methods for unification, quality control, and analysis. As a result of collecting and processing tens of terabytes of raw data, the researchers obtained one of the world’s largest databases of genetic associations. Tatyana Shashkova, Junior Researcher at the Laboratory, talked about their result,

We hope the database we developed can be used to solve a wide range of problems from fundamental research of human genetics to the development of predictive models and the search for candidate therapeutic effects”.

The database contains complete results of associative studies for more than 7,000 traits, including quantitative traits, common diseases, metabolite levels, proteins and glycans, as well as the results of several large-scale studies of gene transcription control. Overall, the database contains data on more than 75 billion genetic associations. To provide access to the database, the PheLiGe web interface was developed. The team also created a GWAS-MAP system that allows access to the database and a wide range of analysis through a command line interface. 

PolyKnomics CEO Lennart Karssen added, “The technological solution we developed with NSU is multipurpose. For example, it can be scaled to store and process information about millions of genomes. Such big data emerges in the context of national biobanking programs or genomic breeding programs”.


The diagram above illustrates the data processing model. The integration module is responsible for converting the summary statistics of genome-wide association studies into a universal format and controlling the data quality. The reference table is used to test and filter allelic variants. If summary statistics pass quality control, they are uploaded together with metadata to databases (DB module). Finally, the data is made available to an external user through a web interface.

Source: Novosibirsk State Technical University

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

Is it difficult to get two degrees at the same time? Interview with a graduate of the Intelligent Systems international program

Sat Jan 30 , 2021
Hendrik Voelker is a graduate of the international Master’s degree program in Intelligent Systems, which is jointly implemented by Peter the Great Polytechnic University and Leibniz University of Hannover (LUG; Germany). In the fall of 2020, Hendrik successfully defended his Master’s degree thesis in St. Petersburg and, a little later, in […]

European Higher Education Organization is a public organization carrying out academic, educational and information activities on higher education in Europe.

The EHEO general plan stresses that:

  • Higher education systems require adequate funding and, as an investment in economic growth, public spending in higher education should be protected.
  • The challenges faced by higher education require more flexible governance and funding systems, which balance greater autonomy for education institutions with accountability to stakeholders.

Thus, EHEO plans:

  • improve academic and scientific interaction of universities;
  • protect the interests of universities;
  • interact more closely with public authorities of European countries;
  • popularize European higher education in the world;
  • develop academic mobility;
  • seek funding for European universities.