Our Research
Our research is dedicated to understanding the impact of genes on specific traits by developing robust genome sequencing data analysis methods, including our unique "Genomics Analysis and Annotation Pipeline." This pipeline integrates bioinformatics, statistical genomics, and population genetics. Additionally, we are enhancing our research by applying machine learning techniques and utilizing variant-, gene-, or pathway-based modeling and network simulations, leveraging our expertise in genetics, bioinformatics, and statistical analysis.
Our primary goals involve developing rapid analytic approaches using machine learning computational techniques to analyze exome/genome sequencing data, enabling large-scale discovery of rare mutations and aiding in the development of diagnostic tools for Mendelian disorders. Furthermore, we aim to identify genes with incomplete penetrance to better understand genetic factors contributing to complex traits and diseases. Additionally, we seek to pinpoint population-specific disease-causing mutations, genes, and pathways, leading to more tailored diagnosis and treatment strategies.
Our research focuses on advancing genomics through innovative analytical approaches and cutting-edge technologies to unravel the complex relationships between genes, traits, and diseases, ultimately improving human health management. At the ONAT Lab, we explore a wide range of rare and complex diseases, including metabolic disorders, neurodegenerative disorders, and behavioral/psychiatric disorders. We utilize state-of-the-art omics technologies to investigate disease mechanisms, identify disease genes, predict patient outcomes, and develop therapeutic strategies, with a particular focus on integrating and analyzing multi-omics data.
Collaboration is central to our research, and we collaborate with leading scientists worldwide, including Nobel laureates and professors from prestigious institutions. Our interdisciplinary research group focuses on unraveling the determinants and pathogenesis of human diseases through innovative strategies and reproducible "Genomics Pipelines." Overall, our work at the ONAT Lab holds significant promise for improving our understanding of human diseases and developing new treatments across various disorders.
Genetic Basis of Metabolic Phenotypes: Obesity and PCOS
Obesity is a global health concern associated with increased health risks and mortality rates. Although numerous genetic loci have been identified as being linked to obesity, the majority of the genetic variability in obesity remains unexplained. Additionally, the specific genes and biological processes underlying obesity susceptibility are still largely unknown. Similarly, the causal genes and pathways contributing to polycystic ovary syndrome (PCOS), a condition prevalent in women characterized by hyperandrogenism and chronic oligo-anovulation, are not well understood. Given that a significant proportion of PCOS women are also overweight or obese, it is likely that genes implicated in obesity may also play a role in the development of PCOS. Identifying the causal genes responsible for obesity and/or PCOS can pave the way for targeted and effective therapies. To address this, our research efforts have involved the collection of extensive clinical and family history data, as well as blood samples, from a total of 2,149 individuals for the obesity cohort and 218 individuals for the PCOS cohort. We conducted exome sequencing on 984 individuals from 764 families in the obesity cohort and 123 individuals from 108 families in the PCOS cohort. Additionally, we performed sequencing on individuals from other cohorts, including those with delayed sleep phase disorder (DSPD) and extreme leanness. Our control cohort comprised 6,090 individuals from the Turkish population, with 830 of them undergoing whole-genome sequencing. In order to identify causal mutations associated with obesity and PCOS, we have developed robust pipelines encompassing various stages, including variant discovery, data quality checks, variant annotation, variant filtering, variant and gene prioritization, and gene discovery. These pipelines have been refined over the years through the integration of diverse approaches from computer science and biology, such as bioinformatics, statistical genomics, machine learning, natural language processing, modeling and simulations, and population genetics. As a result of our extensive efforts, our genome bank has reached a size where achieving genome-wide statistical significance is possible. By utilizing these pipelines and analyzing the vast amount of genomic data we have accumulated, we aim to uncover the causal mutations, genes, and pathways involved in obesity and PCOS. This research has the potential to significantly advance our understanding of these complex traits and facilitate the development of targeted therapeutic interventions.
Genetic Basis of Neurovegetative Phenotypes: Essential Tremors and Parkinson’s Disease
Neurological movement disorders, such as Parkinson's disease (PD) and Essential Tremor (ETM), are highly prevalent in the community. ETM, first identified in 1887, is characterized by trembling of the hands and arms, as well as mild head shaking and vocal tremors. Its incidence ranges from 0.3% to 4% in the general population, with a higher prevalence in older individuals, reaching up to 22% in advanced age. PD, on the other hand, is characterized by symptoms such as tremor, bradykinesia (slowness of movement), rigidity, and postural instability. The incidence of PD increases with age, affecting around 1-2% of individuals over the age of 65 and 4% of individuals over 85. The major cause of PD is the loss of dopamine-producing cells in a region of the brain called the substantia nigra, along with the presence of Lewy bodies, which are aggregates of abnormal proteins. Similar features of dopaminergic cell death and the presence of Lewy bodies have also been observed in ETM, although the genetic basis of these disorders remains poorly understood. Limited knowledge exists regarding the genetic factors contributing to these diseases, mainly due to the challenge of obtaining a sufficient number of affected families for study. In our research, we aim to identify the genetic and molecular basis of these common neurological movement disorders. We have been conducting studies on six different ETM families with consanguineous marriages, investigating both genetic and functional causes. Our team has established a comprehensive database and DNA bank that includes 75 large hereditary families (spanning 4-6 generations) affected by PD and ETM. Through international clinical scorings and the collection of genetic information, we have gathered samples from multiple generations of these families. Additionally, we have a DNA bank specifically focused on 60 probands (individuals from whom a genetic condition originates) with familial Parkinson's disease. The existence of our DNA bank, along with the substantial number of families affected by PD and ETM, provides us with an advantage for conducting more accurate and efficient research. By leveraging these resources, we aim to unravel the genetic and molecular mechanisms underlying these disorders, leading to a better understanding of their pathogenesis and potentially enabling the development of targeted therapies.
Genetic Basis of Behavioral and Psychiatric Phenotypes: Sleep Disorders and ADHD
Disease gene identification studies have evolved over time, initially relying on the identification of genomic regions with chromosomal abnormalities or linking phenotypes to polymorphic markers. Subsequently, the focus shifted to mapping critical loci to identify causal variants. In the present era of next-generation sequencing, the identification of causal variants involves the utilization of complex algorithms and prioritization pipelines. In line with this, we have recently proposed a novel approach known as 'reverse phenotyping' for complex diseases, which can be considered a modern form of 'forward genetics'. This approach involves initially identifying a candidate genomic variant and then conducting clinical assessments in consanguineous families to confirm its role. The identification of causal variants for complex disorders presents several challenges due to shared symptoms, significant epidemiological comorbidity, and extensive allelic and locus heterogeneity. These characteristics make it difficult to establish causality solely through large-scale association or case-control studies. However, through our research, we have demonstrated the applicability of reverse phenotyping in disease gene identification for complex phenotypes such as Delayed Sleep Phase Disorder (DSPD) and Attention Deficit Hyperactivity Disorder (ADHD). By employing this approach, we have made significant strides in understanding the genetic underpinnings of these complex disorders. Overall, reverse phenotyping offers a promising strategy for unraveling the causative variants of complex diseases. By combining genomic analysis with clinical assessments in consanguineous families, we can overcome the challenges associated with these disorders and gain a deeper understanding of their genetic basis.
Reverse Phenotyping of the Complex Phenotypes and Rare Diseases in Our Genome Bank
Over the past 12 years, we have conducted exome/genome sequencing on a total of 1284 individuals with various phenotypes, including obesity, PCOS, ETM, PD, ADHD, DSPD, and extreme leanness. Additionally, we have collected clinical and genetic data from more than 3000 intrafamilial patients and controls to perform segregation and genotyping analyses. To expand our genomic resources, we have established collaborations with Rockefeller, Yale, UCSD, and Koç universities, combining genome sequencing data from over 6,400 Turkish individuals. This comprehensive effort has resulted in the creation of the largest Turkish genome bank in the world. Furthermore, we have gathered detailed clinical information from our cohort, encompassing parameters such as diabetes, hyperinsulinemia and insulin resistance, hypertension, hyperlipidemia, thyroiditis, coronary artery disease, sleep patterns, depression, antidepressant use, addiction, and psychiatric diseases. Our control database includes both neurologically healthy individuals and patients with complex neurodevelopmental and neurodegenerative disorders, such as dementia, Alzheimer's disease (AD), migraine, ataxias, multiple sclerosis (MS), epilepsy, dystonia, dystrophies, and mental retardation (MR), among others. Building upon this foundation, we propose utilizing a "reverse phenotyping" approach to discover disease genes associated with complex phenotypes within our database. In reverse phenotyping, the process begins by selecting candidate genes that are known to be associated with the complex phenotype of interest. These genes are then prioritized based on variant prediction tools and variome databases, including resources such as the GME Variome, GnomAD, Bravo, and Centers for Mendelian Genomics databases. Subsequently, individuals carrying the prioritized mutations are subjected to comprehensive clinical assessments to determine the specific phenotype associated with these variants. Construction of extended pedigrees and establishing genotype-phenotype correlations are crucial, particularly within large consanguineous families where segregation analysis is highly informative. Finally, we conduct population screening for the mutant allele in large, well-characterized cohorts, followed by targeted sequencing of the candidate gene in individuals who tested negative for the identified mutations. This comprehensive approach allows us to determine the full spectrum of mutations associated with the complex phenotype under investigation. By leveraging our extensive database and employing the reverse phenotyping strategy, we aim to make significant strides in the discovery of disease genes implicated in complex phenotypes, contributing to our understanding of the genetic basis of these conditions.
Identification of Evolutionary Patterns in Core Clock Proteins and Their Involvement in Sleep Disorders
Cell autonomous transcriptional and translational feedback loops (TTFLs) play a significant role in influencing the circadian rhythms of various organisms. While there are evolutionary and biochemical differences in these molecular pathways among different kingdoms of life, mammalian circadian rhythms remain consistent and intact. The term "TTFLs" refers to the interdependent expression of core clock genes, which are regulated by their own gene products, resulting in robust daily rhythms. Two interconnected feedback loops contribute to proper sleep and physiological cycles. The primary TTFL is governed by two activator proteins (CLOCK and BMAL1) and two repressor proteins (PER and CRY). CLOCK and BMAL1 form a heterodimeric transcriptional activator complex that binds to enhancer regions (E-boxes) in CRY (1/2) and PER (1/2/3) genes. The absence of these transcription factors completely blocks transcriptional initiation. Once the gene products of PER and CRY reach a certain concentration in the cytoplasm, they translocate to the nucleus and function as transcription factors. The translocation process and stability of these proteins rely on various casein kinases and phosphatases. The components of the main TTFL, including the kinases and phosphatases, are classified as core clock proteins as they directly control the daily oscillation of core clock genes. A secondary TTFL also contributes to the molecular pathway of circadian rhythm. This loop is regulated by nuclear receptors RORα, REV-ERBα (also known as NR1D1), and REV-ERBβ (also known as NR1D2). Their expression is also controlled by the E-boxes through the heterodimeric binding of CLOCK and BMAL1. Once the gene products of RORα and REV-ERBα translocate, they interact with REV response element sequences to activate and repress BMAL1 transcription, respectively. Loss of either of these feedback loops would disrupt the circadian rhythm. In turn, these two TTFLs ensure the daily rhythmicity of the core clock genes and clock-controlled genes. Core clock proteins not only act as transcription factors for their own expression but also regulate the expression of many essential genes through their enhancer-box sequences. Moreover, variations in core clock genes that alter their protein sequences can lead to various phenotypic differences, as these genes are involved in essential pathways throughout the body. Previous studies have demonstrated the pathogenicity of an exon deletion in the CRY1 tail, resulting in Attention Deficit and Hyperactivity Disorder as well as sleep disorders. Cryptochromes play a critical role not only in the sleep-wake cycle but also in DNA repair, cell cycle regulation, and cellular metabolism. Identifying disease-causing variants in core clock genes is essential for understanding the behavioral activities that arise from molecular mechanisms in living organisms. Core clock genes influence a wide range of behavioral and physiological processes. Mammalian species exhibit distinct transcription-translation feedback loops that oscillate approximately every 24 hours, entraining peripheral tissues to the dark-light cycle. Consequently, different species exhibit varied activity patterns in accordance with their daily actions. Over the course of evolution, phenotypic traits are susceptible to divergence due to subtle changes in protein sequences. Therefore, this study aims to assess specific residues of core clock proteins to determine whether there are significant amino acid sites that may be responsible for diurnal or non-diurnal behavior from an evolutionary perspective. A specific residue of a core clock protein that exhibits a non-random distribution of amino acids exclusively within one of the species groups (diurnal or non-diurnal) will be considered an important site for further data analysis, provided it is statistically significant. Additionally, the association of these residues with sleep disorders will be investigated using a reverse phenotyping approach. Overall, this study will pioneer research on the phenotypic impact of core clock proteins and their direct relationship to behavioral traits, within an evolutionary framework. The analysis of cohort data will provide insights into genomic medicine as well.
Efficient and Automatic Prioritization of Pathogenic Mutations
Analysis of NGS data sets presents significant challenges, requiring a systematic and intelligent approach to efficiently process the data. While efforts have been made to standardize basic data processing steps, the annotation, filtration, and prioritization stages involve numerous parameters, leaving researchers with substantial work when working with these pipelines. Therefore, innovative approaches are necessary to develop efficient and automatic prioritization of pathogenic mutations in NGS studies. Over the past decade, NGS has presented researchers with various challenges. Sample quality, library preparation, and read size are critical areas where problems can arise. Additionally, the sheer volume of data being generated poses a challenge, along with data analysis tasks such as sequence alignment, mapping multi-reads, identifying redundant sequences, and detecting genotypes and SNPs/indels. The primary goal of these challenges is to accurately identify variants. However, the long-term challenge in disease gene identification studies lies in prioritizing specific variants that have a causal impact on the disease. Among the millions of variants in each human genome, the majority are rare and do not have a significant effect on the phenotype. While extensive efforts have been made to identify pathogenic mutations using genome data, current methods still struggle to efficiently prioritize true pathogenic mutations in patients. To address this challenge, it is possible to cluster mutations by disease groups using extensive annotations. My aim is to develop an analysis pipeline that can efficiently and automatically prioritize pathogenic mutations in patients' genomes. This will involve considering the disease type, training the data based on extensive annotations at the variant-, gene-, and pathway-levels, and applying statistical genomics to training sets and high-quality genetic variants. Typical basic steps in an NGS analysis and discovery pipeline should include: (i) data quality check and filtering procedures, (ii) alignment of sequences to the reference genome, (iii) variant calling and annotation, and (iv) statistical evaluation of gene clusters, networks, or regulatory circuits. The tools embedded within high-throughput analysis pipelines must address two critical factors: (1) selecting the right tools from the many available options, and (2) ensuring the quality and sustainability of tool implementation. Given the scale of big data, it is crucial to have a fast and accurate workflow, particularly when computational resources are limited. The tools must efficiently utilize available CPU power, employing optimized parallelization and data distribution across multiple threads to significantly reduce processing time. Controlling the workflow with a script running in a Linux shell provides ease of modification. It is important to note that the causality of variation and disease pathology cannot be modeled solely using statistical variables. Machine learning methods are necessary when analyzing large and complex data sets. In my research, I aim to apply machine learning techniques for quality control and variant prioritization using OMICS data, such as transcriptomics (RNA-Seq), epigenomics (ChIP-Seq), glycomics, lipidomics, microbiomics, and phenomics (OMIM, Clinvar, HGMD, MGI). Furthermore, phenome-wide, pathway-wide, and trans-ome-wide associations can be used to connect phenotypes with omics networks and identify clinically relevant disease biomarkers.
Description of the Genomic Structure of the Turkish Population
Turkey's unique geographical location at the crossroads of Africa, Asia, and Europe has made it a crucial hub for human migrations throughout history, giving rise to ancient civilizations and a rich genetic diversity. Investigating the fine-scale genetic structure of the Turkish population provides valuable insights into the historical events that shaped the genetic makeup of both Middle Eastern and European populations. This understanding is crucial for advancing human genetic research, particularly in the context of Mendelian and complex diseases. The field of medical genetics in Turkey is closely intertwined with the region's rich migration history, encompassing the Middle East, the Balkans, and the Mediterranean basin. This region, which is currently home to approximately 10% of the world's population, has served as a central hub for population admixture and human migration. In a notable study published in September 2016 in Nature Genetics, researchers explored the genomic landscape of the region, offering a comprehensive view of genetic variation for enhanced disease-associated gene discovery and comparisons with global ancestral populations. The study involved research groups from various countries, including Algeria, Egypt, France, Morocco, Iran, Iraq, Israel, Jordan, Lebanon, Libya, Pakistan, Qatar, Saudi Arabia, Syria, Tunisia, Turkey, the United Arab Emirates, and the United States. Whole-exome data was collected and compared with data from the 1000 Genomes Project Populations, revealing distinct clusters of European and Asian populations, as well as high levels of divergence among Middle Eastern regions. Notably, the Turkish Peninsula and Syrian Desert populations showed higher levels of European admixture. Consanguineous populations, such as those found in the Middle East, have played a significant role in facilitating the identification of Mendelian disease genes. The region has the highest levels of consanguinity in the world, particularly on the southern and eastern rims of the Mediterranean basin, the Middle East, Mesopotamia, the Gulf, and the Indian subcontinent to southeast Asia. The preference for consanguineous marriages in these regions is rooted in historical and contemporary cultural factors, including the maintenance of family structure and property, financial advantages related to dowry, improved relations with in-laws, and perceived stability of intrafamilial marriages. Although the rates of consanguineous marriages vary across regions and countries, first cousin marriages are estimated to be around 25% on average. Moreover, when considering multiple layers of consanguinity resulting from endogamy, the levels of homozygosity and inbreeding coefficients increase, leading to a higher incidence of recessive diseases. Since the 1980s, the genomic resources of the Mediterranean basin and the Middle East have made significant contributions to global efforts focused on identifying disease-associated genes. Researchers have successfully identified genes associated with neurodevelopmental disorders and rare diseases in Mediterranean and Middle Eastern families. Furthermore, the presence of excellent clinical medicine services and functional biobanks, funded by regional and EU resources, has further facilitated medical genetics research. Today, every major medical school in Turkey has a medical genetics department, and several referral centers offer exome sequencing for research and diagnostic purposes. The genetic landscape of the Turkish population has been a subject of intensive investigation, building upon the previous findings. A collaborative research effort involving the Greater Middle East (GME) team aimed to delve deeper into the genetic structure of the Turkish population. The study utilized a comprehensive genome bank consisting of 5,400 Turkish individuals, including 830 individuals with genome sequencing data. Through sophisticated computational biology, bioinformatics, and statistical genomics approaches, the research aimed to provide a comprehensive understanding of the genetic makeup of the Turkish population. The study revealed extensive admixture between Turkish and European populations, shedding light on the historical interactions that have shaped the genetic diversity of modern-day Anatolian and European populations. Additionally, the study highlighted a high prevalence of consanguineous marriages in Turkey, leading to increased levels of inbreeding and longer runs of homozygosity. These observations underscore the significance of the Turkish population in identifying recessive disease genes. Applying this database to unsolved recessive conditions reduced the number of potential disease-causing variants by a factor of seven. These results not only reveal the variegated genetic architecture of the Turkish population but also support future discoveries in human genetics, particularly in the fields of Mendelian and population genetics.
CAMRQ Syndrome
Our research group has made significant discoveries related to cerebellar ataxia, mental retardation, and dysequilibrium syndrome (CAMRQ, OMIM: 224050). Through the use of various genetic mapping techniques, we identified three genes that are causally associated with this syndrome. The first gene, VLDLR (very low-density lipoprotein receptor), was discovered through linkage analysis and genetic mapping. The second gene, WDR81 (WD repeat domain 81), and the third gene, ATP8A2 (P4-type transmembrane protein ATPase, aminophospholipid transporter, class I, type 8A, member 2), were identified through homozygosity mapping and whole exome sequencing. To further investigate these findings, our team traveled to different regions of Turkey to visit families affected by CAMRQ. This allowed us to gather valuable data and insights directly from the individuals and their communities. Our research efforts garnered significant attention from international media outlets and prestigious scientific journals, including Nature and Science. One notable aspect of our research was the discovery of individuals with quadrupedalism, which was documented by the BBC in 2006. This finding shed light on an important event in human evolution, suggesting that a mutated gene may play a role in the ability to walk on all fours. The implications of these mutations and whether these individuals represent a regression to an evolutionary stage prior to upright walking sparked considerable debate. Our groundbreaking research has been published in high-impact journals such as PNAS, Genome Research, and EJHG. Additionally, we have presented our findings at prestigious national and international conferences through conference papers, posters, and oral presentations. Overall, our research has significantly contributed to the understanding of CAMRQ and its genetic causes. The attention and recognition received from the scientific community and media underscore the importance and impact of our findings in unraveling the complexities of human evolution and genetic disorders.
Funding
The lab's research has been funded by the The Scientific and Technological Research Council of Türkiye (TÜBİTAK), Bezmialem Vakıf University (Startup), Transatlantic Networks of Excellence (Brown Fat and Cardiovascular Health Network), and HHMI (Rockefeller University Center for Clinical and Translational Science (RUCCTS).