EMERGING RESEARCH SCHOLARS AI PhD MENTORS
Karina A Alvina
The Alviña lab is interested in investigating neural mechanisms altered by environmental factors such as stress, dietary habits, and exercise. We are also interested in uncovering how these mechanisms can sometimes lead to unhealthy cognitive aging and neurodegenerative disorders, and social behavior disruption such as that observed in autism spectrum disorder.
Abbas Babajani-Feremi Phd
Dr. Babajani’s lab is dedicated to advancing the integration of neuroimaging and electrophysiological modalities, such as magnetoencephalography (MEG), intracranial electroencephalography (EEG), and functional MRI (fMRI), with cutting-edge AI techniques, including deep learning algorithms. We focus on developing AI-driven methodologies to better understand and diagnose neurological disorders, particularly epilepsy and neurodegenerative diseases such as Alzheimer’s Disease (AD), Lewy Body Dementia (LBD), and Parkinson’s Disease (PD). By leveraging multi-modal neuroimaging and electrophysiological data, we aim to uncover novel biomarkers and neural patterns that can enhance diagnostic accuracy, guide therapeutic interventions, and deepen our understanding of disease mechanisms. Additionally, we are exploring innovative AI approaches to decode speech from brain signals, with the goal of creating advanced brain-computer interface (BCI) systems. These systems hold significant potential for enhancing communication capabilities in individuals with speech impairments, particularly in patients with conditions such as amyotrophic lateral sclerosis (ALS) or those recovering from stroke, offering new pathways for interaction and improving quality of life. We welcome ERS-AI PhD students with a background in AI and a passion for interdisciplinary research to join our team in pushing the boundaries of neuroscience and AI integration.
Tezcan Ozrazgat Baslanti Ph.D.
The ERS-AI PhD student will be working on research projects that are consistent with our long-term goal of implementation of artificial intelligence for autonomous phenotyping and communication of patient’s condition in a fair, and reproducible manner based on multimodal data. The PhD student will be working on multiple projects to provide them with opportunities to gain experience with different types of data and using data from different domains such as nephrology and critical care utilizing different type of techniques.
Sara N Burke
Higher cognitive functions that decline in old age and the early stages of AD, such as memory and executive functions, are supported by neural networks distributed across the medial temporal lobe (MTL) and prefrontal cortex (PFC). Critically, these structures are among the earliest to accumulate pathology in AD, of which aging is the single greatest risk factor. While the precise mechanisms that render the aged brain vulnerable to neurodegeneration remain to be determined, it is known that aging is associated with a host of regionally specific neurobiological alterations within the PFC and MTL that do not correlate. This fact presents a major challenge for the development of effective therapeutics because higher cognition is supported by networks distributed across these vulnerable areas. Thus, targeted interventions that restore function in one brain region may neglect or exacerbate dysfunction in another, hindering the restoration of normal cognition. As such, interventions that target the optimization of “cognitive networks” rather than discrete brain regions may be more effective for improving behavioral outcomes in older adults. In order to do this, we need new technologies that can link cellular changes at the microscopic level to global changes in macroscopic brain networks that cooperate to support higher cognitive function. A current focus of my research program that implements artificial intelligence is developing methods that can link cellular changes to global brain connectivity through machine learning that can co-register different imaging platforms and classify cellular activity.
Gemma Casadesus
Neuroinflammation that results from chronic inflammatory states is recognized as a key driver of Alzheimer’s disease (AD). The exposome, which includes poor diet, is regarded as an important source of chronic inflammation and late-onset AD risk. In fact, obesity-induced type 2 diabetes (T2D), is the highest risk factor for late-onset AD behind aging. However, the mechanistic processes that link chronic metabolic stress to neuroinflammation and AD progression are not fully known. Addressing this relationship using a targeted hypothesis-based approach is difficult given the multifactorial nature of potential interactions (multiple disease factors, time, and sex). This is further confounded by changing and/or incomplete experimental design factors across animal studies. Therefore, in our lab we are trying to tackle these challenges by studying the relationship between METS/T2D progression and AD in a more global genome-wide manner using standard bioinformatics tools to comb through RNAseq pathways and novel gene targets and developing machine learning approaches, including analyses of sparse matrices, to combine RNAseq data with AD and METS-related neurodegenerative and metabolic changes, in both sexes, and over time that allows for more complex comparisons to identify associations across a group of variables and potentially novel targets. We also use these tools to create snapshot representations of complicated mechanisms such as RNA editing that have been observed in AD but been unable to connect to disease pathogenesis in a tangible way.
Paramita Chakrabarty PhD
Dr. Chakrabarty is a neuroscientist interested in characterizing the etiology of Alzheimer’s disease and Parkinson’s disease. Towards this broad aim, her lab has generated various rodent models that are analyzed using transcriptomics, proteomics and MRI imaging. Associations between these molecular-omic profiles with neuropathology and memory impairment could lead to novel target discovery and validating biologically relevant pathways in rodent model of Alzheimer’s disease. In the future, such data would be helpful in predicting disease onset, disease course, and response to targeted therapies in elderly patients who are vulnerable to Alzheimer’s dementia.
Erica A Dale
More than half of the~275,000 global, annual, traumatic spinal cord injuries (SCI) occur at the cervical level, leading to paralysis and respiratory compromise or failure. Approximately 20-30% of cervical SCI(cSCI) patients will require ventilator support for which there are very few therapeutic options for recovery. Indeed, the leading cause of morbidity and mortality after cSCI is respiratory compromise. Even in cases where mechanical ventilation is not required, many people with SCI are unable to cough to clear their airways and thus die of pneumonia. Acute epidural electrical stimulation has emerged as a strategy to restore vital motor, sensory, and autonomic functions in both experimental and clinical settings after SCI. For example, after spinal injury, epidural stimulation improves cardiovascular, bladder and trunk stability via neuromodulation of spinal neural networks. And more recently, we have shown modest success in eliciting respiratory neuroplasticity in the spinal neural network controlling breathing after short-term epidural stimulation in rats. Though limited underlying mechanisms have been proposed, to date little is known how epidural stimulation elicits this motor function at the neuronal level. Even less is known about the capacity for epidural stimulation to promote long-lasting recovery and device-independence nor by which stimulation paradigms this could occur. Thus, it is imperative to functionally map the stimulation parameter space in order to characterize and optimize recovery.
Holger Russ
The emphasis of the Russ lab is on developing innovative regenerative medicine approaches with a focus on understanding the underlying molecular and cellular mechanisms resulting in human autoimmunity, with an emphasize on type 1 diabetes (T1D). His lab employs state of the art human pluripotent stem cell technology and primary human cell/tissue culture with genome engineering approaches to model and potentially treat patients. The Russ lab has successfully worked on different aspects of translational research, which led to several original and important contributions to the fields of pancreatic beta-, thymus- and pluripotent stem cell- biology. Dr.Russ`s long term goal is to understand why T1D develops and develop novel intervention and treatment modalities and train and enable the next generation of scientists focused on regenerative medicine strategies. Current opportunities include leveraging Machine learning and Computer Vision to investigate thymic cell heterogeneity and T cell receptor repertoire diversity in individuals affected by disease compared to healthy controls.
Habibeh Khoshbouei
Landmark scientific discoveries support the neural population doctrine, where the neuronal population, not the single neuron, are the essential unit of computation in many brain regions. New computing technologies have enabled neuroscience research at the level of the neural population. The long-term goal of our research is to apply artificial intelligence to the analysis of dopamine neural populations to decode neural dynamics. We recently employed live-cell calcium imaging in the midbrain slices of DAT-cre/loxP-GCaMP6f (DAT-GCaMP6f) mice of either sex and computational analyses to show that functional network connectivity greatly differs between substantia nigra pars compacta (SNc) and ventral tegmental area (VTA) regions. Using complex network analysis, we found a higher incidence of hyperconnected (i.e. hub-like) neurons in the VTA than the SNc. The lower number of hyperconnected neurons in the SNc is consistent with the interpretation of a lower dopamine neuronal network resilience to the SNc’s neuronal loss-implicated in neurological disorders. Our ongoing studies expand this work to in vivo studies in freely moving DAT-GCaMP6f mice of either sex via live cell calcim imaging through microendoscopic lenses. This approach enables imaging of previously inaccessible dopamine neuronal populations deep within the midbrain of freely moving animals exposed to saline or methamphetamine.
Damon G Lamb Ph.D.
Dr. Lamb is an assistant professor of Psychiatry at the University of Florida and a Health Research Scientist at the Brain Rehabilitation Research Center at the Malcom Randall VAMC in Gainesville, FL. He is interested in the complex interaction of autonomics, emotional function and cognition. His undergraduate training was at the University of Maryland in Computer Engineering and Mathematics. He then earned a Master of Science in Computer Science from the University of Chicago and his PhD in Neuroscience from Emory University, where he focused on biophysical computational modeling of autonomic neuronal networks. He conducts clinical-translational research and education in human neuroimaging of psychiatric and related disorders.
Dominick Lemas Ph.D.
Breastfeeding is associated with positive maternal-child health outcomes that includes reducing transmission of non-HIV infection such as COVID-19 from mother to baby. In the United States, rates of breastfeeding differ significantly depending on race and income status of mother. Mothers with lower rates of breastfeeding tend to be young, low-income, African American, unmarried, less educated, participants in the Supplemental Nutrition Program for Women, Infants, and Children (WIC),overweight or obese before pregnancy, and more likely to report their pregnancy was unintended. Given the complexity in breastfeeding disparities, there is an urgent need to develop breastfeeding interventions that include vulnerable and hard-to-reach populations. Electronic health records(EHRs) represent a unique data source that contains longitudinal clinical data that is linked to non-clinical data sources such as residential location, race, socio-economic status and other social determinants of health (SDoH).The goal of this project is to leverage mom-baby linked EHR to estimate geospatial patterns in breastfeeding and characterize the SDoH that impact breastfeeding outcomes invulnerable and hard-to-reach populations.
Mei Liu
My long-term research goal is to develop innovative Artificial Intelligence/Machine Learning (AI/ML) methods to support Predictive, Preventive, Personalized, and Participatory (P4) medicine. The ERS-AI PhD student will be working on collaborative research projects that address challenges in EHR-data analysis such as federated learning, transfer learning, and personalized learning for more accurate and robust disease prediction and risk factor identification. Developing AI/ML algorithms that will improve model reproducibility, interpretability, transportability, and fairness will be a central focus of the research projects. The first project to which the student will be recruited will involve model development for acute kidney injury (AKI) prediction, prognosis, and sub-phenotyping using multi-institutional electronic health records (EHRs).
Mamoun Mardini Ph.D.
Our proposed project leverages the utility of artificial intelligence (AI) to personalize management of patients with cardiogenic shock. Cardiogenic shock is a very serious clinical situation that occurs when a patient’s heart cannot pump sufficient blood and oxygen, which can lead to failure of other organs such as the lung, brain, kidney, and liver. This is a medical emergency requiring an immediate treatment. The most effective therapy for this critically ill cohort is heart transplantation (HTx). However, HTx is a highly complex process and a major outcome determinant that requires interactions among multiple advanced specialties over a considerable amount of time (often weeks to months) to identify an appropriate donor and a mechanical circulatory support (MCS) device to “bridge” the patient over this time period to achieve medical stability until a suitable donor heart is identified. Selection of the appropriate “bridging strategy” to HTx is one of the main challenges. The current clinical practice relies on subjective patient and provider experiences with few broad MCS principles. This state of evidence scarcity results in bias, care disparity, heuristics, decision fatigue, and is recognized as an outcome limitation. Our central hypothesisis that using AI to develop a data-driven precision medicine approach for this complex and heterogeneous patient cohort can enhance the clinical practice and result in better outcomes. Additionally, building a graphical platform that shows an interactive presentation of each MCS platform can promote shared decision-making between patients and clinicians. To the best of knowledge, our proposal is the first effort to utilize advanced computational models and a data-driven approach to guide heart failure and replacement therapies.
Preclinical Assays of Hippocampal-Prefrontal Cortical Circuit Engagement for Application in Therapeutic Development
The high failure rate of translating discovery science to positive clinical outcomes in the treatment of psychiatric diseases demonstrates the necessity of improving the efficiency and rigor of the therapeutic development pipeline. To this end, the critical importance of advancing the discovery of in vivo physiological and behavioral measures of the engagement of specific circuits for normal cognitive function has been acknowledged across funding initiatives. The hippocampus (HPC)-prefrontal cortical (PFC) circuit is critical for affective processing as well as higher cognitive functions and vulnerable in a number of mental health disorders. Although disrupted functional connectivity in the HPC-PFC circuit is a common feature of anxiety, bipolar disorder, schizophrenia, and autism, how local cellular interactions within this circuit manifest as large-scale temporal coordination to support higher cognitive functions remains unknown. Addressing this fundamental gap in our knowledge will establish a foundation for using circuit-based models for therapeutic target discovery and screening tools of novel drug efficacy. The long-term goal of this proposal, in line with the Funding Opportunity Announcement (PAR-19-289, is to enhance the therapeutic development pipeline for mental illness treatment by optimizing, evaluating, and mechanistically testing neurophysiological and behavioral measures of circuit engagement. The primary objective of this proposal, which is the first step towards achieving our goal, is to relate behavioral performance on the rodent analog on the Paired Associates Learning task (PAL), part of human Cambridge Neuropsychological Test Automated Batteries [CANTAB] assessment, and surface EEG recordings to invasive neurophysiological measures of neural coordination in the HPC-PFC circuit. Through an innovative series of experiments that integrate in vivo neurophysiological local field potential (LFP) recordings, circuit manipulation, surface EEG, and behavior, we will optimize, evaluate and mechanistically test novel noninvasive biomarkers of HPC-PFC circuit engagement by pursuing the following specific aims: 1) Optimize behavioral and non-invasive EEG biomarkers for inferring HPC-PFC circuit engagement and temporal coordination, 2) Evaluation of behavioral and non-invasive EEG biomarkers for determining HPC-PFC circuit engagement through pharmacological manipulation, and 3) Mechanistically test HPC-PFC projections as a driver of surface EEG organization. The proposed research is innovative because it integrates a clinically relevant behavioral task, designed to be analogous to human cognitive assessments, with surface EEG measures that translate across mammals. This will enable the optimization, evaluation, and testing of novel and translatable measures of HPC-PFC circuit engagement in the context of higher cognition and global neural organization. The significance of this contribution will be to provide novel diagnostic tools that can be used to enhance the therapeutic development pipeline for treating mental illness.
The student selected for this project will work on interfacing predictive algorithms, leveraging Al tools and techniques, to anticipate intracortical activity based on cortical EEG and behavior. Through this, the student would have made advancements that are directly translatable to the clinic.
Matthew E Merritt Ph.D.
Dr. Merritt’s project uses AI approaches, primarily neural networks, for automated quantitation and denoising of nuclear magnetic resonance (NMR) data. AI has well known abilities for performing image recognition, and by its very nature, a neural network can evaluate a target image almost instantaneously once it is trained. The speed and robustness of neural network approaches suggest that its application to the spectra denoising/fitting and quantitation problem in NMR could be very profitable. Initial results using a deep learning neural network produced an increase in signal-to-noise ratio (SNR) of 200to 1for13C NMR spectra(1).Using traditional Fourier transformation methods, the SNR is proportional to (square root of # of scans) the which means that it takes 4 times the number of scans to give twice the SNR. A gain in SNR of 200 is equivalent to running the same sample 40000 times longer. Given that most13C spectra acquired in my lab take at least 6 hours to acquire, the time savings possible with this approach are truly transformational.
Nancy Padilla-Coreano
The Padilla-Coreano Lab studies how the brain facilitates social behaviors using tools at the intersection of neuroscience and Artificial Intelligence. Specifically, the lab studies the neural mechanisms of social competence, that is how we adjust our social behavior based on information, using mouse models. Two key elements of this research goal are: being able to measure social behaviors and understanding the relationship between behavior and brain activity. The lab uses Artificial Intelligence to tackle both key elements. The PI is a co-developer of a recent Deep Learning tool (AlphaTracker) that does pose estimation for multiple animal tracking (Padilla-Coreano et al., 2020 preprint). Furthermore, this lab has active collaborations with machine learning scientists at UF to create new tools to analyze behavior incorporating temporal information and structure for unbiased automatic behavior classification. Furthermore, the lab is focused on studying neural function at the network level. By recording neural activity of multiple brain regions simultaneously we can identify what circuits and sequences of circuits lead to important social behaviors. Given the complexity of the data (both neural and behavioral),Artificial Intelligence helps identify the causal relationship between neural activity and behavior. The PI has applied similar approaches to predict behaviors and conditions from neural activity and the lab will expand this approach to consider neural activity from a whole network.
Paola Giusti-Rodriguez Ph.D.
The Giusti-Rodriguez Lab works at the intersection of neuroscience, human genetics, and functional genomics, and aims to maximize the tools and techniques of these fields to advance our understanding of the genetics of neuropsychiatric disorders. AI in genomics is growing rapidly, and deep learning methods have been applied to the analysis of diverse datatypes, including DNA and RNA-sequencing, methylation, DNA accessibility and chromatin, and 3Dgenome organization. The Giusti-Rodríguez lab will generate diverse data types using mouse, postmortem human brain tissue, iPSCs, etc., and has access to many external datasets through existing collaborations and or publicly available datasets. The Giusti-Rodríguez lab will apply machine learning and artificial intelligence approaches to multiomics datatypes relevant to understanding specific susceptibilities to psychiatric disorders and to parse out genetic underpinnings in individuals from diverse populations and complex admixture.
Pinaki Sarder
Dr. Sarder’s lab develops novel computational methods to study and understand tissue micro-anatomy using multi-modal whole-slide microscopy images as well as associated molecular omics data. Our method facilitates decision making in a clinical work-flow (both for diagnosis and predicting progression of diseases), and also allows studying fundamental systems biology of disease dynamics. Currently, our major focus involves studying chronic kidney diseases as well as ‘reference’ organ systems across scale.
My laboratory uses rodents to investigate behavioral and neural mechanisms of cognition, motivation, and decision making in the context of psychiatric disorders and advanced age. Much of our research over the past 15 years has focused on developing rat models that recapitulate features of human behavior that predict vulnerability to substance use or are disrupted in advanced age, such as executive functions and cost/benefit decision making. In the next phase of our research, we are focused on collecting large-scale datasets of behavioral/cognitive and neurobiological variables from which we can extract factors that might provide mechanistic insight into vulnerabilities in these conditions. As one example, we have recently collected data from a large number of young adult and aged rats across four tasks that assess different elements of cognition. We are in the process of obtaining multiple measures from each task (e.g., performance accuracy, speed, persistence, variability), and will use the data to attempt to derive overarching factors that predict broad areas of age-related cognitive decline. The goal is to link these factors with neurobiological measures (through transcriptomic and/or proteomic approaches), to identify targets for remediating cognitive impairments. In addition, by isolating variables that predict cognitive trajectories, we will be able to focus our research on therapeutic approaches toward targets that are most relevant to cognitive impairment. Additional projects in our lab involve computer vision approaches to analysis of behavioral data, and computational models of cost/benefit decision making.
Benjamin Shickel PhD
The ERS-AI PhD student will be recruited into projects exploring the application of multi-modal foundation models fora variety of clinical applications and patient health modeling. Briefly, foundation models comprise a recent class of large-scale machine learning frameworks based on the Transformer model architecture that are designed to formulate scalable data-driven representations from voluminous data, merging AI principles of supervised, unsupervised, and self-supervised learning techniques; such data representations can be applied to several downstream AI tasks. Currently popularized by innovations in natural language processing (NLP),the ERS-AI PhD student will research the translation of these discoveries into the healthcare domain by developing foundation models of patient health that integrate granular and temporal health data from multiple modalities (e.g. continuous and discrete electronic health record measurements, clinical notes, radiography, omics data) for unified health representations that can be applied to downstream clinical prediction tasks (e.g. sepsis, acute kidney injury, mortality).Methods to measure and improve explainability, fairness, and causality of foundation models will be a large focus of the research projects. The first project to which the student will be recruited will involve the development of a Transformer foundation model for dynamic monitoring of acute kidney injury (AKI).
Nikhil Urs Ph.D.
My research interests broadly cover dopamine neurotransmission in neurological and psychiatric disorders. My primary research focus is to learn more about the dopamine system by deciphering a) signaling pathways involved in DA neurotransmission, b) functional dopamine neuronal circuits and c) how these integrate and manifest behaviorally in an organism. Using this integrated approach will in parallel allow us to fine-tune dopamine neurotransmission and devise novel drug- and gene-based therapeutic approaches to treat dopamine-related disorders such as PD and schizophrenia. One of the main projects in the lab studies cortical dopamine circuits in motivated behavior and how these circuits regulate striatal dopamine. Our goal is to manipulate these circuits and assess their effects on behavior. We will simultaneously measure calcium or dopamine dynamics in the brain during behavior using fiber photometry using fluorescent biosensors GCaMP and dlight. The photometry data needs to be extracted from the RZ10 photometry unit using python and Matlab, and requires coding knowledge. This is essential since we need to extract fluorescent signal data during particular behavioral events (cue, approach, reward, avoidance etc) over time i.e a single training session or multiple days of training.
In addition, we also will study effects of these cortical circuits on motor learning and behavior for which we will use DeepLabCut (http://www.mackenziemathislab.org/deeplabcut) an opensource software that uses machine learning to track fine and gross motor movements in rodents.
ERS-AI scholars will be trained by us to learn and use “python/MATLAB” and “DeepLabCut” as part of their research projects.
Eric Wang
The brain is a complex network of multiple cell types, each with its own transcriptome and proteome. Modern technologies facilitate high throughput, precise measurement of transcriptomes in particular; this is now commonly performed using bulk tissue, as well as at the single cell and subcellular levels. These techniques are useful for studies of the brain, given the complex morphology of cell types such as neurons, whose gene products must be transported from nuclei to synapses, sometimes millimeters away. Many neurological and neurodegenerative diseases are caused by mutations in genes that cause downstream changes to RNA metabolism and intracellular transport. One of these is myotonic dystrophy, a repeat expansion disease with symptoms manifesting in muscle, heart, and brain tissues. Some of the symptoms potentially mediated by brain dysfunction include profound hypersomnolence, altered regulation of circadian rhythms, executive dysfunction, problems with learning/memory, and white matter atrophy. We currently do not understand which cell types and brain regions are affected in this disease, and are studying post-mortem samples from myotonic dystrophy patients to better understand disease pathogenesis. We seek to profile transcriptomes and proteomes using bulk tissue, transcriptomes at the single cell level (with a focus on RNA splicing isoforms), and transcriptomes with spatial information. In addition, we seek to better understand the variability in somatic repeat expansion across brain regions and cell types, and will employ cutting edge optical mapping approaches coupled to transcriptome profiling approaches to obtain this information. All of these techniques and studies require extensive computational analyses. We routinely write custom code to analyze these datasets, and also leverage existing packages (e.g. Python and R). Artificial intelligence approaches such as Bayesian Inference, mixture models, and linear regression will be employed in this project, and deep learning approaches will also be applied when appropriate. Overall, these efforts will not only provide insights into myotonic dystrophy pathogenesis, but also inform studies of repeat expansion disease and brain diseases in general.
Yonghui Wu
In the last two decades, the introduction of targeted anticancer therapies has revolutionized the treatment of hematological malignancies such as multiple myeloma, chronic myeloid leukemia, and solid malignancies such as breast and renal carcinoma. Contemporary cancer therapy has led to a 23% reduction in cancer-related mortality rate and a rapid increase in cancer survivorship in the last 15 years. However, some devastating side effects of these treatments have also resulted in increased morbidity and mortality. For example, cardiotoxicity is one of the well-documented adverse events of cancer treatments resulting either from accelerated development of cardiovascular diseases in cancer patients or from the direct effects of the treatment on the structure and function of the heart. The goal of this project is to develop predictive models for the identification of cancer patients with a high risk of cardiotoxicity to prevent or minimize the risk of cardiotoxicity in cancer treatments.
Mingyi Xie
Gene expression, the flow of genetic information from DNA to messenger RNA (mRNA) to protein, involves delicate regulation by a group of small RNAs named microRNAs (miRNA). The development of high throughput technology of next-generation sequencing and the advancement in artificial intelligence provides new opportunities for miRNA target identification. We aim to develop an innovative machine learning framework to efficiently predict high-confidence miRNA-mRNA interaction pairs in cancer patients with contrastive convolutional neural networks based on the combination of heterogenous RNA-seq data (miRNA, mRNA and miRNA-mRNA hybrids). We will also anticipate profiling and validating the effect of discovered miRNA-mRNA pairs in patient samples to facilitate the hypothesis generation process for potential cancer therapeutics. Collectively, our efforts will result in rapid and accurate identification of high-quality miRNA-mRNA pairs with our proposed model, which would accelerate the process of elucidating the underlying mechanism of cancer progression and provide the basis for improving current therapeutic interventions. Additionally, apart from identifying the miRNA-mRNA pairs, our proposed framework also has the potential to be applied in other types of cancer to facilitate the development of cancer therapeutics.
Jie Xu Ph.D.
To develop machine learning methods for the identification of Alzheimer’s disease (AD) and its related dementias (ADRD) sub-phenotypes. Using electronic health records (EHRs) from patients diagnosed with AD/ADRD, we will retrospectively review their structured EHRs, clinical notes, and neuroimages and develop machine learning methods for connecting these data sources and computationally deriving AD/ADRD sub-phenotypes based on hierarchical clustering. Interfaces with Data Science/AI: Students are required to develop machine learning methods to connect different data modalities and develop AI methods to derive disease subtypes from large-scale health data.
Rui Yin
In this project, we will develop and validate research-grade computable phenotyping (CP) algorithms and tools, leveraging advanced natural language processing (NLP) methods, to accurately identify AD/ADRD drug repurposing study cohorts and then extract and standardize relevant patient characteristics (e.g., APOE) and outcomes (e.g., ADRD subtypes and severity) from RWD. The algorithms will be developed and internally validated at OneFlorida and externally validated with INSIGHT data. This work will address misclassification errors and incomplete information through CP and clinical NLP. Neither NLP nor CP is a novel method; nevertheless, there has been no systematic investigation for AD/ADRD drug repurposing. Our effort will be the first to make publicly available resources to support AD/ADD drug repurposing research using real-world data (RWD). With the CP/NLP pipeline, we can accurately identify study cohorts, extract drug exposures, outcomes, and other important confounders and potential effect modifiers, which enables more precise estimation of the treatment effects for the candidate repurposing drugs from RWD.
Sai Zhang
The research in my lab is focused on developing novel statistical and machine learning methods to decipher the genomic basis of human complex diseases (e.g., cardiovascular and neurodegenerative diseases) by reasoning over large-scale genetic, multiomic, and clinical datasets. Our efforts will contribute to a better understanding of disease biology/mechanisms, discovery of novel therapeutic targets, and development of new treatment, and eventually pave the way towards personalized medicine.
The genetic risk loci of many complex diseases are in noncoding genomic regions, which do not encode any proteins and are usually referred to as the “dark genome”. The molecular function of those regions remains largely unclear, while it plays a critical role in regulating cell development and function. In a prioritized project in my lab we will develop advanced machine learning, especially deep learning models, to decode the cell-typespecific functional impacts of noncoding genetic variants. We have three aims in the project.
Aim 1. Development of deep learning models to map scATAC-seq peaks from DNA sequences. My precious work on deep learning modeling of high-throughput sequencing data [1,2,3] and my access to singlecell multiome profiling of >1m cells from >40 human tissue across healthy and diseased contexts (ENCODE4) serve a solid foundation for this project. After model training, we will be able to comprehensively characterize the sequence-based features of cis-regulatory elements (CREs) specific to different cell types.
Aim 2. Perform in silico mutagenesis to identify cell-type-specific noncoding variants. By performing saturated in silico mutagenesis based on the trained deep learning model, we will quantify the effects of individual mutations on changing CREs in different cell types, which informs the mutation regulatory impacts. The genetic variants whose functions gain high cell-type specificity are of particular interest because it implicates strong associations with cell function and disease etiology.
Aim 3. Combine predicted mutation effect with genetic association to finemap disease risk loci and prioritize candidate disease genes. We recently developed a Bayesian model to effectively finemap disease risk loci by integrating epigenomic profiling with GWAS summary data [4,5]. Following this idea, we will develop a novel Bayesian model to integrate the cell-type-specific functional profiling with genetic data (we have access to multiple genetic databases including the UK Biobank, TOPMed and MVP) to infer disease loci and genes at a higher resolution. The cell heterogeneity of diseases will be also dissected based on heritability analysis.