I started by working with plants and plastid genomes. I found the coevolutionary dynamics between nuclear and plastid genomes very interesting during my work with plastome structure variations and plant phylogeny.

After joined nyu, I continued to explore more about host-symbiosis interactions in the context of human & microbes.

Estimating disease similarity with metagenomic data of the human gut

This work was inspired by the observation that individuals with autism spectrum disorder experience higher chance of gastrointesinal disturbances, such as constipation, diarrhea or abdominal pain. Besides, assessing disease similarity is an essential step preceding disease-based approach for drug repositioning. Our study provides a modest first step in underscoring the potential of integrating microbiome insights into the disease similarity assessment. Recent microbiome research has mainly focused on analyzing individual disease to understand its unique characteristics, which by design excludes comorbidities individuals. We analyzed shotgun metagenomic data from existing studies and identified previously unknown similarities between diseases. Our pipeline represents an initial effort that utilize both interpretable machine learning and differential abundance analysis to assess microbial similarity between diseases.

image
The overall design and data analysis workflow.

Profiling of the disruption of the gut-brain axis in autism spectrum disorder

Gut microbiome has been considered as the second genome in our body due to its significant impact on our health and well-being. Autism spectrum disorder (ASD) is heterogeneous neurological condition that has been implicated to be associated with disruptions in the gut-brain axis (GBA) although with limited reproducibility across studies.

I worked with Dr. James Morton, Dr. Gaspar Taroncher-Oldenburg, and Dr. Richard Bonneau on this collaborative project. This study proposed a Bayesian differential ranking algorithm based framework to leverage multi-omic datasets and investigated how the GBA affects ASD. Within this project, I led the data processing for shotgun metagenomics samples and RNA expression data. We found that microbial profiles are predictive of ASD except in the sibling-matched cohorts. I also did co-cooccurrence analysis to identify candidate viral-microbe interactions. Our results found that Prevotella copri and Bacteroides fragilis both co-occurred with phages enriched in children with ASD or in neurotypical children.

image
Held-out gradient boosting ASD classifier performance measured by AUROC.

Plastome structure variations of Malpighiales

Plastome plays essential roles in plant biology and functionality. Plastomes of heterotrophic plants are generally highly rearranged, while plastomes of autotrophic angiosperms are relatively conserved. I worked with Dr. Tingshuang Yi and Dr. Jianjun Jin at Kunming Institute of Botany, Chinese Academy of Sciences, and investigated structural variations of plastomes in autotrophic angiosperm. We used genome skimming data of species from the order Malpighiales. The Malpighiales is a diverse order comprise one of the largest orders of flowering plants. Some examples of well-known species from this order include cassava, willows, passionfruit, mangosteen, and poplars. We found novel plastid genomic rearrangement events in families including Hypericaceae, Podostemaceae,Lophopyxidaceae, Putranjivaceae, Caryocaraceae, and Euphroniaceae.

assembly graph
The assembly graph of Caryocar glabrum viewed in Bandage. Two paths are supported by reads mapping: A B C D E -C F -D -C -B; -A B C D E -C F -D -C -B. These two paths represent the isomers maintained by the flip-flop recombination mediated by the Inverted Repeats.