#: Joint first authors; *: Joint corresponding authors.
2024
-
Ruohan Wang#, Yumin Zheng#, Zijian Zhang#, Kailu Song, Erxi Wu, Xiaopeng Zhu, Tao P. Wu* & Jun Ding*.
MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell. Nature Communications 15 (2024): 8798.
Abstract:
Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align multi-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.
link
-
Hao Li, Wan-Xing Xu, Jing-Cong Tan, Yue-Mei Hong, Jian He, Ben-Peng Zhao, Jin-An Zhou, Yu-Min Zheng, Ming Lei, Xiao-Qi Zheng, Jun Ding*, Ning-Ning Liu*, Jun-Jie Gao*, Chang-Qing Zhang*, Hui Wang*.
Single-cell multi-omics identify novel regulators required for osteoclastogenesis during ageing. iScience (2024).
Abstract:
Age-related osteoporosis manifests as a complex pathology that disrupts bone homeostasis and elevates fracture risk, yet the mechanisms facilitating age-related shifts in bone marrow macrophages/osteoclasts (BMMs/OCs) lineage are not fully understood. To decipher these mechanisms, we conducted an investigation into the determinants controlling BMMs/OCs differentiation. We performed single-cell multi-omics profiling on bone marrow samples from mice of different ages (1, 6, and 20 months) to gain a holistic understanding of cellular changes across time. Our analysis revealed that ageing significantly instigates OC differentiation. Importantly, we identified Cebpd as a vital gene for osteoclastogenesis and bone resorption during the ageing process. Counterbalancing the effects of Cebpd, we found Irf8, Sox4, and Klf4 to play crucial roles. By thoroughly examining the cellular dynamics underpinning bone ageing, our study unveils novel insights into the mechanisms of age-related osteoporosis and presents potential therapeutic targets for future exploration.
link
-
Yimin Fan, Yu Li, Jun Ding* & Yue Li*.
GFETM: Genome Foundation-Based Embedded Topic Model for scATAC-seq Modeling. International Conference on Research in Computational Molecular Biology. Cham: Springer Nature Switzerland, (2024): 314-319.
Abstract:
Single-cell Assay for Transposase-Accessible Chromatin with sequencing (scATAC-seq) has emerged as a powerful technique for investigating open chromatin landscapes at the single-cell level. Yet, scATAC-seq cell representation learning and its downstream tasks remain challenging due to the inherent high dimensional, sparse, and noisy properties of the data.
link
-
Yingfu Wu#, Zhenqi Shi#, Xiangfei Zhou#, Pengyu Zhang#, Xiuhui Yang#, Jun Ding* & Hao Wu*.
scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information. Commun Biol 7, 923 (2024).
Abstract:
The emergence of single-cell Hi-C (scHi-C) technology has provided unprecedented opportunities for investigating the intricate relationship between cell cycle phases and the three-dimensional (3D) structure of chromatin. However, accurately predicting cell cycle phases based on scHi-C data remains a formidable challenge. Here, we present scHiCyclePred, a prediction model that integrates multiple feature sets to leverage scHi-C data for predicting cell cycle phases. scHiCyclePred extracts 3D chromatin structure features by incorporating multi-scale interaction information. The comparative analysis illustrates that scHiCyclePred surpasses existing methods such as Nagano_method and CIRCLET across various metrics including accuracy (ACC), F1 score, Precision, Recall, and balanced accuracy (BACC). In addition, we evaluate scHiCyclePred against the previously published CIRCLET using the dataset of complex tissues (Liu_dataset). Experimental results reveal significant improvements with scHiCyclePred exhibiting improvements of 0.39, 0.52, 0.52, and 0.39 over the CIRCLET in terms of ACC, F1 score, Precision, and Recall metrics, respectively. Furthermore, we conduct analyses on three-dimensional chromatin dynamics and gene features during the cell cycle, providing a more comprehensive understanding of cell cycle dynamics through chromatin structure. scHiCyclePred not only offers insights into cell biology but also holds promise for catalyzing breakthroughs in disease research. Access scHiCyclePred on GitHub at https:// github.com/HaoWuLab-Bioinformatics/ scHiCyclePred.
link
-
Xiuhui Yang, Koren K. Mann, Hao Wu* & Jun Ding*.
scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration. Genome Biology 25.1 (2024): 1-34.
Abstract:
Single-cell multi-omics data reveal complex cellular states, providing significant insights into cellular dynamics and disease. Yet, integration of multi-omics data presents challenges. Some modalities have not reached the robustness or clarity of established transcriptomics. Coupled with data scarcity for less established modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross, a tool leveraging variational autoencoders, generative adversarial networks, and the mutual nearest neighbors (MNN) technique for modality alignment. By enabling single-cell cross-modal data generation, multi-omics data simulation, and in silico cellular perturbations, scCross enhances the utility of single-cell multi-omics studies.
link
-
Jingtao Wang, Gregory J. Fonseca & Jun Ding.
scSemiProfiler: Advancing large-scale single-cell studies through semi-profiling with deep generative models and active learning. Nature Communications 15.1 (2024): 5989.
Abstract:
Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases. Its prohibitive cost, however, hampers its application in expansive biomedical studies. Traditional cellular deconvolution approaches can infer cell type proportions from more affordable bulk sequencing data, yet they fall short in providing the detailed resolution required for single-cell-level analyses. To overcome this challenge, we introduce “scSemiProfiler”, an innovative computational framework that marries deep generative models with active learning strategies. This method adeptly infers single-cell profiles across large cohorts by fusing bulk sequencing data with targeted single-cell sequencing from a few rigorously chosen representatives. Extensive validation across heterogeneous datasets verifies the precision of our semi-profiling approach, aligning closely with true single-cell profiling data and empowering refined cellular analyses. Originally developed for extensive disease cohorts, “scSemiProfiler” is adaptable for broad applications. It provides a scalable, cost-effective solution for single-cell profiling, facilitating in-depth cellular investigation in various biological domains.
link
-
Kim A. Tran, Erwan Pernet, Mina Sadeghi, Jeffrey Downey, Julia Chronopoulos, Elizabeth Lapshina, Oscar Tsai, Eva Kaufmann, Jun Ding & Maziar Divangahi.
BCG immunization induces CX3CR1^hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity. Nature immunology 25.3 (2024): 418-431.
Abstract:
After a century of using the Bacillus Calmette–Guérin (BCG) vaccine, our understanding of its ability to provide protection against homologous (Mycobacterium tuberculosis) or heterologous (for example, influenza virus) infections remains limited. Here we show that systemic (intravenous) BCG vaccination provides significant protection against subsequent influenza A virus infection in mice. We further demonstrate that the BCG-mediated cross-protection against influenza A virus is largely due to the enrichment of conventional CD4+ effector CX3CR1hi memory αβ T cells in the circulation and lung parenchyma. Importantly, pulmonary CX3CR1hi T cells limit early viral infection in an antigen-independent manner via potent interferon-γ production, which subsequently enhances long-term antimicrobial activity of alveolar macrophages. These results offer insight into the unknown mechanism by which BCG has persistently displayed broad protection against non-tuberculosis infections via cross-talk between adaptive and innate memory responses.
link
2023
-
Yue Li, Gregory Fonseca & Jun Ding.
Multimodal Methods for Knowledge Discovery from Bulk and Single-Cell Multi-Omics Data . Machine Learning Methods for Multi-Omics Data Integration. 2023 July; 39–74
Abstract:
Multi-omics measurements (bulk and single-cell) are essential to depict the cellular states comprehensively and thus could derive a deep understanding of underlying mechanisms for cellular state changes in many biological processes. Therefore, computational models that integrate omics data are often indispensable for discovering novel effective diagnostics and therapeutics. In this chapter, we will give an overview of the multimodal methods for a variety of data analysis tasks (dimensionality reduction, clustering, gene regulatory network inference, and biomarker discovery), leading to the discovery of cell populations, gene regulatory networks, and biomarkers. We will also cover the characteristics associated with each method to provide users with practical guidance on how to choose appropriate methods for their specific application scenarios.
link
-
Basil J Petrof, Tom Podolsky, Salyan Bhattarai, Jiahui Tan & Jun Ding.
Trained immunity as a potential target for therapeutic immunomodulation in Duchenne muscular dystrophy. Frontiers in Immunology. 2023 June; 14
Abstract:
Dysregulated inflammation involving innate immune cells, particularly of the monocyte/macrophage lineage, is a key contributor to the pathogenesis of Duchenne muscular dystrophy (DMD). Trained immunity is an evolutionarily ancient protective mechanism against infection, in which epigenetic and metabolic alterations confer non-specific hyperresponsiveness of innate immune cells to various stimuli. Recent work in an animal model of DMD (mdx mice) has shown that macrophages exhibit cardinal features of trained immunity, including the presence of innate immune system “memory”. The latter is reflected by epigenetic changes and durable transmissibility of the trained phenotype to healthy non-dystrophic mice by bone marrow transplantation. Mechanistically, it is suggested that a Toll-like receptor (TLR) 4-regulated, memory-like capacity of innate immunity is induced at the level of the bone marrow by factors released from the damaged muscles, leading to exaggerated upregulation of both pro- and anti-inflammatory genes. Here we propose a conceptual framework for the involvement of trained immunity in DMD pathogenesis and its potential to serve as a new therapeutic target.
link
-
Zijian Zhang, Yiyang Wang, Xinluo Luo, Xuwen Li, Xiaomei Zhan, Yumin Zheng, Jun Ding & Tao Wu.
Abstract P1-13-02: The Aberrant Activity of Retrotransposon Elements Mediates the Chemo-tolerant Persister Cells Relapse in TNBC. Cancer Research. 2023 Mar; 83(5_Supplement)
Abstract:
The emergence of acquired drug resistance through therapeutic treatment remains a critical threat to efficient chemotherapy, target therapy, or immune therapy. These resistant cancer cells most often lead to relapse or metastasis. The development of drug resistance is a multi-step evolutionary adaptation for cancer cells. Tumor heterogeneity, cancer cells’ plasticity, and microenvironment contribute to the resistant clone’s formation. Therefore, a time-lapse adaptation model is critical to define the mechanism of drug resistance evolution. Recently, several studies have revealed that the initial acquired drug resistance might be conferred by transient events, such as drug-tolerant persisters (DTP) that might occur in a subpopulation of the cancer cells at the early stage of the treatment, which were then followed by the transcriptomic reprogramming and secondary-wave genetic mutations in the progression of resistance development. In the clinic, chemotherapy is still the mainstream treatment for TNBC, and one of the primary chemo agents is doxorubicin. Although the initial responsive rate of doxorubicin-based chemotherapy is up to 70%, it is well recognized that TNBC cells usually generate an evolutionary adaptive response that can result in the acquired drug-resistance and multi-drug resistant phenotypes. To date, numerous different mechanisms of acquired chemo-resistance have been reported, but the vast majority of these results have been derived from the continuous-high-dose-exposure acquired resistant cell line models. Since the chemo-treatment dosage in these artificial models is well above what is physiologically achievable in patients, few of them can mimic the actual situation of resistance development or improve the clinical trial outcomes. Moreover, most of these studies only characterized the terminal resistant cells, which are challenging to be resensitized because of their dominant genetic mutations. In this study, we hypothesize that the TNBC chemo-resistant cells may derive from the early-stage reversible chemo-tolerant “DTP-like” (CTP) cells, and early-stage epigenetic landscape perturbation might determine the progression of chemo-resistance development. To test the hypothesis and overcome the previous model limitations, based on the clinical drug exposure kinetics for doxorubicin, we developed an in vitro “pulsing-treatment CTPs regrowth” model (referred to as CTP model), which could mimic the clinical treatment and provide therapeutically relevant insights into the initial drug-induced stress response and resistance development. Leveraging this CTP model, we are able to define the early event for drug response, in which the doxorubicin-treated cells showed a senescence-like phenotype, and the interferon alpha (type I) pathway was activated. Furthermore, unexpectedly, we found that the expression of HERVs was significantly activated but LINE1s not. To further explore the TEs reactivation, we did the single cell RNA-seq for 0h, 2h, and 4 days samples. With a novel bioinformatic workflow, we integrated the TE expression information with coding genes mRNA profiling from the same single cell RNA-seq dataset and identified the IFN-enriched cluster had higher expression of HERVs. Herein, a subpopulation of HERVhigh cells with IFN activation was identified as a “hot-cluster” which might be the early determinant in the resistance evolution.
link
-
Erwan Pernet*, Sarah Sun, Nicole Sarden, Saideep Gona, Angela Nguyen, Nargis Khan, Martin Mawhinney, Kim A Tran, Julia Chronopoulos, Dnyandeo Amberkar, Mina Sadeghi, Alexandre Grant, Shradha Wali, Renaud Prevel, Jun Ding, James G Martin, Ajitha Thanabalasuriar, Bryan G Yipp, Luis B Barreiro & Maziar Divangahi*.
Neonatal imprinting of alveolar macrophages via neutrophil-derived 12-HETE. Nature. 2023 Feb; 614(7948)
Abstract:
Resident-tissue macrophages (RTMs) arise from embryonic precursors1,2, yet the developmental signals that shape their longevity remain largely unknown. Here we demonstrate in mice genetically deficient in 12-lipoxygenase and 15-lipoxygenase (Alox15−/− mice) that neonatal neutrophil-derived 12-HETE is required for self-renewal and maintenance of alveolar macrophages (AMs) during lung development. Although the seeding and differentiation of AM progenitors remained intact, the absence of 12-HETE led to a significant reduction in AMs in adult lungs and enhanced senescence owing to increased prostaglandin E2 production. A compromised AM compartment resulted in increased susceptibility to acute lung injury induced by lipopolysaccharide and to pulmonary infections with influenza A virus or SARS-CoV-2. Our results highlight the complexity of prenatal RTM programming and reveal their dependency on in trans eicosanoid production by neutrophils for lifelong self-renewal.
link
-
Yiwei Xiong, Jingtao Wang, Xiaoxiao Shang, Tingting Chen, Gregory Fonseca, Simon Rousseau, & Jun Ding.
RAMEN identifies effective indicators for severe COVID and Long COVID patients. Biorxiv. 2023 Jan 25
Abstract:
The outbreak of the COVID-19 pandemic caused catastrophic socioeconomic consequences and fundamentally reshaped the lives of billions across the globe. Our current understanding of the relationships between clinical variables (demographics, symptoms, follow-up symptoms, comorbidities, treatments, lab results, complications, and other clinical measurements) and COVID-19 outcomes remains obscure. Various computational approaches have been employed to elucidate the relationships between different COVID-19 clinical variables and their contributions to the disease outcomes. However, it is often challenging to capture the indirect relationships, as well as the direction of those relationships, with the conventional pairwise correlation methods. Graphical models (e.g., Bayesian networks) can address these limitations but are computationally expensive, which substantially limits their applications in reconstructing relationship networks of umpteen clinical variables. In this study, we have developed a method named RAMEN, which employs Genetic Algorithm and random walks to infer the Bayesian relationship network between clinical variables. We applied RAMEN to a comprehensive COVID-19 dataset, Biobanque Québécoise de la COVID-19 (BQC19). Most of the clinical variables in our reconstructed Bayesian network associated with COVID-19 severity, or long COVID, are supported by existing literature. We further computationally verified the effectiveness of the RAMEN method with statistical examinations of the multi-omics measurements (Clinical variables, RNA-seq, and Somascan) of the BQC19 data and simulations. The accurate inference of the relationships between clinical variables and disease outcomes powered by RAMEN will significantly advance the development of effective and early diagnostics of severe COVID-19 and long COVID, which can help save millions of lives.
link
2022
-
Dongshunyi Li, Jeremy J. Velazquez, Jun Ding, Joshua Hislop, Mo R. Ebrahimkhani & Ziv Bar-Joseph.
TraSig: inferring cell-cell interactions from pseudotime ordering of scRNA-Seq data. Genome Biol. 2022 Dec; 23(73)
Abstract:
A major advantage of single cell RNA-sequencing (scRNA-Seq) data is the ability to reconstruct continuous ordering and trajectories for cells.
Here we present TraSig, a computational method for improving the inference of cell-cell interactions in scRNA-Seq studies that utilizes the dynamic
information to identify significant ligand-receptor pairs with similar trajectories, which in turn are used to score interacting cell clusters. We
applied TraSig to several scRNA-Seq datasets and obtained unique predictions that improve upon those identified by prior methods. Functional experiments
validate the ability of TraSig to identify novel signaling interactions that impact vascular development in liver organoids.
link
-
Dongshunyi Li, Jun Ding, Ziv Bar-Joseph.
Unsupervised cell functional annotation for single-cell RNA-Seq. Springer, Cham. 2022; 13278
Abstract:
One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types.
link
-
AtLee TD Watson, Aldo Carmona Baez, Dereje Jima, David Reif, Jun Ding, Reade Roberts, Seth W Kullman.
TCDD alters essential transcriptional regulators of osteogenic differentiation in multipotent mesenchymal stem cells. Toxicological Sciences. 2022 Nov 12; kfac120
Abstract:
Differentiation of multipotent mesenchymal stem cells (MSCs) into bone-forming osteoblasts requires strict coordination of transcriptional pathways. Aryl
hydrocarbon receptor ligands, such as 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), have been shown to alter osteoblast differentiation in vitro and bone formation
in multiple developmental in vivo models. The goal of the present study was to establish a global transcriptomic landscape during early, intermediate, and apical
stages of osteogenic differentiation in vitro in response to TCDD exposure. Human bone-derived mesenchymal stem cells (hBMSCs) were cultured in growth media (GM),
osteogenic differentiation media (ODM), or ODM containing 10 nM TCDD (ODM + TCDD), thus enabling a comparison of the transcriptomic profiles of undifferentiated,
differentiated, and differentiated-TCDD-exposed hBMSCs, respectively. In this test system, exposure to TCDD attenuated the differentiation of hBMSCs into osteoblasts as
evidenced by reduced alkaline phosphatase activity and mineralization. At various timepoints, we observed altered expression of genes that play a role in the Wnt, fibroblast
growth factor, bone morphogenetic protein/transforming growth factor beta developmental pathways, as well as pathways related to extracellular matrix organization and deposition.
Reconstruction of gene regulatory networks with the interactive dynamic regulatory event miner (iDREM) analysis revealed modulation of transcription factors (TFs) including POLR3G,
NR4A1, RDBP, GTF2B, POU2F2, and ZEB1, which may putatively influence osteoblast differentiation and the requisite deposition and mineralization of bone extracellular matrix.
We demonstrate that the combination of RNA-Seq data in conjunction with the iDREM regulatory model captures the transcriptional dynamics underlying MSC differentiation under
different conditions in vitro. Model predictions are consistent with existing knowledge and provide a new tool to identify novel pathways and TFs that may facilitate a better
understanding of the osteoblast differentiation process, perturbation by exogenous agents, and potential intervention strategies targeting those specific pathways.
link
-
Jun Ding, Nadav Sharon, Ziv Bar-Joseph.
Temporal modelling using single-cell transcriptomics. Nature Review Genetics. 2022 June; 23:355–368
Abstract:
Methods for profiling genes at the single-cell level have revolutionized our ability to study several biological processes and systems including development,
differentiation, response programmes and disease progression. In many of these studies, cells are profiled over time in order to infer dynamic changes in cell
states and types, sets of expressed genes, active pathways and key regulators. However, time-series single-cell RNA sequencing (scRNA-seq) also raises several
new analysis and modelling issues. These issues range from determining when and how deep to profile cells, linking cells within and between time points, learning
continuous trajectories, and integrating bulk and single-cell data for reconstructing models of dynamic networks. In this Review, we discuss several approaches
for the analysis and modelling of time-series scRNA-seq, highlighting their steps, key assumptions, and the types of data and biological questions they are most
appropriate for.
link
-
JC Schupp, G Clair, T Adams, JE Mcdonough, J Flint, N Kothapalli, L De Sadeleer, A Justet, J Melchior, V Paurus, H Olson, G Deluliis, F Ahangari, Z Bar-Joseph, X Yan, Jun Ding, WA Wuyts, B Vanaudenaerde, N Kaminski.
Single Nuclei RNA Sequencing of Differentially Affected Regions in IPF Lungs Suggests a Central Role of Aberrant Basaloid Cells in Disease Progression. American Thoracic Society. 2022 May
Abstract:
The understanding of the emergence and progression of idiopathic pulmonary fibrosis (IPF) in the human lung is lacking. Recent advances by scRNAseq have
identified the depth of aberrant and ectopic cell populations but did not address tissue heterogeneity. Utilizing single nuclei RNA sequencing (snRNAseq)
and microCT imaging we aimed to characterize differentially affected regions.
link
-
Jun Ding, Y Zheng, JC Schupp, T Adams, F Ahangari, X Yan, P Hasen, Z Bar-Joseph, L De Sadeleer, JE Mcdonough, BM Vanaudenaerde, WA Wuyts, N Kaminski.
A Probabilistic Graphical Model for Understanding Cellular Dynamics in Idiopathic Pulmonary Fibrosis Progression. American Thoracic Society. 2022 May
Abstract:
Increasing evidence suggests that different cell populations (cell type or sub-types) in the IPF lung tissue have varying impacts on the disease progression.
However, the role of each cell population in driving IPF progression remains poorly understood. Here, we develop a probabilistic graphical model that reconstructs
the cellular dynamics for each cell population along IPF progression.
link
-
Euxhen Hasanaj, Jingtao Wang, Arjun Sarathi, Jun Ding, Ziv Bar-Joseph.
Interactive single-cell data analysis using Cellar. Nature Communications. 2022 Apr 14; 13(1998)
Abstract:
Cell type assignment is a major challenge for all types of high throughput single cell data. In many cases such assignment requires
the repeated manual use of external and complementary data sources. To improve the ability to uniformly assign cell types across large
consortia, platforms and modalities, we developed Cellar, a software tool that provides interactive support to all the different steps
involved in the assignment and dataset comparison process. We discuss the different methods implemented by Cellar, how these can be used
with different data types, how to combine complementary data types and how to analyze and visualize spatial data. We demonstrate the
advantages of Cellar by using it to annotate several HuBMAP datasets from multi-omics single-cell sequencing and spatial proteomics studies.
Cellar is open-source and includes several annotated HuBMAP datasets.
link
-
Salyan Bhattarai, Qian Li, Jun Ding, Feng Liang, Ekaterina Gusev, Orsolya Lapohos, Gregory J Fonseca, Eva Kaufmann, Maziar Divangahi, Basil J Petrof.
TLR4 is a regulator of trained immunity in a murine model of Duchenne muscular dystrophy. Nature Communications. 2022 Feb 15; 13(879)
Abstract:
Dysregulation of the balance between pro-inflammatory and anti-inflammatory macrophages has a key function in the pathogenesis of Duchenne muscular
dystrophy (DMD), a fatal genetic disease. We postulate that an evolutionarily ancient protective mechanism against infection, known as trained immunity,
drives pathological inflammation in DMD. Here we show that bone marrow-derived macrophages from a murine model of DMD (mdx) exhibit cardinal features of
trained immunity, consisting of transcriptional hyperresponsiveness associated with metabolic and epigenetic remodeling. The hyperresponsive phenotype is
transmissible by bone marrow transplantation to previously healthy mice and persists for up to 11 weeks post-transplant. Mechanistically, training is induced
by muscle extract in vitro. The functional and epigenetic changes in bone marrow-derived macrophages from dystrophic mice are TLR4-dependent. Adoptive transfer
experiments further support the TLR4-dependence of trained macrophages homing to damaged muscles from the bone marrow. Collectively, this suggests that a TLR4-regulated,
memory-like capacity of innate immunity induced at the level of the bone marrow promotes dysregulated inflammation in DMD.
link
-
Bowen Zhao, Dong-Qing Wei, Yi Xiong, Jun Ding.
scCobra: Contrastive cell embedding learning with domain-adaptation for single-cell data integration. bioRxiv. 2022 Jan 1
Abstract:
The ever-increasing availability of single-cell transcriptomic data offers unrivaled opportunities to profile cellular states in various biological
processes at high resolution, which has brought substantial advancements in understanding complex mechanisms underlying a large variety of bioprocesses.
As limited by the protocol and technology, single-cell measurements in one study are often performed in batches, which unavoidably induces biological and
technical differences in the single-cell measurements of the same study. Consequently, it presents challenges in analyzing all single-cell data from different
batches together, particularly if the measurements were assayed with different technologies. Several methods have been developed to remove the aforementioned
batch effects recently. However, there remain challenges unaddressed with those existing methods, including but not limited to the risk of over-correction, the
need for the assumption of gene expression distribution, and expensive computation. To mitigate those limitations, we develop a novel deep learning method called
scCobra that combines contrastive learning, domain adaptation, and generative adversarial networks to remove batch effects in single-cell RNA-seq data. The
contrastive learning network is utilized to learn latent embeddings to represent the cells, domain-adaptation is employed to batch-normalize the latent embeddings
of cells from distinct batches, while generative adversarial networks further optimize the blending effect. The proposed method does not require any prior assumption
of gene expression distribution. We applied the scCobra method to one simulated and two real single-cell datasets with significantly experimental differences. Our method
outperforms other benchmarked methods in batch correction and biological conservation, and its running efficiency is also among the best.
link
2021
-
Jun Ding, Amir Alavi, Mo R Ebrahimkhani, Ziv Bar-Joseph.
Computational tools for analyzing single-cell data in pluripotent cell differentiation studies .Cell Reports Methods. 2021
Abstract:
Single-cell technologies are revolutionizing the ability of researchers to infer the causes and results of biological processes.
Although several studies of pluripotent cell differentiation have recently utilized single-cell sequencing data, other aspects related to the optimization of differentiation protocols,
their validation, robustness, and usage are still not taking full advantage of single-cell technologies.
In this review, we focus on computational approaches for the analysis of single-cell omics and imaging data and discuss their use to
address many of the major challenges involved in the development, validation, and use of cells obtained from pluripotent cell differentiation.
link
-
Ding J.
A versatile model for single-cell data analysis. Nature Computational Science. 2021 July 22;1:460-461
Abstract:
Making sense of single-cell data requires various computational efforts such as clustering,
visualization and gene regulatory network inference, often addressed by different methods. DeepSEM provides an all-in-one solution.
link
-
Ding J, Hostallero DE, El Khili MR, Fonseca GJ, Milette S, Noorah N, Guay-Belzile M, Spicer J, Daneshtalab N, Sirois M, Tremblay K.
A network-informed analysis of SARS-CoV-2 and hemophagocytic lymphohistiocytosis genes’ interactions points to Neutrophil extracellular traps as mediators of thrombosis in COVID-19.. PLoS Computational Biology. 2021 Mar 8;17(3):e1008810.
Abstract:
Abnormal coagulation and an increased risk of thrombosis are features of severe COVID-19, with parallels proposed with hemophagocytic lymphohistiocytosis (HLH), a life-threating condition associated with hyperinflammation. The presence of HLH was described in severely ill patients during the H1N1 influenza epidemic, presenting with pulmonary vascular thrombosis. We tested the hypothesis that genes causing primary HLH regulate pathways linking pulmonary thromboembolism to the presence of SARS-CoV-2 using novel network-informed computational algorithms. This approach led to the identification of Neutrophils Extracellular Traps (NETs) as plausible mediators of vascular thrombosis in severe COVID-19 in children and adults. Taken together, the network-informed analysis led us to propose the following model: the release of NETs in response to inflammatory signals acting in concert with SARS-CoV-2 damage the endothelium and direct platelet-activation promoting abnormal coagulation leading to serious complications of COVID-19. The underlying hypothesis is that genetic and/or environmental conditions that favor the release of NETs may predispose individuals to thrombotic complications of COVID-19 due to an increase risk of abnormal coagulation. This would be a common pathogenic mechanism in conditions including autoimmune/infectious diseases, hematologic and metabolic disorders.
link
-
Guerrina N, Traboulsi H, Souza A, Bossé Y, Thatcher TH, Robichaud A, Ding J, Li P, Simon L, Pareek S, Bourbeau J, Tan WC, Benedetti A, Obeidat M, Sin DD, Brandsma C, Nickle DC, Sime PJ, Phipps RP, Nair P, Zago M, Hamid Q, Smith BM, Eidelman DH, Baglole CJ.
Aryl hydrocarbon receptor deficiency causes the development of chronic obstructive pulmonary disease through the integration of multiple pathogenic mechanisms.. FASEB journal: official publication of the Federation of American Societies for Experimental Biology, 2021, 35.3: e21376.
Abstract:
Emphysema, a component of chronic obstructive pulmonary disease (COPD), is characterized by irreversible alveolar destruction that results in a progressive decline in lung function. This alveolar destruction is caused by cigarette smoke, the most important risk factor for COPD. Only 15%-20% of smokers develop COPD, suggesting that unknown factors contribute to disease pathogenesis. We postulate that the aryl hydrocarbon receptor (AHR), a receptor/transcription factor highly expressed in the lungs, may be a new susceptibility factor whose expression protects against COPD. Here, we report that Ahr-deficient mice chronically exposed to cigarette smoke develop airspace enlargement concomitant with a decline in lung function. Chronic cigarette smoke exposure also increased cleaved caspase-3, lowered SOD2 expression, and altered MMP9 and TIMP-1 levels in Ahr-deficient mice. We also show that people with COPD have reduced expression of pulmonary and systemic AHR, with systemic AHR mRNA levels positively correlating with lung function. Systemic AHR was also lower in never-smokers with COPD. Thus, AHR expression protects against the development of COPD by controlling interrelated mechanisms involved in the pathogenesis of this disease. This study identifies the AHR as a new, central player in the homeostatic maintenance of lung health, providing a foundation for the AHR as a novel therapeutic target and/or predictive biomarker in chronic lung disease.
link
-
Aloufi N, Traboulsi H, Ding J, Fonseca GJ, Nair P, Huang SK, Hussain SN, Eidelman DH, Baglole CJ.
Angiotensin-converting enzyme 2 expression in COPD and IPF fibroblasts: the forgotten cell in COVID-19.. American Journal of Physiology-Lung Cellular and Molecular Physiology. 2021 Jan 1;320(1):L152-7.
Abstract:
The COVID-19 pandemic is associated with severe pneumonia and acute respiratory distress syndrome leading to death in susceptible individuals. For those who recover, post-COVID-19 complications may include development of pulmonary fibrosis. Factors contributing to disease severity or development of complications are not known. Using computational analysis with experimental data, we report that idiopathic pulmonary fibrosis (IPF)- and chronic obstructive pulmonary disease (COPD)-derived lung fibroblasts express higher levels of angiotensin-converting enzyme 2 (ACE2), the receptor for SARS-CoV-2 entry and part of the renin-angiotensin system that is antifibrotic and anti-inflammatory. In preclinical models, we found that chronic exposure to cigarette smoke, a risk factor for both COPD and IPF and potentially for SARS-CoV-2 infection, significantly increased pulmonary ACE2 protein expression. Further studies are needed to understand the functional implications of ACE2 on lung fibroblasts, a cell type that thus far has received relatively little attention in the context of COVID-19.
link
-
Li D, Ding J,Bar-Joseph Z.
Identifying signaling genes in spatial single cell expression data. Bioinformatics, 2021; 37(7), 968-975.
Abstract:
Motivation
Recent technological advances enable the profiling of spatial single-cell expression data. Such data present a unique opportunity to study cell–cell interactions and the signaling genes that mediate them. However, most current methods for the analysis of these data focus on unsupervised descriptive modeling, making it hard to identify key signaling genes and quantitatively assess their impact.
Results
We developed a Mixture of Experts for Spatial Signaling genes Identification (MESSI) method to identify active signaling genes within and between cells. The mixture of experts strategy enables MESSI to subdivide cells into subtypes. MESSI relies on multi-task learning using information from neighboring cells to improve the prediction of response genes within a cell. Applying the methods to three spatial single-cell expression datasets, we show that MESSI accurately predicts the levels of response genes, improving upon prior methods and provides useful biological insights about key signaling genes and subtypes of excitatory neuron cells.
link
2020
-
Ding J, Bar-Joseph Z.
Analysis of time series regulatory networks. Current Opinion in Systems Biology. 2020 June; 21 Pages 16-24
Abstract:
The vast majority of biological processes are dynamic, changing over time. Several studies profile high-throughput time-series data and use it for analyzing and modeling various biological processes. In this review, we focus on data, methods, and analysis for reconstructing dynamic regulatory network models from high-throughput time-series data sets. We discuss methods focused on a single data type, methods that integrate several omics data types, methods that integrate static and time-series data, and methods that focus on single-cell data. For each of these categories, we present some of the top methods and discuss their underlying assumptions, advantages, and potential shortcomings. As the quantity and types of time-series omics data continue to increase, we expect that these methods, and additional methods extending and improving them, would play an increasingly important role in our ability to accurately model biological processes.
link
-
Lin C, Ding J, Bar-Joseph Z.
Inferring TF activation order in time series scRNA-Seq studies. PLoS computational biology. 2020 Feb 18;16(2):e1007644.
Abstract:
Methods for the analysis of time series single cell expression data (scRNA-Seq) either do not utilize information about transcription factors (TFs) and
their targets or only study these as a post-processing step. Using such information can both, improve the accuracy of the reconstructed model and
cell assignments, while at the same time provide information on how and when the process is regulated.
We developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) method which integrates probabilistic modeling of
scRNA-Seq data with the ability to assign TFs to specific activation points in the model. TFs are assumed to influence the
emission probabilities for cells assigned to later time points allowing us to identify not just the TFs controlling each path but
also their order of activation. We tested CSHMM-TF on several mouse and human datasets. As we show, the method was able to identify known and
novel TFs for all processes, assigned time of activation agrees with both expression information and prior knowledge and combinatorial predictions are supported by known interactions.
We also show that CSHMM-TF improves upon prior methods that do not utilize TF-gene interaction.
link
-
Hurley K, Ding J (co-first), Villacorta-Martin C, Herriges MJ, Jacob A, Vedaie M, Alysandratos KD, Sun YL, Lin C, Werder RB, Huang J. , ..., Bar-Joseph Z, Kotton DN.
Reconstructed single-cell fate trajectories define lineage plasticity windows during differentiation of human PSC-derived distal lung progenitors.
Cell Stem Cell. 2020 Jan 30.
Abstract:
Alveolar epithelial type 2 cells (AEC2s) are the facultative progenitors responsible for maintaining lung alveoli throughout life but are difficult to isolate from patients. Here, we engineer AEC2s from human pluripotent stem cells (PSCs) in vitro and use time-series single-cell RNA sequencing with lentiviral barcoding to profile the kinetics of their differentiation in comparison to primary fetal and adult AEC2 benchmarks. We observe bifurcating cell-fate trajectories as primordial lung progenitors differentiate in vitro, with some progeny reaching their AEC2 fate target, while others diverge to alternative non-lung endodermal fates. We develop a Continuous State Hidden Markov model to identify the timing and type of signals, such as overexuberant Wnt responses, that induce some early multipotent NKX2-1+ progenitors to lose lung fate. Finally, we find that this initial developmental plasticity is regulatable and subsides over time, ultimately resulting in PSC-derived AEC2s that exhibit a stable phenotype and nearly limitless self-renewal capacity.
link
2019
-
McDonough JE, Ahangari F, ..., Ding J.,Maes K, Sadeleer LD, Vos R, Neyrinck A, Benos PV, Bar-Joseph Z, Tantin D, Hogg JC, Vanaudenaerde BM, Wuyts WA, Kaminski N.
Transcriptional regulatory model of fibrosis progression in the human lung. JCI insight. 2019 Nov 14;4(22).
Abstract:
To develop a systems biology model of fibrosis progression within the human lung we performed RNA sequencing and microRNA analysis on 95 samples
obtained from 10 idiopathic pulmonary fibrosis (IPF) and 6 control lungs. Extent of fibrosis in each sample was assessed by microCT-measured alveolar
surface density (ASD) and confirmed by histology. Regulatory gene expression networks were identified using linear mixed-effect models and dynamic regulatory
events miner (DREM). Differential gene expression analysis identified a core set of genes increased or decreased before fibrosis was histologically evident
that continued to change with advanced fibrosis. DREM generated a systems biology model (www.sb.cs.cmu.edu/IPFReg) that identified progressively divergent
gene expression tracks with microRNAs and transcription factors that specifically regulate mild or advanced fibrosis.
We confirmed model predictions by demonstrating that expression of POU2AF1, previously unassociated with lung fibrosis but proposed
by the model as regulator, is increased in B lymphocytes in IPF lungs and that POU2AF1-knockout mice were protected from bleomycin-induced lung fibrosis.
Our results reveal distinct regulation of gene expression changes in IPF tissue that remained structurally normal compared with moderate or advanced fibrosis
and suggest distinct regulatory mechanisms for each stage.
link
-
Ding J, Ahangari F, Espinoza CR, Chhabra D, Nicola T, Yan X, Lal CV, Hagood JS, Kaminski N, Bar-Joseph Z, Ambalavanan N.
Integrating multiomics longitudinal data to reconstruct networks underlying lung development.
American Journal of Physiology-Lung Cellular and Molecular Physiology. 2019 Nov 1;317(5):L556-68.
Abstract:
A comprehensive understanding of the dynamic regulatory networks that govern postnatal alveolar lung development is still lacking.
To construct such a model, we profiled mRNA, microRNA, DNA methylation, and proteomics of developing murine alveoli isolated by
laser capture microdissection at 14 predetermined time points.
We developed a detailed comprehensive and interactive model that provides information about the major expression trajectories,
the regulators of specific key events, and the impact of epigenetic changes. Intersecting the model with single-cell RNA-Seq data led
to the identification of active pathways in multiple or individual cell types. We then constructed a similar model for human lung development
by profiling time-series human omics data sets. Several key pathways and regulators are shared between the reconstructed models.
We experimentally validated the activity of a number of predicted regulators, leading to new insights about the regulation of innate immunity during lung development.
link
-
Liu H, Zhang CH, Ammanamanchi N, Suresh S, ..., Ding J, Bar-Joseph Z, Wu Y, Yechoor V, Moulik M, Johnson J, Weinberg J, Reyes-Mugica M, Steinhauser ML, Kuhn B.
Control of cytokinesis by beta-adrenergic receptors indicates an approach for regulating cardiomyocyte endowment.
Sci Transl Med., 2019; 11(513)
Abstract:
One million patients with congenital heart disease (CHD) live in the United States.
They have a lifelong risk of developing heart failure. Current concepts do not sufficiently address mechanisms of heart failure development specifically for these patients.
Here, analysis of heart tissue from an infant with tetralogy of Fallot with pulmonary stenosis (ToF/PS) labeled with isotope-tagged thymidine demonstrated that
cardiomyocyte cytokinesis failure is increased in this common form of CHD.
We used single-cell transcriptional profiling to discover that the underlying mechanism of cytokinesis failure is repression of the cytokinesis gene ECT2,
downstream of beta-adrenergic receptors (beta-ARs). Inactivation of the beta-AR genes and administration of the beta-blocker propranolol increased cardiomyocyte division in neonatal mice, which increased the number of cardiomyocytes (endowment) and conferred benefit after myocardial infarction in adults. Propranolol enabled the division of ToF/PS cardiomyocytes in vitro.
These results suggest that beta-blockers could be evaluated for increasing cardiomyocyte division in patients with ToF/PS and other types of CHD
link
- Ding J, Lin C, Bar-Joseph Z.
Cell lineage inference from SNP and scRNA-Seq data.
Nucleic Acids Research, 2019 47(10), pp.e56-e56.
Abstract:
Several recent studies focus on the inference of developmental and response trajectories from single cell RNA-Seq (scRNA-Seq) data.
A number of computational methods, often referred to as pseudo-time ordering, have been developed for this task.
Recently, CRISPR has also been used to reconstruct lineage trees by inserting random mutations.
However, both approaches suffer from drawbacks that limit their use. Here we develop a method to detect significant, cell type specific, sequence mutations from scRNA-Seq data. We show that only a few mutations are enough for reconstructing good branching models. Integrating these mutations with expression data further improves the accuracy of the reconstructed models. As we show, the majority of mutations we identify are likely RNA editing events indicating that such information can be used to distinguish cell types.
link
2018
- Friedman C, Nguyen Q, Lukowski S, ..., Ding J, Wang Y, Hudson J, Ruohola-Baker H, Bar-Joseph Z, Tam P, Powell J, Palpant N.
Single-Cell Transcriptomic Analysis of Cardiac Differentiation from Human PSCs Reveals HOPX-Dependent Cardiomyocyte Maturation.
Cell Stem Cell. 2018; 23(4):586-598
Abstract:
Cardiac differentiation of human pluripotent stem cells (hPSCs) requires orchestration of dynamic gene regulatory
networks during stepwise fate transitions but often generates immature cell types that do not fully recapitulate
properties of their adult counterparts, suggesting incomplete activation of key transcriptional networks.
We performed extensive single-cell transcriptomic analyses to map fate choices and gene expression programs during
cardiac differentiation of hPSCs and identified strategies to improve in vitro cardiomyocyte differentiation.
Utilizing genetic gain- and loss-of-function approaches, we found that hypertrophic signaling is not effectively
activated during monolayer-based cardiac differentiation, thereby preventing expression of HOPX and its activation of
downstream genes that govern late stages of cardiomyocyte maturation. This study therefore provides a key
transcriptional roadmap of in vitro cardiac differentiation at single-cell resolution, revealing fundamental mechanisms underlying
heart development and differentiation of hPSC-derived cardiomyocytes.
link
-
Nguyen QH, Lukowski SW, Chiu HS, Friedman CE, Senabouth A, Crowhurst L, Bruxner TJ, Christ AN, Hudson J, Ding J, Bar-Joseph Z, Tam PP, Palpant NJ, Powell JE.
Genetic networks modulating cell fate specification and contributing to cardiac disease risk in hiPSC-derived cardiomyocytes at single cell resolution.
Human Genomics. 2018 Mar 9;12.
Abstract: With huge amount of genome-wide mutational data generated by
cancer genomic sequencing studies, distinguishing cancer drivers
from the vast majority of passengers is important. Existing cancer
driver prediction methods capture specific mutational aspects in discriminating
potential cancer drivers. We explore the possibility of alterative
way in doing the task.
We noted mutational parameters (functional mutation ratio, mutation
frequency and sample mutation recurrence) vary differently among
mutant genes of different sizes. This led us to develop our novel algorithm
(Mutant Gene Ranker - MuGeR), incorporating the comparison
of multiple mutational parameters of target gene against the
corresponding background derived from a specific subset of genes
using a sliding window approach, to estimate the likelihood of target
genes for being potential cancer drivers. We applied our MuGeR algorithm
on the The Cancer Genome Atlas (TCGA) datasets.
Empirical data on the TCGA datasets and comparison with the
prioritization results of 4 other existing tools (MuSiC, MuSig, TUSON
explorer and DOTS-Finder) suggested satisfactory performance of our
MuGeR algorithm. More importantly, we demonstrated the existence
of specific pattern for mutational parameters across cancers.
Empirical data verified the usefulness of our MuGeR algorithm in
identifying potential cancer drivers. Moreover, our in-depth appraisal
of TCGA liver hepatocellular carcinoma datasets further highlighted
the frequent mutational dysregulation of ubiquitin-related proteasomal
degradation in driving hepatocarcinogenesis.
link
- Ding J, Aronow B, Kaminski N, Kitzmiller J, Whitsett J, Bar-Joseph Z.
Reconstructing differentiation networks and their regulation from time series single cell expression data.
Genome research. 2018 Jan 9:gr-225979.
Abstract: Generating detailed and accurate organogenesis models using single cell RNA-seq data remains a major challenge.
Current methods have relied primarily on the assumption that decedent cells are similar to their parents in terms of gene expression levels.
These assumptions do not always hold for in-vivo studies which often include infrequently sampled, un-synchronized and diverse cell populations.
Thus, additional information may be needed to determine the correct ordering and branching of progenitor cells and the set of transcription factors (TFs)
that are active during advancing stages of organogenesis.
To enable such modeling we have developed a method that learns a probabilistic model which integrates expression similarity with
regulatory information to reconstruct the dynamic developmental cell trajectories. When applied to mouse lung developmental data the method
accurately distinguished different cell types and lineages. Existing and new experimental data validated the ability of the method to identify key regulators of cell fate.
link
- Ding J, Hagood JS, Ambalavanan N, Kaminski N, Bar-Joseph Z.
iDREM: Interactive visualization of dynamic regulatory networks.
PLoS computational biology. 2018 Mar 14;14(3):e1006019.
Abstract: The Dynamic Regulatory Events Miner (DREM) software reconstructs dynamic regulatory networks
by integrating static protein-DNA interaction data with time series gene expression data.
In recent years, several additional types of high-throughput time series data have been profiled when studying biological processes
including time series miRNA expression, proteomics, epigenomics and single cell RNA-Seq.
Combining all available time series and static datasets in a unified model remains an important challenge and goal.
To address this challenge we have developed a new version of DREM termed interactive DREM (iDREM).
iDREM provides support for all data types mentioned above and combines them with existing interaction data to
reconstruct networks that can lead to novel hypotheses on the function and timing of regulators.
Users can interactively visualize and query the resulting model. We showcase the functionality of the new tool by applying it to microglia developmental data from multiple labs.
link
Before 2017
- Ding J, Bar-Joseph Z.
MethRaFo: MeDIP-seq methylation estimate using a Random Forest Regressor.
Bioinformatics. 2017 Jul 13;33(21):3477-9.
Abstract:
Profiling of genome wide DNA methylation is now routinely performed when studying development, cancer and several other biological processes.
Although Whole genome Bisulfite Sequencing provides high-quality methylation measurements at the resolution of nucleotides,
it is relatively costly and so several studies have used alternative methods for such profiling. One of the most widely used low cost alternatives is MeDIP-Seq.
However, MeDIP-Seq is biased for CpG enriched regions and thus its results need to be corrected in order to determine accurate methylation levels.
Here we present a method for correcting MeDIP-Seq results based on Random Forest regression.
Applying the method to real data from several different tissues (brain, cortex, penis) we show that it achieves almost 4 fold decrease in run time while
increasing accuracy by as much as 20% over prior methods developed for this task.
link
- Ding J, Li X, Hu H.
CCmiR: a computational approach for competitive and cooperative microRNA binding prediction.
Bioinformatics. 2017 Sep 25;34(2):198-206.
Abstract:
The identification of microRNA (miRNA) target sites is important. In the past decade, dozens of computational methods have been developed to predict miRNA target sites.
Despite their existence, rarely does a method consider the well-known competition and cooperation among miRNAs when attempts to discover target sites.
To fill this gap, we developed a new approach called CCmiR, which takes the cooperation and competition of multiple miRNAs into account in a statistical model to predict their target sites.
Tested on four different datasets, CCmiR predicted miRNA target sites with a high recall and a reasonable precision, and identified known and new cooperative and competitive miRNAs supported by literature.
Compared with three state-of-the-art computational methods, CCmiR had a higher recall and a higher precision.
link
- Roqueta-Rivera M, Esquejo RM, Phelan PE, Sandor K, Daniel B, Foufelle F, Ding J, Li X, Khorasanizadeh S, Osborne TF.
SETDB2 links glucocorticoid to lipid metabolism through Insig2a regulation .
Cell metabolism. 2016 Sep 13;24(3):474-84.
Abstract:
Transcriptional and chromatin regulations mediate the liver response to nutrient availability.
The role of chromatin factors involved in hormonal regulation in response to fasting is not fully understood.
We have identified SETDB2, a glucocorticoid-induced putative epigenetic modifier, as a positive regulator of GR-mediated gene activation in liver.
Insig2a increases during fasting to limit lipid synthesis, but the mechanism of induction is unknown. We show Insig2a induction is GR-SETDB2 dependent.
SETDB2 facilitates GR chromatin enrichment and is key to glucocorticoid-dependent enhancer-promoter interactions.
INSIG2 is a negative regulator of SREBP, and acute glucocorticoid treatment decreased active SREBP during refeeding or in livers of Ob/Ob mice,
both systems of elevated SREBP-1c-driven lipogenesis. Knockdown of SETDB2 or INSIG2 reversed the inhibition of SREBP processing.
Overall, these studies identify a GR-SETDB2 regulatory axis of hepatic transcriptional reprogramming and identify SETDB2 as a potential target for metabolic disorders with aberrant glucocorticoid actions
link
- Ding J, Li X, Hu H.
TarPmiR: a new approach for microRNA target site prediction.
Bioinformatics. 2016 May 20;32(18):2768-75.
Abstract:
The identification of microRNA (miRNA) target sites is fundamentally important for studying gene regulation.
There are dozens of computational methods available for miRNA target site prediction.
Despite their existence, we still cannot reliably identify miRNA target sites, partially due to our limited understanding of the characteristics of miRNA target sites.
The recently published CLASH (crosslinking ligation and sequencing of hybrids) data provide an unprecedented opportunity to study the characteristics of miRNA target sites and improve miRNA target site prediction methods.
Applying four different machine learning approaches to the CLASH data, we identified seven new features of miRNA target sites.
Combining these new features with those commonly used by existing miRNA target prediction algorithms,
we developed an approach called TarPmiR for miRNA target site prediction. Testing on two human and one mouse non-CLASH datasets,
we showed that TarPmiR predicted more than 74.2% of true miRNA target sites in each dataset. Compared with three existing approaches,
we demonstrated that TarPmiR is superior to these existing approaches in terms of better recall and better precision.
link
- Ding J, Li X, Hu H.
MicroRNA modules prefer to bind weak and unconventional target sites.
Bioinformatics .2014; doi: 10.1093/bioinformatics/btu833.
Abstract:
MicroRNAs (miRNAs) play critical roles in gene regulation. Although it is well known that multiple miRNAs may work as miRNA modules to synergistically regulate common target mRNAs,
the understanding of miRNA modules is still in its infancy. We employed the recently generated high throughput experimental data to study miRNA modules.
We predicted 181 miRNA modules and 306 potential miRNA modules. We observed that the target sites of these predicted modules were in general weaker compared with those not bound by miRNA modules.
We also discovered that miRNAs in predicted modules preferred to bind unconventional target sites rather than canonical sites.
Surprisingly, contrary to a previous study, we found that most adjacent miRNA target sites from the same miRNA modules were not within the range of 10-130 nucleotides.
Interestingly, the distance of target sites bound by miRNAs in the same modules was shorter when miRNA modules bound unconventional instead of canonical sites.
Our study shed new light on miRNA binding and miRNA target sites, which will likely advance our understanding of miRNA regulation.
link
- Ding J, Dhillon V, Li X, Hu H.
Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS.
Methods . 2014; doi:10.1016/j.ymeth.2014.08.006
Abstract:
Understanding transcriptional regulatory elements and particularly the transcription factor binding sites represents a significant challenge in computational biology.
The chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-seq) experiments provide an unprecedented opportunity to study transcription factor binding sites on the genome-wide scale.
Here we describe a recently developed tool, SIOMICS, to systematically discover motifs and binding sites of transcription factors and their cofactors from ChIP-seq data. Unlike other tools,
SIOMICS explores the co-binding properties of multiple transcription factors in short regions to predict motifs and binding sites.
We have previously shown that the original SIOMICS method predicts motifs and binding sites of more cofactors in more accurate and time-effective ways than two popular methods.
In this paper, we present the extended SIOMICS method, SIOMICS_Extension, and demonstrate its usage for systematic discovery of cofactor motifs and binding sites.
The SIOMICS tool, including SIOMICS and SIOMICS_Extension, are available at http://hulab.ucf.edu/research/projects/SIOMICS/SIOMICS.html.
link
- Ding J, Hu H, Li X.
SIOMICS: a Novel Approach for Systematic Identification of Motifs in ChIP-seq Data.
Nucleic Acids Research . 2014; 42(5): e35.
Abstract:
The identification of transcription factor binding motifs is important for the study of gene transcriptional regulation.
The chromatin immunoprecipitation (ChIP), followed by massive parallel sequencing (ChIP-seq) experiments, provides an unprecedented opportunity to discover binding motifs.
Computational methods have been developed to identify motifs from ChIP-seq data, while at the same time encountering several problems.
For example, existing methods are often not scalable to the large number of sequences obtained from ChIP-seq peak regions.
Some methods heavily rely on well-annotated motifs even though the number of known motifs is limited.
To simplify the problem, de novo motif discovery methods often neglect underrepresented motifs in ChIP-seq peak regions.
To address these issues, we developed a novel approach called SIOMICS to de novo discover motifs from ChIP-seq data.
Tested on 13 ChIP-seq data sets, SIOMICS identified motifs of many known and new cofactors. Tested on 13 simulated random data sets,
SIOMICS discovered no motif in any data set. Compared with two recently developed methods for motif discovery, SIOMICS shows advantages in terms of speed,
the number of known cofactor motifs predicted in experimental data sets and the number of false motifs predicted in random data sets.
The SIOMICS software is freely available at http://eecs.ucf.edu/∼xiaoman/SIOMICS/SIOMICS.html.
link
- Ding J, Hu H, Li X. NIM,
A novel computational method for predicting nuclear-encoded chloroplast proteins.
Journal of Medical and Bioengineering. 2013; 2(2): 115-119.
Abstract:
The identification of nuclear-encoded chloroplast
proteins is important for the understanding of their
functions and their interaction in chloroplasts. Despite
various endeavors in predicting these proteins, there is still
room for developing novel computational methods for
further improving the prediction accuracy. Here we
developed a novel computational method called NIM based
on interpolated Markov chains to predict nuclear-encoded
chloroplast proteins. By testing the method on real data, we
show NIM has an average sensitivity larger than 92% and
an average specificity larger than 97%. Compared with the
state-of-the-art methods, we demonstrate that NIM
performs better or is at least comparable with them. Our
study thus provides a novel and useful tool for the
prediction of nuclear-encoded chloroplast proteins.
link
- Ding J, Cai X, Wang Y, Hu H, Li X. ChIPModule:
Systematic discovery of transcription factors and their cofactors from ChIP-seq data.
Pac Symp Biocomput. 2013.
Abstract:
We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data.
Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors,
ChIPModule can efficiently identify groups of transcription factors,
whose binding sites significantly co-occur in the ChIP-seq peak regions.
By testing ChIPModule on simulated data and experimental data, we have shown that ChIPModule identifies known cofactors of transcription factors,
and predicts new cofactors that are supported by literature. ChIPModule provides a useful tool for studying gene transcriptional regulation.
link
- Ding J, Li X, Hu H.
Systematic discovery of cis-regulatory elements in Chlamydomonas reinhardtii genome using comparative genomics.
Plant Physiology. 2012;160(2):613-23.
Abstract:
Chlamydomonas reinhardtii (C. reinhardtii) is one of the most important microalgae model organisms and has been widely
studied towards the understanding of chloroplast functions and various cellular processes.
Further exploitation of C. reinhardtii as a model system to elucidate various molecular mechanisms and pathways requires systematic study of gene regulation.
However, there is a general lack of genome-scale gene regulation study such as global cis-regulatory element (CRE) identification in C. reinhardtii.
Recently, large-scale genomic data in microalgae species have become available, which enable the development of efficient computational methods to systematically identify CREs and characterize
their roles in microalgae gene regulation. Here we performed in-silico CRE identification at the whole genome level in C. reinhardtii using a comparative-genomics-based method.
We identified a large number of CREs in C. reinhardtii that are consistent with experimentally verified CREs.
We also discovered that a large percentage of these CREs form combinations and have the potential to work together for coordinated gene regulation in C. reinhardtii.
Multiple evidences from literature, gene transcriptional profiles and gene annotation resources support our discovery.
The discovered CREs will serve as the first large-scale collection of CREs in C. reinhardtii to facilitate further experimental study of microalgae gene regulation.
The accompanying software tool and the predictions in C. reinhardtii are also made available through a web-accessible database (http://hulab.ucf.edu/research/projects/Microalgae/sdcre/motifcomb.html).
link
- Ying Wang, Ding J, Daniell H, Hu H, Li X.
Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.
Plant Molecular Biology . 2012;80(2):177-87.
Abstract:
Chloroplasts play critical roles in land plant cells.
Despite their importance and the availability of at least 200 sequenced chloroplast genomes,
the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper,
we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in
seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional
regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes,
indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic
sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses,
protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs.
Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein,
which sheds light on the understanding of the transcriptional regulation of chloroplast genes.
link
- Ding J, Hu H, Li X.
Thousands of cis-regulatory sequences are shared by Arabidopsis and populus.
Plant Physiology. 2012;158(1):145-55. Epub 2011 Nov 4.
Abstract:
The identification of cis-regulatory modules can greatly advance our understanding of gene regulatory mechanisms.
Despite the existence of binding sites of more than three transcription factors in a cis-regulatory module,
studies in plants often consider only the co-occurrence of binding sites of one or two transcription factors.
In addition, cis-regulatory module studies in plants are limited to combinations of only a few families of transcription factors.
It is thus not clear how widespread plant transcription factors work together, which transcription factors work together to regulate plant genes,
and how the combinations of these transcription factors are shared by different plants. To fill these gaps,
we applied a frequent pattern mining based approach to identify frequently used cis-regulatory sequence combinations in the promoter sequences of two plant species,
Arabidopsis thaliana and Populus trichocarpa. A cis-regulatory sequence here corresponds to a DNA motif bound by a transcription factor.
We identified 18638 combinations composed of 2 to 6 cis-regulatory sequences that are shared by the two plant species.
In addition, with known cis-regulatory sequence combinations, gene function annotation, gene expression data, and known functional gene sets,
we shown that the functionality of at least 96.8% and 65.2% of these shared combinations in Arabidopsis are partially supported, under a false discovery rate of 0.1 and 0.05, respectively.
Finally, we discovered that 796 of the 18638 combinations might relate to functions that are important in bioenergy research. Our work will facilitate the study of gene transcriptional regulation in plants.
link
- Ding J, Liu Falin.
Novel Tag Anti-Collision Algorithm with Adaptive Grouping.
Wireless Sensor Network, 2009 1, 475-481
Abstract:
For RFID tags, a Novel Tag Anti-collision Algorithm with Grouping (TAAG) is proposed.
It divides tags into groups and adopts a deterministic method to identify tags within group.
TAAG estimates the total number of tags in systems from group identifying result and then adjusts the grouping method accordingly.
The performance of the proposed TAAG algorithm is compared with the conventional tag anti-collision algorithms by simulation experiments.
According to both the analysis and simulation result, the proposed algorithm shows better performance in terms of throughput, total slots used to identify and total cycles.
link