Integrated Systems Biology and AI — Tailored Pharmacology and Precision Medicine: Hu Li

Software and data

Software available in the Integrated Systems Biology and AI — Tailored Pharmacology and Precision Medicine Lab includes:

ANNE

The artificial neural network (ANN) was initially created to model how the human brain works. Over the past few decades, ANN has evolved into numerous sophisticated algorithms with proven outstanding performance in various recognition tasks.

Artificial neural network encoder (ANNE) is a novel weight engineering deep machine learning method that harnesses the power of autoencoder models and demonstrates that it is possible to decode meaningful information encoded in ANN models trained for specific tasks. We applied ANNE on breast cancer gene expression data with known clinical properties as case studies.

Our work illustrates that the trained autoencoder models are information encoders and that meaningful gene-gene associations with supported evidence can be retrieved. ANNE opens a new avenue in machine intelligence. ANN models will no longer be perceived as tools to perform recognition tasks but rather as powerful tools to extract meaningful information embedded within a sea of high-dimensional data.

Reference

Zhang C, Correia C, Weiskittel TM, Tan SH, Meng-Lin K, Yu GT, Yao J, Yeo KS, Zhu S, Ung CY, Li H. A knowledge-based discovery approach couples artificial neural networks with weight engineering to uncover immune-related processes underpinning clinical traits of breast cancer. Frontiers in Immunology. 2022; doi:10.3389/fimmu.2022.920669.

Source code

The source code is available for public access.

ASTAR-seq

Assay for single-cell transcriptome and accessibility regions (ASTAR-seq) is an automated method with high sensitivity used to simultaneously measure whole-cell transcriptome and chromatin accessibility within the same single cell.

References

Xing QR, El Farran CA, Gautam P, Chuah YS, Warrier T, Toh CD, Kang NY, Sugii S, Chang YT, Xu J, Collins JJ, Daley GQ, Li H, Zhang LF, Loh YH. Diversification of reprogramming trajectories revealed by parallel single-cell transcriptome and chromatin accessibility sequencing. Science Advances. 2020; doi:10.1126/sciadv.aba1190.
Xing QR, Farran CAE, Zeng YY, Yi Y, Warrier T, Gautam P, Collins JJ, Xu J, Dröge P, Koh CG, Li H, Zhang LF, Loh YH. Parallel bimodal single-cell sequencing of transcriptome and chromatin accessibility. Genome Research. 2020; doi:10.1101/gr.257840.119.

Source code

The source code is available for public access.

CellNet

CellNet is a network biology-based computational platform that assesses the fidelity of cellular engineering more accurately than existing methodologies do and generates hypotheses for improving cell derivations.

References

Cahan P, Li H, Morris SA, Lummertz da Rocha E, Daley GQ, Collins JJ. CellNet: Network biology applied to stem cell engineering. Cell. 2014; doi:10.1016/j.cell.2014.07.020.
Morris SA, Cahan P, Li H, Zhao AM, San Roman AK, Shivdasani RA, Collins JJ, Daley GQ. Dissecting engineered cell types and enhancing cell fate conversion via CellNet. Cell. 2014; doi:10.1016/j.cell.2014.07.021.

Materials available

The web interface and other materials are on the website portal.

CLR

Context likelihood of relatedness (CLR) is a network biology algorithm for reverse engineering and inferring regulatory interactions between master regulators and their targets using a compendium of transcriptome profiles.

Reference

Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology. 2007; doi:10.1371/journal.pbio.0050008.

Source code

Download.

Computational drug discovery platform, machine learning, feature selection, AI drug discoveries

Artificial intelligence (AI) and machine learning methods — and feature selection approaches to predict specific pharmacodynamic, pharmacokinetic or toxicological properties of pharmaceutical agents — are useful to facilitate the discovery and development of new drugs. Pharmaceutical agents have been developed and tested for possessing desirable pharmacodynamics and pharmacokinetics and a minimal level of toxicological properties.

Computational methods have been explored to predict these properties, aimed at discovering promising leads and eliminating unsuitable leads in the early stages of drug development. AI and machine learning methods have shown huge potential at predicting these properties for structurally diverse sets of agents. These methods have been used to predict agents with a variety of pharmacodynamic, pharmacokinetic and toxicological properties.

References

Li H, Yap CW, Ung CY, Xue Y, Li ZR, Han LY, Lin HH, Chen YZ. Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. Journal of Pharmaceutical Science. 2007; doi:10.1002/jps.20985.
Li H, Ung CY, Yap CW, Xue Y, Li ZR, Chen YZ. Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. Journal of Molecular Graphics and Modelling. 2006; doi:10.1016/j.jmgm.2006.01.007.
Li H, Wap CW, Zue Y, Li R, Ung CY, Han LY, Chen YZ. Statistical learning approach for predicting specific pharmacodynamic, pharmacokinetic, or toxicological properties of pharmaceutical agents. Drug Development Research. 2006; doi:10.1002/ddr.20044.
Li H, Yap CW, Ung CY, Xue Y, Cao ZW, Chen YZ. Effect of selection of molecular descriptors on the prediction of blood-brain barrier penetrating and nonpenetrating agents by statistical learning methods. Journal of Chemical Information and Modeling. 2005; doi:10.1021/ci050135u.

Source code

Download.

DPYD-Varifier

The DPYD gene-specific variant classifier DPYD-Varifier is a highly accurate in silico classifier for predicting the functional impact of DPYD variants on dihydropyrimidine dehydrogenase (DPD) activity. DPYD-Varifier has great potential for systems pharmacology and individualized medicine and for improving the clinical decision-making process.

Reference

Shrestha S, Zhang C, Jerde CR, Nie Q, Li H, Offer SM, Diasio RB. Gene-specific variant classifier (DPYD-Varifier) to identify deleterious alleles of dihydropyrimidine dehydrogenase. Clinical Pharmacology & Therapeutics. 2018; doi:10.1002/cpt.1020.

EDDI

Expression Dosage Dependent Inferelator (EDDI) is a machine learning and systems biology approach used to characterize dosage-based gene dependencies.

Reference

Meng-Lin K, Yong Ung C, Weiskittel TM, Chen A, Zhang C, Correia C, Li H. Machine learning and systems biology approaches to characterize dosage-based gene dependencies in cancer cells (PDF). Journal of Bioinformatics and Systems Biology. 2021; doi:10.26502/jbsb.5107019.

Source code

The source code is available for public access.

GEDI

Gene Expression Dynamics Inspector (GEDI), developed in the lab of Donald E. Ingber, M.D., Ph.D., at the Wyss Institute for Biologically Inspired Engineering at Harvard University, is a computational program that opens a new perspective for analyzing transcriptome data. By treating each high-dimensional sample — such as one transcriptome experiment — as an object, it accentuates and visualizes the genomewide response of a tissue or a patient and treats it as an integrated biological entity.

GEDI honors the new spirit of a systems-level approach in biology and unites a novel holistic perspective with the traditional gene-centered approach in molecular biology.

Reference

Eichler GS, Huang S, Ingber DE. Gene Expression Dynamics Inspector (GEDI): For integrative analysis of expression profiles. Bioinformatics. 2003; doi:10.1093/bioinformatics/btg307.

Questions?

Contact Dr. Ingber or Dr. Li.

GUM

The Gene Utility Model (GUM) is a novel computational pipeline used to understand the importance of genes under specific cellular contexts. GUM states that it is the utility of genes that provides selective pressure for the survival and fitness of aberrant cells.

It is possible to use GUM to construct a "utility karyotype" by mapping differentially used genes to their respective chromosomal loci. Further, GUM predicts whether the resulting utility karyotype can recapitulate, to a certain extent, the chromosomal aberrancies observed in diseases.

Reference

Ung CY, Levee TM, Zhang C, Correia C, Yeo KS, Li H, Zhu S. Gene utility recapitulates chromosomal aberrancies in advanced stage neuroblastoma. Computational and Structural Biotechnology Journal. 2022; doi:10.1016/j.csbj.2022.06.024.

Hypothesis-driven AI

Hypothesis-driven AI is a new class of AI that has not been previously described. Unlike conventional AI, hypothesis-driven AI is guided by the underlying hypothesis that can explain how a system behaves. This new AI technology offers a way to test a hypothesis and make new discoveries using an AI approach.

Hypothesis-driven AI offers a targeted and informed approach to address many of the challenges in diseases. Hypothesis-driven AI can perform focused investigations by centering on specific hypotheses or research questions and thus uses prior knowledge to guide its exploration. This approach can generate more interpretable and explainable results, compared with those of conventional AI tools. That's because the underlying hypotheses provide a mechanistic framework in which to understand the logic behind certain predictions or outcomes.

Hypothesis-driven AI tends to use resources more efficiently. It encourages the integration of domain-specific knowledge to generate meaningful insights within a specific context. Hypothesis-driven AI allows researchers to test and validate hypotheses via AI-mediated gedankenexperiments, a term coined by Albert Einstein, Ph.D., that means thought experiment. This in turn guides future experimental designs.

Reference

Xianyu Z, Correia C, Ung CY, Zhu S, Billadeau DD, Li H. The rise of hypothesis-driven artificial intelligence in oncology. Cancers. 2024; doi:10.3390/cancers16040822.

LIFE

Learning-Based Invariant Feature Engineering (LIFE) is a novel feature engineering platform. Symmetry refers to properties that remain invariant upon mathematical transformations. Yet it remains unexplored in biology and medicine.

We set out to explore symmetry relationships in gene expression to distinguish between healthy and disease states. We hypothesize that there are relationships between gene expressions that remain invariant across people displaying the same biological phenotypes.

Our Gene Expression Symmetry Hypothesis (GESH) posits that a set of genes exhibiting specific symmetrical relationships defines the invariant nature of phenotypic traits in cells. We deployed a hybrid machine learning approach and implemented it with two symmetric invariant feature functions (IFFs) to identify invariant feature genes (IFGs). IFGs are gene pairs for which IFF single-value outputs remain invariant across individual samples in each phenotype.

Our multiclass classification identified unique fingerprints across the transcriptomes derived from 25 normal organs, 25 cancer types and blood samples from people with four types of neurodegenerative diseases. We constructed networks from the IFGs (IF-Nets) and found that cancer IF-Net hubs were enriched with approved and clinical trial drugs, highlighting symmetry breaking as a novel treatment approach.

Reference

Zhang C, Correia C, Weiskittel T, Tan S-H, Zhang Z, Yeo K-S, Zhu S, Ung C-Y, Li H. Symmetry as a fundamental principle in defining gene expression and phenotypic traits. bioRxiv. 2025; doi:10.1101/2025.01.27.634930.

MALANI

Machine Learning-Assisted Network Inference (MALANI) is a hybrid computational platform that harnesses the power of both machine learning and network biology methodologies to provide new insights and improve the understanding of complex biological systems. MALANI assesses all genes, regardless of expression or mutational status in the context of disease etiology, by building more than 2 million machine learning models for reconstructing gene regulatory networks.

MALANI has the power to uncover "dark" disease genes that are neither mutated nor differentially expressed but play important pathological roles in disease development.

Reference

Ghanat Bari M, Yong Ung C, Zhang C, Zhu S, Li H. Machine learning-assisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks (PDF). Scientific Reports. 2017; doi:10.1038/s41598-017-07481-5.

Source code

Download.

MNI

Mode-of-action by network inference (MNI) is a reverse engineering network biology algorithm that identifies the gene targets and key mediators of a biomedical phenotype based on transcriptome data.

References

Brock A, Krause S, Li H, Kowalski M, Goldberg MS, Collins JJ, Ingber DE. Silencing HoxA1 by intraductal injection of siRNA lipidoid nanoparticles prevents mammary tumor progression in mice. Science Translational Medicine. 2014; doi:10.1126/scitranslmed.3007048.
di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nature Biotechnology. 2005; doi:10.1038/nbt1075.

Modified RNA

Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA.

Reference

Warren L, Manos PD, Ahfeldt T, Loh YH, Li H, Lau F, Ebina W, Mandal PK, Smith ZD, Meissner A, Daley GQ, Brack AS, Collins JJ, Cowan C, Schlaeger TM, Rossi DJ. Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA. Cell Stem Cell. 2010; doi:10.1016/j.stem.2010.08.012.

Multiregional GBM imaging and genetics

Integrated molecular and multiparametric MRI mapping of high-grade glioma identifies regional biologic signatures.

Reference

Hu LS, D'Angelo F, Weiskittel TM, Caruso FP, Fortin Ensign SP, Blomquist MR, Flick MJ, Wang L, Sereduk CP, Meng-Lin K, De Leon G, Nespodzany A, Urcuyo JC, Gonzales AC, Curtin L, Lewis EM, Singleton KW, Dondlinger T, Anil A, Semmineh NB, Noviello T, Patel RA, Wang P, Wang J, Eschbacher JM, Hawkins-Daarud A, Jackson PR, Grunfeld IS, Elrod C, Mazza GL, McGee SC, Paulson L, Clark-Swanson K, Lassiter-Morris Y, Smith KA, Nakaji P, Bendok BR, Zimmerman RS, Krishna C, Patra DP, Patel NP, Lyons M, Neal M, Donev K, Mrugala MM, Porter AB, Beeman SC, Jensen TR, Schmainda KM, Zhou Y, Baxter LC, Plaisier CL, Li J, Li H, Lasorella A, Quarles CC, Swanson KR, Ceccarelli M, Iavarone A, Tran NL. Integrated molecular and multiparametric MRI mapping of high-grade glioma identifies regional biologic signatures. Nature Communications. 2023; doi:10.1038/s41467-023-41559-1.

Source code

The source code is available for public access.

NetDecoder

NetDecoder is a network biology computational platform used to dissect context-specific biological networks and gene activities. NetDecoder provides freely available source code and web portal resources for researchers to explore genomewide context-dependent information flow profiles and key genes using pairwise phenotypic comparative analyses. NetDecoder also allows researchers to prioritize drug targets for genes that affect pathological contexts.

Reference

Brock A, Krause S, Li H, Kowalski M, Goldberg MS, Collins JJ, Ingber DE. Silencing HoxA1 by intraductal injection of siRNA lipidoid nanoparticles prevents mammary tumor progression in mice. Science Translational Medicine. 2014; doi:10.1126/scitranslmed.3007048.

Source code

Download.

Pathway modeling and simulation

One of the most commonly used approaches for modeling biological systems is that of ordinary differential equations (ODEs). In general, a differential equation can be used to describe the chemical reaction rate that depends on the change of participating species over time. A set of coupled ODEs can capture the temporal dynamic behavior of molecular species in the biological signaling pathway network.

References

Li H, Ung CY, Ma XH, Li BW, Low BC, Cao ZW, Chen YZ. Simulation of crosstalk between small GTPase RhoA and EGFR-ERK signaling pathway via MEKK1. Bioinformatics. 2009; doi:10.1093/bioinformatics/btn635
Li H, Ung CY, Ma XH, Liu XH, Li BW, Low BC, Chen YZ. Pathway sensitivity analysis for detecting pro-proliferation activities of oncogenes and tumor suppressors of epidermal growth factor receptor-extracellular signal-regulated protein kinase pathway at altered protein levels. Cancer. 2009; doi:10.1002/cncr.24485.
Ung CY, Li H, Ma XH, Jia J, Li BW, Low BC, Chen YZ. Simulation of the regulation of EGFR endocytosis and EGFR-ERK signaling by endophilin-mediated RhoA-EGFR crosstalk. FEBS Letters. 2008; doi:10.1016/j.febslet.2008.05.026.

PERMUTOR

Personalized mutation evaluator (PERMUTOR) is a novel computational pipeline that collects potent disease gene cooperative pathways to envision individualized disease etiology and therapies. Our algorithm constructs individualized disease networks and modules de novo, which enables us to elucidate the importance of mutated genes in specific patients and understand the synthetic penetrance of these genes across patients.

Individualized module disruption enables us to devise customized singular and combinatorial target therapies that are highly varied across patients, demonstrating the need for precision therapeutics pipelines. With the first analysis of de novo individualized disease networks and modules, we illustrate the power of individualized disease modules for precision medicine by providing deep novel insights on the activity of diseased genes in people.

Reference

Hu LS, D'Angelo F, Weiskittel TM, Caruso FP, Fortin Ensign SP, Blomquist MR, Flick MJ, Wang L, Sereduk CP, Meng-Lin K, De Leon G, Nespodzany A, Urcuyo JC, Gonzales AC, Curtin L, Lewis EM, Singleton KW, Dondlinger T, Anil A, Semmineh NB, Noviello T, Patel RA, Wang P, Wang J, Eschbacher JM, Hawkins-Daarud A, Jackson PR, Grunfeld IS, Elrod C, Mazza GL, McGee SC, Paulson L, Clark-Swanson K, Lassiter-Morris Y, Smith KA, Nakaji P, Bendok BR, Zimmerman RS, Krishna C, Patra DP, Patel NP, Lyons M, Neal M, Donev K, Mrugala MM, Porter AB, Beeman SC, Jensen TR, Schmainda KM, Zhou Y, Baxter LC, Plaisier CL, Li J, Li H, Lasorella A, Quarles CC, Swanson KR, Ceccarelli M, Iavarone A, Tran NL. Integrated molecular and multiparametric MRI mapping of high-grade glioma identifies regional biologic signatures. Nature Communications. 2023; doi:10.1038/s41467-023-41559-1.

Source code

The source code is available for public access.

P-Map

Phenotype mapping (P-Map) is a network-based approach used to identify genes and regulatory networks that modulate drug response phenotypes.

Reference

Cairns, J., Ung, C., da Rocha, E. et al. A network-based phenotype mapping approach to identify genes that modulate drug response phenotypes. Scientific Reports. 2016; doi:10.1038/srep37003.

Source code:

Download.

RSI

Regulostat Inferelator (RSI) is a novel computational algorithm to decipher intrinsic molecular devices called regulostats. Regulostats predetermine cellular phenotypic responses.

Reference

Yong Ung C, Ghanat Bari M, Zhang C, Liang J, Correia C, Li H. Regulostat Inferelator: A novel network biology platform to uncover molecular devices that predetermine cellular response phenotypes. Nucleic Acids Research. 2019; doi:10.1093/nar/gkz417.

Web interface and source code

Download.

sn-m6A-CT data analysis

Single-nucleus m6A-CUT&Tag (sn-m6A-CT) is for simultaneous profiling of m6A methylomes and transcriptomes within a single nucleus. sn-m6A-CT can enrich m6A-marked RNA molecules in situ without isolating RNA from cells. sn-m6A-CT profiling is sufficient to determine cell identity, and it allows the generation of cell type-specific m6A methylome landscapes from heterogeneous populations.

Reference

Hamashima K, Wong KW, Sam TW, Teo JHJ, Taneja R, Le MTN, Li QJ, Hanna JH, Li H, Loh YH. Single-nucleus multiomic mapping of m6A methylomes and transcriptomes in native populations of cells with sn-m6A-CT. Molecular Cell. 2023; doi:10.1016/j.molcel.2023.08.010.

Source code

The source code is available for public access.

SPIN-AI

Spatially resolved sequencing technologies help us dissect how cells are organized in space. Several available computational approaches focus on the identification of spatially variable genes (SVGs). These are genes with expression patterns that vary in space.

Detecting SVGs is analogous to identifying differentially expressed genes. It permits us to understand how genes and associated molecular processes are spatially distributed within cellular niches. However, the expression activities of SVGs fail to encode all information inherent to the spatial distribution of cells.

Here, we devised a deep learning model — Spatially Informed Artificial Intelligence (SPIN-AI) — to identify spatially predictive genes (SPGs). These are genes with expression that can predict how cells are organized in space without any prior assumptions of spatial distribution.

We used SPIN-AI on spatial transcriptomic data from squamous cell carcinoma as a proof of concept. Our results demonstrated that SPGs not only recapitulate the biology of squamous cell carcinoma but also identify genes distinct from SVGs. Moreover, we found a substantial number of ribosomal genes that are SPGs but not SVGs.

Since SPGs can predict spatial cellular organization, we reason that SPGs capture more biologically relevant information for a given cellular niche. Hence, SPIN-AI has broad applications for detecting SPGs and uncovering which biological processes play important roles in governing cellular organization.

Reference

Meng-Lin, K., Ung, C.-Y., Zhang, C., Weiskittel, T. M., Wisniewski, P., Zhang, Z., Tan, S.-H., Yeo, K.-S., Zhu, S., Correia, C., & Li, H. (2023). SPIN-AI: A deep learning model that identifies spatially predictive genes. Biomolecules. 2023; doi.org/10.3390/biom13060895.

Source code

The source code is available for public access.

StemSite

StemSite is a database network of transcriptional gene regulators for identifying and engineering the developmental origin of mouse hematopoietic stem cells.

Reference

McKinney-Freeman S, Cahan P, Li H, Lacadie SA, Huang HT, Curran M, Loewer S, Naveiras O, Kathrein KL, Konantz M, Langdon EM, Lengerke C, Zon LI, Collins JJ, Daley GQ. The transcriptional landscape of hematopoietic stem cell ontogeny. Cell Stem Cell. 2012; doi:10.1016/j.stem.2012.07.018.

Database

Access the database.

More about research at Mayo Clinic

ART-20581150

Integrated Systems Biology and AI — Tailored Pharmacology and Precision Medicine: Hu Li

Software and data

ANNE

Reference

Source code

ASTAR-seq

References

Source code

CellNet

References

Materials available

CLR

Reference

Source code

Computational drug discovery platform, machine learning, feature selection, AI drug discoveries

References

Source code

DPYD-Varifier

Reference

EDDI

Reference

Source code

GEDI

Reference

Questions?

GUM

Reference

Hypothesis-driven AI

Reference

LIFE

Reference

MALANI

Reference

Source code

MNI

References

Modified RNA

Reference

Multiregional GBM imaging and genetics

Reference

Source code

NetDecoder

Reference

Source code

Pathway modeling and simulation

References

PERMUTOR

Reference

Source code

P-Map

Reference

Source code:

RSI

Reference

Web interface and source code

sn-m6A-CT data analysis

Reference

Source code

SPIN-AI

Reference

Source code

StemSite

Reference

Database

More about research at Mayo Clinic

Mayo Clinic Footer