副教授,博士生导师,湖湘青年英才,近年来在IJCAI、Bioinformatics、Briefings in Bioinformatics和 PLoS Computational Biology等期刊和会议上发表论文30余篇,获ACM SIGBIO中国新星奖。担任国际SCI期刊Journal of Translational Medicine(中科院二区)副主编
Deep generative models are gaining attention in the field of de novo drug design. However, the rational design of ligand molecules for novel targets remains challenging, particularly in controlling the properties of the generated molecules. Here, inspired by the DNA-encoded compound library technique, we introduce DeepBlock, a deep learning approach for block-based ligand generation tailored to target protein sequences while enabling precise property control. DeepBlock neatly divides the generation process into two steps: building blocks generation and molecule reconstruction, accomplished by a neural network and a rule-based reconstruction algorithm we proposed, respectively. Furthermore, DeepBlock synergizes the optimization algorithm and deep learning to regulate the properties of the generated molecules. Experiments show that DeepBlock outperforms existing methods in generating ligands with affinity, synthetic accessibility and drug likeness. Moreover, when integrated with simulated annealing or Bayesian optimization using toxicity as the optimization objective, DeepBlock successfully generates ligands with low toxicity while preserving affinity with the target.
A Foundation Model Identifies Broad-Spectrum Antimicrobial Peptides against Drug-Resistant Bacterial Infection
Tingting Li, Xuanbai Ren, Xiaoli Luo, Zhuole Wang, Zhenlu Li, Xiaoyan Luo, Jun Shen, Yun Li, Dan Yuan, Ruth Nussinov, Xiangxiang Zeng, Junfeng Shi & Feixiong Cheng
Development of potent and broad-spectrum antimicrobial peptides (AMPs) could help overcome the antimicrobial resistance crisis. We develop a peptide language-based deep generative framework (deepAMP) for identifying potent, broad-spectrum AMPs. Using deepAMP to reduce antimicrobial resistance and enhance the membrane-disrupting abilities of AMPs, we identify, synthesize, and experimentally test 18 T1-AMP (Tier 1) and 11 T2-AMP (Tier 2) candidates in a two-round design and by employing cross-optimization-validation. More than 90% of the designed AMPs show a better inhibition than penetratin in both Gram-positive (i.e., S. aureus) and Gram-negative bacteria (i.e., K. pneumoniae and P. aeruginosa). T2-9 shows the strongest antibacterial activity, comparable to FDA-approved antibiotics. We show that three AMPs (T1-2, T1-5 and T2-10) significantly reduce resistance to S. aureus compared to ciprofloxacin and are effective against skin wound infection in a female wound mouse model infected with P. aeruginosa. In summary, deepAMP expedites discovery of effective, broad-spectrum AMPs against drug-resistant bacteria.
Self-assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self-assembling peptides from scratch. This model explores the self-assembly properties by molecular structure, leveraging 1,377 self-assembling non-peptidal small molecules to navigate chemical space and improve structural diversity. Utilizing HydrogelFinder, 111 peptide candidates are generated and synthesized 17 peptides, subsequently experimentally validating the self-assembly and biophysical characteristics of nine peptides ranging from 1–10 amino acids—all achieved within a 19-day workflow. Notably, the two de novo-designed self-assembling peptides demonstrated low cytotoxicity and biocompatibility, as confirmed by live/dead assays. This work highlights the capacity of HydrogelFinder to diversify the design of self-assembling peptides through non-peptidal small molecules, offering a powerful toolkit and paradigm for future peptide discovery endeavors.
The clinical efficacy and safety of a drug is determined by its molecular properties and targets in humans. However, proteome-wide evaluation of all compounds in humans, or even animal models, is challenging. In this study, we present an unsupervised pretraining deep learning framework, named ImageMol, pretrained on 10 million unlabelled drug-like, bioactive molecules, to predict molecular targets of candidate compounds. The ImageMol framework is designed to pretrain chemical representations from unlabelled molecular images on the basis of local and global structural characteristics of molecules from pixels. We demonstrate high performance of ImageMol in evaluation of molecular properties (that is, the drug’s metabolism, brain penetration and toxicity) and molecular target profiles (that is, beta-secretase enzyme and kinases) across 51 benchmark datasets. ImageMol shows high accuracy in identifying anti-SARS-CoV-2 molecules across 13 high-throughput experimental datasets from the National Center for Advancing Translational Sciences. Via ImageMol, we identified candidate clinical 3C-like protease inhibitors for potential treatment of COVID-19.
Accurate molecular representation of compounds is a fundamental challenge for prediction of drug targets and molecular properties. In this study, we present a molecular video-based foundation model, named VideoMol, pretrained on 120 million frames of 2 million unlabeled drug-like and bioactive molecules. VideoMol renders each molecule as a video with 60-frame and designs three self-supervised learning strategies on molecular videos to capture molecular representation. We show high performance of VideoMol in predicting molecular targets and properties across 43 drug discovery benchmark datasets. VideoMol achieves high accuracy in identifying antiviral molecules against common diverse disease-specific drug targets (i.e., BACE1 and EP4). Drugs screened by VideoMol show better binding affinity than molecular docking, revealing the effectiveness in understanding the three-dimensional structure of molecules. We further illustrate interpretability of VideoMol using key chemical substructures.
Optimizing molecules with desired properties is a crucial step in de novo drug design. While translation-based methods have achieved initial success, they continue to face the challenge of the “exposure bias” problem. The challenge of preventing the “exposure bias” problem of molecule optimization lies in the need for both positive and negative molecules of contrastive learning. That is because generating positive molecules through data augmentation requires domain-specific knowledge, and randomly sampled negative molecules are easily distinguished from the real molecules. Hence, in this work, we propose a molecule optimization method called GPMO, which leverages a gradient perturbation-based contrastive learning method to prevent the “exposure bias” problem in translation-based molecule optimization. With the assistance of positive and negative molecules, GPMO is able to effectively handle both real and artificial molecules. GPMO is a molecule optimization method that is conditioned on matched molecule pairs for drug discovery. Our empirical studies show that GPMO outperforms the state-of-the- art molecule optimization methods. Furthermore, the negative and positive perturbations improve the robustness of GPMO.
Inductive Knowledge Graph Completion (KGC) aims to infer
missing facts between newly emerged entities within knowledge graphs (KGs), posing a significant challenge. While recent studies have shown promising results in inferring such
entities through knowledge subgraph reasoning, they suffer
from (i) the semantic inconsistencies of similar relations, and
(ii) noisy interactions inherent in KGs due to the presence
of unconvincing knowledge for emerging entities. To address
these challenges, we propose a Semantic Structure-aware
Denoising Network (S2DN) for inductive KGC. Our goal is
to learn adaptable general semantics and reliable structures
to distill consistent semantic knowledge while preserving reliable interactions within KGs. Specifically, we introduce a
semantic smoothing module over the enclosing subgraphs
to retain the universal semantic knowledge of relations. We
incorporate a structure refining module to filter out unreliable interactions and offer additional knowledge, retaining ro
bust structure surrounding target links. Extensive experiments
conducted on three benchmark KGs demonstrate that S2DN
surpasses the performance of state-of-the-art models. These
results demonstrate the effectiveness of S2DN in preserving
semantic consistency and enhancing the robustness of filtering out unreliable interactions in contaminated KGs. Code is
available at https://github.com/xiaomingaaa/SDN
Molecular design inherently involves the optimization of multiple conflicting objectives, such as enhancing bio-activity and ensuring synthesizability. Evaluating these objectives often requires resource-intensive computations or physical experiments. Current molecular design methodologies typically approximate the Pareto set using a limited number of molecules. In this paper, we present an innovative approach, called Multi-Objective Molecular Design through Learning Latent Pareto Set (MLPS). MLPS initially utilizes an encoder-decoder model to seamlessly transform the discrete chemical space into a continuous latent space. We then employ local Bayesian optimization models to efficiently search for local optimal solutions (i.e., molecules) within predefined trust regions. Using surrogate objective values derived from these local models, we train a global Pareto set learning model to understand the mapping between direction vectors (called ``preferences'') in the objective space and the entire Pareto set in the continuous latent space. Both the global Pareto set learning model and local Bayesian optimization models collaborate to discover high-quality solutions and adapt the trust regions dynamically. Our work is the first endeavor towards learning the Pareto set for multi-objective molecular design, providing decision-makers with the capability to fine-tune their preferences and thoroughly explore the Pareto set. Experimental results demonstrate that MLPS achieves state-of-the-art performance across various multi-objective scenarios, encompassing diverse objective types and varying numbers of objectives.