Lianming Du, Chaoyue Geng, Qianglin Zeng, Ting Huang, Jie Tang, Yiwen Chu, Kelei Zhao. Dockey: a modern integrated tool for large-scale molecular docking and virtual screening. Briefings in Bioinformatics, 2023, 24(2):bbad047.

Molecular docking is a structure-based and computer-aided drug design approach that plays a pivotal role in drug discovery and pharmaceutical research. AutoDock is the most widely used molecular docking tool for study of protein–ligand interactions and virtual screening. Although many tools have been developed to streamline and automate the AutoDock docking pipeline, some of them still use outdated graphical user interfaces and have not been updated for a long time. Meanwhile, some of them lack cross-platform compatibility and evaluation metrics for screening lead compound candidates. To overcome these limitations, we have developed Dockey, a flexible and intuitive graphical interface tool with seamless integration of several useful tools, which implements a complete docking pipeline covering molecular sanitization, molecular preparation, paralleled docking execution, interaction detection and conformation visualization. Specifically, Dockey can detect the non-covalent interactions between small molecules and proteins and perform cross-docking between multiple receptors and ligands. It has the capacity to automatically dock thousands of ligands to multiple receptors and analyze the corresponding docking results in parallel. All the generated data will be kept in a project file that can be shared between any systems and computers with the pre-installation of Dockey. We anticipate that these unique characteristics will make it attractive for researchers to conduct large-scale molecular docking without complicated operations, particularly for beginners. Dockey is implemented in Python and freely available at


Lianming Du, Qin Liu, Zhenxin Fan, Jie Tang, Xiuyue Zhang, Megan Price, Bisong Yue, Kelei Zhao. Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files. Briefings in Bioinformatics, 2021, 22(4):bbaa368.

FASTA and FASTQ are the most widely used biological data formats that have become the de facto standard to exchange sequence data between bioinformatics tools. With the avalanche of next-generation sequencing data, the amount of sequence data being deposited and accessed in FASTA/Q formats is increasing dramatically. However, the existing tools have very low efficiency at random retrieval of subsequences due to the requirement of loading the entire index into memory. In addition, most existing tools have no capability to build index for large FASTA/Q files because of the limited memory. Furthermore, the tools do not provide support to randomly accessing sequences from FASTA/Q files compressed by gzip, which is extensively adopted by most public databases to compress data for saving storage. In this study, we developed pyfastx as a versatile Python package with commonly used command-line tools to overcome the above limitations. Compared to other tools, pyfastx yielded the highest performance in terms of building index and random access to sequences, particularly when dealing with large FASTA/Q files with hundreds of millions of sequences. A key advantage of pyfastx over other tools is that it offers an efficient way to randomly extract subsequences directly from gzip compressed FASTA/Q files without needing to uncompress beforehand. Pyfastx can easily be installed from PyPI ( and the source code is freely available at


Lianming Du, Qin Liu, Kelei Zhao, Jie Tang, Xiuyue Zhang, Bisong Yue, Zhenxin Fan. PSMD: An extensive database for pan-species microsatellite investigation and marker development. Molecular Ecology Resources, 2020, 20(1):283-291.

Microsatellites are widely distributed throughout nearly all genomes which have been extensively exploited as powerful genetic markers for diverse applications due to their high polymorphisms. Their length variations are involved in gene regulation and implicated in numerous genetic diseases even in cancers. Although much effort has been devoted in microsatellite database construction, the existing microsatellite databases still had some drawbacks, such as limited number of species, unfriendly export format, missing marker development, lack of compound microsatellites and absence of gene annotation, which seriously restricted researchers to perform downstream analysis. In order to overcome the above limitations, we developed PSMD (Pan-Species Microsatellite Database, as a web-based database to facilitate researchers to easily identify microsatellites, exploit reliable molecular markers and compare microsatellite distribution pattern on genome-wide scale. In current release, PSMD comprises 678,106,741 perfect microsatellites and 43,848,943 compound microsatellites from 18,408 organisms, which covered almost all species with available genomic data. In addition to interactive browse interface, PSMD also offers a flexible filter function for users to quickly gain desired microsatellites from large data sets. PSMD allows users to export GFF3 formatted file and CSV formatted statistical file for downstream analysis. We also implemented an online tool for analysing occurrence of microsatellites with user-defined parameters. Furthermore, Primer3 was embedded to help users to design high-quality primers with customizable settings. To our knowledge, PSMD is the most extensive resource which is likely to be adopted by scientists engaged in biological, medical, environmental and agricultural research.


Lianming Du, Tao Guo, Qin Liu, Jing Li, Xiuyue Zhang, Jinchuan Xing, Bisong Yue, Jing Li, Zhenxin Fan. MACSNVdb: a high-quality SNV database for interspecies genetic divergence investigation among macaques. Database, 2020, 2020:baaa027.

Macaques are the most widely used non-human primates in biomedical research. The genetic divergence between these animal models is responsible for their phenotypic differences in response to certain diseases. However, the macaque single nucleotide polymorphism resources mainly focused on rhesus macaque (Macaca mulatta), which hinders the broad research and biomedical application of other macaques. In order to overcome these limitations, we constructed a database named MACSNVdb that focuses on the interspecies genetic diversity among macaque genomes. MACSNVdb is a web-enabled database comprising ~74.51 million high-quality non-redundant single nucleotide variants (SNVs) identified among 20 macaque individuals from six species groups (muttla, fascicularis, sinica, arctoides, silenus, sylvanus). In addition to individual SNVs, MACSNVdb also allows users to browse and retrieve groups of user-defined SNVs. In particular, users can retrieve non-synonymous SNVs that may have deleterious effects on protein structure or function within macaque orthologs of human disease and drug-target genes. Besides position, alleles and flanking sequences, MACSNVdb integrated additional genomic information including SNV annotations and gene functional annotations. MACSNVdb will facilitate biomedical researchers to discover molecular mechanisms of diverse responses to diseases as well as primatologist to perform population genetic studies. We will continue updating MACSNVdb with newly available sequencing data and annotation to keep the resource up to date.


Kelei Zhao, Ting Huang, Jiafu Lin, Chaochao Yan, Lianming Du, Tao Song, Jing Li, Yidong Guo, Yiwen Chu, Junfeng Deng, Xinrong Wang, Chaolan Liu, Yingshun Zhou. Genetic and Functional Diversity of Pseudomonas aeruginosa in Patients With Chronic Obstructive Pulmonary Disease. Frontiers in Microbiology, 2020, 11:598478.

Pseudomonas aeruginosa is the most relevant pathogen to the severe exacerbations of patients with chronic obstructive pulmonary disease (COPD). However, the genetic and functional characteristics of P. aeruginosa isolates from COPD airways still remain less understood. In this study, the genetic, phylogenetic, phenotypic, and transcriptional features of P. aeruginosa isolates from COPD sputa were comprehensively explored by susceptibility testing, comparative-genomic analysis, phylogenetic analysis, phenotypic profiling, and comparative-transcriptomic analysis. We found that P. aeruginosa was prevalent in elder COPD patients and highly resisted to many commonly used antibiotics. P. aeruginosa COPD isolates harbored a substantial number of variant sites that might influence the primary metabolism and substance transport system. These isolates were discretely distributed in the phylogenetic tree and clustered with internationally collected P. aeruginosa in two major groups, and could be classified into three groups according to their differences in virulence-related phenotypes. Furthermore, the transcriptional patterns of COPD isolates could be classified into PAO1-like group with reduced protein secretion and motility and PAO1-distinct group with decreased substance transport but enhanced primary metabolism. In conclusion, this study demonstrates that P. aeruginosa isolates from COPD patients have abundant genetic and phenotypic diversity, and provides an important reference for further exploring the survival strategy of P. aeruginosa in COPD airways and the development of anti-pseudomonal therapy.


Lianming Du, Qin Liu, Fujun Shen, Zhenxin Fan, Rong Hou, Bisong Yue, Xiuyue Zhang. Transcriptome analysis reveals immune-related gene expression changes with age in giant panda (Ailuropoda melanoleuca) blood. Aging, 2019, 11(1):249-262.

The giant panda (Ailuropoda melanoleuca), an endangered species endemic to western China, has long been threatened with extinction that is exacerbated by highly contagious and fatal diseases. Aging is the most well-defined risk factor for diseases and is associated with a decline in immune function leading to increased susceptibility to infection and reduced response to vaccination. Therefore, this study aimed to determine which genes and pathways show differential expression with age in blood tissues. We obtained 210 differentially expressed genes by RNA-seq, including 146 up-regulated and 64 down-regulated genes in old pandas (18-21yrs) compared to young pandas (2-6yrs). We identified ISG15, STAT1, IRF7 and DDX58 as the hub genes in the protein-protein interaction network. All of these genes were up-regulated with age and played important roles in response to pathogen invasion. Functional enrichment analysis indicated that up-regulated genes were mainly involved in innate immune response, while the down-regulated genes were mainly related to B cell activation. These may suggest that the innate immunity is relatively well preserved to compensate for the decline in the adaptive immune function. In conclusion, our findings will provide a foundation for future studies on the molecular mechanisms underlying immune changes associated with ageing.


Kelei Zhao, Linjie Liu, Xiaojie Chen, Ting Huang, Lianming Du, Jiafu Lin, Yang Yuan, Yingshun Zhou, Bisong Yue, Kun Wei, Yiwen Chu. Behavioral heterogeneity in quorum sensing can stabilize social cooperation in microbial populations. BMC Biology, 2019, 17(1):20.

Background: Microbial communities are susceptible to the public goods dilemma, whereby individuals can gain an advantage within a group by utilizing, but not sharing the cost of producing, public goods. In bacteria, the development of quorum sensing (QS) can establish a cooperation system in a population by coordinating the production of costly and sharable extracellular products (public goods). Cooperators with intact QS system and robust ability in producing public goods are vulnerable to being undermined by QS-deficient defectors that escape from QS but benefit from the cooperation of others. Although microorganisms have evolved several mechanisms to resist cheating invasion in the public goods game, it is not clear why cooperators frequently coexist with defectors and how they form a relatively stable equilibrium during evolution. Results: We show that in Pseudomonas aeruginosa, QS-directed social cooperation can select a conditional defection strategy prior to the emergence of QS-mutant defectors, depending on resource availability. Conditional defectors represent a QS-inactive state of wild type (cooperator) individual and can invade QS-activated cooperators by adopting a cheating strategy, and then revert to cooperating when there are abundant nutrient supplies irrespective of the exploitation of QS-mutant defector. Our mathematical modeling further demonstrates that the incorporation of conditional defection strategy into the framework of iterated public goods game with sound punishment mechanism can lead to the coexistence of cooperator, conditional defector, and defector in a rock-paper-scissors dynamics. Conclusions: These findings highlight the importance of behavioral heterogeneity in stabilizing the population structure and provide a potential reasonable explanation for the maintenance and evolution of cooperation in microbial communities.


Jie Tang, Lianming Du, Yuanmei Liang, Maurycy Daroch. Complete Genome Sequence and Comparative Analysis of Synechococcus sp. CS-601 (SynAce01), a Cold-Adapted Cyanobacterium from an Oligotrophic Antarctic Habitat. International Journal of Molecular Sciences, 2019, 20(1):152.

Marine picocyanobacteria belonging to Synechococcus are major contributors to the global carbon cycle, however the genomic information of its cold-adapted members has been lacking to date. To fill this void the genome of a cold-adapted planktonic cyanobacterium Synechococcus sp. CS-601 (SynAce01) has been sequenced. The genome of the strain contains a single chromosome of approximately 2.75 MBp and GC content of 63.92%. Gene prediction yielded 2984 protein coding sequences and 44 tRNA genes. The genome contained evidence of horizontal gene transfer events during its evolution. CS-601 appears as a transport generalist with some specific adaptation to an oligotrophic marine environment. It has a broad repertoire of transporters of both inorganic and organic nutrients to survive in inhospitable environments. The cold adaptation of the strain exhibited characteristics of a psychrotroph rather than psychrophile. Its salt adaptation strategy is likely to rely on the uptake and synthesis of osmolytes, like glycerol or glycine betaine. Overall, the genome reveals two distinct patterns of adaptation to the inhospitable environment of Antarctica. Adaptation to an oligotrophic marine environment is likely due to an abundance of genes, probably acquired horizontally, that are associated with increased transport of nutrients, osmolytes, and light harvesting. On the other hand, adaptations to low temperatures are likely due to prolonged evolutionary changes.


Kai Cui, Wujiao Li, Jake George James, Changjun Peng, Jiazheng Jin, Chaochao Yan, Zhenxin Fan, Lianming Du, Megan Price, Yongjie Wu, Bisong Yue. The first draft genome of Lophophorus: A step forward for Phasianidae genomic diversity and conservation. Genomics, 2019, 111(6):1209-1215.

The monal genus (Lophophorus) is a branch of Phasianidae and its species inhabit the high-altitude mountains of the Qinghai-Tibet Plateau. The Chinese monal, L. lhuysii, is a threatened endemic bird of China that possesses high-altitude adaptability, diversity of plumage color and potentially low reproductive life history. This is the first study to describe the monal genome using next generation sequencing technology. The Chinese monal genome size is 1.01 Gb, with 16,940 protein-coding genes. Gene annotation yielded 100.93 Mb (9.97%) repeat elements, 785 ncRNA, 5,465,549 bp (0.54%) SSR and 15,550 (92%) genes in public databases. Compared to other birds and mammals, the genome evolution analysis showed numerous expanded gene families and positive selected genes involved in high-altitude adaptation, especially related to the adaptation of low temperature and hypoxia. Consequently, this gene data can be used to investigate the molecular evolution of high-altitude adaptation in future bird research. Our first published genome of the genus Lophophorus will be integral for the study of monal population genetic diversity and conservation, genomic evolution and Galliformes species differentiation in the Qinghai-Tibetan Plateau.


Kelei Zhao, Lianming Du, Jiafu Lin, Yang Yuan, Xiwei Wang, Bisong Yue, Xinrong Wang, Yidong Guo, Yiwen Chu, Yingshun Zhou. Pseudomonas aeruginosa Quorum-Sensing and Type VI Secretion System Can Direct Interspecific Coexistence During Evolution. Frontiers in Microbiology, 2018, 9:2287.

It is reported that a wide range of bacterial infections are polymicrobial, and the members in a local microcommunity can influence the growth of neighbors through physical and chemical interactions. Pseudomonas aeruginosa is an important opportunistic pathogen that normally causes a variety of acute and chronic infections, and clinical evidences suggest that P. aeruginosa can be frequently coisolated with other pathogens from the patients with chronic infections. However, the interspecific interaction and the coexisting mechanism of P. aeruginosa with coinfecting bacterial species during evolution still remain largely unclear. In this study, the relationships of P. aeruginosa with other Gram-positive (Staphylococcus aureus) and Gram-negative (Klebsiella pneumoniae) are investigated by using a series of on-plate proximity assay, in vitro coevolution assay, and RNA-sequencing. We find that although the development of a quorum-sensing system contributes P. aeruginosa a significant growth advantage to compete with S. aureus and K. pneumoniae, the quorum-sensing regulation of P. aeruginosa will be decreased during evolution and thus provides a basis for the formation of interspecific coexistence. The results of comparative transcriptomic analyses suggest that the persistent survival of S. aureus in the microcommunity has no significant effect on the intracellular transcriptional pattern of P. aeruginosa, while a more detailed competition happens between P. aeruginosa and K. pneumoniae. Specifically, the population of P. aeruginosa with decreased quorum-sensing regulation can still restrict the proportion increase of K. pneumoniae by enhancing the type VI secretion system-elicited cell aggressivity during further coevolution. These findings provide a general explanation for the formation of a dynamic stable microcommunity consisting of more than two bacterial species, and may contribute to the development of population biology and clinical therapy.


Lianming Du, Chi Zhang, Qin Liu, Xiuyue Zhang, Bisong Yue. Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design. Bioinformatics, 2018, 34(4):681–683.

Summary Microsatellites are found to be related with various diseases and widely used in population genetics as genetic markers. However, it remains a challenge to identify microsatellite from large genome and screen microsatellites for primer design from a huge result dataset. Here, we present Krait, a robust and flexible tool for fast investigation of microsatellites in DNA sequences. Krait is designed to identify all types of perfect or imperfect microsatellites on a whole genomic sequence, and is also applicable to identification of compound microsatellites. Primer3 was seamlessly integrated into Krait so that users can design primer for microsatellite amplification in an efficient way. Additionally, Krait can export microsatellite results in FASTA or GFF3 format for further analysis and generate statistical report as well as plotting. Availability and implementation Krait is freely available at under GPL2 License, implemented in C and Python, and supported on Windows, Linux and Mac operating systems. Supplementary information Supplementary data are available at Bioinformatics online.