FASTA and FASTQ are the most widely used biological data formats that have become the de facto standard to exchange sequence data between bioinformatics tools. However, the existing tools have very low efficiency at random retrieval of subsequences due to the requirement of loading the entire index into memory. In addition,most existing tools have no capability to build index for large FASTA/Q files because of the limited memory. We developed pyfastx as a versatile Python package with commonly used command-line tools to overcome the above limitations.
Source code: https://github.com/lmdu/pyfastx
Documentation: https://pyfastx.readthedocs.io/en/latest/