Question: Using Rsubread with FASTA sequence files over 4 GB
gravatar for bryan.penning
13 months ago by
bryan.penning0 wrote:


I want to use Rsubread for RNAseq analysis but I  am working with wheat. My sequence file is well over the 4 GB limit for the FASTA file with all chromosomes (approaching 15 GB) so I get the following error from Rsubread:

//================================= Running ==================================\\

|| ||

|| Check the integrity of provided reference sequences ... ||

|| No format issues were found ||

|| Scan uninformative subreads in reference sequences ... ||

|| 8%, 5 mins elapsed, rate=6749.5k bps/s, total=14581m ||

|| 16%, 10 mins elapsed, rate=5156.9k bps/s, total=14581m ||

|| 24%, 16 mins elapsed, rate=4356.8k bps/s, total=14581m

ERROR: The chromosome data contains too many bases. The size of the input FASTA files should be less than 4G Bytes. 

My statement:


quit(save = "no", status = 0, runLast = FALSE)

Is there a workaround to get the sequences indexed?  Is there a way to do individual chromosomes and stitch them together later?


A second issue may also rear it's head.  Since the chromosome sequences are so long, it is recommended to use split chromosome files with mapping programs such as Tophat, STAR, and BWA because the BAM files are too big to be indexed by samtools.  Does anyone know if this will affect Subread?



ADD COMMENTlink modified 12 months ago by Wei Shi2.8k • written 13 months ago by bryan.penning0
gravatar for Wei Shi
12 months ago by
Wei Shi2.8k
Wei Shi2.8k wrote:

Hi Bryan, as shown in the error message Rsubread cannot build an index for a reference genome that includes more than 4 gigabases. You will have to split the genome into two or more pieces before you can build index and perform mapping.

ADD COMMENTlink written 12 months ago by Wei Shi2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 168 users visited in the last hour