Question: Rsubread for microbiome data analysis: genome >4GB
gravatar for W. Evan Johnson
6 months ago by
United States
W. Evan Johnson800 wrote:

Dear RSubread developers and users:

We (my group) appreciate the Rsubread aligner! We have found it to be very useful and its great being able to do all of our genome analysis, including alignment, completely in R.

We are trying to use/optimize Rsubread to conduct some microbiome data analysis. Naturally, our microbial reference libraries are >>4GB. We are getting the following error:

"ERROR: The chromosome data contains too many bases. The size of the input FASTA files should be less than 4G Bytes"

So I am assuming that I need to break my reference genome library into chunks <4GB, do the alignments, and then merge the .bam files? We have been using Bowtie2, and have to do the same thing. Just asking if anyone knows an easier or more concise way to do it. If not, we will move forward with the separate libraries/merging .bam approach.






microbiome alignment rsubread • 156 views
ADD COMMENTlink modified 6 months ago by Wei Shi3.1k • written 6 months ago by W. Evan Johnson800
Answer: Rsubread for microbiome data analysis: genome >4GB
gravatar for Wei Shi
6 months ago by
Wei Shi3.1k
Wei Shi3.1k wrote:

Dear Evan,

Yes Rsubread cannot build an index for the reference sequences containing more than 4 billion bases. You will have to break down your reference libraries into blocks each containing < 4 billion bases and then map your reads to each block separately.

The buildindex() function adds a small padding sequence before the start of each chromosome and also after the end of each chromosome. So if your reference blocks each contain no more than 3.9 billion bases buildindex() should run fine.

Best wishes,



ADD COMMENTlink written 6 months ago by Wei Shi3.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 281 users visited in the last hour