Question

Using Rsubread with FASTA sequence files over 4 GB

0

Entering edit mode

bryan.penning • 0

@bryanpenning-13114

Last seen 4.1 years ago

Hi,

I want to use Rsubread for RNAseq analysis but I am working with wheat. My sequence file is well over the 4 GB limit for the FASTA file with all chromosomes (approaching 15 GB) so I get the following error from Rsubread:

//================================= Running ==================================\\

|| ||

|| Check the integrity of provided reference sequences ... ||

|| No format issues were found ||

|| Scan uninformative subreads in reference sequences ... ||

|| 8%, 5 mins elapsed, rate=6749.5k bps/s, total=14581m ||

|| 16%, 10 mins elapsed, rate=5156.9k bps/s, total=14581m ||

|| 24%, 16 mins elapsed, rate=4356.8k bps/s, total=14581m

ERROR: The chromosome data contains too many bases. The size of the input FASTA files should be less than 4G Bytes.

My statement:

buildindex(basename="Wheatfull_Ref_index",reference="Chinese_Spring.fasta",memory=30000)

quit(save = "no", status = 0, runLast = FALSE)

Is there a workaround to get the sequences indexed? Is there a way to do individual chromosomes and stitch them together later?

A second issue may also rear it's head. Since the chromosome sequences are so long, it is recommended to use split chromosome files with mapping programs such as Tophat, STAR, and BWA because the BAM files are too big to be indexed by samtools. Does anyone know if this will affect Subread?

Thanks!

Bryan

Rsubread buildindex fastafile • 1.6k views

ADD COMMENT • link updated 6.9 years ago by Wei Shi ★ 3.6k • written 6.9 years ago by bryan.penning • 0

score 0 · Answer 1 · 2017-05-29

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 15 days ago

Australia/Melbourne/Olivia Newton-John …

Hi Bryan, as shown in the error message Rsubread cannot build an index for a reference genome that includes more than 4 gigabases. You will have to split the genome into two or more pieces before you can build index and perform mapping.

ADD COMMENT • link 6.9 years ago Wei Shi ★ 3.6k