Using Rsubread with FASTA sequence files over 4 GB
1
0
Entering edit mode
@bryanpenning-13114
Last seen 8 days ago
United States

Hi,

I want to use Rsubread for RNAseq analysis but I  am working with wheat. My sequence file is well over the 4 GB limit for the FASTA file with all chromosomes (approaching 15 GB) so I get the following error from Rsubread:

//================================= Running ==================================\\

|| ||

|| Check the integrity of provided reference sequences ... ||

|| No format issues were found ||

|| Scan uninformative subreads in reference sequences ... ||

|| 8%, 5 mins elapsed, rate=6749.5k bps/s, total=14581m ||

|| 16%, 10 mins elapsed, rate=5156.9k bps/s, total=14581m ||

|| 24%, 16 mins elapsed, rate=4356.8k bps/s, total=14581m

ERROR: The chromosome data contains too many bases. The size of the input FASTA files should be less than 4G Bytes. 

My statement:

buildindex(basename="Wheatfull_Ref_index",reference="Chinese_Spring.fasta",memory=30000)

quit(save = "no", status = 0, runLast = FALSE)

Is there a workaround to get the sequences indexed?  Is there a way to do individual chromosomes and stitch them together later?

 

A second issue may also rear it's head.  Since the chromosome sequences are so long, it is recommended to use split chromosome files with mapping programs such as Tophat, STAR, and BWA because the BAM files are too big to be indexed by samtools.  Does anyone know if this will affect Subread?

Thanks!

Bryan

Rsubread buildindex fastafile • 1.8k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne

Hi Bryan, as shown in the error message Rsubread cannot build an index for a reference genome that includes more than 4 gigabases. You will have to split the genome into two or more pieces before you can build index and perform mapping.

ADD COMMENT

Login before adding your answer.

Traffic: 596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6