I want to use Rsubread for RNAseq analysis but I am working with wheat. My sequence file is well over the 4 GB limit for the FASTA file with all chromosomes (approaching 15 GB) so I get the following error from Rsubread:
//================================= Running ==================================\\
|| Check the integrity of provided reference sequences ... ||
|| No format issues were found ||
|| Scan uninformative subreads in reference sequences ... ||
|| 8%, 5 mins elapsed, rate=6749.5k bps/s, total=14581m ||
|| 16%, 10 mins elapsed, rate=5156.9k bps/s, total=14581m ||
|| 24%, 16 mins elapsed, rate=4356.8k bps/s, total=14581m
ERROR: The chromosome data contains too many bases. The size of the input FASTA files should be less than 4G Bytes.
quit(save = "no", status = 0, runLast = FALSE)
Is there a workaround to get the sequences indexed? Is there a way to do individual chromosomes and stitch them together later?
A second issue may also rear it's head. Since the chromosome sequences are so long, it is recommended to use split chromosome files with mapping programs such as Tophat, STAR, and BWA because the BAM files are too big to be indexed by samtools. Does anyone know if this will affect Subread?