Hello. I'm using Rsubread to align short stranded Illumina RNA seq reads to the yeast (S. cerevisiae) genome). The BAM file generated using subjunc
shows quite a large number of reads that are erroneously mapped across very long distances, considered introns. One of the mapped ends often corresponds to stretches of "A". I was wondering if there is an option that could limit the size of the detected introns or filter out reads that were partially mapped in low complexity genomic regions ?
Thank you very much.
The test command used (using indexed yeast genome file from Ensembl and the corresponding annotation file):
subjunc(index="yeast110", readfile1="RAW/my.fastq.gz",
output_file = "BAM/my.bam",
output_format = "BAM",
nthreads = 8,
sortReadsByCoordinates = TRUE,
annot.ext = "Saccharomyces_cerevisiae.R64-1-1.110.gtf.gz",
isGTF = T,
useAnnotation = T)
EDIT: since STAR
has a specific parameter that allows to define maximum acceptable intron size, I'm switching back to it. I liked the idea of using only R for the whole workflow, but it's not a big problem to switch to a few bash commands.