I am analyzing a Ribosome Profiling and RNA-Seq dataset. For the analysis I am trying to repeat another published pipeline. However there it says:
" Reads were aligned in genome and transcriptome coordinates with a splice-aware aligner (STAR, v2.5.3a14, inserting annotations on the fly: STAR --quantMode TranscriptomeSAM --alignIntronMin 20 --alignIntronMax 100000 --outFilterMismatchNmax 1 -- outFilterIntronMotifs RemoveNoncanonicalUnannotated --outFilterMismatchNoverLmax 0.04 -- sjdbOverhang 50. We used the EnsEMBL mouse genome assembly GRCm38.p6, where all non- coding regions were excluded, and all fully contained shorter CDSs were collapsed: gffread -C - M -K.
Later they also write:
"Finally, only reads mapping uniquely to only one genomic position as well as to the transcriptome were kept for analysis."
If I understand correctly they first modified the GTF file of GRCm38.p6 with gffread -C - M -K. Then they somehow mapped to both the genome and the transcriptome together. How is this done? Can't I only use one file for where to map the reads to in STAR? Why would someone do this and not only map to only the genome or the transcriptome? Or does this mean it was done once at a time, e.g. first mapping to the genome and then seperately in a second run map to the transcriptome?
Some suggestions would help a lot. Thank you.