I have single ended RNAseq reads from an allopolyploid organism. This means that I will have groups of 2,3, or even 4 genes with high sequence homology among them. I am sure some reads will map to more than one gene.
So far, my pipeline is
- Map reads to genome using tophat2, with default options (i.e., no -M or -g)
- count using htseq as
htseq-count -m intersection-strict --stranded=no --idattr=Name -f bam tophat/accepted_hits.bam annotation.gff3 > counts
- Use counts in DESeq2.
I am not confident that it is differentiating the homologs "correctly". First of all, tophat2 is built on top of bowtie2, and it seems bowtie2 does not map multi-reads uniformly by default. See: https://www.biostars.org/p/55237/
So once the counts matrix is formed, DESeq2 just uses that to get the p-values.
The other option I have is to use RSEMS, but I do not have knowledge of splice variants and DESeq can only take count data, which RSEMS does not provide.
What is the DESeq2 recommended pipeline that handles multi-mapped reads well (i.e., differentiates between genes with homology)?
If I use RSEMS, what should I use for differential expression analysis? They recommend ebseq but I am not sure how it handles batch effects and the like. Their vignette doesn't mention replicates in the design matrix.