Hello,
I am looking at DE across multiple species of mammals. In order to look at multiple species (and do some phylogenetic analyses), I am looking at orthologus genes shared among species. I want to use edgeR for some pairwise comparisons.
I aligned using bowtie and quantified using htseq to get raw counts. I am thinking of the following workflow, starting with the raw counts:
- filter lowly expressed genes in raw counts
- TMM normalize
- filter only orthologs shared between species
- maybe convert to TPM in order to normalize for different gene lengths between species, but likely not because this isn't recommended based on what I've read
- use limma-voom pipeline for DE;; may also incorporate phytools
One paper [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668955/] recommends that we find orthologs, then run edgeR because these are the transcripts of interest. However, TMM normalizes library sizes, so if the original library sizes are different. Will this mean that TMM normalization will be as accurate? Thank you in advance!