Background: I would like to identify differentially expressed gene orthologs across multiple related organisms. I predicted a set of single copy orthologs from transcriptome de novo assemblies and quantified expression of these transcripts with RSEM.
#I imported RSEM quantifications with tximport txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE) #Made a DEseq dataset dds <- DESeqDataSetFromTximport(txi.rsem, colData = samples, design = ~ condition) #...and run DEseq2 dds <-DESeq(dds)
Question: The orthologs for which I want to perform differential expression analysis are not of the same sequence length - does the procedure above "normalize" the counts with respect to the different transcript lengths (I presume that yes - also given the vignette of DESeq2::plotCounts which says that "the counts should be normalized by size factor (default is TRUE)" but I'm not entirely sure. Thanks!
But does the FASTA for each species have the same names of the transcripts (just different sequence)?
If so, then yes, tximport => DESeq2 using the code in the tximport vignette will correct for the differences in gene length across samples (and so here species).
Yes, the orthologs have always the same name (name of the orthologous group) across all samples (species).
Thanks for clarifying this for me.