Question: DE analysis of orthologs with varying sequence lengths using RSEM+tximport+DEseq2: normalization for sequence length?
0
gravatar for al-ash
12 months ago by
al-ash20
al-ash20 wrote:

Background: I would like to identify differentially expressed gene orthologs across multiple related organisms. I predicted a set of single copy orthologs from transcriptome de novo assemblies and quantified expression of these transcripts with RSEM.

#I imported RSEM quantifications with tximport
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
#Made a DEseq dataset
dds <- DESeqDataSetFromTximport(txi.rsem, colData = samples, 
                                design = ~ condition)
#...and run DEseq2
dds <-DESeq(dds)

Question: The orthologs for which I want to perform differential expression analysis are not of the same sequence length - does the procedure above "normalize" the counts with respect to the different transcript lengths (I presume that yes - also given the vignette of DESeq2::plotCounts which says that "the counts should be normalized by size factor (default is TRUE)" but I'm not entirely sure. Thanks!

 

ADD COMMENTlink modified 12 months ago by Michael Love24k • written 12 months ago by al-ash20
Answer: DE analysis of orthologs with varying sequence lengths using RSEM+tximport+DEseq
0
gravatar for Michael Love
12 months ago by
Michael Love24k
United States
Michael Love24k wrote:

How many orthologs do you have across organisms, and do you expect all of them to be DE, or only a subset?

How did you run RSEM to produce the gene.results files? Did you give each organism a different FASTA file?

ADD COMMENTlink written 12 months ago by Michael Love24k
  • I have 2500 orthologs in each organism.
  • I expect that only relatively small portion will be DE (partly because I do not have biological replicates in sense of multiple RNAseq data from a single species but instead I use phylogenetic replicates - i.e. distinct species which share some characteristic - therefore I expect rather large variation in expression and therefore relatively low power to detect DE). DEseq analysis run as in my original post for several contrasts (different possible experimental designs) gave me typically few tens of DE genes (padj < 0.1)
  • I run RSEM separately for each species (i.e. one multifasta per species)


 

ADD REPLYlink written 12 months ago by al-ash20
1

But does the FASTA for each species have the same names of the transcripts (just different sequence)?

If so, then yes, tximport => DESeq2 using the code in the tximport vignette will correct for the differences in gene length across samples (and so here species).

ADD REPLYlink written 12 months ago by Michael Love24k

Yes, the orthologs have always the same name (name of the orthologous group) across all samples (species).

Thanks for clarifying this for me.
 

ADD REPLYlink written 12 months ago by al-ash20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 275 users visited in the last hour