Question: DESeq2 standalone TE usage
0
16 days ago by
snatanf0
snatanf0 wrote:

Hi,

I am new DESeq2 user, interested in Translation Efficiency (TE) and Log2FoldChange (L2FC) of TE. To the best of my understanding, before DESeq2 calculates TE (RiboSeq/RNASeq), the counts are normalized with the appropriate SizeFactor. The SizeFactors take into account the geometric avg of counts across all conditions.

If the above is correct, then the standalone TE values for each condition are dependent on the counts of other conditions and should not be consistent. Would TPM be a better measure for standalone TE values?

Thanks! Nathan

normalization deseq2 • 90 views
modified 16 days ago by Michael Love26k • written 16 days ago by snatanf0
Answer: DESeq2 standalone TE usage
0
16 days ago by
Michael Love26k
United States
Michael Love26k wrote:

I don’t see the dependency problem. It’s more like, instead of using one million (an arbitrary number) the size factor estimation picks a different arbitrary number which is in the middle of the number of mapped reads per sample.

Btw, here’s a relevant thread (not relevant to this question though):

https://support.bioconductor.org/p/61509/

I think I see your point that it doesn't matter if the SizeFactors for ctrlRNASeq counts (SFctrlRNA) and ctrlRiboSeq (SFctrlRibo) normalize the the ctrl counts to 1M or some other number, however it seems to me that it is critical that the SizeFactor ratio: (SFctrlRNA)/(SFctrlRibo) remains independent of the other treatments in the dataset. To the best of my understanding the ratio (SFctrlRNA)/(SFctrlRibo) can change if the counts distributions of treat1RNA, treat1Ribo are different than treat2RNA, treat2Ribo. An 2-fold increase in R = (SFctrlRNA)/(SFctrlRibo) would lead to a 2-fold increase in the ctrl-TE of every gene.

Should I just divide all the ctrl-TE values for by the ctrl-TE of an "anchor gene"? (e.g. ACTB)

So you need to correct for library size somehow, because RNA and Ribo experiments are all different sequencing experiments, and the library size is a technical artifact that tells you nothing about the ratios for individual genes. It's not robust to pick a single gene, this is why the median ratio method (Anders and Huber 2010) uses the center of the distribution of ratios for each sample to a pseudoreference. If you want to pick a set of genes you believe are good for calculating the library size you can pass this set of genes to controlGenes but I would pick hundreds of genes for which you have a prior that they are not changing much across the experiments instead of a single gene.