how to deal the effect of enormous gene expression?
Entering edit mode
hamaor • 0
Last seen 5.1 years ago

Hey everyone,

this is my first question in the forum. 

i'm analyzing RNA-Seq data of Tomato. 

some of the samples were treated with HeatShock, and comparing to the control groups it seems that 

a small group of genes presenting enormous values of expression. 

my question how does it effects other genes at the phase of normalization and comparisons (using DESeq2) ?

and what are the best way to handle it?



deseq2 rnaseq tomato geneexpression • 1.8k views
Entering edit mode
Last seen 3 hours ago
United States

DESeq2 size factor estimation is robust to large changes, because it uses the median ratio method described in Anders and Huber 2010.

If you expect extreme fold changes across condition for a small subset of genes, I would turn off the LFC shrinkage though: DESeq(dds, betaPrior=FALSE).

Entering edit mode

just to clarify,

we have 2 time point: T0 and T5 (right after heat shock, and 5 hours after heat shock respectively) , when the heat shock effect is much more significant in T0 when small subset of genes have great expression level.

 is it possible that genes that showed positive fold change in T5, will show a negative at T0 because of the over expressed genes of T0 ?

Entering edit mode
Certainly, anything is possible. I would discuss with biological collaborators if you want more intuition on what is expected in your system.
Entering edit mode

Hi Michael,
I am working with Maor on this project, first of all thank you very much for the reply.
We have already spoken with the researcher that performed the experiment, What we are afraid of is that the high expression of the heat-shock response genes will cause a global decrease in the expression levels of the rest of the genes due to pure technical reasons during the sequencing.

Even when looking at the size factors, we see that the samples after heat-shock received size factor of around 0.8-0.9 (with the total library size being the same or greater than other samples), meaning that we do observe a phenomena where most of the genes were under-represented compared to the other samples.

We looked at the histograms of expression levels in all samples and we don't see a global shift or different counts distribution between our different samples.

In you experience, do you have any suggestions on how to test the effect of this on the data. We thought to try and remove the most up regulated genes (convert their counts to 0) and perform the analysis, just to test how significantly will this affect the results. What is you opinion on that? do you have any other suggestions?

Entering edit mode

This has been discussed on the support site before, so you can try to search for similar posts, but I can summarize:

If you suspect global increase or decrease in expression, there is no way to estimate size factors computationally. you need spike-ins for example. And if you had many spike-ins to estimate the global changes, you would pass the spike-ins to the controlGenes argument of estimateSizeFactors. Spike-ins have their own problems -- they are hard to control, giving imprecise size factors which impairs proper inference -- as discussed in the SEQC papers.

Also you would want to follow the advice above, turning off betaPrior, because the assumption that the bulk of the LFCs are roughly centered around zero is not the case.

Performing differential expression null hypothesis testing when all the genes change expression will not give very interesting/informative results, because you know already that the null is not nearly the case for most or all genes. I would think about potentially finding other ways to present the results.

Entering edit mode
chris86 ▴ 420
Last seen 4.3 years ago
UCL, United Kingdom

I don't think it is a problem


Login before adding your answer.

Traffic: 316 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6