Question

how to deal the effect of enormous gene expression?

0

Entering edit mode

hamaor • 0

@hamaor-9799

Last seen 6.9 years ago

Hey everyone,

this is my first question in the forum.

i'm analyzing RNA-Seq data of Tomato.

some of the samples were treated with HeatShock, and comparing to the control groups it seems that

a small group of genes presenting enormous values of expression.

my question how does it effects other genes at the phase of normalization and comparisons (using DESeq2) ?

and what are the best way to handle it?

thanks

deseq2 rnaseq tomato geneexpression • 2.5k views

ADD COMMENT • link updated 10.0 years ago by Michael Love 43k • written 10.0 years ago by hamaor • 0

score 1 · Answer 1 · 2016-02-25

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

DESeq2 size factor estimation is robust to large changes, because it uses the median ratio method described in Anders and Huber 2010.

If you expect extreme fold changes across condition for a small subset of genes, I would turn off the LFC shrinkage though: DESeq(dds, betaPrior=FALSE).

ADD COMMENT • link 10.0 years ago Michael Love 43k

0

Entering edit mode

just to clarify,

we have 2 time point: T0 and T5 (right after heat shock, and 5 hours after heat shock respectively) , when the heat shock effect is much more significant in T0 when small subset of genes have great expression level.

is it possible that genes that showed positive fold change in T5, will show a negative at T0 because of the over expressed genes of T0 ?

ADD REPLY • link 9.9 years ago hamaor • 0

0

Entering edit mode

Certainly, anything is possible. I would discuss with biological collaborators if you want more intuition on what is expected in your system.

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,
I am working with Maor on this project, first of all thank you very much for the reply.
We have already spoken with the researcher that performed the experiment, What we are afraid of is that the high expression of the heat-shock response genes will cause a global decrease in the expression levels of the rest of the genes due to pure technical reasons during the sequencing.

Even when looking at the size factors, we see that the samples after heat-shock received size factor of around 0.8-0.9 (with the total library size being the same or greater than other samples), meaning that we do observe a phenomena where most of the genes were under-represented compared to the other samples.

We looked at the histograms of expression levels in all samples and we don't see a global shift or different counts distribution between our different samples.

In you experience, do you have any suggestions on how to test the effect of this on the data. We thought to try and remove the most up regulated genes (convert their counts to 0) and perform the analysis, just to test how significantly will this affect the results. What is you opinion on that? do you have any other suggestions?

ADD REPLY • link 9.9 years ago solgakar@bi.technion.ac.il ▴ 90

0

Entering edit mode

This has been discussed on the support site before, so you can try to search for similar posts, but I can summarize:

If you suspect global increase or decrease in expression, there is no way to estimate size factors computationally. you need spike-ins for example. And if you had many spike-ins to estimate the global changes, you would pass the spike-ins to the controlGenes argument of estimateSizeFactors. Spike-ins have their own problems -- they are hard to control, giving imprecise size factors which impairs proper inference -- as discussed in the SEQC papers.

Also you would want to follow the advice above, turning off betaPrior, because the assumption that the bulk of the LFCs are roughly centered around zero is not the case.

Performing differential expression null hypothesis testing when all the genes change expression will not give very interesting/informative results, because you know already that the null is not nearly the case for most or all genes. I would think about potentially finding other ways to present the results.

ADD REPLY • link 9.9 years ago Michael Love 43k

score 0 · Answer 2 · 2016-02-25

0

Entering edit mode

chris86 ▴ 420

@chris86-8408

Last seen 6.2 years ago

UCL, United Kingdom

I don't think it is a problem

ADD COMMENT • link 10.0 years ago chris86 ▴ 420