DESEQ2: high number of differentially expressed genes with very high fold change
1
0
Entering edit mode
ZheFrench ▴ 40
@zhefrench-11689
Last seen 16 months ago
France

I test different condition with the following design. For example , my contrasts will be (mant vs mesenchymal_stem_cells , late_treated vs mesenchymal_stem_cells)

sample_id condition
mantrep1 mant
mantrep2 mant
t6rep1 late_treated
t6rep2 late_treated
t6_2015 late_treated
HMSC_tot mesenchymal_stem_cells

I have different number of samples per group ( I don't know if it could have an influence ,  2samples for mant, 3 for late_treated, 1 for mesenchymal_stem_cells)

The size factors are very different (look HMSC_tot and T6rep2). 

"SIZEFACTORS: "
 mantrep1  mantrep2    t6rep1    t6rep2   t6_2015  HMSC_tot 
1.2341148 0.8325101 1.0863274 0.9813990 2.0297889 0.4530482 

At the end , I retrieved 10 000 genes differentialy expressed ( |FC| > 1.5 and p-values < 0.05) when I compare MANT vs Mesenchymal_stem_cells or late_treated vs Mesenchymal_stem_cells. )

Is it not too much ? 

Moreover , I found a lot of genes with | FC | > 2000 ...so something must goes wrong here !!

Can we compare samples with very different library size ? However DESeq2 model internally corrects for library size...

Note that HMSC_tot is from a  different a run from 2010 (unstranded). The others come from the same run of 2015(stranded) with different library preparation from 2010.

 

deseq2 • 646 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Can you post your code for producing the comparisons?

"Note that HMSC_tot is from a  different a run from 2010 (unstranded). The others come from the same run of 2015(stranded) with different library preparation from 2010."

You can't make any useful comparisons across batch due to the problems of confounding. Technical effects related to library preparation are often larger in magnitude than the true biological differences you want to find. When you have batch and the biological condition perfectly confounded, it's impossible to say what is real and what is technical. Here is a paper on the topic:

Tackling the widespread and critical impact of batch effects in high-throughput data.
https://www.ncbi.nlm.nih.gov/pubmed/20838408

ADD COMMENT

Login before adding your answer.

Traffic: 933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6