Question

DESEQ2: Can we analyse pair of technical replicates as biological replicates to get p-values ? What will be the effect ?

0

Entering edit mode

ZheFrench ▴ 60

@zhefrench-11689

Last seen 3 months ago

France

I have several samples , 3 biological replicates. (2015, 2016_1,2016_2 )

I want to compare each condition between each other.

My PCA on rlog transformed values (here) make me think I should analyse 2015 group separately. Maybe I'm wrong !

In 2015 group I have 3 conditions with 2 technical replicates each time. (here) So then can I analyse them as if they were biological replicates to get p-values ?The biological variability of each gene will be biaised...So I will found more genes with differential expression, no ?

Or the best idea is to analyse them together with severals factors (condition adjusted on biological group) ?

Thanks

deseq2 • 3.1k views

ADD COMMENT • link updated 8.1 years ago by Michael Love 43k • written 8.1 years ago by ZheFrench ▴ 60

0

Entering edit mode

I would recommend merging the technical replicates, rather than treating them as the same kind of replicate as the other biological replicates - you could either pool the reads, or (more straight-forwardly) average the read-counts. As you say, treating them as biological replicates would bias the variance estimate.

Regarding the batch effect due to year, I prefer to make the decision on how to deal with this based on evidence from positive control genes rather than the global view given by PCA. If genes known to respond to treatment show a significant batch effect, then I tend to keep a batch factor in the analysis. If they show a batch:treatment interaction, I may analyse the batches separately (either with a nested design, or subsetting the data - the latter in your case being problematic as you have only technical replication in the 2015 batch), and if they show no batch effect, then I have a model with no batch term in it. My intuition would say in your case to omit the 'batch' effect and take the penalty of increased noise, as you don't have many degrees of freedom to play with. Obviously, the appearance of positive controls in any resulting gene-list is near-tautological, so shouldn't be the basis of research claims!

PCA plots show what the major global changes are, but the biology might be focused in unexplored components, which is why I prefer to use an approach more targeted to the 'expected' biology, but when you have no positive controls, you are forced into more empirical approaches like PCA. It's always tempting to try out the multiple approaches, and see which set of results makes more biological sense, but again this uses up researcher-degrees-of-freedom, and this should be reflected in adjusted p-values (and careful avoidance of making circular claims of what 'biological sense' means!)

ADD REPLY • link 8.1 years ago Gavin Kelly ▴ 690

score 1 · Accepted Answer · 2016-10-20

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 hours ago

United States

Gavin has useful advice above which I agree with.

"In 2015 group I have 3 conditions with 2 technical replicates each time. (here) So then can I analyse them as if they were biological replicates to get p-values?"

You don't want to treat technical replicates as biological replicates. You are correct that this will "trick" the model into thinking there is little biological variability, when in fact, you can't assess the biological variability unless you have biological replicates.

Generally I'd recommend you add 'batch' to the design, e.g. ~batch + condition, especially when seeing that it separates samples in the PCA plot. Just because you see that one batch dominates PC1, doesn't mean that you can't recover genes DE across condition. Often just adding the 'batch' term to the design is sufficient. Make sure to take a look at plotCounts (see vignette) of the top genes, and you'll want to perhaps color the points by batch using ggplot2.

ADD COMMENT • link 8.1 years ago Michael Love 43k

0

Entering edit mode

Thanks for your clear answer. Let me ask two last questions.

First , I was just wondering If I sould separate batch factor (emt) in 3 groups (2015/2016_1/2016_2) or 2 groups only (2015/2016) regarding the pca plot. Because you see that 2016_1 and 2016_2 follow the same trend compare to 2015 where there is cleary a batch effect. Should I try ? Does it worth it ?

So for now, I'm doing something as follows :

design(dds) <- ~ emt + condition
dds <- DESeq(dds)
results(dds, contrast=c("condition","early_treated","unT"))

I tested with a model using only condition and try to compare how many genes with a fc > |1.5| with adj pval > 0.05 were outputed by the two model. (venn driagram here) It's quite similar , batch model giving some more genes differentially expressed.

Scond and last question, making reference to an another post(Which formula should I use to set the best model matrix for my analysis ?) , where people give me advice to use ~0 at the begining of the model .I think it was for edgeR... This is equivalent to set betaPrior=FALSE in deseq2. ( to put a zero-mean normal prior on the non-intercept coefficients )

But what does it mean ? What does it will change in my test ? The comparison and code will stay the same but what about interpretatation ? Do I need to care about that ?

ADD REPLY • link 8.1 years ago ZheFrench ▴ 60