Question

Samples without pair in paired design experiment

1

Entering edit mode

Vladimir Krasikov ▴ 90

@vladimir-krasikov-5097

Last seen 4.5 years ago

Dear Experts!

I am analyzing RNASeq data of 18 paired samples (before and after treatment) in frame of DESeq2 pipeline.

In the set some of the pairs are not complete by technical reasons.

I am using straightforward design : model.matrix(~ Ind +Treatment, Exp.Design) ...

So the question is should I omit samples without pair, or still include them?

It seems that DESeqDataSetFromMatrix() and next DESeq() works fine without warnings and errors when samples without pairs are included

paired samples deseq2 • 2.4k views

ADD COMMENT • link updated 6.9 years ago by Gordon Smyth 50k • written 6.9 years ago by Vladimir Krasikov ▴ 90

score 3 · Answer 1 · 2017-05-31

3

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

If you use any of the negative binomial packages (like DESeq2 or edgeR) then you'll simply have to omit the samples without a pair. Actually it makes no difference whether you explicitly omit them or not, because the linear modelling approach implemented in those packages will have the effect of removing the unpaired samples automatically.

If you use limma to analyse your RNA-seq data (either voom or limma-trend), then you can use duplicateCorrelation() to estimate the correlation between samples from the same pair. This approach allows you to include all your samples, whether they have a pair or not. That's the way I approach problems of this type because I don't like to throw data out.

ADD COMMENT • link 6.9 years ago Gordon Smyth 50k

0

Entering edit mode

I usually do recommend duplicateCorrelation() in cases where there are sample relationships that cannot be included in a design matrix. Here I would argue that the variance of interest is, within a pair, how far is the treated sample from the value predicted by a shared treatment effect. Kind of analogous to testing on differences in a paired t-test. If the correlation between pairs is high, I don't think adding unpaired samples helps much to estimate this variance. If the correlation between pairs is low, and the number of unpaired samples is high, I can imagine how it would help. So I can see why you would recommend the duplicateCorrelation route either way.

ADD REPLY • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Thanks a lot Gordon for clarification of this point.

ADD REPLY • link 6.9 years ago Vladimir Krasikov ▴ 90

score 1 · Answer 2 · 2017-05-31

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 37 minutes ago

United States

Including the samples without pair doesn't help you estimate the treatment effect, or the biological variability, if you have a single point per individual. I wouldn't include them.

ADD COMMENT • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Michael, thanks a lot for an explanation. However DE genes set slightly differs from running DESeq on complete data set, where not paired samples should be discarded automatically (as Gordon Smyth said below), from the DE genes set I obtained when discard not-paired samples manually from experimental design. The difference is minor though with 1-2 genes less in second case.

ADD REPLY • link 6.9 years ago Vladimir Krasikov ▴ 90