Samples without pair in paired design experiment
2
1
Entering edit mode
@vladimir-krasikov-5097
Last seen 5.1 years ago

Dear Experts!

I am analyzing RNASeq data of 18 paired samples (before and after treatment) in frame of DESeq2 pipeline.

In the set some of the pairs are not complete by technical reasons.

I am using straightforward design : model.matrix(~ Ind +Treatment, Exp.Design) ...

So the question is should I omit samples without pair, or still include them?

It seems that DESeqDataSetFromMatrix() and next DESeq() works fine without warnings and errors when samples without pairs are included

 

paired samples deseq2 • 2.8k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

If you use any of the negative binomial packages (like DESeq2 or edgeR) then you'll simply have to omit the samples without a pair. Actually it makes no difference whether you explicitly omit them or not, because the linear modelling approach implemented in those packages will have the effect of removing the unpaired samples automatically.

If you use limma to analyse your RNA-seq data (either voom or limma-trend), then you can use duplicateCorrelation() to estimate the correlation between samples from the same pair. This approach allows you to include all your samples, whether they have a pair or not. That's the way I approach problems of this type because I don't like to throw data out.

ADD COMMENT
0
Entering edit mode

I usually do recommend duplicateCorrelation() in cases where there are sample relationships that cannot be included in a design matrix. Here I would argue that the variance of interest is, within a pair, how far is the treated sample from the value predicted by a shared treatment effect. Kind of analogous to testing on differences in a paired t-test. If the correlation between pairs is high, I don't think adding unpaired samples helps much to estimate this variance. If the correlation between pairs is low, and the number of unpaired samples is high, I can imagine how it would help. So I can see why you would recommend the duplicateCorrelation route either way.

ADD REPLY
0
Entering edit mode

Thanks a lot Gordon for clarification of this point.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Including the samples without pair doesn't help you estimate the treatment effect, or the biological variability, if you have a single point per individual. I wouldn't include them.

ADD COMMENT
0
Entering edit mode

Michael, thanks a lot for an explanation. However DE genes set slightly differs from running DESeq on complete data set, where not paired samples should be discarded automatically (as Gordon Smyth said below), from the DE genes set I obtained when discard not-paired samples manually from experimental design. The difference is minor though with 1-2 genes less in second case.

ADD REPLY

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6