DESeq2 with paired samples - problem with the design
1
0
Entering edit mode
@estel-kitsune-8393
Last seen 5.8 years ago
Sweden

Dear all,

I am trying to use DESeq2 to analyze my data. I have 2 conditions (wild type and mutant), and 3 samples for each condition. My samples are paired: wt1 with mutant1, wt2 with mutant2, wt3 with mutant3.

I am using R version 3.2.0 (2015-04-16) and DESeq2_1.8.1.The first script I wrote starts as followed:

d <- read.delim(infiles)
m <- c(rep("MU",3),rep("WT",3))
cd <- data.frame(id=colnames(d), line=m, pair=c(1,2,3,1,2,3))
cd$line <- relevel(cd$line, ref="WT")
dds <- DESeqDataSetFromMatrix(countData = d, colData = cd, design = ~ pair + line)
dds <- DESeq(dds)
res <- results(dds)
sig <- res[which(res$padj<0.01),]

It ran and gave me a list of differentially expressed genes. However, reading the DESeq2 manual, I realized that "pair" should be a factor, rather than a numeric. So I reran the previous script, with a modified 3rd line:

cd <- data.frame(id=colnames(d), line=m, pair=factor(c(1,2,3,1,2,3)))

With this factor, about 150 genes which were previously DE are not DE anymore. I don't really understand what is causing this difference. Could you tell me if my design is correct?

 

deseq2 • 2.7k views
ADD COMMENT
3
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.7 years ago
Zentrum für Molekularbiologie, Universi…

Short answer: If you run a biostatistical tool with wrong parameters, you will get a wrong result. If you then redo it with the correct settings, you will get a correct result. That the two results differ should not surprise you.

Longer answer: In your first attempt, R misinterpreted pair as a numeric covariate, i.e., it was trying to fit a proportionality factor, assuming that a typical gene is twice as strongly expressed (on a log2 scale) in "pair-2" mice than in "pair-1" mice and three times as strongly expressed in "pair-3" mice. Obviously, this is nonsense, but fitting this made some genes (wrongly) appear differentially expressed.

BTW: 0.01 is unusually stringent FDR. Why have you chosen that threshold?
 

 

ADD COMMENT
0
Entering edit mode

Thank you Simon. I was not especially surprised to get different results, but I wanted to know the reason behind (and you explained it in your long answer).

For the threshold, we usually use 0.05 in our analyses. The script posted above is truncated, it creates several output files with different cut-offs.

ADD REPLY

Login before adding your answer.

Traffic: 571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6