Question

DESeq2 with paired samples - problem with the design

0

Entering edit mode

Estel Kitsune ▴ 10

@estel-kitsune-8393

Last seen 5.8 years ago

Sweden

Dear all,

I am trying to use DESeq2 to analyze my data. I have 2 conditions (wild type and mutant), and 3 samples for each condition. My samples are paired: wt1 with mutant1, wt2 with mutant2, wt3 with mutant3.

I am using R version 3.2.0 (2015-04-16) and DESeq2_1.8.1.The first script I wrote starts as followed:

d <- read.delim(infiles)
m <- c(rep("MU",3),rep("WT",3))
cd <- data.frame(id=colnames(d), line=m, pair=c(1,2,3,1,2,3))
cd$line <- relevel(cd$line, ref="WT")
dds <- DESeqDataSetFromMatrix(countData = d, colData = cd, design = ~ pair + line)
dds <- DESeq(dds)
res <- results(dds)
sig <- res[which(res$padj<0.01),]

It ran and gave me a list of differentially expressed genes. However, reading the DESeq2 manual, I realized that "pair" should be a factor, rather than a numeric. So I reran the previous script, with a modified 3rd line:

cd <- data.frame(id=colnames(d), line=m, pair=factor(c(1,2,3,1,2,3)))

With this factor, about 150 genes which were previously DE are not DE anymore. I don't really understand what is causing this difference. Could you tell me if my design is correct?

deseq2 • 2.7k views

ADD COMMENT • link updated 8.8 years ago by Simon Anders ★ 3.7k • written 8.8 years ago by Estel Kitsune ▴ 10

score 3 · Accepted Answer · 2015-07-14

3

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.7 years ago

Zentrum für Molekularbiologie, Universi…

Short answer: If you run a biostatistical tool with wrong parameters, you will get a wrong result. If you then redo it with the correct settings, you will get a correct result. That the two results differ should not surprise you.

Longer answer: In your first attempt, R misinterpreted pair as a numeric covariate, i.e., it was trying to fit a proportionality factor, assuming that a typical gene is twice as strongly expressed (on a log2 scale) in "pair-2" mice than in "pair-1" mice and three times as strongly expressed in "pair-3" mice. Obviously, this is nonsense, but fitting this made some genes (wrongly) appear differentially expressed.

BTW: 0.01 is unusually stringent FDR. Why have you chosen that threshold?

ADD COMMENT • link 8.8 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Thank you Simon. I was not especially surprised to get different results, but I wanted to know the reason behind (and you explained it in your long answer).

For the threshold, we usually use 0.05 in our analyses. The script posted above is truncated, it creates several output files with different cut-offs.

ADD REPLY • link 8.8 years ago Estel Kitsune ▴ 10