Why does samples order subtly influence the results of DESeq2?
1
0
Entering edit mode
Sam ▴ 10
@sam-21502
Last seen 1 day ago
Jerusalem

When changing the order of samples (ensuring that colData and countData are in the same order), this influences the DE results. Very subtly (something like in the 9th digit) - but why does it happen?

Using the pasilla dataset and the code from the vignette :

options(digits = 15)

library(DESeq2)
library(pasilla)

pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)

rownames(coldata) <- sub("fb", "", rownames(coldata))
cts <- cts[, rownames(coldata)]

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)



dds <- DESeq(dds)
res <- results(dds)


print(head(res$log2FoldChange),digits=15)


[1] -1.02604541037965413 -0.00215142369260044
[3]  0.49673556850473838  1.88276170249200669
[5]  0.24002523000310516  0.10479911223675623

When I randomly shuffle coldata and ensure that the counts is in the same order as count data, the logfold changes (very slightly). Why?

coldata2 <- coldata[sample(1:nrow(coldata),replace=F),]
setequal(rownames(coldata2), colnames(cts))

cts2 <- cts[,rownames(coldata2)]


dds2 <- DESeqDataSetFromMatrix(countData = cts2,
                              colData = coldata2,
                              design = ~ condition)



dds2 <- DESeq(dds2)
res2 <- results(dds2)


print(head(res2$log2FoldChange),digits=15)

[1] -1.02604541037965524 -0.00215142640531776
[3]  0.49673554526085623  1.88276152384409690
[5]  0.24002523019401523  0.10479911227720901
DESeq2 • 658 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

The calculations are not identical. IRLS is not a closed form solution, and we iterate until a convergence criterion.

ADD COMMENT

Login before adding your answer.

Traffic: 810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6