When changing the order of samples (ensuring that colData and countData are in the same order), this influences the DE results. Very subtly (something like in the 9th digit) - but why does it happen?
Using the pasilla dataset and the code from the vignette :
options(digits = 15)
pasCts <- system.file("extdata",
package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)
rownames(coldata) <- sub("fb", "", rownames(coldata))
cts <- cts[, rownames(coldata)]
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition)
dds <- DESeq(dds)
res <- results(dds)
[1] -1.02604541037965413 -0.00215142369260044
[3] 0.49673556850473838 1.88276170249200669
[5] 0.24002523000310516 0.10479911223675623
When I randomly shuffle coldata and ensure that the counts is in the same order as count data, the logfold changes (very slightly). Why?
coldata2 <- coldata[sample(1:nrow(coldata),replace=F),]
setequal(rownames(coldata2), colnames(cts))
cts2 <- cts[,rownames(coldata2)]
dds2 <- DESeqDataSetFromMatrix(countData = cts2,
colData = coldata2,
design = ~ condition)
dds2 <- DESeq(dds2)
res2 <- results(dds2)
[1] -1.02604541037965524 -0.00215142640531776
[3] 0.49673554526085623 1.88276152384409690
[5] 0.24002523019401523 0.10479911227720901