Changing group labels yields different results in DESeq2
1
0
Entering edit mode
@blobbyscience-15133
Last seen 6.6 years ago

Hello,

I am analyzing RNA-seq raw counts data from 5 conditions (vehicle, tx1, tx2, tx3, tx4). When I label the groups "A", "B", "C", "D", "E" and run DESeq2 on all 5 groups at once, I get a list of 7000+ differentially expressed genes (padj < 0.05). However, when I label the groups with their actual names ("vehicle", "name1", "name2", etc), my results are now ~3000 differentially expressed genes. I know that R automatically alphabetizes the groups, so that in my second run, samples labeled as "vehicle" are considered last (at least that's how it shows up when I pull out individual genes using plotCounts, as well as when I do PCA). Has anyone had this issue before? Shouldn't my results theoretically be the same regardless of how I named the groups, since I gave DESeq2 the same data? Am I doing something wrong/missing something? Or maybe I shouldn't be using DESeq2 to analyze multiple groups at once? I checked my results against the DESeq2 analysis that our genome center did, and the 7000+ list was exactly the same (which makes sense because the only info I gave them was which samples were "A", which ones were "B", etc). Any insight would be much appreciated!

In case anyone is wondering how I labeled the samples, here is how I did it:

SamplesABCDE <- data.frame(row.names=colnames(CountTable), condition=as.factor(c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3))))

SamplesGroupNames <- data.frame(row.names=colnames(CountTable), condition=as.factor(c(rep("Vehicle",3),rep("HLow",3),rep("HHigh",3),rep("MLow",3),rep("MHigh",3))))

ddsGroupNames <- DESeqDataSetFromMatrix(countData = CountTable, colData = SamplesGroupNames, design=~condition)
dds2_GroupNames <- DESeq(ddsGroupNames)

(For ABCDE, "ddsGroupNames" is replaced by "ddsABCDE", etc)

deseq2 multiple groups • 940 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I'd say it's likely a bug in your code somewhere. E.g. here's a reproducible example of renaming the levels:

> dds <- makeExampleDESeqDataSet(m=15,betaSD=1)
> dds$condition <- factor(rep(1:5,each=3))
> dds <- DESeq(dds, quiet=TRUE)
> summary(results(dds, contrast=c("condition","5","1")))

out of 999 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 72, 7.2% 
LFC < 0 (down)   : 66, 6.6% 
outliers [1]     : 0, 0% 
low counts [2]   : 329, 33% 
(mean count < 8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> dds$condition <- factor(rep(5:1,each=3))
> dds <- DESeq(dds, quiet=TRUE)
> summary(results(dds, contrast=c("condition","1","5")))

out of 999 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 72, 7.2% 
LFC < 0 (down)   : 66, 6.6% 
outliers [1]     : 0, 0% 
low counts [2]   : 329, 33% 
(mean count < 8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
ADD COMMENT

Login before adding your answer.

Traffic: 356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6