Question: 10 multiple comparisons in DESeq2
0
gravatar for kissmatee
3.4 years ago by
kissmatee0
kissmatee0 wrote:

Dear All, 

I have an RNA seq experiment with 5 different cell types (=my 5 conditions are the 5 cell types), and I would like to identify those genes that show differential expression between at least 2 cell types (so basically like performing an ANOVA). I am a medical doctor and new to R with very basic knowledge in statistics so I would like to be sure that my code is correct from a statistical perspective. 

Between 5 cell types there are 10 possible comparisons, so my idea is to run these 10 comparisons, combine the gene lists from the comparisons, and then delete the duplicates to get the list of genes that show statistical difference between at least 2 cell types. Using this gene list, I can create an expression profile containing the expression values of these genes in the different cell types that I can use for clustering the genes based on their expression profile in the cell types etc. I will also have the adjusted p-values and fold changes for every comparison in separate tables. Here is an example:

dds <- DESeqDataSetFromMatrix(countData = countData,

    colData = colData,
    design = ~ condition)

colData(dds)

dds$condition <- relevel(dds$condition, "cell_type_A")

Question 1: Is "relevel" required here? I do not have a reference or control cell type/condition. What difference does it make if I do not perform "relevel"?

Then the multiple comparisons:

dds <- DESeq(dds)

res <- results(dds, contrast=list("conditioncell_type_A","conditioncell_type_B"))
resOrdered <- res[order(res$padj),]
resSigTmp <- subset(resOrdered, padj < 0.01)
resSig <- subset(resSigTmp, log2FoldChange < -1 | log2FoldChange > 1)
DEtable_1<-as.data.frame(resSig[order(resSig$log2FoldChange,decreasing=TRUE),])


res <- results(dds, contrast=list("conditioncell_type_B","conditioncell_type_C"))
resOrdered <- res[order(res$padj),]
resSigTmp <- subset(resOrdered, padj < 0.01)
resSig <- subset(resSigTmp, log2FoldChange < -1 | log2FoldChange > 1)
DEtable_2<-as.data.frame(resSig[order(resSig$log2FoldChange,decreasing=TRUE),])

...and so on, until creating DEtable_10.

Collecting rownames:

allDEgenes <- c(rownames(DEtable_1), rownames(DEtable_2), ..... , rownames(DEtable_10))'

Removing duplicates:

allDEgenes_unique<-unique(allDEgenes)'

 

Question 2: Is this method statistically correct? Will my adjusted p-values be correct? Do they require further correction considering the 10 comparisons I perform? (I am sorry, these must be silly questions.)

Question 3: Is there a better way of performing this analysis using DEseq2 or some other package?

Question 4: My "coldata" table must have 2 variables, but I only have one: cell type. Since I will only use this variable in my contrast argument, I can put anything as a second variable just to have one, it does not really matter, right?

Thank you for your help!

Best,

Mate

 

rnaseq deseq2 statistics • 2.2k views
ADD COMMENTlink modified 3.4 years ago by Michael Love26k • written 3.4 years ago by kissmatee0
Answer: 10 multiple comparisons in DESeq2
0
gravatar for Ben Mansfeld
3.4 years ago by
Michigan State University
Ben Mansfeld0 wrote:

Q1: I think Relevel-ing is good so that you are at least certain what your base level is. Since you are using specific contrast that might not be an issue. If you were to just ask for results(dds), then making sure you knew what your base level was would be important. I may be wrong about this though. 

Q2: the pvals should be correct for that specific contrast. 

Q3: maybe some sort of k-means clustering?

Q4: why does coldata have to have 2 variables?

from DESeq help: 

colData    
for matrix input: a DataFrame or data.frame with at least a single column. Rows of colData correspond to columns of countData

Hope this helps a little....

-Ben

ADD COMMENTlink written 3.4 years ago by Ben Mansfeld0
Answer: 10 multiple comparisons in DESeq2
0
gravatar for Michael Love
3.4 years ago by
Michael Love26k
United States
Michael Love26k wrote:

You can perform a likelihood ratio test (as is done in ANOVA). See the DESeq2 vignette section on "likelihood ratio tests" and the section in ?results about likelihood ratio tests as well.

dds <- DESeq(dds, test="LRT", reduced=~1)
res <- results(dds)

Other Qs:

No you don't have to relevel. This is only if you plan to use results() with no arguments.

colData does not have to have 2 variables

ADD COMMENTlink written 3.4 years ago by Michael Love26k

Hi Michael,

Thank you for taking time to answer my questions.

Does that mean that the method described above is not correct and LRT should be used in this case? Will combining 10 comparisons yield too many false positives?

Thanks,

Mate

ADD REPLYlink written 3.4 years ago by kissmatee0
Yes. It's opportunistic you could say. You're not correcting for doing all these extra comparisons. And the LRT gives you the answer - - any differences among the five - - using a single test which is a cleaner approach.
ADD REPLYlink written 3.4 years ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 408 users visited in the last hour