Question

How to use q.value cutt off in "decideTests" function for RNAseq differential expression analysis

0

Entering edit mode

anver • 0

@anver-8131

Last seen 9.5 years ago

Germany

I am analyzing my RNAseq data with two treatments (mock and trt) with three time points each with three replicates using limma. After model fitting etc I calculated qvalues using qvalue package. Then calculated differentially expressed genes for each contrast. They are higher with qvalue cutoff of 0.01 compared p.val cutoff of 0.01 in decideTests (please see the code below).

My major question is how would you incorporate a qval cutoff in your decideTests. I would also like to know why qvalue gives more differentially expressed genes than p.val for the same cutoff since qval should be more stringent.

Appreciate your help.

Sha

#################################################

#My code########

################

#calculating qvals

library(qvalue)

q.val = qvalue(eb.fit$p.val)$qvalues

q.val_contrast1 = q.val [, 'trt_1h - mock_1h']

q.val_contrast2 = q.val [, 'trt_9h - mock_9h']

q.val_contrast3 = q.val [, 'trt_24h - mock_24h']

#Number of DEGs based on qvals

sum(q.val_contrast1 < 0.01)

[1] 6029

sum(q.val_contrast2 < 0.01)

[1] 6008

sum(q.val_contrast3 < 0.01)

[1] 3785

##Differentially expressed genes with decideTests

#####################################

results <- decideTests(eb.fit, p.value = 0.01)

summary (results)

# trt_1h - mock_1h trt_9h - mock_9h trt_24h - mock_24h

# -1 1827 2260 555

# 0 16172 16173 18382

# 1 2932 2498 1994

rnaseq decideTests limma qvalue • 3.5k views

ADD COMMENT • link 9.5 years ago anver • 0

score 1 · Answer 1 · 2015-06-12

I'm not the author of decideTests, so I'm not sure what it does. However, qvalue can be used to determine significance cut-offs once p-values are obtained using the qvalue functions as demonstrated in the qvalue package's vignette. With regards to q-values sometimes being smaller than p-values, this is expected and it is answered in detail in the FAQ in both the vignette and here:

https://github.com/jdstorey/qvalue/blob/master/README.md

This package produces "adjusted p-values", so how is it possible that my adjusted p-values are smaller than my original p-values?

The q-value is not an adjusted p-value, but rather a population quantity with an explicit definition. The package produces estimates of q-values and the local FDR, both of which are very different from p-values. The package does not perform a Bonferroni correction on p-values, which returns "adjusted p-values" that are larger than the original p-values. The maximum possible q-value is pi_0, the proportion of true null hypotheses. The maximum possible p-value is 1. When considering a large number of hypothesis tests where there is a nontrivial fraction of true alternative p-values, we will have both an estimate pi_0 < 1 and we will have some large p-values close to 1. Therefore, the maximal estimated q-value will be less than or equal to the estimated pi_0 but there will also be a number of p-values larger than the estimated pi_0. It must be the case then that at some point p-values become larger than estimated q-values.

score 1 · Answer 2 · 2015-06-12

The decideTests function uses the Benjamini-Hochberg correction to compute adjusted p-values. The p.value argument refers to the threshold for the adjusted p-value, not the raw p-values. From what I understand, the q-values reported by qvalue are usually less conservative than the BH-adjusted p-values as the former accounts for the proportion of true nulls in the data set. This may explain why you get more DE genes after correction with qvalue, compared to the default correction with the BH method.

On another note, when you have multiple comparisons, decideTests will apply the BH method separately to the p-values from each DE comparison at each time point. In contrast, the qvalue function expects a vector of p-values. I'm guessing that the q-values will be computed using the p-values from all of the DE comparisons at once. If you want to compare the methods, a more appropriate approach would be to supply each column of p-values separately to qvalue, or to set method="global" in decideTests.

score 0 · Answer 3 · 2015-06-13

0

Entering edit mode

anver • 0

@anver-8131

Last seen 9.5 years ago

Germany

Thank you very much Storey and Aaron.

I very appreciate your responses. That clarified my doubts and misconceptions. However I want to get the differentially expressed genes based on the qvalue cutoff along with associated qvalues and fdrs. I use the following code to get the differentially expressed genes' table.

q.val = qvalue(eb.fit$p.val)$qvalues

q.val_contrast1 = q.val [, 'trt_1h - mock_1h']

diffexp_1h = subset(eb.fit$coef, q.val_contrast1 < 0.01)

This gives only the genes row.name) and the logFC but no q.val or lfdr calculated based on the method suggested by Storey (above). Could you please let me know how could I get a table { "gene" "logFC" "q.val" "lfdr"}.

Thank you.

Sha

ADD COMMENT • link 9.5 years ago anver • 0

0

Entering edit mode

Something like the code below might work. Assuming that the first coefficient in your eb.fit object corresponds to the contrast between treatment and mock at 1 hour:

results <- topTable(eb.fit, coef=1, n=Inf)
qout <- qvalue(results$P.Value)
results$q.val <- qout$qvalues
results$lfdr <- qout$lfdr

Of course, you can just change coef according to the timepoint you're interested in, and you can subset results to remove columns that you don't want to see.

ADD REPLY • link 9.5 years ago Aaron Lun ★ 28k

score 0 · Answer 4 · 2015-06-14

0

Entering edit mode

anver • 0

@anver-8131

Last seen 9.5 years ago

Germany

Thank you very much Aaron. Very much appreciate it.

Sha

ADD COMMENT • link 9.5 years ago anver • 0