I have applied lmFit to RNAseq data following the code below:
## Fit the model with counts data
fit_counts = lmFit(eset_counts, design)
fit_counts = eBayes(fit_counts)
## Use the contrast matrix to make the comparisons
fit_counts_cont = contrasts.fit(fit_counts, contrast)
fit_counts_cont = eBayes(fit_counts_cont)
I have been able to decide the fit based on the p-value and the log fold change (see code below for a p-value of 0.01 and a lfc=1). But would also like to do this based on the B stats (lods). Is this possible? Is there any parameter that can be included on the decideTests function to do this?
The underlying idea of using also the B stats emerged from the fact that, even using an adjusted p-value of lower than 0.01, we still have ~7000 differentially expressed genes... which is a lot! So, we starting wondering how we could trim down our data using other statistical criteria and came up with the idea of applying different stats "one on top of the other":
1) adjusted p.value <= 0.01
2) log fold change >= 1
3) B stats >= 0
Following your suggestions, I will now do steps 2) and 3) ad hoc.
We don't recommend using statistics "one on top of the other".
As Aaron and I have said in our answers, we instead recommend substituting treat() in place of eBayes(). That will trim down the number of DE genes in the best possible way.
No, decideTests() only works on p-values. This is for two reasons. First, decideTests() is intended to collate statistical tests (hence the name), whereas the B-statistics are a Bayesian quantity. Second, the B-statistics are not on an absolute scale because the prior expected proportion of null hypothesis is by default put to an arbitrary value. See the 'proportion' argument of the eBayes function.
It is possible to estimate the proportion of null hypotheses using the propTrueNull() function, and that would allow the creation of an objective B-statistic cutoff. But for our analyses we just use the p-values.
As Aaron says, we no longer recommend use of lfc in decideTests. The help page for decideTests says:
Note
Although this function enables users to set p-value and lfc cutoffs simultaneously, this is not generally recommended. If the fold changes and p-values are not highly correlated, then the use of a fold change cutoff can increase the false discovery rate above the nominal level. Users wanting to use fold change thresholding are recommended to use treat instead of eBayes, and to leave lfc at the default value when using decideTests.
Firstly, you're working with the BH-adjusted p-value, not the raw p-value. Secondly, you shouldn't use a log-fold change threshold in conjunction with an (adjusted) p-value threshold. If you want to control the size of the log-FC, you should be using the treat function instead of eBayes. Finally, you'll have to filter on the log-odds manually:
This assumes that you've only got one contrast, otherwise you'll have to specify coef in topTable (as B-statistics aren't computed for multiple contrasts). Though, I'm not sure why you would ever want to filter on the B-statistic. Filtering on the adjusted p-value is more sensible, as you're controlling the false discovery rate. Combining filters across multiple statistics is generally ad hoc.
I have not same yet very similar question since I have already asked in stack I would like to share the link here may I request you to have a look and possible solution what and how to go ahead
Thank you for the explanations!
The underlying idea of using also the B stats emerged from the fact that, even using an adjusted p-value of lower than 0.01, we still have ~7000 differentially expressed genes... which is a lot! So, we starting wondering how we could trim down our data using other statistical criteria and came up with the idea of applying different stats "one on top of the other":
1) adjusted p.value <= 0.01
2) log fold change >= 1
3) B stats >= 0
Following your suggestions, I will now do steps 2) and 3) ad hoc.
More suggestions/comments are welcome! :)
Thanks!
We don't recommend using statistics "one on top of the other".
As Aaron and I have said in our answers, we instead recommend substituting treat() in place of eBayes(). That will trim down the number of DE genes in the best possible way.