I'll answer your second question first; if you want to get all genes, it should be as easy as specifying n=Inf
in topTable
. This will extract statistics for all genes, sorted by the B-statistic. Of course, you could also use the total number of genes, which can be determined by running nrow
on the MArrayLM
object you supply to topTable
.
As for your first question, topTable
only supports filtering on a minimum log-fold change. If you want something more complicated, you'll have to do it yourself. For example, if you assign the output of topTable
into res
, you could do:
keep <- res$logFC < 1 & res$logFC > -1
res[keep,]
... to get all genes with an estimated log-fold change between -1 and 1 (i.e., no more than 2-fold change in either direction). A more rigorous strategy for identifying genes with near-zero log-fold changes is to use confidence intervals:
res <- topTable(fit,coef=2, confint=0.95) # 95% CIs for the log-fold changes
keep <- res$CI.L > -1 & res$CI.R < 1
res[keep,]
This will identify genes where the 95% confidence interval for the log-fold change lies within -1 and 1. The use of confidence intervals accounts for any uncertainty in log-fold change estimation (e.g., when the data is variable).
However, be warned that filtering on the log-fold change is generally incompatible with filtering on the (adjusted) p-value. Check out the "Note" in ?topTable
:
If the fold changes and p-values are not highly correlated, then the use of a fold change cutoff can increase the false discovery rate above the nominal level.
This is because you can get fairly large log-fold changes when genes have highly variable expression between replicates. Selection for large log-fold changes would result in selection of these variable genes that are not significantly DE. Of course, if you're looking for non-DE genes with near-zero log-fold changes, this is less of a problem. Filtering for large p-values would be ad hoc anyway, as the significance statistics won't tell you whether or not the gene is non-DE (absence of evidence does not equal evidence of absence, and all that).
Thank you very much! It is really helpful. Actually my goal is finding the non DE genes. So, I think I will try filtering using confidence interval like you suggested.