Dear Community,
i would like to ask you an important question about the interpretation of the topTable function output results. Specifically, i know that this is not a general statistics blog, but i checked the argument confint and i used after a specific implementation with limma, i used confint=0.95 in order to return confidence intervals for logFCs.
In detail, here is a small output of some selected genes (after i have subsetted my topTable):
> head(significant, 20) GENE_SYMBOL logFC adj.P.Val MAD CI.L CI.R A_23_P114903 HSPA6 3.595423 0.048103409 3.691256 1.729956 5.4608897 A_24_P245379 SERPINB2 2.910139 0.027037437 2.955397 1.676020 4.1442576 A_23_P161698 MMP3 2.581726 0.022127328 2.692339 1.569174 3.5942793 A_23_P66241 MT1M 2.857147 0.030223818 2.517837 1.601280 4.1130140 A_23_P206724 MT1E 2.574222 0.023364607 2.464120 1.535268 3.6131756 A_32_P87013 CXCL8 3.531625 0.015656484 2.316576 2.304604 4.7586457 A_24_P125096 MT1X 2.528568 0.017441204 2.314788 1.622389 3.4347482 A_23_P37983 MT1B 2.421996 0.018654318 2.282815 1.531431 3.3125618 A_23_P206707 MT1G 2.395756 0.025898285 2.259628 1.395022 3.3964891 A_23_P71037 IL6 2.135955 0.037906167 2.179081 1.119542 3.1523685 A_23_P427703 MT1L 2.364336 0.017614832 2.136130 1.514988 3.2136838 A_23_P163782 MT1HL1 2.284161 0.020399185 2.112673 1.414799 3.1535226 A_23_P315364 CXCL2 2.426432 0.006127387 1.995657 1.900835 2.9520286 A_23_P414343 MT1H 2.380918 0.014319566 1.995214 1.598750 3.1630865 A_23_P365738 ARC 2.102056 0.030560036 1.993690 1.170771 3.0333421 A_23_P1691 MMP1 2.154745 0.020770068 1.982411 1.329721 2.9797693 A_23_P108842 DUSP2 1.972652 0.014964790 1.973428 1.306712 2.6385916 A_23_P54840 MT1A 1.974725 0.024914831 1.839428 1.158456 2.7909943 A_23_P15727 FKBP10 -1.908220 0.040797553 1.724086 -2.840906 -0.9755334 A_24_P251764 CXCL3 1.938549 0.007849170 1.667274 1.424668 2.4524294
Thus, how i can interpret and "evaluate" the returned confidence intervals about a specific gene with a specific logFC ? That for instance, for the first gene, HSPA6 which has a "relatively" big fold change, is more "significant" due to the fact that both CI.L & CI.R >1 ? Or even the case that one of these is >1 ? as here this specific gene is upregulated ? Or my approach to this matter is completely wrong ? For instance, if a gene above with a significant p-value-i.e FDR < 0.05--and a logFC of -0.5, had CI.R=-0.4 & CI.L=-0.8, which is the "evaluation" of this example ?
Thank you,
Konstantinos
Dear Aaron, thank you for your explanation !! By your last explanation, i think i have misinterpret a specific part of the interpretation of the CIs. Thus, the bigger the C.I.s (CI.L and/or CI.R) do not count that much, taking of course into account at the same time that for a specific gene, the adjusted p-value is smaller than a threshold ? i.e. < 0.05 ? Thus, if my notion is correct, an "ideal senario" for a gene would be to have a significant p-value, and a logFC different from zero, but with "narrower" CIs ? like the one example i gave you above ??
Well, "ideal" depends on what you want to do. If you're just interested in whether a gene is DE or not, then looking at the adjusted p-value would be sufficient. Genes with low p-values are often accompanied by non-zero log-fold changes (even more so if you use
treat
) and CIs that do not contain zero; but I don't think you have to explicitly select on the CIs being narrow, that's already considered in the p-value calculation.Yes, if you want a gene to be DE, then a small p-value, large logFC and narrow CI is the "ideal". However (as Aaron says), the latter two things are already built into the p-value so far as statistical significance is concerned.
Dear Aaron, one last comment about the approach you mentioned to "detect" non-significant genes: in this case, except from an adjusted p-value>0.05, for identifying non-DE genes with near zero logFC, i should also state that both CI.I >-1 & CI.R <1 ? in order for the near-zero logFC to be as precise as possible, right ?