Hi, I was wondering if I could ask for confirmation regarding my understanding of logFC values. So I know when you get a negative logFC value, the gene is underexpressed, and when you get a positive logFC value, the gene is overexpressed. But is there a cutoff value?
I read in an article that they used the cut off value of 1.5: that a gene has to be below -1.5 to be underexpressed, and above 1.5 to be overexpressed, and anything in between in not differentially expressed. However, how do you assume that value? And is there a way to derive the cut-off?
No, there is no general objective justification for any particular log-fold change threshold. Mathematically speaking, it is possible to reject the null hypothesis at any non-zero log-fold change if the variability is low enough. One could argue that small log-fold changes are not biologically relevant, but the exact definition of "small" is open to interpretation. Larger log-fold changes are also more robustly detected across technologies (e.g., RNA-seq and qPCR), though selecting a threshold on this basis would depend on the sensitivities of the technologies involved. Somewhere between 1.1 to 1.5 is a common choice for a "sensible" threshold.
But all this is getting away from the main point, which is the detection of DE genes. If you want to do this in a statistically rigorous manner, use the BH-adjusted p-values to control the false discovery rate. This ensures that the expected proportion of false positives in your set of significant DE genes is below a certain threshold (usually 5%). Now, you might say that this approach also involves the selection of an arbitrary threshold. However, with this approach, at least the choice of threshold is directly related to the probability of whether the genes are truly DE or not. A log-fold change threshold doesn't tell you much about the error rate, as it doesn't account for the variability of the expression values.
Finally, if you do need a log-fold change threshold, the treat function should be used, and DE genes selected on the basis of the adjusted p-values. This ensures that the FDR is controlled while only considering genes with log-fold changes above a minimum value.
Hi just incase there are no DG genes identified. How to proceed ? Is it always necessary to get DE genes? Would be happy to get some resources to share data where it is possible to show that "there are no DE genes" . The findings could be "biologically" significant although statistically "not significant"