Hi, I am using lmFit in R. I am new to R, lmFit and study context, and I highly appreciate your input.
The study concerns which of 150 proteins show different mean values between 2 groups (like cases and controls). Our data are comprised as: 500 rows for 500 unique individuals, and 151 columns for 150 proteins and 1 variable for the 2-level group variable (control=0, case=1).
I have used the R codes that Hamel Patel (2021) Scientific Reports and published at: https://zenodo.org/record/3895886#.YJKg893RbDc I refrain from pasting the codes here as I am not sure whether it is allowed, but the authors made it freely available. Since I am not used to every aspect of this work, I have asked statisticians to verify my script and they assured that I am using correctly given the data.
My question concerns the second part of our study. We have a set of 4 proteins (ie, column variables) derived from another context. We would like to examine whether these 4 proteins are different by the 2 level groups (case vs control). The sample size, 500, remains the same.
Given that lmFit can utilise variance information across dependent variables (ie, 4 proteins), I consider that the use of lmFit is better than analysing each of 4 proteins one by one by linear regressions. However, because 4 proteins would be much smaller number than usual application of lmFit, I would like to ask for your input if there is reasons that lmFit may not be suitable.
Thank you for your time for my questions, in advance.
Is there a precise rule at which number of rows one must apply FDR correction?
If you really want to be precise, then you should apply multiple testing adjustment for any number of proteins that you test. But the adjustment only needs to be over the genes you are interesting in testing, e.g., 4 proteins in this case, not over all proteins given to lmFit.
A general rule is to subset the fit object for the proteins that you want to test, then apply topTable() just to the subsetted object.
With only 4 proteins, I would tend to apply Holm adjustment instead of BH (FDR). That's not a statistical rule, just an interpretation issue. With only 4 proteins, controlling the FDR at a low level is the same as controlling the familywise type I error rate.
Dear Gordnon, Thank you, your reply was very helpful!