lmFit, number of proteins
1
0
Entering edit mode
AyHi • 0
@3decdc93
Last seen 13 hours ago
Sweden

Hi, I am using lmFit in R. I am new to R, lmFit and study context, and I highly appreciate your input.

The study concerns which of 150 proteins show different mean values between 2 groups (like cases and controls). Our data are comprised as: 500 rows for 500 unique individuals, and 151 columns for 150 proteins and 1 variable for the 2-level group variable (control=0, case=1).

I have used the R codes that Hamel Patel (2021) Scientific Reports and published at: https://zenodo.org/record/3895886#.YJKg893RbDc I refrain from pasting the codes here as I am not sure whether it is allowed, but the authors made it freely available. Since I am not used to every aspect of this work, I have asked statisticians to verify my script and they assured that I am using correctly given the data.

My question concerns the second part of our study. We have a set of 4 proteins (ie, column variables) derived from another context. We would like to examine whether these 4 proteins are different by the 2 level groups (case vs control). The sample size, 500, remains the same.

Given that lmFit can utilise variance information across dependent variables (ie, 4 proteins), I consider that the use of lmFit is better than analysing each of 4 proteins one by one by linear regressions. However, because 4 proteins would be much smaller number than usual application of lmFit, I would like to ask for your input if there is reasons that lmFit may not be suitable.

Thank you for your time for my questions, in advance.

lmFit • 210 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

No, there's no reason not to use lmFit. The only difference is that, with only 4 proteins of interest, you don't need to apply proteomewide FDR adjustment. So, when you examine the final topTable, use the p-values instead of the FDR values for the four proteins of interest.

ADD COMMENT
0
Entering edit mode

Is there a precise rule at which number of rows one must apply FDR correction?

ADD REPLY
1
Entering edit mode

If you really want to be precise, then you should apply multiple testing adjustment for any number of proteins that you test. But the adjustment only needs to be over the genes you are interesting in testing, e.g., 4 proteins in this case, not over all proteins given to lmFit.

A general rule is to subset the fit object for the proteins that you want to test, then apply topTable() just to the subsetted object.

With only 4 proteins, I would tend to apply Holm adjustment instead of BH (FDR). That's not a statistical rule, just an interpretation issue. With only 4 proteins, controlling the FDR at a low level is the same as controlling the familywise type I error rate.

ADD REPLY
0
Entering edit mode

Dear Gordnon, Thank you, your reply was very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6