Question

lmFit, number of proteins

0

Entering edit mode

AyHi • 0

@3decdc93

Last seen 3.6 years ago

Sweden

Hi, I am using lmFit in R. I am new to R, lmFit and study context, and I highly appreciate your input.

The study concerns which of 150 proteins show different mean values between 2 groups (like cases and controls). Our data are comprised as: 500 rows for 500 unique individuals, and 151 columns for 150 proteins and 1 variable for the 2-level group variable (control=0, case=1).

I have used the R codes that Hamel Patel (2021) Scientific Reports and published at: https://zenodo.org/record/3895886#.YJKg893RbDc I refrain from pasting the codes here as I am not sure whether it is allowed, but the authors made it freely available. Since I am not used to every aspect of this work, I have asked statisticians to verify my script and they assured that I am using correctly given the data.

My question concerns the second part of our study. We have a set of 4 proteins (ie, column variables) derived from another context. We would like to examine whether these 4 proteins are different by the 2 level groups (case vs control). The sample size, 500, remains the same.

Given that lmFit can utilise variance information across dependent variables (ie, 4 proteins), I consider that the use of lmFit is better than analysing each of 4 proteins one by one by linear regressions. However, because 4 proteins would be much smaller number than usual application of lmFit, I would like to ask for your input if there is reasons that lmFit may not be suitable.

Thank you for your time for my questions, in advance.

lmFit • 1.6k views

ADD COMMENT • link 4.6 years ago • updated 4.5 years ago AyHi • 0

score 2 · Accepted Answer · 2021-05-05

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

No, there's no reason not to use lmFit. The only difference is that, with only 4 proteins of interest, you don't need to apply proteomewide FDR adjustment. So, when you examine the final topTable, use the p-values instead of the FDR values for the four proteins of interest.

ADD COMMENT • link 4.6 years ago Gordon Smyth 53k

0

Entering edit mode

Is there a precise rule at which number of rows one must apply FDR correction?

ADD REPLY • link 4.6 years ago ATpoint ★ 5.0k

1

Entering edit mode

If you really want to be precise, then you should apply multiple testing adjustment for any number of proteins that you test. But the adjustment only needs to be over the genes you are interesting in testing, e.g., 4 proteins in this case, not over all proteins given to lmFit.

A general rule is to subset the fit object for the proteins that you want to test, then apply topTable() just to the subsetted object.

With only 4 proteins, I would tend to apply Holm adjustment instead of BH (FDR). That's not a statistical rule, just an interpretation issue. With only 4 proteins, controlling the FDR at a low level is the same as controlling the familywise type I error rate.