Question

Differential protein/biomarker expression using limma: is that possible?

1

Entering edit mode

r.i.s.alnuwaysir ▴ 10

@d1feef57

Last seen 18 months ago

Rijnsburg

Hey there! Hope everyone is doing well :)

I have a question regarding using LIMMA package for data that is not RNA-seq nor microarray. I have a dataset of protein/biomarker quantification(around 365 proteins) and I would like to get log-fold changes(i.e. using differential protein expression) based on my conditions of interest. However, the used measurement technique for the dataset I have(Proximity Extension Assay technology) does not provide absolute expression/quantification, but normalized protein expression (NPX). NPX(click here for more details) is an arbitrary unit on Log2 scale. These normalized expressions(their expressions are normally distributed) are highly correlated with absolute quantification of proteins(spearman's correlation can reach up to 0.85 for the same proteins).

My understanding is since the data that I have is normalized quantification/expression, which is similar to what is done in microarrays(normalized Microarray intensity values), the data I have can be analyzed using the same pipelines for microarrays. However, in the user guide of limma, I could not find an explanation/mention about whether limma could also be used in such settings/data/applications.

My questions are :

1) is it possible to use limma to find differentially expressed proteins in my case? Also, is it a valid way for such analysis?

2) if yes, should I also set trend=T and robust =T or just use the normal pipeline?

3) if that's not possible, any thoughts or suggestions to do differential protein expression?

Thank you very much in advance for your help!

NOTE: this question has already been posted in BIOSTARS, but reposted here after a suggestion from ATpoint

Proteomics NPX limma • 3.7k views

ADD COMMENT • link updated 3 months ago by briankleiboeker • 0 • written 3.1 years ago by r.i.s.alnuwaysir ▴ 10

score 1 · Answer 1 · 2021-03-16

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

I have no experience with NPX but, from the information you give here, limma should be analyse it using the same pipeline as for single channel microarrays. It sounds analogous to PCR data for which limma has been used successfully.

I don't know whether trend=TRUE will be necessary. Try it and see. Use plotSA(fit) to examine the trend. Same with robust=TRUE.

ADD COMMENT • link 3.1 years ago Gordon Smyth 50k

0

Entering edit mode

Dear professor Gordon Smyth, Thank you very much for your answers, I really appreciate your help and time!

I have 3 follow-up questions if I may:

1) I did the analyis using trend&robust=F and =T. However, I find it difficult to interpert the mean-variance relationship plotted using the plotSA function. I don't know what to conclude from the plots. Would you please comment on the plots below(I also included a density plot produced using the plotDensities function from the limma package to double check whether there could be anything wrong going on?)

2) Is normal distribution a requirement for analyzing proteins using the limma pipeline? I understood that limma fits a linear model and there are assumptions made about the data. I am wondering if this should be met to be able to use the package, which brings me to question 3.

3) Is it possible to combine proteins measured using different techniques and are in a different units? I am worried whether their inclusion could influence the results. In my case, I have 365 proteins in the NPX unit measured by olinks. However, I also have other proteins measured using competitive ELISA and would like to include them in the analyisis.I logged 2 transformed the ones measured via ELISA to make them in the same scale as NPX(log2 scale) and to make them normally distributed. Is such approach valid to combine proteins measured differently? Or are there any other requirements that should be met to be able to combine them?

Thank you very much in advance!

enter image description here

ADD REPLY • link 3.1 years ago r.i.s.alnuwaysir ▴ 10

2

Entering edit mode

The mean-variance plot shows no trend so you can set trend=FALSE. robust=TRUE does find a few outliers but will probably have only a small effect. Again you could set robust=FALSE.

A normal distribution is not a requirement in the way you think it is. From the evidence you present, limma should work fine.

I don't recommend combining proteins measured by different technologies because the different technologies may have different precisions. Better to analyse them separately. If you feel you must combine them, then run arrayWeights() with var.group=technology to allow for different precisions between the two groups.

ADD REPLY • link 3.1 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Dr. Smyth,

Sorry to bump an old thread but my question is directly related to OP's (mods please let me know if it would be preferable that I start a new thread for this instead).

You said that it is logical to follow the limma single channel microarray pipeline for analyzing Proximity Extension Assay data. Do you think that it would be advisable to normalize data in this case?

As you can see from the result of both OP and my plotDensities functions, the data densities vary substantially. Thanks!

enter image description here

My question may be better stated with boxplots: Do you think it is advisable to normalize the data like you show in Fig2 of this paper?

enter image description here

ADD REPLY • link 3 months ago briankleiboeker • 0

1

Entering edit mode

As I told OP, I have no experience with Proximity Extension Assays. I cannot advise you how to normalize or process data that I don't know anything about.

Normalization might well be a good idea but I wonder whether the zeros are real observations or a code for missing values. There are plenty of normalization methods for single-channel microarray data that could be applied but the zeros may need special treatment. You seem to have removed the zeros from your boxplots of log-intensities, so perhaps your intention is to treat them as NA. (BTW the "Intensity" in your first plot and the "LogIntensity" in your second plot seem to be the same thing, except for removing the zeros.) One approach would be to apply quantile normalization to the log-intensities with NAs. That approach is common for proteomics data but whether it is appropriate in your case I can't say.

You ask about a figure in one of my papers, but that paper is about RNA-seq and you certainly can't normalize proteomic data in the same way as is done for RNA-seq.

Your question is different from OP's because OP said their data was already normalized, so a fresh question rather than a comment would be better.

ADD REPLY • link 3 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you for your response! I understand that you aren't familiar with this type of assay and I will have to figure out the specifics myself, I was mostly seeking your general opinion on if this might be a time when normalization is indeed warranted. Your thoughts on this topic were helpful as always

ADD REPLY • link 3 months ago briankleiboeker • 0