Question

significance of changes in TMT proteomics data using vsn2 and limma

0

Entering edit mode

lukas.burger • 0

@lukasburger-7892

Last seen 8.6 years ago

Switzerland

I am working on TMT-labelled proteomics data (3 conditions in triplicates). Looking at the mean-variance relationship of my data on the peptide-level, it appears to be in good agreement with an additive-multiplicative error model, so I used vsn2 to transform the data. Now that the variance is stabilized, I believe I could directly use limma to asses the significance of changes on the peptide level. However, I would like to have a model for the significance on the protein level, which takes into account the number of detected peptides per protein (which is quite variable) and assigns higher significance to proteins with several peptides showing consistent changes than with just a single peptide showing the same change. Is there a way to use limma (on the vsn-transformed data) to this end?

limma vsn2 • 2.9k views

ADD COMMENT • link updated 8.6 years ago by Ryan C. Thompson ★ 7.9k • written 8.6 years ago by lukas.burger • 0

0

Entering edit mode

I can't speak for what happens with proteomics data, but in general, a variance-stabilizing normalization is not a prerequisite for analyses with limma. Instead, you can model the mean-variance relationship by running eBayes with trend=TRUE.

Edit: To be clear, I'm referring to the VSN procedure done by method="vsn". Most analyses start off with log-transformed intensities, which already stabilizes the variance a bit. My point is that we usually don't bother with more sophisticated stabilization procedures, and trust limma (or voom, for RNA-seq) to handle the modelling of the mean-variance relationship.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Aaron - that's a surprising statement, chapters 6 and 8 of the limma users guide recommend log-transformation and background correction, which together have an approximate variance-stabilising effect. Are you saying one shouldn't do this and just feed untransformed intensities into limma? (for microarrays, or any other technology)

ADD REPLY • link 8.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

You're right, my apologies; I was referring specifically to the method="vsn" option in normalizeBetweenArrays. Comment's been amended.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k

score 1 · Answer 1 · 2015-09-28

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 9 months ago

Scripps Research, La Jolla, CA

Limma fits a separate linear model to each "feature", which in your case is peptides. However, you could use a self-contained gene set test like roast (also part of limma) to combine the results for proteins with multiple peptides, giving a p-value for each protein representing the null hypothesis that none of its peptides are differentially expressed. You would treat each protein as a "gene set" consisting of all the peptides associated with it.

Also, if your proteomics data consists of discrete counts of peptides, you may want to try either voom or edgeR if their assumptions match your data, since both are designed precisely for analyzing count data. (roast is also available for these analyses)

ADD COMMENT • link 8.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Ryan, I think Lukas is looking for something more robust, and possibly more aware of technology-specific effects, than meta-analysis techniques from gene set enrichment analysis.

E.g. in the maxquant paper http://www.nature.com/nbt/journal/v26/n12/full/nbt.1511.html they quantify the protein with the median of the peptide-level data, in http://www.mcponline.org/content/9/9/1885.long we (vaguely) recommended trimmed mean. Others on this list probably know about more sophisticated summarisation methods.

vsn2 is designed precisely for data where the variance v depends on the mean m through a relationship of the form v(m) = c*m^2 + b. One can easily check the fit of this assumption on real data. Statements of the form 'method X was designed for count data' on the other hand seem less verifiable.

ADD REPLY • link 8.6 years ago Wolfgang Huber ★ 13k