Question

Queries related to limma package based analysis of masspectrometry data

1

Entering edit mode

manubhaikp ▴ 10

@manubhaikp-17876

Last seen 5.6 years ago

I am having a query related to VOOM function in the limma package. I am trying to analyze proteomics data using limma to identify the differential expression between two groups which has control(n=3and tumor, n=6). There is only biological replicates and not any technical ones. I have a doubt that the matrix design is correct or not . Also limma can be used for the analysis proteomics data obtained from masspectrometry platform .

design <- model.matrix(~ -1 + > factor(c(1,1,1,1,1,1,2,2,2)))

design <- model.matrix(~-1 + factor(rep(1:2, 1=6,2=3)))

colnames(design) <- c("tumor","control") contrast <- makeContrasts(tumor - control, levels = design)

voom(counts, design = NULL, lib.size = NULL, normalize.method = "none", span = 0.5, plot = FALSE, save.plot = FALSE)

limma • 1.3k views

ADD COMMENT • link updated 5.6 years ago by Gordon Smyth 50k • written 5.6 years ago by manubhaikp ▴ 10

score 1 · Answer 1 · 2018-10-18

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 13 minutes ago

WEHI, Melbourne, Australia

I don't recommend or use voom myself for MS data because there is no concept of sequencing depth for MS data. I instead analyse it much like microarray data, with the linear models fitted to the log-intensities.

A critical decision is how you will impute missing values, when a peptide or protein is not detected at all in a sample. One way is to impute a relatively low intensity value for the missing cases and then assign a low weight to these values in the limma analysis.

ADD COMMENT • link 5.6 years ago Gordon Smyth 50k

0

Entering edit mode

I agree Gordon's comment: use limma but not voom because spectrometer are not counting values but measuring the are of a peak. Imputation of missing is a major point. Missing values are not at random but mainly occur because the intensity of the peak is too low to be captured by the spectrometer. Moreover, intensities are usually reported with high multiplicative coefficient: an intensity of 1e5 is frequently one of the lowest values. I prefer dividing intensities by 1e5 or 1e6 before imputing values using some random values around zero.

ADD REPLY • link 5.6 years ago SamGG ▴ 350

0

Entering edit mode

The MS data was generated in the orbitrap platform of Thermo Fisher Scientific and using proteome discoverer 2.2 the initial data processing was done. In, PD(2.2), there is an option for missing value imputation. The details are listed below. I have a doubt that whether we should consider software-based missing value imputation or the way you have suggested

•PD software missing value imputation- Low Abundance Resampling: Replaces missing values with random values sampled between the minimum and the lower 5 percent of all detected values.

ADD REPLY • link 5.5 years ago manubhaikp ▴ 10

score 0 · Answer 2 · 2018-10-18

The formatting of your post is messed up, but otherwise, the formulation of the design matrix is fine. Note that you should specify your design matrix as design= in the voom call, otherwise the observation weights will not be properly estimated without knowledge of the group-specific means.

It's also likely that you'll need a more sophisticated normalization strategy than normalize.method="none". When this option is set and a count matrix is supplied to voom, library size normalization is performed. This will not be able to remove composition biases (and indeed, the concept of a library size doesn't really make sense for mass spectrometry). I'll admit that I don't know what people usually do for mass spectrometry data, but you can play around with the options - you could set normalize.method='cyclicloess' for example.

More generally, limma is used widely for analyzing mass spectrometry experiments, so I wouldn't worry about that. It's just a weighted linear model at the end of the day, so if you're willing to make normality assumptions about log-abundances, then you're mostly good to go. The only extra thing is the empirical Bayes shrinkage, for which the distributional assumptions are fairly mild - I would be surprised if it was not widely applicable to a wide variety of 'omics data modalities.