VOOM vs VSN vs log2 to transform proteome data before LIMMA analysis
2
2
Entering edit mode
@sebastian-hesse-18351
Last seen 3 months ago
Germany / Munich / Dr.von Hauner Childr…

I am analysing an proteome dataset, derived from DIA mass spec with quantified values from Biognosys Spectronaut 11, and I would like to use LIMMA for differential expression analysis. Im just a beginner and not very experienced yet so please forgive me if my question isn't perfectly stated.

According to the LIMMA manual, one should test for heteroscedasticity and correct for it if present.

I tested my data using a linear model testing I found online. The test, output below, showed heteroscedasticity to be present in the raw data. The strange thing is that LIMMA did not change that as these after transformation still shows heteroscedasticity (though with a higher but still very low Breusch-Pagan test p-value)

I started playing with the data, testing also log2 transformation and VSN stabilisation. All matrixes continue to show heteroscedasticity if tested but after VSN transformation the meanSdPlot(vsnMatrix) shows the "straightest" line. (compared with the meanSdPlot of raw, voom or log2 matrix)

After reviewing the results below, what would you recommend me to do? Simply log2 transformation, VOOM or SVN before LIMMA? How badly does it affect results if only log2 transformed data are used?

PS: if you know how to resize pictures postet here let me know :)

Analysis: - Data: raw expression matrix with optionally: voom, log2, SVN transformation. - Analysis (shown for rawMatrix, others simply with matrix after transformation)

variance_rawMatrix <- apply(raw_matrix, 1, var)
mean_rawMatrix <- apply(raw_matrix, 1, mean)
var_and_mean_rawMatrix <- data.frame("variance" = variance_noVoom,
"mean" = mean_noVoom,
row.names = names(variance_noVoom))

lmMod_rawMatrix <- lm(variance ~ mean, data=var_and_mean_rawMatrix)
par(mfrow=c(2,2))
plot(lmMod_rawMatrix)
mtext("Linear model quality testing of raw data", side = 3, line = -1.5, outer = TRUE)

lmtest::bptest(lmMod_rawMatrix)


Transformation

 voomMatrix <- voom(rawMatrix, design = design, plot = T)
log2Matrix <- log2(rawMatrix)
vsnMatrix <- normalizeVSN(rawMatrix)


Results:

Raw matrix: Breusch-Pagan: BP = 2264.5, df = 1, p-value < 2.2e-16

VOOM matrix BP = 20.044, df = 1, p-value = 7.567e-06

log2 matrix BP = 25.474, df = 1, p-value = 4.483e-07

SVN matrix BP = 42.718, df = 1, p-value = 6.323e-11

r limma voom vsn • 1.9k views
0
Entering edit mode

Is it maybe an option to use the log2 matrix and then use either trend = T and/or robust = T ?

3
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

The short answer is that you almost certainly should just log2 transform the mass spec values, after adding a constant to avoid zeros, and apply limma and eBayes with trend=TRUE and robust=TRUE. There is no possible advantage in using voom or vsn.

But I make no guarantees. I do not know what a Biognosys Spectronaut 11 is and I don't know how you have quantified your data. Analysis of proteomics data is very difficult because of non-detection issues and the best way to handle that has not been worked out. It is clear from your separate question about eBayes that your data indeed has a serious issue with NA values that likely arise from non-detection.

The limma User's Guide does not advise you to test for heteroscedasticity. The Guide instead advises you to check for between-sample variability using MDS plots and to view the mean-variance relationship in the data using plotSA.

The mean-variance modelling you have done is far too simple to tell you anything about your data. It is not appropriate to compute means and variances as if all the samples belonged to the same treatment group. There's no need for an ad hoc analysis like this anyway and a Breusch-Pagan test is just about the last thing you need in my opinion. The limma plotSA and eBayes functions are much more relevant.

1
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…

A thoughtful approach to differential detection for label-free (LFQ) mass spec data is Constantin Ahlmann-Eltze's and Simon Anders' proDD. It explicitly takes into account and models the non-detections. The current state of this work is here: https://github.com/const-ae/proDD and I understand this is going to be submitted to Bioconductor and a paper is being written (a thesis describing it already exists).

The usefulness of variance stabilizing transformation for isobaric tag (iTRAQ) mass spec data has been explored, e.g, here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2938101 . At the very least, it effectively chooses for you, in a data-driven manner, the added constant $c$ in the $\log(x+c)$ transformation(*) that Gordon advises in his above post - perhaps in a more satisfactory manner than eye-balling or pontification. It's often also helpful for visualizing the data - which IMHO should always precede and succeed any model fitting such as that done by limma.

(*) The precise form of the vsn transformation is $\log(y+\sqrt{y^2+1}$ with $y=(x+a)/b$ and two real-valued parameters $a$ and $b$ that are fitted from the data, and is based on statistical error modeling. In effect, the function often looks very similar to $\log(x+c)$ within the data range of $x$.