Question

limma: vooma mean-variance trend and data filtration

0

Entering edit mode

Lily Cheang ▴ 10

@c3c2f95c

Last seen 25 days ago

Australia

Enter the body of text here

hi, I am working on a GEO microarray dataset, 5 groups were defined and assigned to the Expressionset.

Beform vooma I used exprs(gset) <- normalizeBetweenArrays(exprs(gset)) after log2 transformation,

and used gset <- gset[complete.Cases(exprs(gset)), ] to remove missing value, as suggested in the manual.

so when I run vooma, I see this mean-variance trend, enter image description here

I understand this graph does not reflect data quality, but I wonder if

a) is vooma appropriate for weighted distribution? or should I use voomaByGroup instead?

b) is there any more data filtering process I need to add beform vooma and lmFit in this case?

or in other words, how can I ensure that my data is properly processed?

Many thanks! Lily C

MicroarrayData limma • 457 views

ADD COMMENT • link 7 weeks ago • updated 4 weeks ago Lily Cheang ▴ 10

score 0 · Answer 1 · 2024-03-06

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 26 minutes ago

WEHI, Melbourne, Australia

Microarray data processing and filtering is very much platform-specific and it is hard to give any advice without knowing the details of the dataset you are working on.

In my opinion, microarray data should never contain missing values when properly processed. So, if I were you, I would be revisiting the preprocessing of the GEO dataset.

In general, vooma has the much the same requirements as any limma microarray analysis and you would just do the same filtering and normalization as usual. I don't understand what you mean by a "weighted distribution" or why you would need voomaByGroup.

Added Later

I was a bit puzzled by your question because you say you're following "the manual" but actually the approach and code you show is not recommended anywhere in the limma documentation or User's Guide. After some Google searching, it seems to me that you are following code from this paper:

Islam, S., Kitagawa, T., Baron, B., Abiko, Y., Chiba, I., & Kuramitsu, Y. (2021). ITGA2, LAMB3, and LAMC2 may be the potential therapeutic targets in pancreatic ductal adenocarcinoma: an integrated bioinformatics analysis. Scientific Reports, 11(1), 10563. https://www.nature.com/articles/s41598-021-90077-x

This paper isn't official limma documentation. Usually we recommend a limma-trend approach for microarray data rather than the vooma approach in this paper, not because vooma is wrong but just because limma-trend achieves the same end without requiring weights.

ADD COMMENT • link 7 weeks ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon, Thank you very much for your clarification to my question. I must admit that my wording in describing my issues was not precise enough, and I apologize for the confusion.

I was first frustrated with the mean-variance trend that started from 2 instead of 0, therefore wondered if I needed to further filter my data. The "manual" I mentioned refers to the pdf manual on Bioconductor, limma package. It is also where I found the voomaByGroup() from.

Where I adapted the normalizeBetweenArrays() and complete.Cases() was from the GEO2R R script before I could adjust and analyze the GEO dataset locally in R.

Originally, I was stuck at obtaining the full statistic after lmFit() and topTable() from each contrast when I started to set contrast more than one, hence the idea to export "each set of statistics data" one by one in multiple voomaByGroup(). After your reply I went back and was able to extract by using the coef in topTable().

After all, thank you so much for your patience when I was poorly describing my question. Thank you and thank you !

Lily C

ADD REPLY • link 6 weeks ago Lily Cheang ▴ 10

0

Entering edit mode

Hi Lily,

The voom methods all require some filtering of low expression genes so, yes, almost certainly you should do some filtering.

In my answer, I hinted that you should let us know which GEO dataset you are working on so that we know which type of microarray platform it is and hence could give more specific advice. You haven't quite taken my hint yet!

Regards Gordon

ADD REPLY • link 6 weeks ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon,

apologies that I did not get your hint before! It is the GSE45291 dataset. Thanks again for the reminder and patience!

Lily C

ADD REPLY • link 5 weeks ago Lily Cheang ▴ 10

1

Entering edit mode

OK, this is Affymetrix data that is already gcrma normalized. Most of the steps you are using are unnecessary! There's no need for complete.cases or normalizeBetweenArrays or vooma. See the Affymetrix case study in the limma User's Guide.

I would suggest a little bit of filtering:

keep <- rowMeans(exprs(gset)) > 3
gset <- gset[keep,]

Other than that, you can go straight into linear modelling:

fit <- lmFit(gset, design)
fit <- eBayes(fit, trend=TRUE)
plotSA(fit)

No need for weights.

ADD REPLY • link 5 weeks ago Gordon Smyth 50k

1

Entering edit mode

thank you SO MUCH for the help Gordon! I will have a try later!

ADD REPLY • link 4 weeks ago Lily Cheang ▴ 10