What type of data normalization for multiple microarrays?
3
0
Entering edit mode
@ewelina-dratkiewicz-11421
Last seen 5.0 years ago
Poland

Hello,

I'm new to R and I have a little problem with data analysis. I was asked to create a correlation plot for 2 genes expressed in melanoma cells. I downloaded data from GEO (for 14 data sets, 2 types of similar microaarays), made Expression Sets, normalized with RMA, substracted data for 2 genes and compiled it into one matrix. To every sample I assigned two traits - cell type (normal, primary, metastasis and so on) and number of data set it was substracted from. Then I created simple plot to observe how my data looks like (without multiple sets normalization Spearman's correlation coefficient is above 0,5 with really low p-value). Now I would like to remove any differences between data sets - if I understood it correctly I should remove batch effect with e.g. ComBat. And here's my question - should I assume one batch equals one data set (or one data set contains more batches (differences in data collection dates and so on))? Is ComBat or SVA the best method for this particular case? And should I perform this normalization on whole data matrices (how?) and extract data for my 2 genes of interest?

I'm sorry if my post is a little chaotic but I'm still learning how to use R. I will be really greatful for your advice. 

normalization microarray combat • 1.2k views
ADD COMMENT
2
Entering edit mode
@manimaran_1975-8939
Last seen 5.5 years ago
United States
Hi, Please check out the new Shiny App R-package called BatchQC, which will let you easily do what you want. You can adjust for Batch using ComBat or SVA and compare the results, all with a click of a few buttons. Please check out the following application note in Bioinformatics journal that we just published: “BatchQC: interactive software for evaluating sample and batch effects in genomic data” Solaiappan Manimaran, Heather Marie Selby, Kwame Okrah, Claire Ruberman, Jeffrey T. Leek, John Quackenbush, Benjamin Haibe-Kains, Hector Corrada Bravo and W. Evan Johnson http://bioinformatics.oxfordjournals.org/content/early/2016/08/30/bioinformatics.btw538 BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. BatchQC can also apply existing adjustment tools and allow users to evaluate their benefits interactively. BatchQC is available from Bioconductor at the following link: http://bioconductor.org/packages/BatchQC Best, Mani (Solaiappan Manimaran) From: Ewelina Dratkiewicz [bioc] [mailto:noreply@bioconductor.org] Sent: Monday, September 5, 2016 8:27 AM To: manimaran_1975@hotmail.com Subject: [bioc] What type of data normalization for multiple microarrays? Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Ewelina Dratkiewicz<https: support.bioconductor.org="" u="" 11421=""/> wrote Question: What type of data normalization for multiple microarrays?<https: support.bioconductor.org="" p="" 86786=""/>: Hello, I'm new to R and I have a little problem with data analysis. I was asked to create a correlation plot for 2 genes expressed in melanoma cells. I downloaded data from GEO (for 14 data sets, 2 types of similar microaarays), made Expression Sets, normalized with RMA, substracted data for 2 genes and compiled it into one matrix. To every sample I assigned two traits - cell type (normal, primary, metastasis and so on) and number of data set it was substracted from. Then I created simple plot to observe how my data looks like (without multiple sets normalization Spearman's correlation coefficient is above 0,5 with really low p-value). Now I would like to remove any differences between data sets - if I understood it correctly I should remove batch effect with e.g. ComBat. And here's my question - should I assume one batch equals one data set (or one data set contains more batches (differences in data collection dates and so on))? Is ComBat or SVA the best method for this particular case? And should I perform this normalization on whole data matrices (how?) and extract data for my 2 genes of interest? I'm sorry if my post is a little chaotic but I'm still learning how to use R. I will be really greatful for your advice. ________________________________ Post tags: normalization, microarray, combat You may reply via email or visit What type of data normalization for multiple microarrays?
ADD COMMENT
0
Entering edit mode

Thank you very match for this great suggestion! I will try to apply it to my data.

Best,

Ewelina

ADD REPLY
1
Entering edit mode
polemiraza ▴ 60
@polemiraza-11428
Last seen 20 months ago

Hello,

People assume of "what is batch" depending on their capabilities and/or precaution (it can be chip scan date*; dataset, microarray platform). Batch adjustments may significantly bias  group (phenotype) differences if your study design is unbalanced (groups are not evenly distributed across batches). From your description I deduce that you will have highly unbalanced design. There is many combination of steps you could take and none of them does  guarantee success.

It will be necessary for you to monitor your data (by using plots) after each important step.

On the assumption that you have unbalanced design and two platforms I would suggest one of the ways:

- normalize  each platform  (with RMA); use BrainArray mappings

- use Combat**  to merge data from both platforms (I assume that phenotypes on both platforms are more or less equal in therms of quantity) - here the platform is the batch.

or

- normalize  each platform (with RMA); use BrainArray mappings

- apply Combat for each platform separately (define "batch" as "data set" again be aware of phenotype composition of your "batches")

- perform scaling between platforms to merge data

or

You may try my suggestions without batch correction (eg. normalize and than scale). Sometimes the use of batch correction might be more harmful to data than not using it.

Cheers,

Pawel

*You can extract scan date from each .cel file

**Perform batch correction on the whole matrices. It is easer to get batch corrected matrix in Combat (that what you want for correlation analysis). SVA works  good in differential expression setting and is considered to be more "hard-core" to the data than Combat which makes it dangerous in the hands of inexperienced user.

 

 

ADD COMMENT
0
Entering edit mode

Thank you! I will try to apply your advice to my data and see what happens.

Best regards,

Ewelina

ADD REPLY
0
Entering edit mode

Can I ask another question? Because I tried to find some tools to merge my data but: VirtualArray is outdated and I'm unable to install this package and InsilicoDB is taking ages to process only one dataset. Can you recommend anything?

ADD REPLY
0
Entering edit mode
polemiraza ▴ 60
@polemiraza-11428
Last seen 20 months ago

Cześć Ewelina,

Please clarify:

Substracted means extracted?

14 datasets means 14 independent experiments (consisted of some number of samples, each)?

What kind of platforms they represent?

Best,

Pawel

ADD COMMENT
0
Entering edit mode

Hello,

yes, sorry for bad word choice - I meant extracted (according to names of probes corresponding to my genes of interest). And yes, these are 14 independent experiments (maybe only 2 of them are performed by the same group of scientists), each containing from 6 to about 80 samples. All experiments were performed using Affymetrix Human Genome U133A Array (GPL96) or it's derivatives (Plus, 2.0). 

Hope it clafiries it a little bit,

Ewelina

ADD REPLY

Login before adding your answer.

Traffic: 251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6