Question

Normalizing expression across complex array designs

0

Entering edit mode

shaybenelazar • 0

@shaybenelazar-20397

Last seen 5.5 years ago

Hi, I am relatively new to both Limma and R so excuse me if this is going to sound trivial to some. I am trying to do a meta-analysis on miRNA data from 4 different studies, each with its own caveats, to name a few:

Only 2/4 studies use the same Microarray technology (and chip design)
All studies have additional clinical labels (e.g. a measured tumor grade). Some label are available for all studies, but some are not.
One study is two-channel (common reference), the rest are one-channel.
One study has technical replicates

My current goal is to have a single normalized expression matrix, here is what I currently do: 1. Foreach dataset: 1.1 Load as single channel, filtering control & flagged genes 1.2 Apply backgroundCorrect(normexp) 1.3 Take median per probeid 1.4 Map probeids to miRNA accessions 1.5 Take median per miRNA 2. Join datasets on miRNA accession 3. Quantile normalize across datasets

Open questions: 1. Is it better to Quantile normalize each study separately before joining them? 2. Do you recommend using duplicateCorrelation separately on the study with replicates (before/after joining)? How do I apply the resulting coefficients to get a corrected dataset? 3. Should I be using an lmFit model to normalize the combined dataset somehow? If yes - how? 4. Should I be using clinical labels for normalization (blocking)? 5. Any recommendations on best practices I seem to be skipping?

Appreciated!

limma • 357 views

ADD COMMENT • link updated 5.5 years ago by James W. MacDonald 67k • written 5.5 years ago by shaybenelazar • 0

score 1 · Answer 1 · 2019-04-05

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 19 minutes ago

United States

That's not really how meta-analysis works. To magically normalize out all the technical differences between the four data sets yet keep all the real biological differences would be some arcane magic indeed. Instead what one usually does is to make comparisons within each data set, and then combine the statistics as part of a meta-analysis to see if you have evidence for consistent results. You could use the GeneMeta package for that, or maybe OMICsPCA, or made4.

I would also note that this might be a bit advanced for someone who is 'relatively new to both limma and R', and unless you have an abiding interest in knowing how to do this sort of thing (and the wherewithal to learn, probably on your own), you might consider finding somebody local who can either provide structured advice or do it for you.

ADD COMMENT • link 5.5 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks, James, I appreciate your advice!

If I understand correctly - you are saying its impossible to pool multiple datasets and benefit from the added statistical power of increased sample size? Intuitively, I feel this should be possible to some extent, and I am surprised this isn't considered a common practice.

Unfortunately, I know of no one locally with this precise expertise, and I have no choice but to pursue this problem myself. I do have a strong CS background, so I expect to be able to make up what I lack with some general guidance.

ADD REPLY • link 5.5 years ago shaybenelazar • 0