Hi, I am relatively new to both Limma and R so excuse me if this is going to sound trivial to some. I am trying to do a meta-analysis on miRNA data from 4 different studies, each with its own caveats, to name a few:
- Only 2/4 studies use the same Microarray technology (and chip design)
- All studies have additional clinical labels (e.g. a measured tumor grade). Some label are available for all studies, but some are not.
- One study is two-channel (common reference), the rest are one-channel.
- One study has technical replicates
My current goal is to have a single normalized expression matrix, here is what I currently do: 1. Foreach dataset: 1.1 Load as single channel, filtering control & flagged genes 1.2 Apply backgroundCorrect(normexp) 1.3 Take median per probeid 1.4 Map probeids to miRNA accessions 1.5 Take median per miRNA 2. Join datasets on miRNA accession 3. Quantile normalize across datasets
Open questions: 1. Is it better to Quantile normalize each study separately before joining them? 2. Do you recommend using duplicateCorrelation separately on the study with replicates (before/after joining)? How do I apply the resulting coefficients to get a corrected dataset? 3. Should I be using an lmFit model to normalize the combined dataset somehow? If yes - how? 4. Should I be using clinical labels for normalization (blocking)? 5. Any recommendations on best practices I seem to be skipping?
Appreciated!
Thanks, James, I appreciate your advice!
If I understand correctly - you are saying its impossible to pool multiple datasets and benefit from the added statistical power of increased sample size? Intuitively, I feel this should be possible to some extent, and I am surprised this isn't considered a common practice.
Unfortunately, I know of no one locally with this precise expertise, and I have no choice but to pursue this problem myself. I do have a strong CS background, so I expect to be able to make up what I lack with some general guidance.