Entering edit mode
Dear all,
I would like to appeal to the collective wisdom in this group on how
best
to solve this problem of normalization and batch correction.
We are a service unit for an academic institute and we run several
projects
simultaneously. We use Illumina HT12-v4 microarrays which can take up
to 12
different samples per chip. As we QC the data from one project, the
RNA
from failed samples can be repeated to include into chips from another
project (rather than running partial chips to avoid wastage).
Sometimes we
include samples from other projects also. Here is a simple
illustration
Chip No ScanDate Contents
1 1st July *12 samples from project A*
2 1st July *8 samples from project A* + 4 from
project B
3 1st August 12 samples from Project B
4 1st August *1 sample from Project A* + 5 samples
from
B + 6 from project C
...
What is the best way to prepare the final data for *project A*? One
option
is to do the following:
1. Pool chips 1, 2 and 4 together.
2. Remove failed samples
3. Remove samples from other projects.
4. Normalize using NEQC from limma
5. Correct for scan date using COMBAT from sva.
The other option we considered is to omit step 3 (i.e. use other
samples
for normalization and COMBAT) and subset at the end.
I feel this second option allows for better estimation of batch
effects
(especially in chip 4). However, sometimes project A and B can be
quite
different (e.g. samples derived from different tissues) which might
mess up
the normalization especially if we want to compare project A to B
directly. We
also considered nec() followed by normalizeBetweenArrays with
"Tquantile"
but I felt it was too complicated. Anything else to try?
Thank you.
--
Adaikalavan Ramasamy
Senior Leadership Fellow in Bioinformatics
Head of the Transcriptomics Core Facility
Email: adaikalavan.ramasamy at ndm.ox.ac.uk
Office: 01865 287 710
Mob: 07906 308 465
http://www.jenner.ac.uk/transcriptomics-facility
[[alternative HTML version deleted]]