Question

Two colour arrays - advice on unconnected design, batch effect

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen just now

WEHI, Melbourne, Australia

Hi Katrina, Hello from down the road. > Date: Mon, 26 Oct 2009 13:42:30 +1100 > From: "Katrina Bell" <katrina.bell at="" mcri.edu.au=""> > Subject: [BioC] Two colour arrays - advice on unconnected design, > batch effect > To: <bioconductor at="" stat.math.ethz.ch=""> > > > Dear All, > > I have limited experience with analysis of two colour arrays and would > appreciate your thoughts on the follow design matrix I have constructed > and ways to deal with batch effects. It is an unconnected design, of 5 > conditions each with their own reference. I should say, these are > agilent 44k mouse arrays. > > There are 39 arrays in total. There are technical dye swaps (which I > know aren't the best- but its what I have got) and these arrays have > been performed in 3 lots (so 3 batches). I have attempted to take care > of the technical replicates (same mouse/RNA, just labelled in reverse > for a dye swap) using the block function in lmFit > > Targets > SlideNumber Cy3 Cy5 Batch Bioreps > 1 WTCL WTCR 1 1 > 2 CoffeeAL CoffeeAR 1 2 > 3 CoffeeCL CoffeeCR 1 3 > 4 WTBL WTBR 1 4 > 5 WTBR WTBL 1 4 > 6 CoffeeAR CoffeeAL 1 2 > 7 WTCR WTCL 1 1 > 8 CoffeeCL CoffeeCR 1 5 > 9 WTBL WTBR 1 6 > 10 WTAL WTAR 1 7 > 11 CoffeeAL CoffeeAR 1 8 > 12 CoffeeAR CoffeeAL 1 8 > 13 WTAR WTAL 1 7 > 14 WTBR WTBL 1 6 > 15 CoffeeCR CoffeeCL 1 5 > 16 WTAL WTAR 1 9 > 17 WTCR WTCL 1 10 > 18 CoffeeCR CoffeeCL 1 3 > 19 WTAR WTAL 1 9 > 20 WTBR WTBL 2 11 > 21 WTBL WTBR 2 11 > 22 WTCR WTCL 2 12 > 23 WTCL WTCR 2 12 > 24 WTAR WTAL 2 13 > 25 WTAL WTAR 2 13 > 26 CoffeeCR CoffeeCL 2 14 > 27 CoffeeCL CoffeeCR 2 14 > 28 CoffeeAR CoffeeAL 2 15 > 29 CoffeeAL CoffeeAR 2 15 > 30 WTBR WTBL 3 16 > 31 WTBL WTBR 3 16 > 32 WTCR WTCL 3 17 > 33 WTCL WTCR 3 17 > 34 WTAR WTAL 3 18 > 35 WTAL WTAR 3 18 > 36 CoffeeCR CoffeeCL 3 19 > 37 CoffeeCL CoffeeCR 3 19 > 38 CoffeeAR CoffeeAL 3 20 > 39 CoffeeAL CoffeeAR 3 20 > > > > RG <- read.maimages(target, source= "agilent", path="ArrayFiles") > RG <- backgroundCorrect(RG, method="subtract") > > I also tried using normexp, offset 50, but a couple of my arrays M > values really constricted after this... > > RG$genes$Status <-controlStatus(spottypes, RG) > Matching patterns for: ControlType GeneName > Found 43379 probe > Found 604 DarkCorner > Found 14 GE_BrightCorner > Found 1486 controls > Setting attributes: values Color >> w <-modifyWeights(array(1,dim(RG)), RG$genes$Status, c("BrightCorner", "DarkCorner"), c(0,0)) > > bioreps<-c(1,2,3,4,4,2,1,5,6,7,8,8,7,6,5,9,10,3,9,11,11,12,12,13,13, 14,14,15,15,16,16,17,17,18,18,19,19,20,20 ) > MA <-normalizeWithinArrays(RG, weights=w, method='loess') > MA<-normalizeBetweenArrays(MA, method="Aquantile") > > MA.avg <-avereps(MA, ID=MA$genes$ProbeName) > > corfit<-duplicateCorrelation(MA.avg, block=biorep) Need to add the design matrix to the duplicateCorrelation() call. >> corfit$consensus > [1] -0.812968 > > As this is an unconnected design, I followed Gordon's advice in another > posting and made my own design matrix. > > >> design > Dye WTAR WTBR WTCR CoffeeAR CoffeeCr > [1,] 1 0 0 -1 0 0 > [2,] 1 0 0 0 -1 0 > [3,] 1 0 0 0 0 -1 > [4,] 1 0 -1 0 0 0 > [5,] 1 0 1 0 0 0 > [6,] 1 0 0 0 1 0 > [7,] 1 0 0 1 0 0 > [8,] 1 0 0 0 0 -1 > [9,] 1 0 -1 0 0 0 > [10,] 1 -1 0 0 0 0 > [11,] 1 0 0 0 -1 0 > [12,] 1 0 0 0 1 0 > [13,] 1 1 0 0 0 0 > [14,] 1 0 1 0 0 0 > [15,] 1 0 0 0 0 1 > [16,] 1 -1 0 0 0 0 > [17,] 1 0 0 1 0 0 > [18,] 1 0 0 0 0 -1 > [19,] 1 1 0 0 0 0 > [20,] 1 0 1 0 0 0 > [21,] 1 0 -1 0 0 0 > [22,] 1 0 0 1 0 0 > [23,] 1 0 0 -1 0 0 > [24,] 1 1 0 0 0 0 > [25,] 1 -1 0 0 0 0 > [26,] 1 0 0 0 0 1 > [27,] 1 0 0 0 0 -1 > [28,] 1 0 0 0 1 0 > [29,] 1 0 0 0 -1 0 > [30,] 1 0 1 0 0 0 > [31,] 1 0 -1 0 0 0 > [32,] 1 0 0 1 0 0 > [33,] 1 0 0 -1 0 0 > [34,] 1 1 0 0 0 0 > [35,] 1 -1 0 0 0 0 > [36,] 1 0 0 0 0 1 > [37,] 1 0 0 0 0 -1 > [38,] 1 0 0 0 1 0 > [39,] 1 0 0 0 -1 0 > > > fit<- lmFit(MA.avg,design, block=bioreps, cor=corfit$consensus) > fit2 <-eBayes(fit) > WTAR<- topTable(fit2, coef=2, adjust="BH") > > > Is it sensible to make a coefficent for each of the batches in my design > with my set of arrays? So three extra columns? I am unsure if I have > enough information in my arrays for this, and I would appreciated your > advice/ suggestions. I am especially concerned about how to treat the > batch effect as the second batch has some background hybridisation > issues from looking at the FE array images. Although they look OK on > the QC in limma- just more constricted M values than the other arrays, I > am concerned about them. I did remove the whole batch and ran the > analysis with the remaining 29 arrays to gauge what effect they were > having on the analysis and found that I got even less statistically > significant genes. > > So, my questions are; > 1. is the design matrix I constructed OK ? Looks ok, although haven't checked every line. Each coefficient is measuring the R vs L effect for a specific combination of treatments. Assume that's what you want. > 2. How can I deal with the batch effect in my set off arrays. Add it to your design matrix, i.e., an indicator column for batch 2 and one for batch 3. Best wishes Gordon > 3. Any other comments welcome! > > Thanks for any help you are able to give. > Cheers > Katrina

• 753 views

ADD COMMENT • link 14.5 years ago Gordon Smyth 50k