Two colour arrays - advice on unconnected design, batch effect
0
0
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia
Hi Katrina, Hello from down the road. > Date: Mon, 26 Oct 2009 13:42:30 +1100 > From: "Katrina Bell" <katrina.bell at="" mcri.edu.au=""> > Subject: [BioC] Two colour arrays - advice on unconnected design, > batch effect > To: <bioconductor at="" stat.math.ethz.ch=""> > > > Dear All, > > I have limited experience with analysis of two colour arrays and would > appreciate your thoughts on the follow design matrix I have constructed > and ways to deal with batch effects. It is an unconnected design, of 5 > conditions each with their own reference. I should say, these are > agilent 44k mouse arrays. > > There are 39 arrays in total. There are technical dye swaps (which I > know aren't the best- but its what I have got) and these arrays have > been performed in 3 lots (so 3 batches). I have attempted to take care > of the technical replicates (same mouse/RNA, just labelled in reverse > for a dye swap) using the block function in lmFit > > Targets > SlideNumber Cy3 Cy5 Batch Bioreps > 1 WTCL WTCR 1 1 > 2 CoffeeAL CoffeeAR 1 2 > 3 CoffeeCL CoffeeCR 1 3 > 4 WTBL WTBR 1 4 > 5 WTBR WTBL 1 4 > 6 CoffeeAR CoffeeAL 1 2 > 7 WTCR WTCL 1 1 > 8 CoffeeCL CoffeeCR 1 5 > 9 WTBL WTBR 1 6 > 10 WTAL WTAR 1 7 > 11 CoffeeAL CoffeeAR 1 8 > 12 CoffeeAR CoffeeAL 1 8 > 13 WTAR WTAL 1 7 > 14 WTBR WTBL 1 6 > 15 CoffeeCR CoffeeCL 1 5 > 16 WTAL WTAR 1 9 > 17 WTCR WTCL 1 10 > 18 CoffeeCR CoffeeCL 1 3 > 19 WTAR WTAL 1 9 > 20 WTBR WTBL 2 11 > 21 WTBL WTBR 2 11 > 22 WTCR WTCL 2 12 > 23 WTCL WTCR 2 12 > 24 WTAR WTAL 2 13 > 25 WTAL WTAR 2 13 > 26 CoffeeCR CoffeeCL 2 14 > 27 CoffeeCL CoffeeCR 2 14 > 28 CoffeeAR CoffeeAL 2 15 > 29 CoffeeAL CoffeeAR 2 15 > 30 WTBR WTBL 3 16 > 31 WTBL WTBR 3 16 > 32 WTCR WTCL 3 17 > 33 WTCL WTCR 3 17 > 34 WTAR WTAL 3 18 > 35 WTAL WTAR 3 18 > 36 CoffeeCR CoffeeCL 3 19 > 37 CoffeeCL CoffeeCR 3 19 > 38 CoffeeAR CoffeeAL 3 20 > 39 CoffeeAL CoffeeAR 3 20 > > > > RG <- read.maimages(target, source= "agilent", path="ArrayFiles") > RG <- backgroundCorrect(RG, method="subtract") > > I also tried using normexp, offset 50, but a couple of my arrays M > values really constricted after this... > > RG$genes$Status <-controlStatus(spottypes, RG) > Matching patterns for: ControlType GeneName > Found 43379 probe > Found 604 DarkCorner > Found 14 GE_BrightCorner > Found 1486 controls > Setting attributes: values Color >> w <-modifyWeights(array(1,dim(RG)), RG$genes$Status, c("BrightCorner", "DarkCorner"), c(0,0)) > > bioreps<-c(1,2,3,4,4,2,1,5,6,7,8,8,7,6,5,9,10,3,9,11,11,12,12,13,13, 14,14,15,15,16,16,17,17,18,18,19,19,20,20 ) > MA <-normalizeWithinArrays(RG, weights=w, method='loess') > MA<-normalizeBetweenArrays(MA, method="Aquantile") > > MA.avg <-avereps(MA, ID=MA$genes$ProbeName) > > corfit<-duplicateCorrelation(MA.avg, block=biorep) Need to add the design matrix to the duplicateCorrelation() call. >> corfit$consensus > [1] -0.812968 > > As this is an unconnected design, I followed Gordon's advice in another > posting and made my own design matrix. > > >> design > Dye WTAR WTBR WTCR CoffeeAR CoffeeCr > [1,] 1 0 0 -1 0 0 > [2,] 1 0 0 0 -1 0 > [3,] 1 0 0 0 0 -1 > [4,] 1 0 -1 0 0 0 > [5,] 1 0 1 0 0 0 > [6,] 1 0 0 0 1 0 > [7,] 1 0 0 1 0 0 > [8,] 1 0 0 0 0 -1 > [9,] 1 0 -1 0 0 0 > [10,] 1 -1 0 0 0 0 > [11,] 1 0 0 0 -1 0 > [12,] 1 0 0 0 1 0 > [13,] 1 1 0 0 0 0 > [14,] 1 0 1 0 0 0 > [15,] 1 0 0 0 0 1 > [16,] 1 -1 0 0 0 0 > [17,] 1 0 0 1 0 0 > [18,] 1 0 0 0 0 -1 > [19,] 1 1 0 0 0 0 > [20,] 1 0 1 0 0 0 > [21,] 1 0 -1 0 0 0 > [22,] 1 0 0 1 0 0 > [23,] 1 0 0 -1 0 0 > [24,] 1 1 0 0 0 0 > [25,] 1 -1 0 0 0 0 > [26,] 1 0 0 0 0 1 > [27,] 1 0 0 0 0 -1 > [28,] 1 0 0 0 1 0 > [29,] 1 0 0 0 -1 0 > [30,] 1 0 1 0 0 0 > [31,] 1 0 -1 0 0 0 > [32,] 1 0 0 1 0 0 > [33,] 1 0 0 -1 0 0 > [34,] 1 1 0 0 0 0 > [35,] 1 -1 0 0 0 0 > [36,] 1 0 0 0 0 1 > [37,] 1 0 0 0 0 -1 > [38,] 1 0 0 0 1 0 > [39,] 1 0 0 0 -1 0 > > > fit<- lmFit(MA.avg,design, block=bioreps, cor=corfit$consensus) > fit2 <-eBayes(fit) > WTAR<- topTable(fit2, coef=2, adjust="BH") > > > Is it sensible to make a coefficent for each of the batches in my design > with my set of arrays? So three extra columns? I am unsure if I have > enough information in my arrays for this, and I would appreciated your > advice/ suggestions. I am especially concerned about how to treat the > batch effect as the second batch has some background hybridisation > issues from looking at the FE array images. Although they look OK on > the QC in limma- just more constricted M values than the other arrays, I > am concerned about them. I did remove the whole batch and ran the > analysis with the remaining 29 arrays to gauge what effect they were > having on the analysis and found that I got even less statistically > significant genes. > > So, my questions are; > 1. is the design matrix I constructed OK ? Looks ok, although haven't checked every line. Each coefficient is measuring the R vs L effect for a specific combination of treatments. Assume that's what you want. > 2. How can I deal with the batch effect in my set off arrays. Add it to your design matrix, i.e., an indicator column for batch 2 and one for batch 3. Best wishes Gordon > 3. Any other comments welcome! > > Thanks for any help you are able to give. > Cheers > Katrina
• 753 views
ADD COMMENT

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6