FW: duplicates, technical and biological replicates + dividing a microarray into two parts

0

Entering edit mode

Staninska, Ana, Dr. ▴ 40

@staninska-ana-dr-3914

Last seen 9.6 years ago

Dear BioConductor team, I am working on a statistical analysis of 2 color Genepix data. For this analysis, I am using the Limma package. I am trying to find expressed genes in the treated samples vs the non- treated samples, where each sample is a leaf from a particular tree. I have 5 biological replicates (5 treated and 5 untreated), and for each biological replicates- -- 2 technical replicates, which are dye swapped arrays (10 microarrays total). The spots on each microarray are duplicated: on the left side and on the right side So...The experiment design involves 3 kinds of replicates: within array replicate (duplicate spots), technical replicates (as dye swap), and biological replicates. I know that at the moment it is not possible with limma to treat this kind of experiment, but I have an idea how to avoid duplicate spots (within array replicates), if that is possible. Here I need your help. There are 8x4 (8 rows, 4 columns) print tip groups on the microarrays, and each print tip group is of size 12x8 (but i think that is not relevant for now). The experiment is designed such that the left hand side of the microarray and the right hand side are identical. Basically the duplicate spots are spotted on the left and on the right hand side of the array (if the blocks are numbered 1 through 32, then 1 and 3 are same, 2 and 4, 5 and 7, 6 and 8 etc?.) . So if somehow I can divide my microarray into two peaces and treat the peaces as two separate microarrays, then I will be able to avoid the duplicate spots, and only deal with technical and biological replicates. So if my original microarray consists of 32 blocs (print tip groups), I would like the two new microarrays, called Left_microarray and Right_microarray each to contain 16 blocks, such that the blocks 1,2,5,6,9,10,13,14,17,18,21,22,25,26,29,30 to be in Left_microarray and the remaining blocks 3,4,7,8,11,12,15,16,19,20,23,24,27,28,31,32 to be in the Right microarray. Is this possible? If it is, could you please help me and tell me how to do this? Just in case, I am also sending my R code for the experiment. Thank you very much in advance Ana Staninska Institute of Biomathematics and Biometry Helmholtz-Zentrum M?nchen M?nchen, Deutschand The R-code of the experiment: I tried all the possible cases to deal with the experiment: averaging the within array replicates, treating biological as technical replicates, or treating technIcal as biological replicates. After I ran the R code, I compared the results with the qRT-PCR results previously done for the experiments. The comparison was done such that I took the sum of the absolute values of the subtraction of log FC form qRT-PCR and logFC from my analysis. It turned out that treating technical as biological replicates was the worst possibility, but treating biological as technical replicates was the best. > targets <- readTargets("Lysi_270706.txt") > > myfun<-function(x) { + nored<-abs(x[,"F635 Median"] + x[,"F635 Mean"]) !=0 + nogreen<-abs(x[, "F532 Median"]+x[,"F532 Mean"]) !=0 + as.numeric(nogreen & nored) + } > > RGa <- read.maimages(targets, source="genepix", wt.fun=myfun, other.columns=c("F635 SD","B635 SD","F532 SD","B532 SD","B532 Mean","B635 Mean","F Pixels","B Pixels")) Read Met270706_1_60308.gpr Read Met270706_dw1_110308.gpr Read Met270706_2_060308.gpr Read Met270706_dw2_110308.gpr Read Met270706_3_060308.gpr Read Met270706_dw3_120308.gpr Read Met270706_4_060308.gpr Read Met270706_dw4_120308.gpr Read Met270706_5_060308.gpr Read Met270706_dw5_120308.gpr Read Met270706_6_060308.gpr Read Met270706_dw6_120308.gpr Read Met270706_7_110308.gpr Read Met270706_dw7_120308.gpr Read Met270706_8_220408.gpr Read Met270706_dw8_120308.gpr Read Met270706_9_110308.gpr Read Met270706_dw9_120308.gpr Read Met270706_10_110308.gpr Read Met270706_dw10_120308.gpr > > RG.ne10b <-backgroundCorrect(RGa, method="normexp", , normexp.method="mle", offset=10) Green channel Corrected array 1 Corrected array 2 Corrected array 3 Corrected array 4 Corrected array 5 Corrected array 6 Corrected array 7 Corrected array 8 Corrected array 9 Corrected array 10 Corrected array 11 Corrected array 12 Corrected array 13 Corrected array 14 Corrected array 15 Corrected array 16 Corrected array 17 Corrected array 18 Corrected array 19 Corrected array 20 Red channel Corrected array 1 Corrected array 2 Corrected array 3 Corrected array 4 Corrected array 5 Corrected array 6 Corrected array 7 Corrected array 8 Corrected array 9 Corrected array 10 Corrected array 11 Corrected array 12 Corrected array 13 Corrected array 14 Corrected array 15 Corrected array 16 Corrected array 17 Corrected array 18 Corrected array 19 Corrected array 20 > > MA_l.ne10b <- normalizeWithinArrays(RG.ne10b, method="loess") > > ################################################################# > ### Average of the Duplicate Spots ### > ################################################################# > > MAa_l.ne10b <- avedups(MA_l.ne10b, ndups=2, spacing=192) > design <- modelMatrix(targets, ref="wt") Found unique target names: mu wt > biolrep<-c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10) > > corfita_l.ne10b<-duplicateCorrelation(MAa_l.ne10b, design, block=biolrep) > > fita_l.ne10b<-lmFit(MAa_l.ne10b, design, block=biolrep, cor=corfita_l.ne10b$consensus) > > fita_l.ne10b<-eBayes(fita_l.ne10b) > > TTa_l.ne10b<-topTable(fita_l.ne10b,coef=1, number=1600, adjust="BH") > write.csv(TTa_l.ne10b, file="BC_Lysi_270706a_TTa_l_ne10b.csv") > > ################################################################ > ### BIOLOGICAL AS TECHNICAL ###### > ################################################################ > > corfit_l.ne10b<-duplicateCorrelation(MA_l.ne10b, ndups=2, spacing=192) > > fitbt_l.ne10b<-lmFit(MA_l.ne10b, design, ndups=2, spacing=192, cor=corfit_l.ne10b$consensus) > > fitbt_l.ne10b<-eBayes(fitbt_l.ne10b) > > TTbt_l.ne10b<-topTable(fitbt_l.ne10b,coef=1, number=1600, adjust="BH") > write.csv(TTbt_l.ne10b, file="BC_Lysi_270706a_TTbt_l_ne10b.csv") > > ############################################################### > #### TECNICAL AS BIOLOGICAL #### > ############################################################### > > > > design1<-cbind( + nt1=c( 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), + tr1=c(0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), + nt2=c(0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), + tr2=c(0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), + nt3=c(0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), + tr3=c(0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), + nt4=c(0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0), + tr4=c(0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0), + nt5=c(0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0), + tr5=c(0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0), + nt6=c(0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0), + tr6=c(0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0), + nt7=c(0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0), + tr7=c(0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0), + nt8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0), + tr8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0), + nt9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0), + tr9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0), + nt10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0), + tr10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1)) > > fittb_l.ne10b<-lmFit(MA_l.ne10b, design1, ndups=2, spacing=192,cor=corfit_l.ne10b$consensus) Warning message: Partial NA coefficients for 160 probe(s) > > fittb_l.ne10b<-eBayes(fittb_l.ne10b) > > TTtb_l.ne10b<-topTable(fittb_l.ne10b,coef=1, number=1600, adjust="BH") > write.csv(TTtb_l.ne10b, file="BC_Lysi_270706a_TTtb_l_ne10b.csv") > Ana Staninska Institute of Biomathematics and Biometry Helmholtz-Zentrum M?nchen M?nchen, Deutschand

Microarray limma Microarray limma • 1.5k views

ADD COMMENT • link updated 14.2 years ago by Naomi Altman ★ 6.0k • written 14.2 years ago by Staninska, Ana, Dr. ▴ 40

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

It is probably better to average the 2 spots on the same array. The within array variability is less than the technical replication variability but will be treated as if it were the same if you follow your proposal. The analysis of spot averages is more statistically valid. Regards Naomi Altman At 10:50 AM 2/1/2010, Staninska, Ana, Dr. wrote: >Dear BioConductor team, > > >I am working on a statistical analysis of 2 >color Genepix data. For this analysis, I am using the Limma package. >I am trying to find expressed genes in the >treated samples vs the non-treated samples, >where each sample is a leaf from a particular tree. >I have 5 biological replicates (5 treated and 5 >untreated), and for each biological replicates- >-- 2 technical replicates, which are dye swapped >arrays (10 microarrays total). >The spots on each microarray are duplicated: on >the left side and on the right side > >So...The experiment design involves 3 kinds of >replicates: within array replicate (duplicate >spots), technical replicates (as dye swap), and biological replicates. >I know that at the moment it is not possible >with limma to treat this kind of experiment, but >I have an idea how to avoid duplicate spots >(within array replicates), if that is possible. >Here I need your help. > >There are 8x4 (8 rows, 4 columns) print tip >groups on the microarrays, and each print tip >group is of size 12x8 (but i think that is not relevant for now). >The experiment is designed such that the left >hand side of the microarray and the right hand >side are identical. Basically the duplicate >spots are spotted on the left and on the right >hand side of the array (if the blocks are >numbered 1 through 32, then 1 and 3 are same, 2 >and 4, 5 and 7, 6 and 8 etc .) . >So if somehow I can divide my microarray into >two peaces and treat the peaces as two separate >microarrays, then I will be able to avoid the >duplicate spots, and only deal with technical and biological replicates. >So if my original microarray consists of 32 >blocs (print tip groups), I would like the two >new microarrays, called Left_microarray and >Right_microarray each to contain 16 blocks, such that the blocks >1,2,5,6,9,10,13,14,17,18,21,22,25,26,29,30 to be >in Left_microarray and the remaining blocks >3,4,7,8,11,12,15,16,19,20,23,24,27,28,31,32 to be in the Right microarray. >Is this possible? >If it is, could you please help me and tell me how to do this? > >Just in case, I am also sending my R code for the experiment. > >Thank you very much in advance >Ana Staninska > >Institute of Biomathematics and Biometry >Helmholtz-Zentrum M?nchen >M?nchen, Deutschand > > > >The R-code of the experiment: >I tried all the possible cases to deal with the >experiment: averaging the within array >replicates, treating biological as technical >replicates, or treating technIcal as biological replicates. >After I ran the R code, I compared the results >with the qRT-PCR results previously done for the >experiments. The comparison was done such that I >took the sum of the absolute values of the >subtraction of log FC form qRT-PCR and logFC from my analysis. > It turned out that treating technical as > biological replicates was the worst > possibility, but treating biological as technical replicates was the best. > > > targets <- readTargets("Lysi_270706.txt") > > > > myfun<-function(x) { >+ nored<-abs(x[,"F635 Median"] + x[,"F635 Mean"]) !=0 >+ nogreen<-abs(x[, "F532 Median"]+x[,"F532 Mean"]) !=0 >+ as.numeric(nogreen & nored) >+ } > > > > RGa <- read.maimages(targets, > source="genepix", > wt.fun=myfun, other.columns=c("F635 SD","B635 > SD","F532 SD","B532 SD","B532 Mean","B635 Mean","F Pixels","B Pixels")) >Read Met270706_1_60308.gpr >Read Met270706_dw1_110308.gpr >Read Met270706_2_060308.gpr >Read Met270706_dw2_110308.gpr >Read Met270706_3_060308.gpr >Read Met270706_dw3_120308.gpr >Read Met270706_4_060308.gpr >Read Met270706_dw4_120308.gpr >Read Met270706_5_060308.gpr >Read Met270706_dw5_120308.gpr >Read Met270706_6_060308.gpr >Read Met270706_dw6_120308.gpr >Read Met270706_7_110308.gpr >Read Met270706_dw7_120308.gpr >Read Met270706_8_220408.gpr >Read Met270706_dw8_120308.gpr >Read Met270706_9_110308.gpr >Read Met270706_dw9_120308.gpr >Read Met270706_10_110308.gpr >Read Met270706_dw10_120308.gpr > > > > RG.ne10b <-backgroundCorrect(RGa, > method="normexp", , normexp.method="mle", offset=10) >Green channel >Corrected array 1 >Corrected array 2 >Corrected array 3 >Corrected array 4 >Corrected array 5 >Corrected array 6 >Corrected array 7 >Corrected array 8 >Corrected array 9 >Corrected array 10 >Corrected array 11 >Corrected array 12 >Corrected array 13 >Corrected array 14 >Corrected array 15 >Corrected array 16 >Corrected array 17 >Corrected array 18 >Corrected array 19 >Corrected array 20 >Red channel >Corrected array 1 >Corrected array 2 >Corrected array 3 >Corrected array 4 >Corrected array 5 >Corrected array 6 >Corrected array 7 >Corrected array 8 >Corrected array 9 >Corrected array 10 >Corrected array 11 >Corrected array 12 >Corrected array 13 >Corrected array 14 >Corrected array 15 >Corrected array 16 >Corrected array 17 >Corrected array 18 >Corrected array 19 >Corrected array 20 > > > > MA_l.ne10b <- normalizeWithinArrays(RG.ne10b, method="loess") > > > > ################################################################# > > ### Average of the Duplicate Spots ### > > ################################################################# > > > > MAa_l.ne10b <- avedups(MA_l.ne10b, ndups=2, spacing=192) > > design <- modelMatrix(targets, ref="wt") >Found unique target names: > mu wt > > biolrep<-c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10) > > > > corfita_l.ne10b<-duplicateCorrelation(MAa_l.ne10b, design, block=biolrep) > > > > fita_l.ne10b<-lmFit(MAa_l.ne10b, design, > block=biolrep, cor=corfita_l.ne10b$consensus) > > > > fita_l.ne10b<-eBayes(fita_l.ne10b) > > > > TTa_l.ne10b<-topTable(fita_l.ne10b,coef=1, number=1600, adjust="BH") > > write.csv(TTa_l.ne10b, file="BC_Lysi_270706a_TTa_l_ne10b.csv") > > > > ################################################################ > > ### BIOLOGICAL AS TECHNICAL ###### > > ################################################################ > > > > corfit_l.ne10b<-duplicateCorrelation(MA_l.ne10b, ndups=2, spacing=192) > > > > fitbt_l.ne10b<-lmFit(MA_l.ne10b, design, > ndups=2, spacing=192, cor=corfit_l.ne10b$consensus) > > > > fitbt_l.ne10b<-eBayes(fitbt_l.ne10b) > > > > TTbt_l.ne10b<-topTable(fitbt_l.ne10b,coef=1, number=1600, adjust="BH") > > write.csv(TTbt_l.ne10b, file="BC_Lysi_270706a_TTbt_l_ne10b.csv") > > > > ############################################################### > > #### TECNICAL AS BIOLOGICAL #### > > ############################################################### > > > > > > > > design1<-cbind( >+ nt1=c( 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr1=c(0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt2=c(0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr2=c(0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt3=c(0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr3=c(0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt4=c(0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr4=c(0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt5=c(0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0), >+ tr5=c(0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0), >+ nt6=c(0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0), >+ tr6=c(0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0), >+ nt7=c(0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0), >+ tr7=c(0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0), >+ nt8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0), >+ tr8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0), >+ nt9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0), >+ tr9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0), >+ nt10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0), >+ tr10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1)) > > > > fittb_l.ne10b<-lmFit(MA_l.ne10b, > design1, ndups=2, spacing=192,cor=corfit_l.ne10b$consensus) >Warning message: >Partial NA coefficients for 160 probe(s) > > > > fittb_l.ne10b<-eBayes(fittb_l.ne10b) > > > > TTtb_l.ne10b<-topTable(fittb_l.ne10b,coef=1, number=1600, adjust="BH") > > write.csv(TTtb_l.ne10b, file="BC_Lysi_270706a_TTtb_l_ne10b.csv") > > > > >Ana Staninska > >Institute of Biomathematics and Biometry >Helmholtz-Zentrum M?nchen >M?nchen, Deutschand > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 14.2 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Dear Natali Altman, Thank you very much for your answer. >From your answer, now I have more questions. I hope you could help me answer them. I apologize in advance, if the questions are basic, but I am new in the statistic field and I never saw a microarray in my life. 1. Now that you mention, I can see that the within array variability should be smaller then the technical variability, but I cannot understand why treating them as the same, should be less statistically valid then averaging the duplicate spots. How could one judge what is statistically more valid? Could you maybe tell me where I could read more about this, so I will know more and I won't make the same mistakes again? 2, I should have probably mentioned before, the correlation between my duplicate spots (calculated with duplicateCorelation function in Limma) is in the range (0.5,0.6), and the correlation between my technical replicates is in the range (-0.3, -0.2). So I think the duplicates spots are not well correlated, and averaging them we will lose valuable information. If I do averaging of the spots, should I do it before or after normalization? 3. I already have qRT-PCR results for several genes available, and I compared those results with the results that I got by using different methods (averaging the duplicate spots, treating biological as technical replicates, or treating technical as biological replicates). Every time I got the worst results when I treated technical as biological, and the best when I treated biological as technical replicates (in biological as technical, I used duplicateCorrelation function on the duplicate spots, but I didn't use blocks for the biological replicates in the lmFit function). I thought that by finding the closest results (from stat analysis) to the qRT-PCR results, I will find the most statistically valid method.. Shouldn't this be true? 4. Now back to my original question. Is ti possible to split a microarray into two peaces and treat each peace as a separate microarray for the sake of analysis? If it is, how could I do it? Thank you very much in advance, Best regards, Ana ________________________________________ From: Naomi Altman [naomi@stat.psu.edu] Sent: Monday, February 01, 2010 10:11 PM To: Staninska, Ana, Dr.; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] FW: duplicates, technical and biological replicates + dividing a microarray into two parts It is probably better to average the 2 spots on the same array. The within array variability is less than the technical replication variability but will be treated as if it were the same if you follow your proposal. The analysis of spot averages is more statistically valid. Regards Naomi Altman At 10:50 AM 2/1/2010, Staninska, Ana, Dr. wrote: >Dear BioConductor team, > > >I am working on a statistical analysis of 2 >color Genepix data. For this analysis, I am using the Limma package. >I am trying to find expressed genes in the >treated samples vs the non-treated samples, >where each sample is a leaf from a particular tree. >I have 5 biological replicates (5 treated and 5 >untreated), and for each biological replicates- >-- 2 technical replicates, which are dye swapped >arrays (10 microarrays total). >The spots on each microarray are duplicated: on >the left side and on the right side > >So...The experiment design involves 3 kinds of >replicates: within array replicate (duplicate >spots), technical replicates (as dye swap), and biological replicates. >I know that at the moment it is not possible >with limma to treat this kind of experiment, but >I have an idea how to avoid duplicate spots >(within array replicates), if that is possible. >Here I need your help. > >There are 8x4 (8 rows, 4 columns) print tip >groups on the microarrays, and each print tip >group is of size 12x8 (but i think that is not relevant for now). >The experiment is designed such that the left >hand side of the microarray and the right hand >side are identical. Basically the duplicate >spots are spotted on the left and on the right >hand side of the array (if the blocks are >numbered 1 through 32, then 1 and 3 are same, 2 >and 4, 5 and 7, 6 and 8 etc?.) . >So if somehow I can divide my microarray into >two peaces and treat the peaces as two separate >microarrays, then I will be able to avoid the >duplicate spots, and only deal with technical and biological replicates. >So if my original microarray consists of 32 >blocs (print tip groups), I would like the two >new microarrays, called Left_microarray and >Right_microarray each to contain 16 blocks, such that the blocks >1,2,5,6,9,10,13,14,17,18,21,22,25,26,29,30 to be >in Left_microarray and the remaining blocks >3,4,7,8,11,12,15,16,19,20,23,24,27,28,31,32 to be in the Right microarray. >Is this possible? >If it is, could you please help me and tell me how to do this? > >Just in case, I am also sending my R code for the experiment. > >Thank you very much in advance >Ana Staninska > >Institute of Biomathematics and Biometry >Helmholtz-Zentrum M?nchen >M?nchen, Deutschand > > > >The R-code of the experiment: >I tried all the possible cases to deal with the >experiment: averaging the within array >replicates, treating biological as technical >replicates, or treating technIcal as biological replicates. >After I ran the R code, I compared the results >with the qRT-PCR results previously done for the >experiments. The comparison was done such that I >took the sum of the absolute values of the >subtraction of log FC form qRT-PCR and logFC from my analysis. > It turned out that treating technical as > biological replicates was the worst > possibility, but treating biological as technical replicates was the best. > > > targets <- readTargets("Lysi_270706.txt") > > > > myfun<-function(x) { >+ nored<-abs(x[,"F635 Median"] + x[,"F635 Mean"]) !=0 >+ nogreen<-abs(x[, "F532 Median"]+x[,"F532 Mean"]) !=0 >+ as.numeric(nogreen & nored) >+ } > > > > RGa <- read.maimages(targets, > source="genepix", > wt.fun=myfun, other.columns=c("F635 SD","B635 > SD","F532 SD","B532 SD","B532 Mean","B635 Mean","F Pixels","B Pixels")) >Read Met270706_1_60308.gpr >Read Met270706_dw1_110308.gpr >Read Met270706_2_060308.gpr >Read Met270706_dw2_110308.gpr >Read Met270706_3_060308.gpr >Read Met270706_dw3_120308.gpr >Read Met270706_4_060308.gpr >Read Met270706_dw4_120308.gpr >Read Met270706_5_060308.gpr >Read Met270706_dw5_120308.gpr >Read Met270706_6_060308.gpr >Read Met270706_dw6_120308.gpr >Read Met270706_7_110308.gpr >Read Met270706_dw7_120308.gpr >Read Met270706_8_220408.gpr >Read Met270706_dw8_120308.gpr >Read Met270706_9_110308.gpr >Read Met270706_dw9_120308.gpr >Read Met270706_10_110308.gpr >Read Met270706_dw10_120308.gpr > > > > RG.ne10b <-backgroundCorrect(RGa, > method="normexp", , normexp.method="mle", offset=10) >Green channel >Corrected array 1 >Corrected array 2 >Corrected array 3 >Corrected array 4 >Corrected array 5 >Corrected array 6 >Corrected array 7 >Corrected array 8 >Corrected array 9 >Corrected array 10 >Corrected array 11 >Corrected array 12 >Corrected array 13 >Corrected array 14 >Corrected array 15 >Corrected array 16 >Corrected array 17 >Corrected array 18 >Corrected array 19 >Corrected array 20 >Red channel >Corrected array 1 >Corrected array 2 >Corrected array 3 >Corrected array 4 >Corrected array 5 >Corrected array 6 >Corrected array 7 >Corrected array 8 >Corrected array 9 >Corrected array 10 >Corrected array 11 >Corrected array 12 >Corrected array 13 >Corrected array 14 >Corrected array 15 >Corrected array 16 >Corrected array 17 >Corrected array 18 >Corrected array 19 >Corrected array 20 > > > > MA_l.ne10b <- normalizeWithinArrays(RG.ne10b, method="loess") > > > > ################################################################# > > ### Average of the Duplicate Spots ### > > ################################################################# > > > > MAa_l.ne10b <- avedups(MA_l.ne10b, ndups=2, spacing=192) > > design <- modelMatrix(targets, ref="wt") >Found unique target names: > mu wt > > biolrep<-c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10) > > > > corfita_l.ne10b<-duplicateCorrelation(MAa_l.ne10b, design, block=biolrep) > > > > fita_l.ne10b<-lmFit(MAa_l.ne10b, design, > block=biolrep, cor=corfita_l.ne10b$consensus) > > > > fita_l.ne10b<-eBayes(fita_l.ne10b) > > > > TTa_l.ne10b<-topTable(fita_l.ne10b,coef=1, number=1600, adjust="BH") > > write.csv(TTa_l.ne10b, file="BC_Lysi_270706a_TTa_l_ne10b.csv") > > > > ################################################################ > > ### BIOLOGICAL AS TECHNICAL ###### > > ################################################################ > > > > corfit_l.ne10b<-duplicateCorrelation(MA_l.ne10b, ndups=2, spacing=192) > > > > fitbt_l.ne10b<-lmFit(MA_l.ne10b, design, > ndups=2, spacing=192, cor=corfit_l.ne10b$consensus) > > > > fitbt_l.ne10b<-eBayes(fitbt_l.ne10b) > > > > TTbt_l.ne10b<-topTable(fitbt_l.ne10b,coef=1, number=1600, adjust="BH") > > write.csv(TTbt_l.ne10b, file="BC_Lysi_270706a_TTbt_l_ne10b.csv") > > > > ############################################################### > > #### TECNICAL AS BIOLOGICAL #### > > ############################################################### > > > > > > > > design1<-cbind( >+ nt1=c( 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr1=c(0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt2=c(0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr2=c(0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt3=c(0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr3=c(0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt4=c(0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0), >+ tr4=c(0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0), >+ nt5=c(0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0), >+ tr5=c(0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0), >+ nt6=c(0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0), >+ tr6=c(0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0), >+ nt7=c(0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0), >+ tr7=c(0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0), >+ nt8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0), >+ tr8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0), >+ nt9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0), >+ tr9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0), >+ nt10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0), >+ tr10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1)) > > > > fittb_l.ne10b<-lmFit(MA_l.ne10b, > design1, ndups=2, spacing=192,cor=corfit_l.ne10b$consensus) >Warning message: >Partial NA coefficients for 160 probe(s) > > > > fittb_l.ne10b<-eBayes(fittb_l.ne10b) > > > > TTtb_l.ne10b<-topTable(fittb_l.ne10b,coef=1, number=1600, adjust="BH") > > write.csv(TTtb_l.ne10b, file="BC_Lysi_270706a_TTtb_l_ne10b.csv") > > > > >Ana Staninska > >Institute of Biomathematics and Biometry >Helmholtz-Zentrum M?nchen >M?nchen, Deutschand > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 14.2 years ago Staninska, Ana, Dr. ▴ 40

0

Entering edit mode

Dear Ana, To give Naomi some rest, perhaps I can help with some answers: > > 1. Now that you mention, I can see that the within array variability > should be smaller then the technical variability, > but I cannot understand why treating them as the same, should be less > statistically valid then averaging the duplicate spots. > How could one judge what is statistically more valid? Could you maybe tell > me where I could read more about this, > so I will know more and I won't make the same mistakes again? If you treat within array variability the same as between array variability you would give every replicated spot on the array the same importance as you would give an extra reading from another array. If you now imagine the ideal case that you have a very low within array variability then basically the replicates on the same array will more or less give you identical results, i.e these spots wouldn't add any information, but you would treat them as if they did. To give an example, imagine you have three replicated spots on each of 2 array. Array 1 gives you the values 2.9,3.0, 3.1, Array 2 gives you 6.9,7.0,7.1. If you average you reduce it to 3 and 7 within an overall average of 5 and a standard error (the number that measures how well you estimate the total average) of 2. If you take all 6 values and treat them as independent replicates you end up with the same mean of 5, but the standard error reduces to 0.9. This means that by neglecting the different types of variation you create the false impression of a more precise result. > > 2, I should have probably mentioned before, the correlation between my > duplicate spots (calculated with duplicateCorelation function in Limma) > is in the range (0.5,0.6), and the correlation between my technical > replicates is in the range (-0.3, -0.2). > So I think the duplicates spots are not well correlated, and averaging > them we will lose valuable information. The first question to answer is: why is the correlation so poor? It is most likely to indicate poor array quality. The information you get from the duplicates is not of biological interest, it only tells you something about the within array-variability. In that sense it is valuable information but not as far as the quantity of biological interest (gene expression) is concerned. > If I do averaging of the spots, should I do it before or after > normalization? I would always do that after normalization. Hope that helps a bit to understand it all. Claus

ADD REPLY • link 14.2 years ago Claus Mayer ▴ 340

0

Entering edit mode

Thank you very much, Best, Ana ________________________________________ From: Claus Mayer [claus@bioss.ac.uk] Sent: Wednesday, February 03, 2010 4:09 PM To: Staninska, Ana, Dr.; Naomi Altman; bioconductor at stat.math.ethz.ch Subject: RE: [BioC] FW: duplicates, technical and biological replicates + dividing a microarray into two parts Dear Ana, To give Naomi some rest, perhaps I can help with some answers: > > 1. Now that you mention, I can see that the within array variability > should be smaller then the technical variability, > but I cannot understand why treating them as the same, should be less > statistically valid then averaging the duplicate spots. > How could one judge what is statistically more valid? Could you maybe tell > me where I could read more about this, > so I will know more and I won't make the same mistakes again? If you treat within array variability the same as between array variability you would give every replicated spot on the array the same importance as you would give an extra reading from another array. If you now imagine the ideal case that you have a very low within array variability then basically the replicates on the same array will more or less give you identical results, i.e these spots wouldn't add any information, but you would treat them as if they did. To give an example, imagine you have three replicated spots on each of 2 array. Array 1 gives you the values 2.9,3.0, 3.1, Array 2 gives you 6.9,7.0,7.1. If you average you reduce it to 3 and 7 within an overall average of 5 and a standard error (the number that measures how well you estimate the total average) of 2. If you take all 6 values and treat them as independent replicates you end up with the same mean of 5, but the standard error reduces to 0.9. This means that by neglecting the different types of variation you create the false impression of a more precise result. > > 2, I should have probably mentioned before, the correlation between my > duplicate spots (calculated with duplicateCorelation function in Limma) > is in the range (0.5,0.6), and the correlation between my technical > replicates is in the range (-0.3, -0.2). > So I think the duplicates spots are not well correlated, and averaging > them we will lose valuable information. The first question to answer is: why is the correlation so poor? It is most likely to indicate poor array quality. The information you get from the duplicates is not of biological interest, it only tells you something about the within array-variability. In that sense it is valuable information but not as far as the quantity of biological interest (gene expression) is concerned. > If I do averaging of the spots, should I do it before or after > normalization? I would always do that after normalization. Hope that helps a bit to understand it all. Claus

ADD REPLY • link 14.2 years ago Staninska, Ana, Dr. ▴ 40

Login before adding your answer.