Normalization between arrays for common reference, time course and direct two color designs

0

Entering edit mode

Vinoy Kumar Ramachandran ▴ 50

@vinoy-kumar-ramachandran-1966

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061207/ 2277a20a/attachment.pl

• 900 views

ADD COMMENT • link updated 19.1 years ago by Weiyin Zhou ▴ 220 • written 19.1 years ago by Vinoy Kumar Ramachandran ▴ 50

0

Entering edit mode

Weiyin Zhou ▴ 220

@weiyin-zhou-1970

Last seen 11.4 years ago

Hi Jenny, I have related problem with Agilent two-color array. All of the spots are duplicated twice (have same "ProbeName", except those positive and negative controls, which are duplicated multiple times. Column "ControlType" can identify their type. I use limma package to input data (ProcessedSignal, which is already background corrected and loess normalized), then I did between array quantile normalization. Before I do lmFit and differential expression analysis, I think I should remove those control spots and also average duplicated spots. So I can have p value for each unique ProbeName. I just tried your code, But get error massage. > MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),] Error: object "MA.norm" not found Could you give me some advice? Thanks in advance, Weiyin Zhou Statistics and Data Analyst ExonHit Therapeutics, Inc. 217 Perry Parkway, Building # 5 Gaithersburg, MD 20877 email: Weiyin.zhou at exonhit-usa.com phone: 240.404.0184 fax: 240.683.7060 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny Drnevich Sent: Thursday, December 07, 2006 12:17 PM To: Vinoy Kumar Ramachandran Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Normalization between arrays for common reference, time course and direct two color designs Hi Vinoy, It's better to keep the discussions on the list for other users that may have the same question. If they are not evenly spaced, after the normalizations you can rearrange the MA object so that they are evenly spaced, at least the 90% that are spotted twice. The ones that are spotted 26 times are likely some sort of control spots, and you can probably safely ignore them. Why are some spotted three times? If you want to keep these genes in, a quick-and-dirty solution would be to just pick two of the three spots. The following code *should* work to rearrange the order of the genes, then pick out the first two spots for each unique ID. MA.norm <- MA.norm[order(MA.norm$genes$ID),] x <- unique(MA.norm$genes$ID) MA.norm$genes$spotrep <- NULL # I'm sure there's a better, faster way to do the following, but this is the only way I know how: for (i in 1:length(x)) { y <- which( MA.norm$genes$ID == x[i] ) MA.norm$genes$spotrep[y] <- 1:length(y) } MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ] # now your spacing=1 and ndups=2 HTH, Jenny At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote: >Hi Jenny, > >Thanks a lot for the valuable information. I will try to do loess first >and tehn doa scale if necessary. With regarding the correlation in the >LmFit, my the spots in the array are not evenly spaced and not evenly >replicated, 90% spots are spotted twice, 8% are thrice and 2% spots are >spotted 26 times.I found this code in a posting in the Limma user forum >and try to adapt the code to my data. Is there any other elegant way to >deal with this kind of replication ? > >once again thanks for the information > >with regards, >vinoy >On 12/7/06, Jenny Drnevich <<mailto:drnevich at="" uiuc.edu="">drnevich at uiuc.edu> >wrote: >Hi Vinoy, > >Using the 'Gquantile' between-array normalization is not appropriate in >your case because your reference is not always in the Green channel. The >values you are using for Exp3 and Exp6 in the linear model are actually >from the reference, so it's no wonder your gene lists don't make sense. To >clarify, the discussion we were having recently on the mailing list about >using Gquantile is when your experimental samples are expected to be VERY >different from the reference, such that the assumption of a within-array >normalization may not be met. In your case (and in most reference designs) >you probably meet the assumptions of most genes not changing, and so should >first do a within-array loess-type normalization to help remove dye bias. >Then check to see if the resulting distributions of M values are similar >between arrays. If they are very different, and you would expect them not >to be very different, do a between-array normalization on the M values - >the scale method of 'normalizeBetweenArrays' is my favorite. The design >matrix you have below will correctly adjust for dye swaps, assuming that >the 'dye swaps' are all biological replicates and not technical replicates. > >I'm a little confused about the way you're calling the 'lmFit' function. >Your arrays appear to have duplicate spots, but you have the correlation as >zero. Something is very wrong with your arrays if there is zero correlation >between the duplicate spots! I suggested you read the limma vignette very >closely, especially the sections on common reference designs and >within-array replicate spots. > >Good luck, >Jenny > >At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote: > > Dear Limma users, > > > >I am working on custom spotted 70mer oligo arrays, and use Bluefuse to > >analyse the images. With the help of the excellent user guide and > >Bioconductor user forum(GMANE), i have analysed my direct comparison > >experiements. I also have common reference, time course and direct two color > >design type experiments to analyse. I have read the recent posting in the > >list about using Rquantile or Gquantile for normalizing between arrays in > >common reference experiments. I tried to do a common references analysis > >using the discussed code.But the resulting gene list is different from the > >expected list.i am also wondering how to account for dye swaps. I have > >pasted the code which i used for common reference. > > > >It will also be very useful if you any one could tell me how to use > >normalization between arrays for direct two color designs. > > > >My experiment design is > > Cy3 Cy5 > >____________________ > >Exp1 Ref CpdA > >Exp2 Ref CpdA > >Exp3 CpdA Ref > > > >Exp4 Ref CpdB > >Exp5 Ref CpdB > >Exp6 CpdB Ref > > > >Code which i used for analysing common referencec: > >--------------------------------------------------------------------- -- -- > ------------------------------------------------ > >library(limma) > >targets <- readTargets("commonref.txt", row.names= "Name") > >RG <- read.maimages(targets$FileName, source="bluefuse") > >RG$genes <- readGAL() > >RG$printer <- getLayout(RG$genes) > >spottypes <- readSpotTypes() > >RG$genes$Status <- controlStatus(spottypes, RG) > >isGene <- RG$genes$Status == "oligos" > >MA.Gquantile <- normalizeBetweenArrays(RG[isGene,], method="Gquantile") > >RG.Gquantile <- RG.MA(MA.Gquantile) > >MA.dummy <- MA.Gquantile > >MA.dummy$M <- log2(RG.Gquantile$R) > >o <- order(MA.dummy$genes$ID) > >MA.sorted <- MA.dummy[o,] > >design <- modelMatrix(targets, ref="Ref") > >fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0) > >fit.eb <- eBayes(fit) > >write.fit(fit.eb, file="data/commonref.xls", adjust="BH") > >--------------------------------------------------------------------- -- -- > -------------------------------------------------------- > > > >thanks in advacne > > > >with regards, > >Vinoy...... > > > > [[alternative HTML version deleted]] > > > >_______________________________________________ > >Bioconductor mailing list > ><mailto:bioconductor at="" stat.math.ethz.ch="">Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > ><http: news.gmane.org="" gmane.science.biology.informatics.conductor="">ht tp :/ > /news.gmane.org/gmane.science.biology.informatics.conductor > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: <mailto:drnevich at="" uiuc.edu="">drnevich at uiuc.edu > > > > >-- >Vinoy...... Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 19.1 years ago Weiyin Zhou ▴ 220

0

Entering edit mode

Weiyin Zhou ▴ 220

@weiyin-zhou-1970

Last seen 11.4 years ago

Hi Jenny, Thanks a lot for your help. I used following code: > MA <- MA[order(MA$genes$ProbeName),] > x <- unique(MA$genes$ProbeName) > MA.norm$genes$spotrep <- NULL > for (i in 1:length(x)) { y <- which( MA$genes$ProbeName == x[i] ) MA$genes$spotrep[y] <- 1:length(y) } Error in `$<-.data.frame`(`*tmp*`, "spotrep", value = c(1, 2, 3, 4, 5, : replacement has 314 rows, data has 44202 "44202" is my total rows. The "314" is total number of negative duplicated probes (all have same names). They are at the first 314 rows after probes being ordered according to their ProbeName I checked order of MA and contents of x, they are correct. Could you explain the function of "MA$genes$spotrep <- NULL" code here? Thanks a lot, Weiyin -----Original Message----- From: Jenny Drnevich [mailto:drnevich@uiuc.edu] Sent: Thursday, December 07, 2006 3:52 PM To: Weiyin Zhou; Vinoy Kumar Ramachandran Cc: bioconductor at stat.math.ethz.ch Subject: RE: [BioC] Normalization between arrays for common reference, time course and direct two color designs Hi Weiyin, Sorry - the object name in the code is arbitrary, so 'MA.norm' is a MAList object with your data in it. Besides changing $ID to $ProbeName as you did below, you need to change 'MA.norm' to the name of your MAList. I probably should have specifically said something like: "if your normalized data is in a MAList object named 'MA.norm', and your spot ID names are found in MA.norm$genes$ID, then this code should work." Note that this code does not average duplicate spots. Instead, it arranges them with spacing =1 so you can use the 'duplicateCorrelation' function before lmFit, which is better than averaging the spots. See the Within-Array replicate spot section of the limma vignette for an example of how to do this. Cheers, Jenny At 01:33 PM 12/7/2006, Weiyin Zhou wrote: >Hi Jenny, > >I have related problem with Agilent two-color array. All of the spots >are duplicated twice (have same "ProbeName", except those positive and >negative controls, which are duplicated multiple times. Column >"ControlType" can identify their type. I use limma package to input >data (ProcessedSignal, which is already background corrected and loess >normalized), then I did between array quantile normalization. > >Before I do lmFit and differential expression analysis, I think I should >remove those control spots and also average duplicated spots. So I can >have p value for each unique ProbeName. I just tried your code, But get >error massage. > > > MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),] >Error: object "MA.norm" not found > > >Could you give me some advice? > >Thanks in advance, > >Weiyin Zhou >Statistics and Data Analyst >ExonHit Therapeutics, Inc. >217 Perry Parkway, Building # 5 >Gaithersburg, MD 20877 > >email: Weiyin.zhou at exonhit-usa.com >phone: 240.404.0184 >fax: 240.683.7060 > > > >-----Original Message----- >From: bioconductor-bounces at stat.math.ethz.ch >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny >Drnevich >Sent: Thursday, December 07, 2006 12:17 PM >To: Vinoy Kumar Ramachandran >Cc: bioconductor at stat.math.ethz.ch >Subject: Re: [BioC] Normalization between arrays for common reference, >time course and direct two color designs > >Hi Vinoy, > >It's better to keep the discussions on the list for other users that may > >have the same question. If they are not evenly spaced, after the >normalizations you can rearrange the MA object so that they are evenly >spaced, at least the 90% that are spotted twice. The ones that are >spotted >26 times are likely some sort of control spots, and you can probably >safely >ignore them. Why are some spotted three times? If you want to keep these > >genes in, a quick-and-dirty solution would be to just pick two of the >three >spots. The following code *should* work to rearrange the order of the >genes, then pick out the first two spots for each unique ID. > >MA.norm <- MA.norm[order(MA.norm$genes$ID),] > >x <- unique(MA.norm$genes$ID) > >MA.norm$genes$spotrep <- NULL > ># I'm sure there's a better, faster way to do the following, but this is > >the only way I know how: > >for (i in 1:length(x)) { > y <- which( MA.norm$genes$ID == x[i] ) > MA.norm$genes$spotrep[y] <- 1:length(y) > } > >MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ] ># now your spacing=1 and ndups=2 > >HTH, >Jenny > > > > >At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote: > >Hi Jenny, > > > >Thanks a lot for the valuable information. I will try to do loess first > > >and tehn doa scale if necessary. With regarding the correlation in the > >LmFit, my the spots in the array are not evenly spaced and not evenly > >replicated, 90% spots are spotted twice, 8% are thrice and 2% spots are > > >spotted 26 times.I found this code in a posting in the Limma user forum > > >and try to adapt the code to my data. Is there any other elegant way to > > >deal with this kind of replication ? > > > >once again thanks for the information > > > >with regards, > >vinoy > >On 12/7/06, Jenny Drnevich ><<mailto:drnevich at="" uiuc.edu="">drnevich at uiuc.edu> > >wrote: > >Hi Vinoy, > > > >Using the 'Gquantile' between-array normalization is not appropriate in > >your case because your reference is not always in the Green channel. >The > >values you are using for Exp3 and Exp6 in the linear model are actually > >from the reference, so it's no wonder your gene lists don't make sense. >To > >clarify, the discussion we were having recently on the mailing list >about > >using Gquantile is when your experimental samples are expected to be >VERY > >different from the reference, such that the assumption of a >within-array > >normalization may not be met. In your case (and in most reference >designs) > >you probably meet the assumptions of most genes not changing, and so >should > >first do a within-array loess-type normalization to help remove dye >bias. > >Then check to see if the resulting distributions of M values are >similar > >between arrays. If they are very different, and you would expect them >not > >to be very different, do a between-array normalization on the M values >- > >the scale method of 'normalizeBetweenArrays' is my favorite. The design > >matrix you have below will correctly adjust for dye swaps, assuming >that > >the 'dye swaps' are all biological replicates and not technical >replicates. > > > >I'm a little confused about the way you're calling the 'lmFit' >function. > >Your arrays appear to have duplicate spots, but you have the >correlation as > >zero. Something is very wrong with your arrays if there is zero >correlation > >between the duplicate spots! I suggested you read the limma vignette >very > >closely, especially the sections on common reference designs and > >within-array replicate spots. > > > >Good luck, > >Jenny > > > >At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote: > > > Dear Limma users, > > > > > >I am working on custom spotted 70mer oligo arrays, and use Bluefuse >to > > >analyse the images. With the help of the excellent user guide and > > >Bioconductor user forum(GMANE), i have analysed my direct comparison > > >experiements. I also have common reference, time course and direct >two color > > >design type experiments to analyse. I have read the recent posting in >the > > >list about using Rquantile or Gquantile for normalizing between >arrays in > > >common reference experiments. I tried to do a common references >analysis > > >using the discussed code.But the resulting gene list is different >from the > > >expected list.i am also wondering how to account for dye swaps. I >have > > >pasted the code which i used for common reference. > > > > > >It will also be very useful if you any one could tell me how to use > > >normalization between arrays for direct two color designs. > > > > > >My experiment design is > > > Cy3 Cy5 > > >____________________ > > >Exp1 Ref CpdA > > >Exp2 Ref CpdA > > >Exp3 CpdA Ref > > > > > >Exp4 Ref CpdB > > >Exp5 Ref CpdB > > >Exp6 CpdB Ref > > > > > >Code which i used for analysing common referencec: > > > >--------------------------------------------------------------------- -- >-- > > ------------------------------------------------ > > >library(limma) > > >targets <- readTargets("commonref.txt", row.names= "Name") > > >RG <- read.maimages(targets$FileName, source="bluefuse") > > >RG$genes <- readGAL() > > >RG$printer <- getLayout(RG$genes) > > >spottypes <- readSpotTypes() > > >RG$genes$Status <- controlStatus(spottypes, RG) > > >isGene <- RG$genes$Status == "oligos" > > >MA.Gquantile <- normalizeBetweenArrays(RG[isGene,], >method="Gquantile") > > >RG.Gquantile <- RG.MA(MA.Gquantile) > > >MA.dummy <- MA.Gquantile > > >MA.dummy$M <- log2(RG.Gquantile$R) > > >o <- order(MA.dummy$genes$ID) > > >MA.sorted <- MA.dummy[o,] > > >design <- modelMatrix(targets, ref="Ref") > > >fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0) > > >fit.eb <- eBayes(fit) > > >write.fit(fit.eb, file="data/commonref.xls", adjust="BH") > > > >--------------------------------------------------------------------- -- >-- > > -------------------------------------------------------- > > > > > >thanks in advacne > > > > > >with regards, > > >Vinoy...... > > > > > > [[alternative HTML version deleted]] > > > > > >_______________________________________________ > > >Bioconductor mailing list > > ><mailto:bioconductor at="" stat.math.ethz.ch="">Bioconductor at stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > > ><http: news.gmane.org="" gmane.science.biology.informatics.conductor="">ht tp >:/ > > /news.gmane.org/gmane.science.biology.informatics.conductor > > > >Jenny Drnevich, Ph.D. > > > >Functional Genomics Bioinformatics Specialist > >W.M. Keck Center for Comparative and Functional Genomics > >Roy J. Carver Biotechnology Center > >University of Illinois, Urbana-Champaign > > > >330 ERML > >1201 W. Gregory Dr. > >Urbana, IL 61801 > >USA > > > >ph: 217-244-7355 > >fax: 217-265-5066 > >e-mail: <mailto:drnevich at="" uiuc.edu="">drnevich at uiuc.edu > > > > > > > > > >-- > >Vinoy...... > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: drnevich at uiuc.edu > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 19.1 years ago Weiyin Zhou ▴ 220

Login before adding your answer.