duplicate correlation on Agilent 4x44 arrays

0

Entering edit mode

Mitch Levesque ▴ 20

@mitch-levesque-2102

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070409/ 71d34a3d/attachment.pl

• 983 views

ADD COMMENT • link updated 18.8 years ago by Weiyin Zhou ▴ 220 • written 18.9 years ago by Mitch Levesque ▴ 20

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

Dear Mitch, The data.frame RG$genes must have the same number of rows as the intensity data RG$R, RG$G etc because it is intended to provide probe annotation information corresponding to each row of the data. Your GAL file doesn't satisfy this requirement, because it has massively more rows than the data. I am not familiar with the 4 x 44 technology, so I don't know what format the data files are written to. From the information you give, I am guessing that read.maimages() has read the data from only one of the 4 blocks of data, or else that the 4 blocks of the 4x44 format have been read into separate columns of the data. I am guessing that the GAL file has the probes for all four blocks. Furthermore, some of the spots, perhaps empty probes, have been ommitted from your data files. This seems to have been done in an uneven way, because your data doesn't have an even number of rows. Considering this, you can hardly expect to use your data with duplicateCorrelation(). If you have hybridised different RNA samples to the four different blocks (you don't say), then read.maimages() looks correct and your GAL file is incorrect. I'm only guessing because I'm not familiar with the 4 x 44 format. You need to do some trouble-shooting of these issues at your end. Best wishes Gordon On Tue, April 10, 2007 10:07 pm, Mitch Levesque wrote: > Gordon, > > Thanks for the reply. I am not using any particular instruction set, just > what I have put together from the User Guide. > > You were right about the file dimensions, they are different: > >> dim(RG) > [1] 44407 4 >> gal <- readGAL() >> dim(gal) > [1] 180880 10 > > Is it possible to read the duplicate positions directly off of the gal file? > I tried: > > layout <- getLayout(gal, guessdups=TRUE) > > and I get the following: > > $ngrid.r > [1] 1 > > $ngrid.c > [1] 4 > > $nspot.r > [1] 170 > > $nspot.c > [1] 266 > > $ndups > [1] 8 > > $spacing > [1] NA > > attr(,"class") > [1] "PrintLayout" > > > I haven't tried without the normexp, but I will test it. Thanks again. > > Mitch > > > > -----Original Message----- > From: Gordon Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Tuesday, April 10, 2007 1:03 PM > To: Mitch Levesque > Cc: bioconductor at stat.math.ethz.ch > Subject: [BioC] duplicate correlation on Agilent 4x44 arrays > > Dear Mitch, > > You don't say what instructions you are trying to follow here. I > think you may be trying to use code which was intended for other data > sets. I suspect that there may be more than one problem. > > Firstly, why do you need to use readGAL()? This is only needed with > SPOT data. Your RG object from read.maimages() will already contain > annotation information from the Agilent output files. Look at > > names(RG$genes) > > to see what you have. > > Secondly, does your GAL file match your data files? Type > > dim(RG) > > and > > gal <- readGAL() > dim(gal) > > Do the row numbers agree? I am guessing they may have different > numbers of rows. > > BTW, do you need to use "normexp"? I've found the AgilentFE > background estimator is already pretty good, and doesn't produce > negative intensities anyway. > > Best wishes > Gordon > >>Date: Mon, 9 Apr 2007 12:21:57 +0200 >>From: "Mitch Levesque" <mitch.levesque at="" tuebingen.mpg.de=""> >>Subject: [BioC] duplicate correlation on Agilent 4x44 arrays >>To: <bioconductor at="" stat.math.ethz.ch=""> >> >>Hi Bioconductors, >> >>I am using R 2.4.1 and limma to analyze the new Agilent 4x44 array design >>and am having trouble with the duplicate correlation function using the >>following script: >> >> >>library(limma) >>targets <- readTargets("Targets.txt") >>RG <- read.maimages(targets$FileName, source="agilent") >>RG$genes<-readGAL() >>RG$printer<-getLayout(RG$genes) >>RG <- backgroundCorrect(RG, method="normexp", offset=50) >>MA <- normalizeWithinArrays(RG, method="loess") >>MA <- MA[order(RG$genes[,"ID"]),] >> >>I get the following error: >> >>Error in `[.MAList`(MA, order(RG$genes[, "ID"]), ) : >> subscript out of bounds >> >>I would like to treat the duplicate probes on each array as a technical >>replicate, but since the spacing is not consistent for each gene, I must >>first order the list by reference number. Are there any suggestions about >>how I may do this? >> >>Mitch > > > > >

ADD COMMENT • link 18.8 years ago Gordon Smyth 53k

0

Entering edit mode

Weiyin Zhou ▴ 220

@weiyin-zhou-1970

Last seen 11.4 years ago

Hi Mitch, Try to read Agilent's ProcessedSignal into R session. rProcesssedSignal and rProcesssedSignal are the spatial detrend and global background adjusted, lowess dye-normalized signal with no negative values. For example, > RG <- read.maimages(targets$FileName, columns=list(R="rProcessedSignal", G="gProcessedSignal"), annotation=c("ProbeUID","ControlType","ProbeName","GeneName","Systemat ic Name")) Because ProcessedSignal is already corrected for intensity dye-bias, you don't need do any within array normalization. You can verify it by: > plotMA(RG) You may do between arrays quantile normalization. > MA <- normalizeBetweenArrays(RG, method="quantile") If you have duplicated spots for each gene on each array, you can re-order them first then estimate correlations between them: > MA$genes$Status <- controlStatus(spottypes, MA) > i <- MA$genes$Status =="gene" # MA[i,] will only contain probes define as "gene", no control probes > MA2 <- MA[i,][order(MA[i,]$genes$ProbeName),] > corfit <- duplicateCorrelation(MA2, design, ndups=2) ... Hope this helps, Best wishes, Weiyin Zhou Senior Research Associate ExonHit Therapeutics, Inc. 217 Perry Parkway, Building # 5 Gaithersburg, MD 20877 email: Weiyin.zhou at exonhit-usa.com phone: 240.404.0184 fax: 240.683.7060 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Gordon K Smyth Sent: Tuesday, April 10, 2007 6:55 PM To: Mitch Levesque Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] duplicate correlation on Agilent 4x44 arrays Dear Mitch, The data.frame RG$genes must have the same number of rows as the intensity data RG$R, RG$G etc because it is intended to provide probe annotation information corresponding to each row of the data. Your GAL file doesn't satisfy this requirement, because it has massively more rows than the data. I am not familiar with the 4 x 44 technology, so I don't know what format the data files are written to. From the information you give, I am guessing that read.maimages() has read the data from only one of the 4 blocks of data, or else that the 4 blocks of the 4x44 format have been read into separate columns of the data. I am guessing that the GAL file has the probes for all four blocks. Furthermore, some of the spots, perhaps empty probes, have been ommitted from your data files. This seems to have been done in an uneven way, because your data doesn't have an even number of rows. Considering this, you can hardly expect to use your data with duplicateCorrelation(). If you have hybridised different RNA samples to the four different blocks (you don't say), then read.maimages() looks correct and your GAL file is incorrect. I'm only guessing because I'm not familiar with the 4 x 44 format. You need to do some trouble-shooting of these issues at your end. Best wishes Gordon On Tue, April 10, 2007 10:07 pm, Mitch Levesque wrote: > Gordon, > > Thanks for the reply. I am not using any particular instruction set, just > what I have put together from the User Guide. > > You were right about the file dimensions, they are different: > >> dim(RG) > [1] 44407 4 >> gal <- readGAL() >> dim(gal) > [1] 180880 10 > > Is it possible to read the duplicate positions directly off of the gal file? > I tried: > > layout <- getLayout(gal, guessdups=TRUE) > > and I get the following: > > $ngrid.r > [1] 1 > > $ngrid.c > [1] 4 > > $nspot.r > [1] 170 > > $nspot.c > [1] 266 > > $ndups > [1] 8 > > $spacing > [1] NA > > attr(,"class") > [1] "PrintLayout" > > > I haven't tried without the normexp, but I will test it. Thanks again. > > Mitch > > > > -----Original Message----- > From: Gordon Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Tuesday, April 10, 2007 1:03 PM > To: Mitch Levesque > Cc: bioconductor at stat.math.ethz.ch > Subject: [BioC] duplicate correlation on Agilent 4x44 arrays > > Dear Mitch, > > You don't say what instructions you are trying to follow here. I > think you may be trying to use code which was intended for other data > sets. I suspect that there may be more than one problem. > > Firstly, why do you need to use readGAL()? This is only needed with > SPOT data. Your RG object from read.maimages() will already contain > annotation information from the Agilent output files. Look at > > names(RG$genes) > > to see what you have. > > Secondly, does your GAL file match your data files? Type > > dim(RG) > > and > > gal <- readGAL() > dim(gal) > > Do the row numbers agree? I am guessing they may have different > numbers of rows. > > BTW, do you need to use "normexp"? I've found the AgilentFE > background estimator is already pretty good, and doesn't produce > negative intensities anyway. > > Best wishes > Gordon > >>Date: Mon, 9 Apr 2007 12:21:57 +0200 >>From: "Mitch Levesque" <mitch.levesque at="" tuebingen.mpg.de=""> >>Subject: [BioC] duplicate correlation on Agilent 4x44 arrays >>To: <bioconductor at="" stat.math.ethz.ch=""> >> >>Hi Bioconductors, >> >>I am using R 2.4.1 and limma to analyze the new Agilent 4x44 array design >>and am having trouble with the duplicate correlation function using the >>following script: >> >> >>library(limma) >>targets <- readTargets("Targets.txt") >>RG <- read.maimages(targets$FileName, source="agilent") >>RG$genes<-readGAL() >>RG$printer<-getLayout(RG$genes) >>RG <- backgroundCorrect(RG, method="normexp", offset=50) >>MA <- normalizeWithinArrays(RG, method="loess") >>MA <- MA[order(RG$genes[,"ID"]),] >> >>I get the following error: >> >>Error in `[.MAList`(MA, order(RG$genes[, "ID"]), ) : >> subscript out of bounds >> >>I would like to treat the duplicate probes on each array as a technical >>replicate, but since the spacing is not consistent for each gene, I must >>first order the list by reference number. Are there any suggestions about >>how I may do this? >> >>Mitch > > > > > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 18.8 years ago Weiyin Zhou ▴ 220

Login before adding your answer.