Question: Question on afy/gcrma probe indexes
0
15.3 years ago by
Rich Haney20
Rich Haney20 wrote:
I am using gcrma with the HG-133A dataset. When I ask for the location ( index ) of the first probe I get: (1.) Probe = 1007_s_at1 Index = 129340 [ The probe is at (x,y) =(467,181) ] As I understand it, the probe position is found using the affy routine 'xy2i'. There, the logic for finding a position from x and y is 0-based for y and 1-based for x. So: (2.) Index = x + nrows * ( y - 1 ) with nrows = 712 and, as above, x=467 and y=181 Index = 467 + 712 * ( 181 - 1 ) = 128627 ( that is, 712 + 1 less than answer given above, 129340 ). So the question is, in affy, is the Index of probes stored with 1-based ( not 0-based ) y- coordinates, while xy2i assumes 0-based coordinates? Thanks for your help! ---------------------------------------------------------------------- ------ - Notes: (a.) I believe that this is why my background adjustment is then not correct: bg.adjust.optical <- function(abatch,minimum=1,verbose=TRUE){ Index <- unlist(indexProbes(abatch,"both")) if(verbose) cat("Adjusting for optical effect") for(i in 1:length(abatch)){ if(verbose) cat(".") exprs(abatch)[Index,i] <- exprs(abatch)[Index,i] - min(exprs(abatch)[Index,i],na.rm=TRUE) + minimum } (b.) The probe index is created using the following lines of gcrma: ##put it in an affybatch tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) affinity.info <- new("AffyBatch",cdfName=cdfname) pmIndex <- unlistindexProbesaffinity.info,"pm")) mmIndex <- unlistindexProbesaffinity.info,"mm")) subIndex <- match(tmp(p$x,p$y),pmIndex)
probe affy gcrma • 652 views
modified 15.3 years ago by Park, Richard220 • written 15.3 years ago by Rich Haney20
Answer: Question on afy/gcrma probe indexes
0
15.3 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Hi Rich, Affymetrix uses counting of the x- and y-coordinates that starts at 0, and so do the probe packages and the functions xy2i and i2xy from the CDF packages. For a historic reason, there is code around in the affy package that uses coordinates that are incremented by 1. In AffyBatch objects, x- and y-coordinates are not stored at all: the data is stored in a matrix, where columns correspond to different arrays and rows to all probes within one array. x and y coordinates can be reconstructed from the row index, e.g. by the function i2xy. Otherwise, can you please be more specific? Which commands do you use to get (1.), which to get (2.), and what do you mean by "in affy" (which function, or which object)? Hope that helps, Wolfgang Rich Haney wrote: > I am using gcrma with the HG-133A dataset. When I ask for the location ( > index ) of the first probe I get: > > (1.) Probe = 1007_s_at1 > > Index = 129340 [ The probe is at (x,y) =(467,181) ] > > As I understand it, the probe position is found using the affy routine > 'xy2i'. There, the logic for finding a position from x and y is 0-based for > y and 1-based for x. So: > > (2.) Index = x + nrows * ( y - 1 ) with nrows = 712 and, as above, x=467 > and y=181 > > Index = 467 + 712 * ( 181 - 1 ) > = 128627 ( that is, 712 + 1 less than answer given above, 129340 > ). > > So the question is, in affy, is the Index of probes stored with 1-based ( > not 0-based ) y- coordinates, while xy2i assumes 0-based coordinates? > > Thanks for your help! > > -------------------------------------------------------------------- -------- > - > > Notes: > > (a.) I believe that this is why my background adjustment is then not > correct: > > bg.adjust.optical <- function(abatch,minimum=1,verbose=TRUE){ > Index <- unlist(indexProbes(abatch,"both")) > > if(verbose) cat("Adjusting for optical effect") > for(i in 1:length(abatch)){ > if(verbose) cat(".") > exprs(abatch)[Index,i] <- exprs(abatch)[Index,i] - > min(exprs(abatch)[Index,i],na.rm=TRUE) + minimum > } > > > (b.) The probe index is created using the following lines of gcrma: > > ##put it in an affybatch > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > affinity.info <- new("AffyBatch",cdfName=cdfname) > pmIndex <- unlist(indexProbesaffinity.info,"pm")) > mmIndex <- unlist(indexProbesaffinity.info,"mm")) > subIndex <- match(tmp(p$x,p$y),pmIndex) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/abt0840/whuber
Answer: Question on afy/gcrma probe indexes
0
15.3 years ago by
Park, Richard220
Park, Richard220 wrote:
Hello, I have tried to access the x and y coordinates using xy2i() and i2xy() functions. I would be very cautious about the values you get from these functions. I tried creating a fake .cel file using these functions and the result was never fully correct. I eventually had to download some library file from the affymetrix site that had a full list of each x and y value for each probe set. I am unsure where these files lie on the affymetrix site, since they have undergone a significant revision of their site. But probably on average those functions gave me 30-40 percent correct x and y positions. The only way I was able to get a functional fake .cel file was to use the x and y positions given out by affymetrix. hth, richard Park -----Original Message----- From: Wolfgang Huber [mailto:w.huber@dkfz-heidelberg.de] Sent: Wednesday, April 28, 2004 7:22 AM To: rphaney@bigfoot.com Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Question on afy/gcrma probe indexes Hi Rich, Affymetrix uses counting of the x- and y-coordinates that starts at 0, and so do the probe packages and the functions xy2i and i2xy from the CDF packages. For a historic reason, there is code around in the affy package that uses coordinates that are incremented by 1. In AffyBatch objects, x- and y-coordinates are not stored at all: the data is stored in a matrix, where columns correspond to different arrays and rows to all probes within one array. x and y coordinates can be reconstructed from the row index, e.g. by the function i2xy. Otherwise, can you please be more specific? Which commands do you use to get (1.), which to get (2.), and what do you mean by "in affy" (which function, or which object)? Hope that helps, Wolfgang Rich Haney wrote: > I am using gcrma with the HG-133A dataset. When I ask for the location ( > index ) of the first probe I get: > > (1.) Probe = 1007_s_at1 > > Index = 129340 [ The probe is at (x,y) =(467,181) ] > > As I understand it, the probe position is found using the affy routine > 'xy2i'. There, the logic for finding a position from x and y is 0-based for > y and 1-based for x. So: > > (2.) Index = x + nrows * ( y - 1 ) with nrows = 712 and, as above, x=467 > and y=181 > > Index = 467 + 712 * ( 181 - 1 ) > = 128627 ( that is, 712 + 1 less than answer given above, 129340 > ). > > So the question is, in affy, is the Index of probes stored with 1-based ( > not 0-based ) y- coordinates, while xy2i assumes 0-based coordinates? > > Thanks for your help! > > -------------------------------------------------------------------- -------- > - > > Notes: > > (a.) I believe that this is why my background adjustment is then not > correct: > > bg.adjust.optical <- function(abatch,minimum=1,verbose=TRUE){ > Index <- unlist(indexProbes(abatch,"both")) > > if(verbose) cat("Adjusting for optical effect") > for(i in 1:length(abatch)){ > if(verbose) cat(".") > exprs(abatch)[Index,i] <- exprs(abatch)[Index,i] - > min(exprs(abatch)[Index,i],na.rm=TRUE) + minimum > } > > > (b.) The probe index is created using the following lines of gcrma: > > ##put it in an affybatch > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > affinity.info <- new("AffyBatch",cdfName=cdfname) > pmIndex <- unlist(indexProbesaffinity.info,"pm")) > mmIndex <- unlist(indexProbesaffinity.info,"mm")) > subIndex <- match(tmp(p$x,p$y),pmIndex) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/abt0840/whuber _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Thanks very much for the notes. Here is one simple way to reproduce what I see: ------------------------------------------------------------------- Library(gcrma); [ Use the library H133 = ReadAffy(); [ Could my H133 data possibly be "old" ????? ] Debug(bg.adjust.optical); [ be able to step through one of the routines, bg.adjust.optical] Gcrma(H133); [ runs gcrma ] ... [ runs until it enters bg.adjust.optical. ] N [ go to next line ] Index[1] [ ask for data from the first probe] -------------------------------------------------------------------- *****It is entering Index[1] that gives the surprising answer I show on the first note*** "Index[1]" gives data from "indexProbes". Index[1] gives: (1.) Probe at: 1007_s_at1 129340 Probe 1007_s_at1 is at (x,y) =(467,181) according to the CDF file I have. Data from the actual CDF data is as follows: [Unit1100_Block1] Name=1007_s_at ... CellHeader=X Y PROBE FEAT QUAL EXPOS POS CBASE PBASE TBASE ATOM INDEX CODONIND CODON REGIONTYPE REGION Cell1=467 181 N control 1007_s_at 0 13 G C G 0 129339 -1 -1 99 Cell2=467 182 N control 1007_s_at 0 13 G G G 0 130051 -1 -1 99 ... This CDF File is from https://www.affymetrix.com/support/technical/libraryfilesmain.affx, using the link for the 8th Catalog array, that is, http://www.affymetrix.com/Auth/support/downloads/library_files/hgu133_ librar yfile.zip. But, according to x2yi: (2.) Index = x + nrows * ( y - 1 ) with nrows = 712 and, as above, x=467 and y=181 So, Index = 467 + 712 * ( 181 - 1 ) = 128627 ( that is, 712 + 1 less than answer given above, 129340 ). Simple answer(?) --------------- So I think the simple answer is just that "x2yi" type indexing is not used to get the data found in "indexProbes" and ( in bg.optical.correct ) index. However, somehow(?) x2yi still can work to match up probe positions, as is done in compute.affinities: ##put it in an affybatch tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) affinity.info <- new("AffyBatch",cdfName=cdfname) pmIndex <- unlist(indexProbesaffinity.info,"pm")) mmIndex <- unlist(indexProbesaffinity.info,"mm")) subIndex <- match(tmp(p$x,p$y),pmIndex) tmp.exprs=matrix(NA,nrow=max(cbind(pmIndex,mmIndex)),ncol=1) tmp.exprs[pmIndex[subIndex]]=apm if(!is.null(amm)){ tmp.exprs[mmIndex[subIndex]]=amm } exprsaffinity.info)=tmp.exprs returnaffinity.info) It is the "somehow(?) x2yi still can work" that I haven't yet seen the answer to. Best regards, Rich -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Wolfgang Huber Sent: Wednesday, April 28, 2004 5:10 PM Cc: rphaney@bigfoot.com; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Question on afy/gcrma probe indexes Hi Richard, can you please provide details? An example for when a call to the these functions produces a wrong result? So if you are right, we can repair them? Note that these functions are really simple - e.g. for a hgu133plus2 chip, they are > xy2i function (x, y) { y * 1164 + x + 1 } > i2xy function (i) { r = cbind((i - 1)%%1164, (i - 1)%/%1164) colnames(r) = c("x", "y") return(r) } Best wishes Wolfgang Park, Richard wrote: > Hello, > > I have tried to access the x and y coordinates using xy2i() and i2xy() functions. I would be very cautious about the values you get from these functions. I tried creating a fake .cel file using these functions and the result was never fully correct. > > I eventually had to download some library file from the affymetrix site that had a full list of each x and y value for each probe set. I am unsure where these files lie on the affymetrix site, since they have undergone a significant revision of their site. But probably on average those functions gave me 30-40 percent correct x and y positions. The only way I was able to get a functional fake .cel file was to use the x and y positions given out by affymetrix. > > hth, > richard Park > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor