converting Affy indices to x,y coordinates
3
0
Entering edit mode
Todd Allen ▴ 20
@todd-allen-4484
Last seen 10.2 years ago
Hello all, I have been reading the documentation portion of a package called "affyxparser." In the documentation there is a description of the formulas needed to seemlessly convert between Affymetrix probe indices and the cooresponding (x,y) coordinate of individual probes. Copying from the package documentation, the following information is most relevant: 1. index = K * y + x + 1; where K is the number of columns on the chip 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K * y In my own work, I am processing a HGU133Plus 2 CDF file. The array dimensions are (1164, 1164) and if I take the index of a specific probe listed as 1354890, I calculate the coordinates as x = 1157 and y = 1163 using the formulas above. The (x,y) coordinate reported from Affy's own CDF file for this probe is actually x = 1158 (not 1157) and y = 1163. I am struggling to understand this discrepancy between the affyparser documentation and the verbatim output from Affy's own CDF file. Has any run into this situation before? Do you see any obvious problem or explanation as to what is happening. Thank you! Todd A genesplicer28 at yahoo.com
cdf probe convert cdf probe convert • 1.8k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 17 months ago
United States
On Mon, Feb 14, 2011 at 11:18 AM, Todd Allen <genesplicer28 at="" yahoo.com=""> wrote: > In my own work, I am processing a HGU133Plus 2 CDF file. The array dimensions are (1164, 1164) and if I take the index of a specific probe listed as 1354890, I What exactly do you mean by "I take the index of a specific probe listed as 1354890"? Listed where, where do you get this number and how do you know what line in the CDF file corresponds to this probe? Kasper
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 17 months ago
United States
Todd, This may be a bit hard to explain. There are essentially two index numbers: the one used by Bioconductor/aff/affxparser etc. and the one stored in the CDF file. The one stored in the CDF file (which you will never see used in any Bioconductor documentation) is zero-based, whereas the one "we" use is 1-based. Why the discrepancy. Well, I cannot speak for Affymetrix (but I guess this is caused by C using zero-based indixes), but in Bioconductor we use 1-based indexing because if we read an entire CEL file into a vector we want to be able to do vector[INDEX] and indexing is 1-based in R. It is pretty clear the documentation in affxparser is a bit unclear here. If you are really trying to understand the internals, you will - aside from reading the affxparser docs - also have to do a fair amount of experimentation and reading of the affymetrix file format specs. Kasper On Mon, Feb 14, 2011 at 12:52 PM, Todd Allen <genesplicer28 at="" yahoo.com=""> wrote: > Kasper, > > ? Let me clarify. ?I have opened the HGU133 Plus 2 CDF file inside Microsoft notepad, and I can visually see lists of data underneath header information. ?I randomly chose ?the value of > 1354890, which I am confident is an authentic affymetrix index for a single, specific affymetrix probe on the chip because of the descriptive header information that is present. > > Assuming this value is an authentic index, I was hoping to use the formulas in the affyparser documentation to manually calculate the x & y coordinates of the probe on the affy chip. ?As mentioned below, the y coord is coming out correctly, but the x coordinate is off by 1. > > So, I am trying to understand if the problem is with something I am doing wrong, or whether the documented formulas in affyparser are somehow off. > > Todd > > > > > --- On Mon, 2/14/11, Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> wrote: > >> From: Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> >> Subject: Re: [BioC] converting Affy indices to x,y coordinates >> To: "Todd Allen" <genesplicer28 at="" yahoo.com=""> >> Cc: bioconductor at r-project.org >> Date: Monday, February 14, 2011, 12:12 PM >> On Mon, Feb 14, 2011 at 11:18 AM, >> Todd Allen <genesplicer28 at="" yahoo.com=""> >> wrote: >> > In my own work, I am processing a HGU133Plus 2 CDF >> file. The array dimensions are (1164, 1164) and if I take >> the index of a specific probe listed as 1354890, I >> >> What exactly do you mean by "I take the index of a specific >> probe >> listed as 1354890"?? Listed where, where do you get >> this number and >> how do you know what line in the CDF file corresponds to >> this probe? >> >> Kasper >> > > > >
ADD COMMENT
0
Entering edit mode
@mounts-william-4485
Last seen 10.2 years ago
Todd, It would appear that there is an error in affyxparser. Testing a number of cdf files, it appears that index = K * y + x. Bill -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Todd Allen Sent: Monday, February 14, 2011 11:19 AM To: bioconductor at r-project.org Subject: [BioC] converting Affy indices to x,y coordinates Hello all, I have been reading the documentation portion of a package called "affyxparser." In the documentation there is a description of the formulas needed to seemlessly convert between Affymetrix probe indices and the cooresponding (x,y) coordinate of individual probes. Copying from the package documentation, the following information is most relevant: 1. index = K * y + x + 1; where K is the number of columns on the chip 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K * y In my own work, I am processing a HGU133Plus 2 CDF file. The array dimensions are (1164, 1164) and if I take the index of a specific probe listed as 1354890, I calculate the coordinates as x = 1157 and y = 1163 using the formulas above. The (x,y) coordinate reported from Affy's own CDF file for this probe is actually x = 1158 (not 1157) and y = 1163. I am struggling to understand this discrepancy between the affyparser documentation and the verbatim output from Affy's own CDF file. Has any run into this situation before? Do you see any obvious problem or explanation as to what is happening. Thank you! Todd A genesplicer28 at yahoo.com _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On Mon, Feb 14, 2011 at 10:24 AM, Mounts, William <bill.mounts at="" pfizer.com=""> wrote: > Todd, > > It would appear that there is an error in affyxparser. ?Testing a number > of cdf files, it appears that index = K * y + x. I doubt that. Could you please provide complete examples illustrating the problem? Unless proven wrong, I stand firm on the claim that both the implementation and documentation to be correct. As Kasper pointed out, it may be that the documentation is confusing or ambiguous, but that is not to say it's wrong. I am happy to take suggestions on how to improve the documentation. CLARIFICATIONS: 1. The spatial (x,y) cell coordinates are zero-based [1]. This is at least the case if you access them via Affymetrix Fusion SDK, that is, via affxparser. I cannot claim that all CDF files in history have had zero-based (x,y) coordinates, but it does not matter because throught the Fusion SDK they are returned as such. (Anecdotal evidence: Browsing through several of my (ASCII and binary) CDFs, they are indeed zero-based (x,y):s.) 2. A CDF file reference the cells (probes) by their (x,y) coordinates only [2]. 3. It is more convenient to access cells by their linear indices, which is why they are provided. 4. BTW, note also the last comment on that help page [1]: If you use the affxparser methods, you don't have to worry about (x,y) indices; everything is by default done using cell (probe) indices. 5. In R it is more convenient to use one-based indices instead of zero-based indices. This is taken care of by affxparser. 6. The affxparser documentation [1] clearly says that spatial (x,y) cell coordinates are zero-based indices and the linear cell indices are one-based. 7. Do not confuse (Bioconductor) CDF annotation packages/environments with (Affymetrix) CDF *files*; affxparser deals with the latter only. I think Clarification (4) is one of the most important ones. If you stick with affxparser, you are given a well-defined self-contained and consistent access to the content of CEL and CDF files (and some other Affymetrix file types too). REFERENCES: [1] help("2. Cell coordinates and cell indices", package="affxparser") [2] Section 'Affymetrix CDF Data File Format' part of 'File Formats Documentation', Affymetrix, October 2009 (http://www.affymetrix.com/partners_programs/programs/developer/fusion /index.affx?terms=no) /Henrik (wrote most of [1]) > > Bill > > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Todd Allen > Sent: Monday, February 14, 2011 11:19 AM > To: bioconductor at r-project.org > Subject: [BioC] converting Affy indices to x,y coordinates > > Hello all, > > ? I have been reading the documentation portion of a package called > "affyxparser." ?In the documentation there is a description of the > formulas needed to seemlessly convert between Affymetrix probe indices > and the cooresponding (x,y) coordinate of individual probes. > > Copying from the package documentation, the following information is > most relevant: > > 1. index = K * y + x + 1; where K is the number of columns on the chip > 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K * y > > In my own work, I am processing a HGU133Plus 2 CDF file. The array > dimensions are (1164, 1164) and if I take the index of a specific probe > listed as 1354890, I calculate the coordinates as x = 1157 and y = 1163 > using the formulas above. > > The (x,y) coordinate reported from Affy's own CDF file for this probe is > actually x = 1158 (not 1157) and y = 1163. > > I am struggling to understand this discrepancy between the affyparser > documentation and the verbatim output from Affy's own CDF file. ?Has any > run into this situation before? ?Do you see any obvious problem or > explanation as to what is happening. > > Thank you! > Todd A > genesplicer28 at yahoo.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
>From the Affymetrix documentation, the following are available for each cell (probe) in the cdf file. Cell information, repeated for each cell in the block: Atom number - integer X coordinate - unsigned short Y coordinate - unsigned short Index position (relative to sequence for CustomSeq, Genotyping, Copy Number, Polymorphic Marker, and Multichannel Marker units, for Expression units this value is the atom number) - integer Base of probe at substitution position - char Base of target at interrogation position - char Length of probe sequence - unsigned short (only available in version 2 and 3) Physical grouping of probe - unsigned short (only available in version 2 and 3) Index position is provided and examination of various cdf files shows that index = K*y + x. Below, in point 5, you mention that "In R it is more convenient to use one-based indices instead of zero-based indices. This is taken care of by affxparser." Is this where the 1 comes from in the implementation in order to move the index values from 0-based to 1-based? On Mon, Feb 14, 2011 at 10:24 AM, Mounts, William <bill.mounts at="" pfizer.com=""> wrote: > Todd, > > It would appear that there is an error in affyxparser. ?Testing a > number of cdf files, it appears that index = K * y + x. I doubt that. Could you please provide complete examples illustrating the problem? Unless proven wrong, I stand firm on the claim that both the implementation and documentation to be correct. As Kasper pointed out, it may be that the documentation is confusing or ambiguous, but that is not to say it's wrong. I am happy to take suggestions on how to improve the documentation. CLARIFICATIONS: 1. The spatial (x,y) cell coordinates are zero-based [1]. This is at least the case if you access them via Affymetrix Fusion SDK, that is, via affxparser. I cannot claim that all CDF files in history have had zero-based (x,y) coordinates, but it does not matter because throught the Fusion SDK they are returned as such. (Anecdotal evidence: Browsing through several of my (ASCII and binary) CDFs, they are indeed zero-based (x,y):s.) 2. A CDF file reference the cells (probes) by their (x,y) coordinates only [2]. 3. It is more convenient to access cells by their linear indices, which is why they are provided. 4. BTW, note also the last comment on that help page [1]: If you use the affxparser methods, you don't have to worry about (x,y) indices; everything is by default done using cell (probe) indices. 5. In R it is more convenient to use one-based indices instead of zero-based indices. This is taken care of by affxparser. 6. The affxparser documentation [1] clearly says that spatial (x,y) cell coordinates are zero-based indices and the linear cell indices are one-based. 7. Do not confuse (Bioconductor) CDF annotation packages/environments with (Affymetrix) CDF *files*; affxparser deals with the latter only. I think Clarification (4) is one of the most important ones. If you stick with affxparser, you are given a well-defined self-contained and consistent access to the content of CEL and CDF files (and some other Affymetrix file types too). REFERENCES: [1] help("2. Cell coordinates and cell indices", package="affxparser") [2] Section 'Affymetrix CDF Data File Format' part of 'File Formats Documentation', Affymetrix, October 2009 (http://www.affymetrix.com/partners_programs/programs/developer/fusion /index.affx?terms=no) /Henrik (wrote most of [1]) > > Bill > > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Todd Allen > Sent: Monday, February 14, 2011 11:19 AM > To: bioconductor at r-project.org > Subject: [BioC] converting Affy indices to x,y coordinates > > Hello all, > > ? I have been reading the documentation portion of a package called > "affyxparser." ?In the documentation there is a description of the > formulas needed to seemlessly convert between Affymetrix probe indices > and the cooresponding (x,y) coordinate of individual probes. > > Copying from the package documentation, the following information is > most relevant: > > 1. index = K * y + x + 1; where K is the number of columns on the chip > 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K * y > > In my own work, I am processing a HGU133Plus 2 CDF file. The array > dimensions are (1164, 1164) and if I take the index of a specific > probe listed as 1354890, I calculate the coordinates as x = 1157 and y > = 1163 using the formulas above. > > The (x,y) coordinate reported from Affy's own CDF file for this probe > is actually x = 1158 (not 1157) and y = 1163. > > I am struggling to understand this discrepancy between the affyparser > documentation and the verbatim output from Affy's own CDF file. ?Has > any run into this situation before? ?Do you see any obvious problem or > explanation as to what is happening. > > Thank you! > Todd A > genesplicer28 at yahoo.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi. On Tue, Feb 15, 2011 at 6:21 PM, Mounts, William <bill.mounts at="" pfizer.com=""> wrote: > From the Affymetrix documentation, the following are available for each cell (probe) in the cdf file. > > Cell information, repeated for each cell in the block: > > Atom number - integer > X coordinate - unsigned short > Y coordinate - unsigned short > Index position (relative to sequence for CustomSeq, Genotyping, Copy Number, Polymorphic Marker, and Multichannel Marker units, for Expression units this value is the atom number) - integer > Base of probe at substitution position - char > Base of target at interrogation position - char > Length of probe sequence - unsigned short (only available in version 2 and 3) > Physical grouping of probe - unsigned short (only available in version 2 and 3) > > Index position is provided and examination of various cdf files shows that index = K*y + x. Thanks for pointing this out. You are correct that CDF files (only) also contain and "index" field. You are also correct that this redundant CDF "index" field seems to be zero-based (at least the ASCII CDF files I've checked). I've check the code, and it is the case that affxparser completely ignores this (because it is redundant) and operates only via the (x,y) coordinates. Indeed, none of the methods in affxparser for reading CDF files allows you to read the "index" values. Since it is tedious to address cells by spatial (x,y) coordinates, linear indices are used instead. The convention in affxparser is to use one-based indices, which we call "cell indices" as described in [1]. All affxparser methods reading CDF files returns the one-based "cell indices" as calculated from the (x,y) coordinates (never the above internal CDF "index" field). FYI, this made me go back to old email communication I had with other affxparser authors back in 2006. I forgot, but we then actually discussed the above and eventually decided that the convention should be one-based. Early versions of affxparser did indeed use zero-based indices (still calculated from (x,y) though). Using zero-based indices would be much(!) more error prone in R. From affxparser's NEWS file: Version: 1.3.2 [2006-03-28] o All cell and unit indices are now starting from one and not from zero. This change requires that all code that have been using a previous version of this package have to be updated! > Below, in point 5, you mention that "In R it is more convenient to use one-based indices instead of zero-based indices. ?This is taken care of by affxparser." ?Is this where the 1 comes from in the implementation in order to move the index values from 0-based to 1-based? Correct. In order to improve the affxparser documentation, I have added the following section to the end of [1]: \section{Note on the zero-based "index" field of Affymetrix CDF files}{ An Affymetrix CDF file provides information on which cells should be grouped together. To identify these groups of cells, the cells are specified by their (x,y) coordinates, which are stored as zero-based coordinates in the CDF file. All methods of the \pkg{affxparser} package make use of these (x,y) coordinates, and some methods makes it possible to read them as well. However, it is much more common that the methods return cell indices \emph{calculated} from the (x,y) coordinates as explained above. In order to conveniently work with cell indices in \R, the convention in \emph{affxparser} is to use \emph{one-based} indices. Hence the addition (and subtraction) of 1:s in the above equations. This is all taken care of by \pkg{affxparser}. Note that, in addition to (x,y) coordinates, a CDF file also contains a one-based "index" for each cell. This "index" is redundant to the (x,y) coordinate and can be calculated analogously to the above \emph{cell index} while leaving out the addition (subtration) of 1:s. Importantly, since this "index" is redundant (and exists only in CDF files), we have decided to treat this field as an internal field. Methods of \pkg{affxparser} do neither provide access to nor make use of this internal field. } Note that the other paragraphs on this help page should not need to be updated. Note that nowhere else in this page are we talking about the content of a CDF. I have also, where applicable, made it explicit in the help pages of methods reading CDF files that the "cell indices" are one-based. To those help pages I have also added a short section: \section{Cell indices are one-based}{ Note that in \pkg{affxparser} all \emph{cell indices} are by convention \emph{one-based}, which is more convenient to work with in \R. For more details on one-based indices, see \code{\link{2. Cell coordinates and cell indices}}. } I hope this will clarify things. Any further feedback is appreciated. Thanks for you help Henrik > > On Mon, Feb 14, 2011 at 10:24 AM, Mounts, William <bill.mounts at="" pfizer.com=""> wrote: >> Todd, >> >> It would appear that there is an error in affyxparser. ?Testing a >> number of cdf files, it appears that index = K * y + x. > > I doubt that. ?Could you please provide complete examples illustrating the problem? ?Unless proven wrong, I stand firm on the claim that both the implementation and documentation to be correct. ?As Kasper pointed out, it may be that the documentation is confusing or ambiguous, but that is not to say it's wrong. ?I am happy to take suggestions on how to improve the documentation. > > > CLARIFICATIONS: > > 1. The spatial (x,y) cell coordinates are zero-based [1]. ?This is at least the case if you access them via Affymetrix Fusion SDK, that is, > via affxparser. ? I cannot claim that all CDF files in history have > had zero-based (x,y) coordinates, but it does not matter because throught the Fusion SDK they are returned as such. ?(Anecdotal > evidence: Browsing through several of my (ASCII and binary) CDFs, they are indeed zero-based (x,y):s.) > > 2. A CDF file reference the cells (probes) by their (x,y) coordinates only [2]. > > 3. It is more convenient to access cells by their linear indices, which is why they are provided. > > 4. BTW, note also the last comment on that help page [1]: If you use the affxparser methods, you don't have to worry about (x,y) indices; everything is by default done using cell (probe) indices. > > 5. In R it is more convenient to use one-based indices instead of zero-based indices. ?This is taken care of by affxparser. > > 6. The affxparser documentation [1] clearly says that spatial (x,y) cell coordinates are zero-based indices and the linear cell indices are one-based. > > 7. Do not confuse (Bioconductor) CDF annotation packages/environments with (Affymetrix) CDF *files*; affxparser deals with the latter only. > > > I think Clarification (4) is one of the most important ones. ?If you stick with affxparser, you are given a well-defined self-contained and consistent access to the content of CEL and CDF files (and some other Affymetrix file types too). > > > REFERENCES: > [1] help("2. Cell coordinates and cell indices", package="affxparser") > > [2] Section 'Affymetrix CDF Data File Format' part of 'File Formats Documentation', Affymetrix, October 2009 > (http://www.affymetrix.com/partners_programs/programs/developer/fusi on/index.affx?terms=no) > > > /Henrik > (wrote most of [1]) > >> >> Bill >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Todd Allen >> Sent: Monday, February 14, 2011 11:19 AM >> To: bioconductor at r-project.org >> Subject: [BioC] converting Affy indices to x,y coordinates >> >> Hello all, >> >> ? I have been reading the documentation portion of a package called >> "affyxparser." ?In the documentation there is a description of the >> formulas needed to seemlessly convert between Affymetrix probe indices >> and the cooresponding (x,y) coordinate of individual probes. >> >> Copying from the package documentation, the following information is >> most relevant: >> >> 1. index = K * y + x + 1; where K is the number of columns on the chip >> 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K * y >> >> In my own work, I am processing a HGU133Plus 2 CDF file. The array >> dimensions are (1164, 1164) and if I take the index of a specific >> probe listed as 1354890, I calculate the coordinates as x = 1157 and y >> = 1163 using the formulas above. >> >> The (x,y) coordinate reported from Affy's own CDF file for this probe >> is actually x = 1158 (not 1157) and y = 1163. >> >> I am struggling to understand this discrepancy between the affyparser >> documentation and the verbatim output from Affy's own CDF file. ?Has >> any run into this situation before? ?Do you see any obvious problem or >> explanation as to what is happening. >> >> Thank you! >> Todd A >> genesplicer28 at yahoo.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY
0
Entering edit mode
Hi everyone, Thank you for an "enlightening" conversation. Let me describe my own motivation for asking the original question, and that may reveal the cause of confusion. I am currently writing a tutorial on how to process Affymetrix data using Mathematica. Part of my tutorial covers Affy CDF files, but discovering the conversion between Affy x,y coordinates and indices in Affy CDF files has been rather cumbersome. When I came across the excellent affxparser package, I thought I had discovered what I needed, only to then be confused by the zero or one-based index "issue". Sorry for the "stress", but I do believe there is real value in describing how Affymetrix handles the data verses how an independent package like affxparser handles the data. As a teacher myself, I like to error on the side of "explain too much." Thank you to all and I'm sorry if I caused undue concern. Todd --- On Wed, 2/16/11, Henrik Bengtsson <hb at="" biostat.ucsf.edu=""> wrote: > From: Henrik Bengtsson <hb at="" biostat.ucsf.edu=""> > Subject: Re: [BioC] converting Affy indices to x,y coordinates > To: "Mounts, William" <bill.mounts at="" pfizer.com=""> > Cc: "Todd Allen" <genesplicer28 at="" yahoo.com="">, "bioconductor" <bioconductor at="" r-project.org=""> > Date: Wednesday, February 16, 2011, 2:57 AM > Hi. > > On Tue, Feb 15, 2011 at 6:21 PM, Mounts, William <bill.mounts at="" pfizer.com=""> > wrote: > > From the Affymetrix documentation, the following are > available for each cell (probe) in the cdf file. > > > > Cell information, repeated for each cell in the > block: > > > > Atom number - integer > > X coordinate - unsigned short > > Y coordinate - unsigned short > > Index position (relative to sequence for CustomSeq, > Genotyping, Copy Number, Polymorphic Marker, and > Multichannel Marker units, for Expression units this value > is the atom number) - integer > > Base of probe at substitution position - char > > Base of target at interrogation position - char > > Length of probe sequence - unsigned short (only > available in version 2 and 3) > > Physical grouping of probe - unsigned short (only > available in version 2 and 3) > > > > Index position is provided and examination of various > cdf files shows that index = K*y + x. > > Thanks for pointing this out.? You are correct that > CDF files (only) > also contain and "index" field.? You are also correct > that this > redundant CDF "index" field seems to be zero-based (at > least the ASCII > CDF files I've checked).? I've check the code, and it > is the case that > affxparser completely ignores this (because it is > redundant) and > operates only via the (x,y) coordinates.? Indeed, none > of the methods > in affxparser for reading CDF files allows you to read the > "index" > values. > > Since it is tedious to address cells by spatial (x,y) > coordinates, > linear indices are used instead. The convention in > affxparser is to > use one-based indices, which we call "cell indices" as > described in > [1].? All affxparser methods reading CDF files returns > the one-based > "cell indices" as calculated from the (x,y) coordinates > (never the > above internal CDF "index" field). > > FYI, this made me go back to old email communication I had > with other > affxparser authors back in 2006.? I forgot, but we > then actually > discussed the above and eventually decided that the > convention should > be one-based.? Early versions of affxparser did indeed > use zero-based > indices (still calculated from (x,y) though).? Using > zero-based > indices would be much(!) more error prone in R.? From > affxparser's > NEWS file: > > Version: 1.3.2 [2006-03-28] > o All cell and unit indices are now starting from one and > not > ? from zero.? This change requires that all code > that have > ? been using a previous version of this package have > to be > ? updated! > > > Below, in point 5, you mention that "In R it is more > convenient to use one-based indices instead of zero-based > indices. ?This is taken care of by affxparser." ?Is this > where the 1 comes from in the implementation in order to > move the index values from 0-based to 1-based? > > Correct. > > In order to improve the affxparser documentation, I have > added the > following section to the end of [1]: > > \section{Note on the zero-based "index" field of > Affymetrix CDF files}{ > ???An Affymetrix CDF file provides > information on which cells should be > ???grouped together.? To identify these > groups of cells, the cells > ???are specified by their (x,y) coordinates, > which are stored as > ???zero-based coordinates in the CDF file. > > ???All methods of the \pkg{affxparser} > package make use of these > ???(x,y) coordinates, and some methods makes > it possible to read > ???them as well.? However, it is much > more common that the methods > ???return cell indices \emph{calculated} > from the (x,y) coordinates > ???as explained above. > > ???In order to conveniently work with cell > indices in \R, the > ???convention in \emph{affxparser} is to use > \emph{one-based} > ???indices. > ???Hence the addition (and subtraction) of > 1:s in the above equations. > ???This is all taken care of by > \pkg{affxparser}. > > ???Note that, in addition to (x,y) > coordinates, a CDF file also contains > ???a one-based "index" for each cell.? > This "index" is redundant to > ???the (x,y) coordinate and can be > calculated analogously to the > ???above \emph{cell index} while leaving out > the addition (subtration) > ???of 1:s. > ???Importantly, since this "index" is > redundant (and exists only in > ???CDF files), we have decided to treat this > field as an internal field. > ???Methods of \pkg{affxparser} do neither > provide access to nor make > ???use of this internal field. > } > > Note that the other paragraphs on this help page should not > need to be > updated.? Note that nowhere else in this page are we > talking about the > content of a CDF. > > I have also, where applicable, made it explicit in the help > pages of > methods reading CDF files that the "cell indices" are > one-based.? To > those help pages I have also added a short section: > > \section{Cell indices are one-based}{ > ???Note that in \pkg{affxparser} all > \emph{cell indices} are by > ???convention \emph{one-based}, which is > more convenient to work > ???with in \R.? For more details on > one-based indices, see > ???\code{\link{2. Cell coordinates and cell > indices}}. > } > > I hope this will clarify things.? Any further feedback > is appreciated. > > > Thanks for you help > > Henrik > > > > > > On Mon, Feb 14, 2011 at 10:24 AM, Mounts, William > <bill.mounts at="" pfizer.com=""> > wrote: > >> Todd, > >> > >> It would appear that there is an error in > affyxparser. ?Testing a > >> number of cdf files, it appears that index = K * y > + x. > > > > I doubt that. ?Could you please provide complete > examples illustrating the problem? ?Unless proven wrong, I > stand firm on the claim that both the implementation and > documentation to be correct. ?As Kasper pointed out, it may > be that the documentation is confusing or ambiguous, but > that is not to say it's wrong. ?I am happy to take > suggestions on how to improve the documentation. > > > > > > CLARIFICATIONS: > > > > 1. The spatial (x,y) cell coordinates are zero-based > [1]. ?This is at least the case if you access them via > Affymetrix Fusion SDK, that is, > > via affxparser. ? I cannot claim that all CDF files > in history have > > had zero-based (x,y) coordinates, but it does not > matter because throught the Fusion SDK they are returned as > such. ?(Anecdotal > > evidence: Browsing through several of my (ASCII and > binary) CDFs, they are indeed zero-based (x,y):s.) > > > > 2. A CDF file reference the cells (probes) by their > (x,y) coordinates only [2]. > > > > 3. It is more convenient to access cells by their > linear indices, which is why they are provided. > > > > 4. BTW, note also the last comment on that help page > [1]: If you use the affxparser methods, you don't have to > worry about (x,y) indices; everything is by default done > using cell (probe) indices. > > > > 5. In R it is more convenient to use one-based indices > instead of zero-based indices. ?This is taken care of by > affxparser. > > > > 6. The affxparser documentation [1] clearly says that > spatial (x,y) cell coordinates are zero-based indices and > the linear cell indices are one-based. > > > > 7. Do not confuse (Bioconductor) CDF annotation > packages/environments with (Affymetrix) CDF *files*; > affxparser deals with the latter only. > > > > > > I think Clarification (4) is one of the most important > ones. ?If you stick with affxparser, you are given a > well-defined self-contained and consistent access to the > content of CEL and CDF files (and some other Affymetrix file > types too). > > > > > > REFERENCES: > > [1] help("2. Cell coordinates and cell indices", > package="affxparser") > > > > [2] Section 'Affymetrix CDF Data File Format' part of > 'File Formats Documentation', Affymetrix, October 2009 > > (http://www.affymetrix.com/partners_programs/programs/developer/fu sion/index.affx?terms=no) > > > > > > /Henrik > > (wrote most of [1]) > > > >> > >> Bill > >> > >> -----Original Message----- > >> From: bioconductor-bounces at r-project.org > >> [mailto:bioconductor-bounces at r-project.org] > On Behalf Of Todd Allen > >> Sent: Monday, February 14, 2011 11:19 AM > >> To: bioconductor at r-project.org > >> Subject: [BioC] converting Affy indices to x,y > coordinates > >> > >> Hello all, > >> > >> ? I have been reading the documentation portion > of a package called > >> "affyxparser." ?In the documentation there is a > description of the > >> formulas needed to seemlessly convert between > Affymetrix probe indices > >> and the cooresponding (x,y) coordinate of > individual probes. > >> > >> Copying from the package documentation, the > following information is > >> most relevant: > >> > >> 1. index = K * y + x + 1; where K is the number of > columns on the chip > >> 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K > * y > >> > >> In my own work, I am processing a HGU133Plus 2 CDF > file. The array > >> dimensions are (1164, 1164) and if I take the > index of a specific > >> probe listed as 1354890, I calculate the > coordinates as x = 1157 and y > >> = 1163 using the formulas above. > >> > >> The (x,y) coordinate reported from Affy's own CDF > file for this probe > >> is actually x = 1158 (not 1157) and y = 1163. > >> > >> I am struggling to understand this discrepancy > between the affyparser > >> documentation and the verbatim output from Affy's > own CDF file. ?Has > >> any run into this situation before? ?Do you see > any obvious problem or > >> explanation as to what is happening. > >> > >> Thank you! > >> Todd A > >> genesplicer28 at yahoo.com > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >
ADD REPLY

Login before adding your answer.

Traffic: 625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6