Question

constructing Affybatch object from row data

0

Entering edit mode

m_punisher6 • 0

@m_punisher6-11331

Last seen 9.3 years ago

Hi all.

I have sampled of probe-sets of the rat230_2 affymetrix chip. I have the intensity of probes of that probe-set and all of the information about them like: position of each probe in the chip (x,y) , the names of their probe-sets and their indexes.

I have the described information for two sample(rat). How can I convert them to a Affybatch object for furthermore analysis like normalization and summarization?

Tnx all.

Mansoor.

affy intensity probes • 1.2k views

ADD COMMENT • link updated 9.3 years ago by James W. MacDonald 68k • written 9.3 years ago by m_punisher6 • 0

score 0 · Answer 1 · 2016-08-22

The short answer is that you don't want to do this. Instead you just want to get the CEL files and read them in normally. Unless you have no way to get the CEL files, there is absolutely no profit to doing this sort of thing by hand. However, if you really can't get the CEL files (really?), here is some info that may be helpful.

When the Affy array is read and processed into a CEL file, the data are put into the CEL file in row-major form. In other words, starting from the top left of the array, the probe data from the first row are read into the CEL file, then the next row, etc. So if you want to build an AffyBatch by hand, that's how you would do so. But do note that there will be some expectations by the affy package of what will be contained in your AffyBatch.

As noted above, each probe is read into the CEL file in row-major format. But not all of the data are read in. There are any number of probes that are used for e.g., aligning the scanner that are not actually read in. If you look at the cdf package:

> z <- as.list(rat2302cdf)
> z[[1]]
          pm     mm
 [1,] 126651 127485
 [2,] 304684 305518
 [3,] 221345 222179
 [4,] 293236 294070
 [5,] 368296 369130
 [6,] 342906 343740
 [7,] 533212 534046
 [8,] 591694 592528
 [9,]  44978  45812
[10,] 256673 257507
[11,] 128378 129212
> zz <- do.call(rbind, z)
> apply(zz, 2, range)
         pm     mm
[1,]    840   1674
[2,] 693547 694381

These are the index positions, and we are starting with the 840th position! There are also gaps in the middle, so the data don't go from the 840th to the 694381th position contiguously. The positions are converted by the indices2xy function:

> indices2xy(840, cdf = "rat2302cdf")
x y
[1,] 5 1

So you would have to convert all your (x,y) positions to indices. And there have been lots of arguments over the years as to whether or not the position for index 840 above is in the 6th position on the second row (the counting is zero based, so the first (x,y) position is actually (0,0)), or in the second position of the sixth row. So you would have to figure that out for yourself. And if you don't have all the probes, you will have to modify the rat2302cdf environment to only contain the probes that you do have.

So like I said, there is absolutely no profit in trying to do this by hand. This will be orders of magnitude more difficult than just reading in the CEL files and using the already existing infrastructure of the affy package.