Understanding the output of the exprs() function
1
0
Entering edit mode
neverstop • 0
@neverstop-7227
Last seen 7.9 years ago

I'm trying to understand what exprs() function does to an AffyBatch Object. I would expect to get a matrix with a row for every gene/probeset and a column for every sample. 

 

> library(affydata) 
> data(Dilution)
> Dilution
AffyBatch object
size of arrays=640x640 features (35221 kb)
cdf=HG_U95Av2 (12625 affyids)
number of samples=4
number of genes=12625
annotation=hgu95av2
notes=
> head(exprs(Dilution))
     20A   20B    10A   10B
1  149.0 112.0  129.0  60.0
2 1153.5 575.3 1262.3 564.8
3  142.0  98.0  128.0  56.0
4 1051.0 597.0 1269.0 570.0
5   91.0  77.0   90.0  46.0
6  136.0 133.0  117.0  62.0
> dim(exprs(Dilution))
[1] 409600      4

As you can see, I get a matrix of 409600 rows, instead of 12625 (the number of genes).

I don't understand what the rows of this matrix represent. Do they represent probe cells?

Thank you.

EDIT: I've just noticed that 640*640=409600. Anyway, I still don't understand what these numbers represent.

affymetrix microarrays • 1.7k views
ADD COMMENT
1
Entering edit mode

There is some one with what looks like the same question on stackoverflow http://stackoverflow.com/questions/6410601/extract-raw-data-from-affybatch-object. I think these are probes that map back to the genes.

ADD REPLY
0
Entering edit mode

What are the rows of the matrix, give an example?

ADD REPLY
0
Entering edit mode

I've edited the main post

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

Affymetrix arrays all have multiple probes (25-mers) that interrogate a single transcript. The array you are looking at used primarily 16 probes per probeset (e.g., there are 16 25-mers that are intended to interrogate a particular transcript. Each 25-mer is called a probe, and the set of probes for a given transcript is a probeset).

The numbers represent the partially processed intensity values for each probe, and aren't particularly useful by themselves. If you run rma on those values, you will get the background corrected, normalized, summarized values for each probeset, which you can then use to compare gene expression between samples.

There are any number of articles, white papers, vignettes, etc that describe the Affy platform and how to analyze the data. One useful paper you could read is the citation for the affy package:

 Gautier, L., Cope, L., Bolstad, B. M., and Irizarry, R. A. 2004.
  affy---analysis of Affymetrix GeneChip data at the probe level.
  Bioinformatics 20, 3 (Feb. 2004), 307-315.

ADD COMMENT
0
Entering edit mode

Thank you. May I ask you another question? Do Affymetrix chips have the same number of probes per probeset? Because I tried to divide 409600 by 32 (16 probes of perfect match and 16 probes of mismatch), and I get 12800, while the number of genes is 12625.

ADD REPLY
1
Entering edit mode

No. As I mentioned, most are 16, but not all.

> table(sapply(as.list(hgu95av2cdf), nrow))

    6     7     8     9    10    11    12    13    14    15    16    20    69
    8     3     3     4     1     4    11    53    45    39 12387    66     1
> sum(table(sapply(as.list(hgu95av2cdf), nrow)))
[1] 12625
ADD REPLY
0
Entering edit mode

Thank you a lot!

ADD REPLY

Login before adding your answer.

Traffic: 949 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6