Question

Understanding the output of the exprs() function

0

Entering edit mode

neverstop • 0

@neverstop-7227

Last seen 7.9 years ago

I'm trying to understand what exprs() function does to an AffyBatch Object. I would expect to get a matrix with a row for every gene/probeset and a column for every sample.

> library(affydata) 
> data(Dilution)
> Dilution
AffyBatch object
size of arrays=640x640 features (35221 kb)
cdf=HG_U95Av2 (12625 affyids)
number of samples=4
number of genes=12625
annotation=hgu95av2
notes=
> head(exprs(Dilution))
     20A   20B    10A   10B
1  149.0 112.0  129.0  60.0
2 1153.5 575.3 1262.3 564.8
3  142.0  98.0  128.0  56.0
4 1051.0 597.0 1269.0 570.0
5   91.0  77.0   90.0  46.0
6  136.0 133.0  117.0  62.0
> dim(exprs(Dilution))
[1] 409600      4

As you can see, I get a matrix of 409600 rows, instead of 12625 (the number of genes).

I don't understand what the rows of this matrix represent. Do they represent probe cells?

Thank you.

EDIT: I've just noticed that 640*640=409600. Anyway, I still don't understand what these numbers represent.

affymetrix microarrays • 1.7k views

ADD COMMENT • link updated 8.0 years ago by James W. MacDonald 65k • written 8.0 years ago by neverstop • 0

1

Entering edit mode

There is some one with what looks like the same question on stackoverflow http://stackoverflow.com/questions/6410601/extract-raw-data-from-affybatch-object. I think these are probes that map back to the genes.

ADD REPLY • link 8.0 years ago chris86 ▴ 420

0

Entering edit mode

What are the rows of the matrix, give an example?

ADD REPLY • link 8.0 years ago chris86 ▴ 420

0

Entering edit mode

I've edited the main post

ADD REPLY • link 8.0 years ago neverstop • 0

score 2 · Accepted Answer · 2016-05-09

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Affymetrix arrays all have multiple probes (25-mers) that interrogate a single transcript. The array you are looking at used primarily 16 probes per probeset (e.g., there are 16 25-mers that are intended to interrogate a particular transcript. Each 25-mer is called a probe, and the set of probes for a given transcript is a probeset).

The numbers represent the partially processed intensity values for each probe, and aren't particularly useful by themselves. If you run rma on those values, you will get the background corrected, normalized, summarized values for each probeset, which you can then use to compare gene expression between samples.

There are any number of articles, white papers, vignettes, etc that describe the Affy platform and how to analyze the data. One useful paper you could read is the citation for the affy package:

Gautier, L., Cope, L., Bolstad, B. M., and Irizarry, R. A. 2004.
affy---analysis of Affymetrix GeneChip data at the probe level.
Bioinformatics 20, 3 (Feb. 2004), 307-315.

ADD COMMENT • link 8.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you. May I ask you another question? Do Affymetrix chips have the same number of probes per probeset? Because I tried to divide 409600 by 32 (16 probes of perfect match and 16 probes of mismatch), and I get 12800, while the number of genes is 12625.

ADD REPLY • link 8.0 years ago neverstop • 0

1

Entering edit mode

No. As I mentioned, most are 16, but not all.

> table(sapply(as.list(hgu95av2cdf), nrow))

    6     7     8     9    10    11    12    13    14    15    16    20    69
    8     3     3     4     1     4    11    53    45    39 12387    66     1
> sum(table(sapply(as.list(hgu95av2cdf), nrow)))
[1] 12625