Limma: Don't have .CEL Affymatrix Files.
1
0
Entering edit mode
CantExitVIM ▴ 10
@cantexitvim-15274
Last seen 2.9 years ago

I am working with Affymatrix ( U133A 2.0 chip) data for the first time. The data has been background corrected and normalized and intended to be used to compare different groups. From examining the e.coli Lrp example, I believe this data typically exists in the .CEL format with the mean expression levels and standard deviation.

Problem is, I've been given data that has been parsed; each row represents a probe and each column represents a mean expression value for a particular subject. There is no standard deviation measurement included in this data. To highlight this, I posted a very small portion of the data below (there are many more subjects and probes).

 probe_1 subject_1 subject_2 subject_3 1598_g_at 1.28409 1.34388 1.34706 160020_at 2.88587 2.84006 2.78932

From what I understand, the affy package typically will read .CEL files [e.g. read.affybatch(), ReadAffy()], which limma will work with. But as I don't have the files, I am a bit perplex at how to approach this problem. Initially, I thought I could reconstruct the .CEL files by using the e.coli Lpr example, but noticed that the standard deviations were different in each files (I am a grad student.... my initial thought was that the standard deviation was calculated from different sample expression levels. But this doesn't seem to be the case). Thus, without the STDV value, I feel like I may be missing something crucial.

Thank you!

limma affymetrix • 420 views
2
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia

When you have CEL files, you have to background correct, normalize and summarize to get a matrix of log2 expression values for each sample and each probe-set. In your case, it seems that this has already been done for you and you already have the expression matrix directly. So there's no problem, you just read the matrix into R and use it in limma as usual. For example, you might use:

e_data <- read.delim("file.txt", row.names=1)
e_data <- as.matrix(e_data)
fit - lmFit(e_data, design)


It's as easy as that.

I don't follow your comments about STDV. Processing CEL files does not produce a STDV value, nor is such a value required by limma, so I don't follow what the problem is.

0
Entering edit mode

Thank you for the quick reply,

So there's no problem, you just read the matrix into R and use it in limma as usual.

e_data <- read.table(file.path("file.txt"), skip=0, header=TRUE,sep="\t", row.names=1)
e_data <- as.matrix(e_data)
eSet_data <- ExpressionSet(e_data)
exprs(eSet_data) <- log2(exprs(eSet_data))
fit <- lmFit(eSet_data, design_matrix)

I was able to load the data utilizing ExpressionSet(). As this is a biobase function, I am not sure if this is the "usual" way to load data. But lmFit seems to handle the data just fine (coercing that object with getEAWP() doesn't seem to change anything). Is there anything else I should be aware of?

I don't follow your comments about STDV. Processing CEL files does not produce a STDV value, nor is such a value required by limma, so I don't follow what the problem is.

As I was unaware of what affy data looked like, I followed the " Lrp Mutant E. Coli Strain with Affymetrix Arrays" tutorial in the limma user guide (Section 17.1). After dowloading the .CEL files (http://bioinf.wehi.edu.au/limma/data/ecoli-lrp.zip) , I noticed they have a STDV column. Below is an except of one of the .CEL files.

[INTENSITY]
NumberCells=295936
0      0    46192.0    400.8     36
1      0    5677.8    1202.7     36
2      0    46192.0    1.3     30
3      0    4620.8    3509.0     36
4      0    1823.8    531.2     30
5      0    46192.0    3350.9     30

Thank you for all your help!

1
Entering edit mode

It's even easier than that. As I said in my answer, you can just read the file and give it to limma. See the edits to my answer above.

Modern CEL files are all binary and you will not see any entries like those for the Lrp data.