Hi there,
I have a processed microarray dataset (HG-U133A) downloaded from the ArrayExpress. The authors have processed the data using RMA in affy package and only the processed data is available. What I have now is a spreadsheet containing a summary of normalized expression values. I know that for using limma, it is assumed that we have "eset" of class "exprSet", however, the dataset I downloaded is not definitely of that class. I need to do DE analysis on that dataset... how can I use limma for this purpose?
I checked these links as well but I did not find an answer for my case:
A: Importing RMA processed data into limma for eBayes
Importing quantile normalized .txt file to limma for differential expression
Thanks in advance for any help with this.
Regards,
Sepideh

Note that you can download data directly from ArrayExpress, into convenient containers (ExpressionSets or AffyBatches, depending on the data) using the ArrayExpress package. As an example, say we want the E-GEOD-24068 dataset.
> library(ArrayExpress) > library(affy) > dat <- ArrayExpress("E-GEOD-24068") trying URL 'http://www.ebi.ac.uk/arrayexpress/files/A-AFFY-44/A-AFFY-44.adf.txt' Content type 'text/plain' length 4517088 bytes (4.3 Mb) opened URL ================================================== downloaded 4.3 Mb trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-24068/E-GEOD-24068.sdrf.txt' Content type 'text/plain' length 10304 bytes (10 Kb) opened URL ================================================== downloaded 10 Kb trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-24068/E-GEOD-24068.idf.txt' Content type 'text/plain' length 3742 bytes opened URL ================================================== downloaded 3742 bytes Copying raw data files trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-24068/E-GEOD-24068.raw.1.zip' Content type 'application/zip' length 52828214 bytes (50.4 Mb) opened URL ================================================= downloaded 50.4 Mb Unpacking data files ArrayExpress: Reading pheno data from SDRF ArrayExpress: Reading data files Read 28 items E-GEOD-24068 was successfully loaded into AffyBatch > eset <- rma(dat) Background correcting Normalizing Calculating ExpressionAnd this comes with all sorts of information about the samples:
Which you can use to do the analysis
> trt <- pData(phenoData(eset))[,38] > trt [1] "parathyroid hormone" "parathyroid hormone" "parathyroid hormone" [4] "parathyroid hormone" "parathyroid hormone" "saline" [7] "saline" "saline" "saline" [10] "saline" > library(limma) > fit <- lmFit(eset, model.matrix(~trt)) > fit2 <- eBayes(fit) > library(hgu133plus2.db) > gns <- select(hgu133plus2.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","GENENAME")) Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows > gns <- gns[!duplicated(gns[,1]),] > fit2$genes <- gns > topTable(fit2,2) PROBEID ENTREZID SYMBOL 218002_s_at 218002_s_at 9547 CXCL14 215506_s_at 215506_s_at 9077 DIRAS3 217821_s_at 217821_s_at 51729 WBP11 202388_at 202388_at 5997 RGS2 205870_at 205870_at 624 BDKRB2 209959_at 209959_at 8013 NR4A3 205100_at 205100_at 9945 GFPT2 222899_at 222899_at 22801 ITGA11 218935_at 218935_at 30845 EHD3 213004_at 213004_at 23452 ANGPTL2 GENENAME logFC AveExpr 218002_s_at chemokine (C-X-C motif) ligand 14 -2.0147181 7.161729 215506_s_at DIRAS family, GTP-binding RAS-like 3 -1.9505608 5.607836 217821_s_at WW domain binding protein 11 0.8534752 6.899359 202388_at regulator of G-protein signaling 2 -2.0633228 9.550311 205870_at bradykinin receptor B2 -1.3754981 6.035307 209959_at nuclear receptor subfamily 4, group A, member 3 -1.5975650 7.487489 205100_at glutamine-fructose-6-phosphate transaminase 2 -0.8129612 6.008675 222899_at integrin, alpha 11 -1.3640798 7.776450 218935_at EH-domain containing 3 -1.9003569 7.110979 213004_at angiopoietin-like 2 -1.3629536 7.081965 t P.Value adj.P.Val B 218002_s_at -7.928451 9.362982e-06 0.5119211 -0.1054202 215506_s_at -7.253964 2.087725e-05 0.5707317 -0.3215610 217821_s_at 6.443773 5.874638e-05 0.6018951 -0.6369279 202388_at -6.362932 6.543297e-05 0.6018951 -0.6722592 205870_at -6.200199 8.150267e-05 0.6018951 -0.7457188 209959_at -6.167721 8.519036e-05 0.6018951 -0.7607632 205100_at -6.064561 9.813941e-05 0.6018951 -0.8094168 222899_at -6.020962 1.042326e-04 0.6018951 -0.8303826 218935_at -5.883415 1.262624e-04 0.6018951 -0.8981350 213004_at -5.881011 1.266892e-04 0.6018951 -0.8993415Thanks James for that all,
I downloaded the data in the same way you showed here, however, the data is RMA normalized and I cannot obtain eset by doing rma() on the dataset which is already normalized.
Regards,
Sepideh
I doubt you downloaded the data the same way, else you would have already had an ExpressionSet. And you cannot use ArrayExpress() if there are no raw data, so you could not have downloaded the way I showed. You would have to do something like this
> ae <- getAE("E-GEOD-56177", type = "processed") trying URL 'http://www.ebi.ac.uk/arrayexpress/files/A-AFFY-44/A-AFFY-44.adf.txt' Content type 'text/plain' length 4517088 bytes (4.3 Mb) opened URL ================================================== downloaded 4.3 Mb trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-56177/E-GEOD-56177.sdrf.txt' Content type 'text/plain' length 6155 bytes opened URL ================================================== downloaded 6155 bytes trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-56177/E-GEOD-56177.idf.txt' Content type 'text/plain' length 3303 bytes opened URL ================================================== downloaded 3303 bytes Copying processed data files trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-56177/E-GEOD-56177.processed.1.zip' Content type 'application/zip' length 3444865 bytes (3.3 Mb) opened URL ================================================== downloaded 3.3 Mb Unpacking data files > aeset <- procset(ae, getcolproc(ae)[2]) Read 2 items ArrayExpress: Reading pheno data from SDRF Reading processed File: GSM1357190_sample_table.txt Reading processed File: GSM1357189_sample_table.txt Reading processed File: GSM1357188_sample_table.txt Reading processed File: GSM1357187_sample_table.txt Reading processed File: GSM1357186_sample_table.txt Reading processed File: GSM1357185_sample_table.txt Reading processed File: GSM1357184_sample_table.txt Reading processed File: GSM1357183_sample_table.txt ArrayExpress: Reading feature metadata from ADF Read 25 items E-GEOD-56177 processed data was successfully loaded into ExpressionSet > aeset ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 8 samples element names: exprs protocolData: none phenoData rowNames: GSM1357190_sample_table GSM1357189_sample_table ... GSM1357183_sample_table (8 total) varLabels: Source.Name Comment..Sample_source_name. ... Term.Accession.Number.4 (43 total) varMetadata: labelDescription featureData featureNames: AFFX-BioB-5_at AFFX-BioB-M_at ... 1570653_at (54675 total) fvarLabels: Composite.Element.Name Composite.Element.Database.Entry.affymetrix_netaffx. ... Composite.Element.Database.Entry.cp450. (15 total) fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation:So my point remains. Just use the ArrayExpress package as intended, without exporting data and converting to a spreadsheet, and you can use limma directly.