FW: GEOquery package

0

Entering edit mode

Ochsner, Scott A ▴ 60

@ochsner-scott-a-4334

Last seen 11.5 years ago

Jing, Here is where you have to be very careful. The metadata does seem to indicate that the data are log2 and that RMA has been utilized. As this dataset is from Affymetrix, I would expect log2 values to be in the range of 2 to 16. From what little you have shown us this appears to be the case. Safest bet is to import the .CEL files if available and normalize yourself. I've come across a few datasets archived in GEO in which the journal article describes a normalization procedure which is not consistent with what is described in GEO metadata which is not consistent with the actual data. I have truly found that with GEO data, buyer beware. Scott Scott A. Ochsner, PhD One Baylor Plaza BCM130, Houston, TX 77030 Voice: (713) 798-6227 Fax: (713) 790-1275 -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Jing Huang Sent: Tuesday, August 30, 2011 10:36 AM To: 'bioconductor at r-project.org' Subject: [BioC] GEOquery package Dear Sean and all members, I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below. Can somebody help me on this? >Table(GSMList(gse)[[1]])[1:5, ] ID_REF VALUE 1 1007_s_at 7.693888187 2 1053_at 8.571408272 3 117_at 5.179812431 4 121_at 7.468027592 5 1255_g_at 3.118550777 > Columns(GSMList(gse)[[1]])[1:5, ] Column Description 1 ID_REF 2 VALUE log2 signal intensity, RMA <<<<< Does this means that the value is log2 transformed and the data was normalized by RMA NA <na> <na> NA.1 <na> <na> NA.2 <na> <na> According to GEOquery package I should do following steps in order to get the eset: > probesets <- Table(GPLList(gse)[[1]])$ID > data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) { + tab <- Table(x) + mymatch <- match(probesets, tab$ID_REF) + return(tab$VALUE[mymatch]) + })) > data.matrix <- apply(data.matrix, 2, function(x) { + as.numeric(as.character(x)) + }) > data.matrix <- log2(data.matrix) > data.matrix[1:5, ] GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765 [1,] 2.943713 2.917086 2.926155 2.983485 2.973219 2.962445 2.926030 [2,] 3.099532 3.136898 3.152696 3.217172 3.206948 3.198448 3.135146 [3,] 2.372900 2.309177 2.354380 2.373350 2.368464 2.381139 2.314555 [4,] 2.900727 2.873853 2.863911 2.879232 2.927384 2.913594 2.852870 [5,] 1.640876 1.645330 1.494274 1.792643 1.719597 1.648126 1.605055 Is the log2 transformation necessary for this dataset? Many thanks Jing [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Normalization GEOquery Normalization GEOquery • 1.2k views

ADD COMMENT • link 14.5 years ago Ochsner, Scott A ▴ 60

Login before adding your answer.