GEOquery package
2
0
Entering edit mode
Jing Huang ▴ 380
@jing-huang-4737
Last seen 7.2 years ago
Dear Sean and all members, I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below. Can somebody help me on this? >Table(GSMList(gse)[[1]])[1:5, ] ID_REF VALUE 1 1007_s_at 7.693888187 2 1053_at 8.571408272 3 117_at 5.179812431 4 121_at 7.468027592 5 1255_g_at 3.118550777 > Columns(GSMList(gse)[[1]])[1:5, ] Column Description 1 ID_REF 2 VALUE log2 signal intensity, RMA <<<<< Does this means that the value is log2 transformed and the data was normalized by RMA NA <na> <na> NA.1 <na> <na> NA.2 <na> <na> According to GEOquery package I should do following steps in order to get the eset: > probesets <- Table(GPLList(gse)[[1]])$ID > data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) { + tab <- Table(x) + mymatch <- match(probesets, tab$ID_REF) + return(tab$VALUE[mymatch]) + })) > data.matrix <- apply(data.matrix, 2, function(x) { + as.numeric(as.character(x)) + }) > data.matrix <- log2(data.matrix) > data.matrix[1:5, ] GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765 [1,] 2.943713 2.917086 2.926155 2.983485 2.973219 2.962445 2.926030 [2,] 3.099532 3.136898 3.152696 3.217172 3.206948 3.198448 3.135146 [3,] 2.372900 2.309177 2.354380 2.373350 2.368464 2.381139 2.314555 [4,] 2.900727 2.873853 2.863911 2.879232 2.927384 2.913594 2.852870 [5,] 1.640876 1.645330 1.494274 1.792643 1.719597 1.648126 1.605055 Is the log2 transformation necessary for this dataset? Many thanks Jing [[alternative HTML version deleted]] GEOquery GEOquery • 1.4k views ADD COMMENT 0 Entering edit mode @freudenberg-johannes-nihniehs-e-4789 Last seen 7.2 years ago Hi Jing, The values you show certainly look like they are already on the log- scale. But just to be sure you can quickly check the GEO website: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM424764 About half way down the page it says something like: "Data table header descriptions ID_REF VALUE log2 signal intensity, RMA" So you probably don't want to log these again in this case ... --Johannes -----Original Message----- From: Jing Huang [mailto:huangji@ohsu.edu] Sent: Tuesday, August 30, 2011 11:36 AM To: 'bioconductor at r-project.org' Subject: [BioC] GEOquery package Dear Sean and all members, I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below. Can somebody help me on this? >Table(GSMList(gse)[[1]])[1:5, ] ID_REF VALUE 1 1007_s_at 7.693888187 2 1053_at 8.571408272 3 117_at 5.179812431 4 121_at 7.468027592 5 1255_g_at 3.118550777 > Columns(GSMList(gse)[[1]])[1:5, ] Column Description 1 ID_REF 2 VALUE log2 signal intensity, RMA <<<<< Does this means that the value is log2 transformed and the data was normalized by RMA NA <na> <na> NA.1 <na> <na> NA.2 <na> <na> According to GEOquery package I should do following steps in order to get the eset: > probesets <- Table(GPLList(gse)[[1]])$ID data.matrix <- > do.call("cbind", lapply(GSMList(gse), function(x) { + tab <- Table(x) + mymatch <- match(probesets, tab$ID_REF) + return(tab$VALUE[mymatch]) + })) > data.matrix <- apply(data.matrix, 2, function(x) { + as.numeric(as.character(x)) + }) > data.matrix <- log2(data.matrix) > data.matrix[1:5, ] GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765 [1,] 2.943713 2.917086 2.926155 2.983485 2.973219 2.962445 2.926030 [2,] 3.099532 3.136898 3.152696 3.217172 3.206948 3.198448 3.135146 [3,] 2.372900 2.309177 2.354380 2.373350 2.368464 2.381139 2.314555 [4,] 2.900727 2.873853 2.863911 2.879232 2.927384 2.913594 2.852870 [5,] 1.640876 1.645330 1.494274 1.792643 1.719597 1.648126 1.605055 Is the log2 transformation necessary for this dataset? Many thanks Jing [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
@sean-davis-490
Last seen 17 hours ago
United States
On Tue, Aug 30, 2011 at 11:36 AM, Jing Huang <huangji at="" ohsu.edu=""> wrote: > Dear Sean and all members, > > I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below. ?Can somebody help me on this? > >>Table(GSMList(gse)[[1]])[1:5, ] > ? ? ID_REF ? ? ? VALUE > 1 1007_s_at 7.693888187 > 2 ? 1053_at 8.571408272 > 3 ? ?117_at 5.179812431 > 4 ? ?121_at 7.468027592 > 5 1255_g_at 3.118550777 >> Columns(GSMList(gse)[[1]])[1:5, ] > ? ? Column ? ? ? ? ? ? ? ?Description > 1 ? ?ID_REF > 2 ? ? VALUE log2 signal intensity, RMA ? ? ? <<<<< Does this means that the value is log2 transformed and the data was ? ? ? ? normalized by RMA > NA ? ? <na> ? ? ? ? ? ? ? ? ? ? ? <na> > NA.1 ? <na> ? ? ? ? ? ? ? ? ? ? ? <na> > NA.2 ? <na> ? ? ? ? ? ? ? ? ? ? ? <na> > > According to GEOquery package I should do following steps in order to get the eset: Hi, Jing. In general, you can simply use: gse = getGEO('GSEXXXXX') Then, gse will be a list of ExpressionSets. There is no longer a need in the vast majority of settings to do the steps below. This is pointed out in the vignette. As for the data and log2 transformation, it appears that these data are log2 transformed. However, there is no standard at GEO, so you will need to read the details from the GEO website, read the paper, or contact the original submitters to be sure. Sean >> probesets <- Table(GPLList(gse)[[1]])$ID >> data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) { > + tab <- Table(x) > + mymatch <- match(probesets, tab$ID_REF) > + return(tab\$VALUE[mymatch]) > + })) >> data.matrix <- apply(data.matrix, 2, function(x) { > + as.numeric(as.character(x)) > + }) >> data.matrix <- log2(data.matrix) >> data.matrix[1:5, ] > > ? ? GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765 > [1,] ?2.943713 ?2.917086 ?2.926155 ?2.983485 ?2.973219 ?2.962445 ?2.926030 > [2,] ?3.099532 ?3.136898 ?3.152696 ?3.217172 ?3.206948 ?3.198448 ?3.135146 > [3,] ?2.372900 ?2.309177 ?2.354380 ?2.373350 ?2.368464 ?2.381139 ?2.314555 > [4,] ?2.900727 ?2.873853 ?2.863911 ?2.879232 ?2.927384 ?2.913594 ?2.852870 > [5,] ?1.640876 ?1.645330 ?1.494274 ?1.792643 ?1.719597 ?1.648126 ?1.605055 > > Is the log2 transformation ?necessary for this dataset? > Many thanks > > Jing > > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >