topTable (fit) annotation

0

Entering edit mode

Jing Huang ▴ 380

@jing-huang-4737

Last seen 9.6 years ago

Dear All members, I have been extracting data from GEO (GEO package) and do some analysis on them by using limma package. What I discover is the components of topTable(fit) are different from the dataset GDS and GSE. If the data is from GDS, then the colnames of topTable (fit) looks like this. > colnames(topTable(fit)) [1] "ID" "Gene.title" "Gene.symbol" [4] "Gene.ID" "UniGene.title" "UniGene.symbol" [7] "UniGene.ID" "Nucleotide.Title" "GI" [10] "GenBank.Accession" "Platform_CLONEID" "Platform_ORF" [13] "Platform_SPOTID" "Chromosome.location" "Chromosome.annotation" [16] "GO.Function" "GO.Process" "GO.Component" [19] "GO.Function.1" "GO.Process.1" "GO.Component.1" [22] "CTRL" "HIF1a" "HIF2a" [25] "HIF1a2a" "AveExpr" "F" [28] "P.Value" "adj.P.Val" If the data is from GSE, then the colnames of topTable(fit) looks like this: >colnames(topTable(fit) [1] "ID" "mir210" "CTRL2" "AveExpr" "F" "P.Value" "adj.P.Val" I am trying to add some term into this table by doing following one by one: the data is generated by Affymetrix human U133 platform: >Library(hgu133plus2.db) >x=hgu133plus2SYMBOL >y=topTable(fit) >y$SYMBOL=unlist(as.list(x[y$ID])) It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO annotations as well. I like to have the topTable more like the topTable(fit) generated at top by data GEO GDS data I am wondering if there is an easy way to annotate all once. In addition, I am having a trouble to annotate GO term. Many Thanks Jing [[alternative HTML version deleted]]

GO annotate limma GO annotate limma • 1.6k views

ADD COMMENT • link updated 12.7 years ago by Sean Davis 21k • written 12.7 years ago by Jing Huang ▴ 380

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

Hi, Jing. NCBI GEO maintains two types of GPL records. The normal variant is just supplied by the submitter. However, when a GEO Series is curated by NCBI GEO into a GEO DataSet (GDS), they create a so-called "Annotation GPL". These have a relatively standard set of columns. I have not made the change to GEOquery yet to grab this annotation GPL when getting Series Matrix files. But, you can get them yourself by specifying: gplannot = getGEO("GPL96", AnnotGPL=TRUE) You can always replace the feature data of the ExpressionSets with the information in the retrieved Annotation GPL. I hope that is clear. Sean On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: > Dear All members, > > I have been extracting data from GEO (GEO package) and do some analysis on them by using limma package. What I discover is the components of topTable(fit) are different from the dataset GDS and GSE. > > If the data is from GDS, then the colnames of topTable (fit) looks like this. > >> colnames(topTable(fit)) > [1] "ID" ? ? ? ? ? ? ? ? ? ?"Gene.title" ? ? ? ? ? ?"Gene.symbol" > ?[4] "Gene.ID" ? ? ? ? ? ? ? "UniGene.title" ? ? ? ? "UniGene.symbol" > ?[7] "UniGene.ID" ? ? ? ? ? ?"Nucleotide.Title" ? ? ?"GI" > [10] "GenBank.Accession" ? ? "Platform_CLONEID" ? ? ?"Platform_ORF" > [13] "Platform_SPOTID" ? ? ? "Chromosome.location" ? "Chromosome.annotation" > [16] "GO.Function" ? ? ? ? ? "GO.Process" ? ? ? ? ? ?"GO.Component" > [19] "GO.Function.1" ? ? ? ? "GO.Process.1" ? ? ? ? ?"GO.Component.1" > [22] "CTRL" ? ? ? ? ? ? ? ? ?"HIF1a" ? ? ? ? ? ? ? ? "HIF2a" > [25] "HIF1a2a" ? ? ? ? ? ? ? "AveExpr" ? ? ? ? ? ? ? "F" > [28] "P.Value" ? ? ? ? ? ? ? "adj.P.Val" > > If the data is from GSE, then the ? colnames of topTable(fit) looks like this: > >>colnames(topTable(fit) > > [1] "ID" ? ? ? ?"mir210" ? ?"CTRL2" ? ? "AveExpr" ? "F" ? ? ? ? "P.Value" ? "adj.P.Val" > > I am trying to add some term into this table by doing following one by one: the data is generated by Affymetrix human U133 platform: > >>Library(hgu133plus2.db) >>x=hgu133plus2SYMBOL >>y=topTable(fit) >>y$SYMBOL=unlist(as.list(x[y$ID])) > > It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO annotations as well. ?I like to have the topTable more like the topTable(fit) generated at top by data GEO GDS data > > I am wondering if there is an easy way to annotate all once. > > In addition, I am having a trouble to annotate GO term. > > Many Thanks > > Jing > > > > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 12.7 years ago Sean Davis 21k

0

Entering edit mode

Thank YOU Sean for responding my question. I am not sure where I should add the platform annotation in. Here are what I observed: If I extract "GDS2162" by typing in > gds=getGEO("GDS2162") > eset=GDS2eSet(gds,do.log2=T) >eset ExpressionSet (storageMode: lockedEnvironment) assayData: 45101 features, 16 samples element names: exprs protocolData: none phenoData sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total) varLabels: sample genotype/variation agent description varMetadata: labelDescription featureData featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total) fvarLabels: ID Gene.title ... GO.Component.1 (21 total) fvarMetadata: Column labelDescription experimentData: use 'experimentData(object)' pubMedIds: 16237459 Annotation: Most of annotation is in. If I extract "GSE16962" by typing in >gse=getGEO("GSE16962") > gse $GSE16962_series_matrix.txt.gz ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 12 samples element names: exprs protocolData: none phenoData sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) varLabels: title geo_accession ... data_row_count (34 total) varMetadata: labelDescription featureData featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total) fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL570 It looks to me it includes annotation package such as GO term.... But I can't use this eset data to do further analysis (such as fit table) what I need. If I extract ("GSE16962") by typing in >gse=getGEO("GSE16962", GSEMatrix=F) Then following GEOquery package, I can generate eset2, which I can use to do analysis what I need. But the eset2 looks like this: > eset2 ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 12 samples element names: exprs protocolData: none phenoData sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) varLabels: samples varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: Here are the scripts that I use to generate eset2: >probesets <- Table(GPLList(gse)[[1]])$ID > data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) { + tab <- Table(x) + mymatch <- match(probesets,tab$ID_REF) + return(tab$VALUE[mymatch]) + })) > data.matrix <- apply(data.matrix, 2, function(x) { + as.numeric(as.character(x)) + }) > require(Biobase) > rownames(data.matrix) <- probesets > colnames(data.matrix) <- names(GSMList(gse)) > pdata <- data.frame(samples=names(GSMList(gse))) > rownames(pdata) <- names(GSMList(gse)) > pheno <- as(pdata,"AnnotatedDataFrame") > eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno) At which step, I should add >gplannot = getGEO("GPL96", AnnotGPL=TRUE) Many Many thanks Jing On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: >Hi, Jing. > >NCBI GEO maintains two types of GPL records. The normal variant is >just supplied by the submitter. However, when a GEO Series is curated >by NCBI GEO into a GEO DataSet (GDS), they create a so-called >"Annotation GPL". These have a relatively standard set of columns. I >have not made the change to GEOquery yet to grab this annotation GPL >when getting Series Matrix files. But, you can get them yourself by >specifying: > >gplannot = getGEO("GPL96", AnnotGPL=TRUE) > >You can always replace the feature data of the ExpressionSets with the >information in the retrieved Annotation GPL. > >I hope that is clear. > >Sean > > >On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: >> Dear All members, >> >> I have been extracting data from GEO (GEO package) and do some analysis >>on them by using limma package. What I discover is the components of >>topTable(fit) are different from the dataset GDS and GSE. >> >> If the data is from GDS, then the colnames of topTable (fit) looks like >>this. >> >>> colnames(topTable(fit)) >> [1] "ID" "Gene.title" "Gene.symbol" >> [4] "Gene.ID" "UniGene.title" "UniGene.symbol" >> [7] "UniGene.ID" "Nucleotide.Title" "GI" >> [10] "GenBank.Accession" "Platform_CLONEID" "Platform_ORF" >> [13] "Platform_SPOTID" "Chromosome.location" >>"Chromosome.annotation" >> [16] "GO.Function" "GO.Process" "GO.Component" >> [19] "GO.Function.1" "GO.Process.1" "GO.Component.1" >> [22] "CTRL" "HIF1a" "HIF2a" >> [25] "HIF1a2a" "AveExpr" "F" >> [28] "P.Value" "adj.P.Val" >> >> If the data is from GSE, then the colnames of topTable(fit) looks >>like this: >> >>>colnames(topTable(fit) >> >> [1] "ID" "mir210" "CTRL2" "AveExpr" "F" >>"P.Value" "adj.P.Val" >> >> I am trying to add some term into this table by doing following one by >>one: the data is generated by Affymetrix human U133 platform: >> >>>Library(hgu133plus2.db) >>>x=hgu133plus2SYMBOL >>>y=topTable(fit) >>>y$SYMBOL=unlist(as.list(x[y$ID])) >> >> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO >>annotations as well. I like to have the topTable more like the >>topTable(fit) generated at top by data GEO GDS data >> >> I am wondering if there is an easy way to annotate all once. >> >> In addition, I am having a trouble to annotate GO term. >> >> Many Thanks >> >> Jing >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor >>

ADD REPLY • link 12.7 years ago Jing Huang ▴ 380

0

Entering edit mode

Hi Jing, I'm not quite sure I understand your question. Why can't you use the data for further analysis? Maybe the issue is that getGEO() returns a list? Have you tried something like: > gse16962 <- getGEO("GSE16962") > eset <- gse16962[[1]] > head(fData(eset)) ID GB_ACC SPOT_ID Species.Scientific.Name Annotation.Date 1007_s_at 1007_s_at U48705 <na> Homo sapiens Mar 11, 2009 1053_at 1053_at M87338 <na> Homo sapiens Mar 11, 2009 117_at 117_at X51757 <na> Homo sapiens Mar 11, 2009 121_at 121_at X69699 <na> Homo sapiens Mar 11, 2009 1255_g_at 1255_g_at L36861 <na> Homo sapiens Mar 11, 2009 1294_at 1294_at L13852 <na> Homo sapiens Mar 11, 2009 Sequence.Type Sequence.Source 1007_s_at Exemplar sequence Affymetrix Proprietary Database 1053_at Exemplar sequence GenBank 117_at Exemplar sequence Affymetrix Proprietary Database 121_at Exemplar sequence GenBank 1255_g_at Exemplar sequence Affymetrix Proprietary Database 1294_at Exemplar sequence GenBank Etc. --Johannes -----Original Message----- From: Jing Huang [mailto:huangji@ohsu.edu] Sent: Wednesday, August 31, 2011 12:21 PM To: Davis, Sean (NIH/NCI) [E] Cc: bioconductor at r-project.org Subject: Re: [BioC] topTable (fit) annotation Thank YOU Sean for responding my question. I am not sure where I should add the platform annotation in. Here are what I observed: If I extract "GDS2162" by typing in > gds=getGEO("GDS2162") > eset=GDS2eSet(gds,do.log2=T) >eset ExpressionSet (storageMode: lockedEnvironment) assayData: 45101 features, 16 samples element names: exprs protocolData: none phenoData sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total) varLabels: sample genotype/variation agent description varMetadata: labelDescription featureData featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total) fvarLabels: ID Gene.title ... GO.Component.1 (21 total) fvarMetadata: Column labelDescription experimentData: use 'experimentData(object)' pubMedIds: 16237459 Annotation: Most of annotation is in. If I extract "GSE16962" by typing in >gse=getGEO("GSE16962") > gse $GSE16962_series_matrix.txt.gz ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 12 samples element names: exprs protocolData: none phenoData sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) varLabels: title geo_accession ... data_row_count (34 total) varMetadata: labelDescription featureData featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total) fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL570 It looks to me it includes annotation package such as GO term.... But I can't use this eset data to do further analysis (such as fit table) what I need. If I extract ("GSE16962") by typing in >gse=getGEO("GSE16962", GSEMatrix=F) Then following GEOquery package, I can generate eset2, which I can use to do analysis what I need. But the eset2 looks like this: > eset2 ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 12 samples element names: exprs protocolData: none phenoData sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) varLabels: samples varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: Here are the scripts that I use to generate eset2: >probesets <- Table(GPLList(gse)[[1]])$ID data.matrix <- >do.call("cbind", lapply(GSMList(gse), function(x) { + tab <- Table(x) + mymatch <- match(probesets,tab$ID_REF) + return(tab$VALUE[mymatch]) + })) > data.matrix <- apply(data.matrix, 2, function(x) { + as.numeric(as.character(x)) + }) > require(Biobase) > rownames(data.matrix) <- probesets > colnames(data.matrix) <- names(GSMList(gse)) pdata <- > data.frame(samples=names(GSMList(gse))) > rownames(pdata) <- names(GSMList(gse)) pheno <- > as(pdata,"AnnotatedDataFrame") > eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno) At which step, I should add >gplannot = getGEO("GPL96", AnnotGPL=TRUE) Many Many thanks Jing On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: >Hi, Jing. > >NCBI GEO maintains two types of GPL records. The normal variant is >just supplied by the submitter. However, when a GEO Series is curated >by NCBI GEO into a GEO DataSet (GDS), they create a so-called >"Annotation GPL". These have a relatively standard set of columns. I >have not made the change to GEOquery yet to grab this annotation GPL >when getting Series Matrix files. But, you can get them yourself by >specifying: > >gplannot = getGEO("GPL96", AnnotGPL=TRUE) > >You can always replace the feature data of the ExpressionSets with the >information in the retrieved Annotation GPL. > >I hope that is clear. > >Sean > > >On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: >> Dear All members, >> >> I have been extracting data from GEO (GEO package) and do some >>analysis on them by using limma package. What I discover is the >>components of >>topTable(fit) are different from the dataset GDS and GSE. >> >> If the data is from GDS, then the colnames of topTable (fit) looks >>like this. >> >>> colnames(topTable(fit)) >> [1] "ID" "Gene.title" "Gene.symbol" >> [4] "Gene.ID" "UniGene.title" "UniGene.symbol" >> [7] "UniGene.ID" "Nucleotide.Title" "GI" >> [10] "GenBank.Accession" "Platform_CLONEID" "Platform_ORF" >> [13] "Platform_SPOTID" "Chromosome.location" >>"Chromosome.annotation" >> [16] "GO.Function" "GO.Process" "GO.Component" >> [19] "GO.Function.1" "GO.Process.1" "GO.Component.1" >> [22] "CTRL" "HIF1a" "HIF2a" >> [25] "HIF1a2a" "AveExpr" "F" >> [28] "P.Value" "adj.P.Val" >> >> If the data is from GSE, then the colnames of topTable(fit) looks >>like this: >> >>>colnames(topTable(fit) >> >> [1] "ID" "mir210" "CTRL2" "AveExpr" "F" >>"P.Value" "adj.P.Val" >> >> I am trying to add some term into this table by doing following one >>by >>one: the data is generated by Affymetrix human U133 platform: >> >>>Library(hgu133plus2.db) >>>x=hgu133plus2SYMBOL >>>y=topTable(fit) >>>y$SYMBOL=unlist(as.list(x[y$ID])) >> >> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO >>annotations as well. I like to have the topTable more like the >>topTable(fit) generated at top by data GEO GDS data >> >> I am wondering if there is an easy way to annotate all once. >> >> In addition, I am having a trouble to annotate GO term. >> >> Many Thanks >> >> Jing >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor >> _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.7 years ago Freudenberg, Johannes NIH/NIEHS [E] ▴ 150

0

Entering edit mode

On Wed, Aug 31, 2011 at 12:21 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: > Thank YOU Sean for responding my question. I am not sure where I should > add the platform annotation in. > > Here are what I observed: > > If I extract "GDS2162" by typing in > >> gds=getGEO("GDS2162") >> eset=GDS2eSet(gds,do.log2=T) > >>eset > > ExpressionSet (storageMode: lockedEnvironment) > assayData: 45101 features, 16 samples > ?element names: exprs > protocolData: none > phenoData > ?sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total) > ?varLabels: sample genotype/variation agent description > ?varMetadata: labelDescription > featureData > ?featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total) > ?fvarLabels: ID Gene.title ... GO.Component.1 (21 total) > > ?fvarMetadata: Column labelDescription > experimentData: use 'experimentData(object)' > ?pubMedIds: 16237459 > Annotation: > > Most of annotation is in. > > > > If I extract "GSE16962" by typing in > >>gse=getGEO("GSE16962") > >> gse > $GSE16962_series_matrix.txt.gz > ExpressionSet (storageMode: lockedEnvironment) > assayData: 54675 features, 12 samples > ?element names: exprs > protocolData: none > phenoData > ?sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) > ?varLabels: title geo_accession ... data_row_count (34 total) > ?varMetadata: labelDescription > featureData > ?featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total) > ?fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total) > ?fvarMetadata: Column Description labelDescription > experimentData: use 'experimentData(object)' > Annotation: GPL570 Hi, Jing. gse above is a list. You can use gse[[1]] to get back an ExpressionSet, in this case the ExpressionSet that was generated using GPL570 platform. There is no need to do any of the stuff you note below in the VAST majority of cases. Sean > It looks to me it includes annotation package such as GO term.... > But I can't use this eset data to do further analysis (such as fit table) > what I need. > > If I extract ("GSE16962") by typing in > >>gse=getGEO("GSE16962", GSEMatrix=F) > > Then following GEOquery package, I can generate eset2, which I can use to > do analysis what I need. But the eset2 looks like this: > >> eset2 > ExpressionSet (storageMode: lockedEnvironment) > assayData: 54675 features, 12 samples > ?element names: exprs > protocolData: none > phenoData > ?sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) > ?varLabels: samples > ?varMetadata: labelDescription > featureData: none > experimentData: use 'experimentData(object)' > Annotation: > > > > Here are the scripts that I use to generate eset2: > >>probesets <- Table(GPLList(gse)[[1]])$ID >> data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) { > + tab <- Table(x) > + mymatch <- match(probesets,tab$ID_REF) > + return(tab$VALUE[mymatch]) > + })) >> data.matrix <- apply(data.matrix, 2, function(x) { > + as.numeric(as.character(x)) > + }) >> require(Biobase) >> rownames(data.matrix) <- probesets >> colnames(data.matrix) <- names(GSMList(gse)) >> pdata <- data.frame(samples=names(GSMList(gse))) >> rownames(pdata) <- names(GSMList(gse)) >> pheno <- as(pdata,"AnnotatedDataFrame") >> eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno) > > > At which step, I should add > >>gplannot = getGEO("GPL96", AnnotGPL=TRUE) > > > Many Many thanks > > Jing > > > > > > > > > > > > > > > > On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: > >>Hi, Jing. >> >>NCBI GEO maintains two types of GPL records. ?The normal variant is >>just supplied by the submitter. ?However, when a GEO Series is curated >>by NCBI GEO into a GEO DataSet (GDS), they create a so-called >>"Annotation GPL". ?These have a relatively standard set of columns. ?I >>have not made the change to GEOquery yet to grab this annotation GPL >>when getting Series Matrix files. ?But, you can get them yourself by >>specifying: >> >>gplannot = getGEO("GPL96", AnnotGPL=TRUE) >> >>You can always replace the feature data of the ExpressionSets with the >>information in the retrieved Annotation GPL. >> >>I hope that is clear. >> >>Sean >> >> >>On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: >>> Dear All members, >>> >>> I have been extracting data from GEO (GEO package) and do some analysis >>>on them by using limma package. What I discover is the components of >>>topTable(fit) are different from the dataset GDS and GSE. >>> >>> If the data is from GDS, then the colnames of topTable (fit) looks like >>>this. >>> >>>> colnames(topTable(fit)) >>> [1] "ID" ? ? ? ? ? ? ? ? ? ?"Gene.title" ? ? ? ? ? ?"Gene.symbol" >>> ?[4] "Gene.ID" ? ? ? ? ? ? ? "UniGene.title" ? ? ? ? "UniGene.symbol" >>> ?[7] "UniGene.ID" ? ? ? ? ? ?"Nucleotide.Title" ? ? ?"GI" >>> [10] "GenBank.Accession" ? ? "Platform_CLONEID" ? ? ?"Platform_ORF" >>> [13] "Platform_SPOTID" ? ? ? "Chromosome.location" >>>"Chromosome.annotation" >>> [16] "GO.Function" ? ? ? ? ? "GO.Process" ? ? ? ? ? ?"GO.Component" >>> [19] "GO.Function.1" ? ? ? ? "GO.Process.1" ? ? ? ? ?"GO.Component.1" >>> [22] "CTRL" ? ? ? ? ? ? ? ? ?"HIF1a" ? ? ? ? ? ? ? ? "HIF2a" >>> [25] "HIF1a2a" ? ? ? ? ? ? ? "AveExpr" ? ? ? ? ? ? ? "F" >>> [28] "P.Value" ? ? ? ? ? ? ? "adj.P.Val" >>> >>> If the data is from GSE, then the ? colnames of topTable(fit) looks >>>like this: >>> >>>>colnames(topTable(fit) >>> >>> [1] "ID" ? ? ? ?"mir210" ? ?"CTRL2" ? ? "AveExpr" ? "F" >>>"P.Value" ? "adj.P.Val" >>> >>> I am trying to add some term into this table by doing following one by >>>one: the data is generated by Affymetrix human U133 platform: >>> >>>>Library(hgu133plus2.db) >>>>x=hgu133plus2SYMBOL >>>>y=topTable(fit) >>>>y$SYMBOL=unlist(as.list(x[y$ID])) >>> >>> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO >>>annotations as well. ?I like to have the topTable more like the >>>topTable(fit) generated at top by data GEO GDS data >>> >>> I am wondering if there is an easy way to annotate all once. >>> >>> In addition, I am having a trouble to annotate GO term. >>> >>> Many Thanks >>> >>> Jing >>> >>> >>> >>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >

ADD REPLY • link 12.7 years ago Sean Davis 21k

0

Entering edit mode

THANK YOU Sean. It is so much easier!! Jing -----Original Message----- From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean Davis Sent: Wednesday, August 31, 2011 10:09 AM To: Jing Huang Cc: bioconductor at r-project.org Subject: Re: [BioC] topTable (fit) annotation On Wed, Aug 31, 2011 at 12:21 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: > Thank YOU Sean for responding my question. I am not sure where I should > add the platform annotation in. > > Here are what I observed: > > If I extract "GDS2162" by typing in > >> gds=getGEO("GDS2162") >> eset=GDS2eSet(gds,do.log2=T) > >>eset > > ExpressionSet (storageMode: lockedEnvironment) > assayData: 45101 features, 16 samples > ?element names: exprs > protocolData: none > phenoData > ?sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total) > ?varLabels: sample genotype/variation agent description > ?varMetadata: labelDescription > featureData > ?featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total) > ?fvarLabels: ID Gene.title ... GO.Component.1 (21 total) > > ?fvarMetadata: Column labelDescription > experimentData: use 'experimentData(object)' > ?pubMedIds: 16237459 > Annotation: > > Most of annotation is in. > > > > If I extract "GSE16962" by typing in > >>gse=getGEO("GSE16962") > >> gse > $GSE16962_series_matrix.txt.gz > ExpressionSet (storageMode: lockedEnvironment) > assayData: 54675 features, 12 samples > ?element names: exprs > protocolData: none > phenoData > ?sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) > ?varLabels: title geo_accession ... data_row_count (34 total) > ?varMetadata: labelDescription > featureData > ?featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total) > ?fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total) > ?fvarMetadata: Column Description labelDescription > experimentData: use 'experimentData(object)' > Annotation: GPL570 Hi, Jing. gse above is a list. You can use gse[[1]] to get back an ExpressionSet, in this case the ExpressionSet that was generated using GPL570 platform. There is no need to do any of the stuff you note below in the VAST majority of cases. Sean > It looks to me it includes annotation package such as GO term.... > But I can't use this eset data to do further analysis (such as fit table) > what I need. > > If I extract ("GSE16962") by typing in > >>gse=getGEO("GSE16962", GSEMatrix=F) > > Then following GEOquery package, I can generate eset2, which I can use to > do analysis what I need. But the eset2 looks like this: > >> eset2 > ExpressionSet (storageMode: lockedEnvironment) > assayData: 54675 features, 12 samples > ?element names: exprs > protocolData: none > phenoData > ?sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) > ?varLabels: samples > ?varMetadata: labelDescription > featureData: none > experimentData: use 'experimentData(object)' > Annotation: > > > > Here are the scripts that I use to generate eset2: > >>probesets <- Table(GPLList(gse)[[1]])$ID >> data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) { > + tab <- Table(x) > + mymatch <- match(probesets,tab$ID_REF) > + return(tab$VALUE[mymatch]) > + })) >> data.matrix <- apply(data.matrix, 2, function(x) { > + as.numeric(as.character(x)) > + }) >> require(Biobase) >> rownames(data.matrix) <- probesets >> colnames(data.matrix) <- names(GSMList(gse)) >> pdata <- data.frame(samples=names(GSMList(gse))) >> rownames(pdata) <- names(GSMList(gse)) >> pheno <- as(pdata,"AnnotatedDataFrame") >> eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno) > > > At which step, I should add > >>gplannot = getGEO("GPL96", AnnotGPL=TRUE) > > > Many Many thanks > > Jing > > > > > > > > > > > > > > > > On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: > >>Hi, Jing. >> >>NCBI GEO maintains two types of GPL records. ?The normal variant is >>just supplied by the submitter. ?However, when a GEO Series is curated >>by NCBI GEO into a GEO DataSet (GDS), they create a so-called >>"Annotation GPL". ?These have a relatively standard set of columns. ?I >>have not made the change to GEOquery yet to grab this annotation GPL >>when getting Series Matrix files. ?But, you can get them yourself by >>specifying: >> >>gplannot = getGEO("GPL96", AnnotGPL=TRUE) >> >>You can always replace the feature data of the ExpressionSets with the >>information in the retrieved Annotation GPL. >> >>I hope that is clear. >> >>Sean >> >> >>On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: >>> Dear All members, >>> >>> I have been extracting data from GEO (GEO package) and do some analysis >>>on them by using limma package. What I discover is the components of >>>topTable(fit) are different from the dataset GDS and GSE. >>> >>> If the data is from GDS, then the colnames of topTable (fit) looks like >>>this. >>> >>>> colnames(topTable(fit)) >>> [1] "ID" ? ? ? ? ? ? ? ? ? ?"Gene.title" ? ? ? ? ? ?"Gene.symbol" >>> ?[4] "Gene.ID" ? ? ? ? ? ? ? "UniGene.title" ? ? ? ? "UniGene.symbol" >>> ?[7] "UniGene.ID" ? ? ? ? ? ?"Nucleotide.Title" ? ? ?"GI" >>> [10] "GenBank.Accession" ? ? "Platform_CLONEID" ? ? ?"Platform_ORF" >>> [13] "Platform_SPOTID" ? ? ? "Chromosome.location" >>>"Chromosome.annotation" >>> [16] "GO.Function" ? ? ? ? ? "GO.Process" ? ? ? ? ? ?"GO.Component" >>> [19] "GO.Function.1" ? ? ? ? "GO.Process.1" ? ? ? ? ?"GO.Component.1" >>> [22] "CTRL" ? ? ? ? ? ? ? ? ?"HIF1a" ? ? ? ? ? ? ? ? "HIF2a" >>> [25] "HIF1a2a" ? ? ? ? ? ? ? "AveExpr" ? ? ? ? ? ? ? "F" >>> [28] "P.Value" ? ? ? ? ? ? ? "adj.P.Val" >>> >>> If the data is from GSE, then the ? colnames of topTable(fit) looks >>>like this: >>> >>>>colnames(topTable(fit) >>> >>> [1] "ID" ? ? ? ?"mir210" ? ?"CTRL2" ? ? "AveExpr" ? "F" >>>"P.Value" ? "adj.P.Val" >>> >>> I am trying to add some term into this table by doing following one by >>>one: the data is generated by Affymetrix human U133 platform: >>> >>>>Library(hgu133plus2.db) >>>>x=hgu133plus2SYMBOL >>>>y=topTable(fit) >>>>y$SYMBOL=unlist(as.list(x[y$ID])) >>> >>> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO >>>annotations as well. ?I like to have the topTable more like the >>>topTable(fit) generated at top by data GEO GDS data >>> >>> I am wondering if there is an easy way to annotate all once. >>> >>> In addition, I am having a trouble to annotate GO term. >>> >>> Many Thanks >>> >>> Jing >>> >>> >>> >>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >

ADD REPLY • link 12.7 years ago Jing Huang ▴ 380

Login before adding your answer.