GSVA: using Entrez ID's as identifiers

0

Entering edit mode

Wendy Qiao ▴ 360

@wendy-qiao-4501

Last seen 11.4 years ago

Hi all, I am using the GSVA package for some analysis. I found that the package only takes the gene expression matrix annotated with affymetrix probe IDs, although the gene set collection is made of Entrez IDs. I imagine there a step in the package for converting the Affymetrix probe IDs to Entrez IDs. As my data are from the Illumina platform, I am wondering if an expression matrix annotated with Entrez IDs can be used directly. Thank you, Wendy [[alternative HTML version deleted]]

probe GSVA probe GSVA • 3.7k views

ADD COMMENT • link updated 14.2 years ago by Robert Castelo ★ 3.4k • written 14.2 years ago by Wendy Qiao ▴ 360

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 4 months ago

Barcelona/Universitat Pompeu Fabra

hi Wendy, sorry for my late answer. in principle there is no problem for the gsva() function to take Entrez IDs in your expression data matrix. if the expression data comes as a matrix, and rows are annotated with Entrez IDs and the gene sets are also annotated with Entrez IDs, there should be absolutely no problem. if the expression data comes as an ExpressionSet object where the 'features' are not Affy probe IDs but just EntrezIDs. just make sure that the annotation slot has the corresponding organism-level package. for instance, in the case of human: annotation(eset) <- "org.Hs.eg.db" let me know if you have any problem with this. cheers, robert. On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > Hi all, > > I am using the GSVA package for some analysis. I found that the package > only takes the gene expression matrix annotated with affymetrix probe IDs, > although the gene set collection is made of Entrez IDs. I imagine there a > step in the package for converting the Affymetrix probe IDs to Entrez IDs. > As my data are from the Illumina platform, I am wondering if an expression > matrix annotated with Entrez IDs can be used directly. > > Thank you, > Wendy > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 14.2 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Robert, Thank you for your reply. I happened to convert all the genes to hgu95a probe IDs as I found that this is the only platform that works with ExpressionSet. It would be great that we could make the entrez ID works. Following is my error that I got with your code. Thank you. Wendy > BcellSet ExpressionSet (storageMode: lockedEnvironment) assayData: 12148 features, 7 samples element names: exprs protocolData: none phenoData sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 total) varLabels: CellType Platform Replicates varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: org.Hs.eg.db > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs Mapping identifiers between gene sets and feature names Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., verbose = verbose)) : error in evaluating the argument 'object' in selecting a method for function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, inherits = FALSE) : object 'org.Hs.egENTREZID' not found On 14 November 2011 12:27, Robert Castelo <robert.castelo@upf.edu> wrote: > hi Wendy, > > sorry for my late answer. in principle there is no problem for the > gsva() function to take Entrez IDs in your expression data matrix. > > if the expression data comes as a matrix, and rows are annotated with > Entrez IDs and the gene sets are also annotated with Entrez IDs, there > should be absolutely no problem. > > if the expression data comes as an ExpressionSet object where the > 'features' are not Affy probe IDs but just EntrezIDs. just make sure > that the annotation slot has the corresponding organism-level package. > for instance, in the case of human: > > annotation(eset) <- "org.Hs.eg.db" > > let me know if you have any problem with this. > > cheers, > robert. > > On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > Hi all, > > > > I am using the GSVA package for some analysis. I found that the package > > only takes the gene expression matrix annotated with affymetrix probe > IDs, > > although the gene set collection is made of Entrez IDs. I imagine there a > > step in the package for converting the Affymetrix probe IDs to Entrez > IDs. > > As my data are from the Illumina platform, I am wondering if an > expression > > matrix annotated with Entrez IDs can be used directly. > > > > Thank you, > > Wendy > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago Wendy Qiao ▴ 360

0

Entering edit mode

hi Wendy, i'm afraid you need to get a little bit acquainted with the way in which annotations are handled in BioC. a good starting point could be looking a the vignette "AnnotationDbi: How to use the .db annotation packages" from the AnnotationDbi package. the short answer to your problem is that hgu95a is not the only platform for which annotations exist in BioC, basically there is an annotation package for each platform supported by BioC (you can look all of them up by going to http://www.bioconductor.org/packages/release/BiocViews.html and clicking on "AnnotationData") but in order to use on such annotation packages you need 1. install it once in your system via source() and biocLite() just as with every software package 2. load it via the library() function. in order to use the human organism-level package i mentioned in my previous email you need to install it first and then load it prior to do anything else with it. let me know if this still does not solve your problem. cheers, robert. On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > Hi Robert, > > Thank you for your reply. I happened to convert all the genes to > hgu95a probe IDs as I found that this is the only platform that works > with ExpressionSet. It would be great that we could make the entrez ID > works. Following is my error that I got with your code. > > > Thank you. > Wendy > > > > BcellSet > ExpressionSet (storageMode: lockedEnvironment) > assayData: 12148 features, 7 samples > element names: exprs > protocolData: none > phenoData > sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 > total) > varLabels: CellType Platform Replicates > varMetadata: labelDescription > featureData: none > experimentData: use 'experimentData(object)' > Annotation: org.Hs.eg.db > > > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > Mapping identifiers between gene sets and feature names > Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., > verbose = verbose)) : > error in evaluating the argument 'object' in selecting a method for > function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, > inherits = FALSE) : > object 'org.Hs.egENTREZID' not found > > > > > On 14 November 2011 12:27, Robert Castelo <robert.castelo at="" upf.edu=""> > wrote: > hi Wendy, > > sorry for my late answer. in principle there is no problem for > the > gsva() function to take Entrez IDs in your expression data > matrix. > > if the expression data comes as a matrix, and rows are > annotated with > Entrez IDs and the gene sets are also annotated with Entrez > IDs, there > should be absolutely no problem. > > if the expression data comes as an ExpressionSet object where > the > 'features' are not Affy probe IDs but just EntrezIDs. just > make sure > that the annotation slot has the corresponding organism- level > package. > for instance, in the case of human: > > annotation(eset) <- "org.Hs.eg.db" > > let me know if you have any problem with this. > > cheers, > robert. > > On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > Hi all, > > > > I am using the GSVA package for some analysis. I found that > the package > > only takes the gene expression matrix annotated with > affymetrix probe IDs, > > although the gene set collection is made of Entrez IDs. I > imagine there a > > step in the package for converting the Affymetrix probe IDs > to Entrez IDs. > > As my data are from the Illumina platform, I am wondering if > an expression > > matrix annotated with Entrez IDs can be used directly. > > > > Thank you, > > Wendy > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > >

ADD REPLY • link 14.2 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Greetings, The annotation for the miRNA chip does not seem to have the same amount of information as the hgu95 db. Is there some help available for mapping miRNA probes to their target genes? thanks Thomas (Tom) Keller, PhD kellert at ohsu.edu 503.494.2442 6588 R Jones Hall (BSc/CROET) MMI DNA Services Member of OHSU Shared Resources On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > hi Wendy, > > i'm afraid you need to get a little bit acquainted with the way in which > annotations are handled in BioC. a good starting point could be looking > a the vignette "AnnotationDbi: How to use the .db annotation packages" > from the AnnotationDbi package. > > the short answer to your problem is that hgu95a is not the only platform > for which annotations exist in BioC, basically there is an annotation > package for each platform supported by BioC (you can look all of them up > by going to http://www.bioconductor.org/packages/release/BiocViews.html > and clicking on "AnnotationData") but in order to use on such annotation > packages you need > > 1. install it once in your system via source() and biocLite() just as > with every software package > > 2. load it via the library() function. > > in order to use the human organism-level package i mentioned in my > previous email you need to install it first and then load it prior to do > anything else with it. > > let me know if this still does not solve your problem. > > cheers, > robert. > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: >> Hi Robert, >> >> Thank you for your reply. I happened to convert all the genes to >> hgu95a probe IDs as I found that this is the only platform that works >> with ExpressionSet. It would be great that we could make the entrez ID >> works. Following is my error that I got with your code. >> >> >> Thank you. >> Wendy >> >> >>> BcellSet >> ExpressionSet (storageMode: lockedEnvironment) >> assayData: 12148 features, 7 samples >> element names: exprs >> protocolData: none >> phenoData >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 >> total) >> varLabels: CellType Platform Replicates >> varMetadata: labelDescription >> featureData: none >> experimentData: use 'experimentData(object)' >> Annotation: org.Hs.eg.db >>> >> preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs >> Mapping identifiers between gene sets and feature names >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., >> verbose = verbose)) : >> error in evaluating the argument 'object' in selecting a method for >> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, >> inherits = FALSE) : >> object 'org.Hs.egENTREZID' not found >> >> >> >> >> On 14 November 2011 12:27, Robert Castelo <robert.castelo at="" upf.edu=""> >> wrote: >> hi Wendy, >> >> sorry for my late answer. in principle there is no problem for >> the >> gsva() function to take Entrez IDs in your expression data >> matrix. >> >> if the expression data comes as a matrix, and rows are >> annotated with >> Entrez IDs and the gene sets are also annotated with Entrez >> IDs, there >> should be absolutely no problem. >> >> if the expression data comes as an ExpressionSet object where >> the >> 'features' are not Affy probe IDs but just EntrezIDs. just >> make sure >> that the annotation slot has the corresponding organism- level >> package. >> for instance, in the case of human: >> >> annotation(eset) <- "org.Hs.eg.db" >> >> let me know if you have any problem with this. >> >> cheers, >> robert. >> >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: >>> Hi all, >>> >>> I am using the GSVA package for some analysis. I found that >> the package >>> only takes the gene expression matrix annotated with >> affymetrix probe IDs, >>> although the gene set collection is made of Entrez IDs. I >> imagine there a >>> step in the package for converting the Affymetrix probe IDs >> to Entrez IDs. >>> As my data are from the Illumina platform, I am wondering if >> an expression >>> matrix annotated with Entrez IDs can be used directly. >>> >>> Thank you, >>> Wendy >>> >> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.2 years ago Tom Keller ▴ 70

0

Entering edit mode

hi Tom, i'm a bit unsure what are you asking in relationship with this thread, but i guess you're interested in creating a custom annotation package. For that purpose i'd recommend you to read through the vignettes of the AnnotationDbi package. i'm not an expert in creating custom annotation packages so if you encounter problems to go ahead i think you should start a new thread with the specific question or problem you want to solve. cheers, robert. On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote: > Greetings, > The annotation for the miRNA chip does not seem to have the same amount of > information as the hgu95 db. Is there some help available for mapping miRNA > probes to their target genes? > > thanks > Thomas (Tom) Keller, PhD > kellert at ohsu.edu > 503.494.2442 > 6588 R Jones Hall (BSc/CROET) > MMI DNA Services > Member of OHSU Shared Resources > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > > > hi Wendy, > > > > i'm afraid you need to get a little bit acquainted with the way in which > > annotations are handled in BioC. a good starting point could be looking > > a the vignette "AnnotationDbi: How to use the .db annotation packages" > > from the AnnotationDbi package. > > > > the short answer to your problem is that hgu95a is not the only platform > > for which annotations exist in BioC, basically there is an annotation > > package for each platform supported by BioC (you can look all of them up > > by going to http://www.bioconductor.org/packages/release/BiocViews.html > > and clicking on "AnnotationData") but in order to use on such annotation > > packages you need > > > > 1. install it once in your system via source() and biocLite() just as > > with every software package > > > > 2. load it via the library() function. > > > > in order to use the human organism-level package i mentioned in my > > previous email you need to install it first and then load it prior to do > > anything else with it. > > > > let me know if this still does not solve your problem. > > > > cheers, > > robert. > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > >> Hi Robert, > >> > >> Thank you for your reply. I happened to convert all the genes to > >> hgu95a probe IDs as I found that this is the only platform that works > >> with ExpressionSet. It would be great that we could make the entrez ID > >> works. Following is my error that I got with your code. > >> > >> > >> Thank you. > >> Wendy > >> > >> > >>> BcellSet > >> ExpressionSet (storageMode: lockedEnvironment) > >> assayData: 12148 features, 7 samples > >> element names: exprs > >> protocolData: none > >> phenoData > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 > >> total) > >> varLabels: CellType Platform Replicates > >> varMetadata: labelDescription > >> featureData: none > >> experimentData: use 'experimentData(object)' > >> Annotation: org.Hs.eg.db > >>> > >> preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > >> Mapping identifiers between gene sets and feature names > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., > >> verbose = verbose)) : > >> error in evaluating the argument 'object' in selecting a method for > >> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, > >> inherits = FALSE) : > >> object 'org.Hs.egENTREZID' not found > >> > >> > >> > >> > >> On 14 November 2011 12:27, Robert Castelo <robert.castelo at="" upf.edu=""> > >> wrote: > >> hi Wendy, > >> > >> sorry for my late answer. in principle there is no problem for > >> the > >> gsva() function to take Entrez IDs in your expression data > >> matrix. > >> > >> if the expression data comes as a matrix, and rows are > >> annotated with > >> Entrez IDs and the gene sets are also annotated with Entrez > >> IDs, there > >> should be absolutely no problem. > >> > >> if the expression data comes as an ExpressionSet object where > >> the > >> 'features' are not Affy probe IDs but just EntrezIDs. just > >> make sure > >> that the annotation slot has the corresponding organism- level > >> package. > >> for instance, in the case of human: > >> > >> annotation(eset) <- "org.Hs.eg.db" > >> > >> let me know if you have any problem with this. > >> > >> cheers, > >> robert. > >> > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > >>> Hi all, > >>> > >>> I am using the GSVA package for some analysis. I found that > >> the package > >>> only takes the gene expression matrix annotated with > >> affymetrix probe IDs, > >>> although the gene set collection is made of Entrez IDs. I > >> imagine there a > >>> step in the package for converting the Affymetrix probe IDs > >> to Entrez IDs. > >>> As my data are from the Illumina platform, I am wondering if > >> an expression > >>> matrix annotated with Entrez IDs can be used directly. > >>> > >>> Thank you, > >>> Wendy > >>> > >> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > >> > >> > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 14.2 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Tom, At least one reason why the miRNA packages will have less information is that there is simply less annotation in existance for miRNA work. People have been studying genes for many decades and miRNAs became popular much more recently. That said, there are a few annotation packages for miRNA work that you can find on our website here: http://www.bioconductor.org/packages/release/data/annotation/ And we are always interested in adding more resources if you have the inclination. For miRNAs the instructions for making a new package that you will find in AnnotationDbi, will probably be of somewhat limited utility. But we have been working to make adding new annotation resources easier for people, so please contact me if you are interested in doing that and I might be able to help you create something new. OTOH, if you just want to get the miRNAs mapped to target genes, you might want to look at Jim Reid's targetscan packages (found at the same page listed above). Marc On 11/15/2011 02:24 PM, Tom Keller wrote: > Greetings, > The annotation for the miRNA chip does not seem to have the same amount of information as the hgu95 db. Is there some help available for mapping miRNA probes to their target genes? > > thanks > Thomas (Tom) Keller, PhD > kellert at ohsu.edu > 503.494.2442 > 6588 R Jones Hall (BSc/CROET) > MMI DNA Services > Member of OHSU Shared Resources > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > >> hi Wendy, >> >> i'm afraid you need to get a little bit acquainted with the way in which >> annotations are handled in BioC. a good starting point could be looking >> a the vignette "AnnotationDbi: How to use the .db annotation packages" >> from the AnnotationDbi package. >> >> the short answer to your problem is that hgu95a is not the only platform >> for which annotations exist in BioC, basically there is an annotation >> package for each platform supported by BioC (you can look all of them up >> by going to http://www.bioconductor.org/packages/release/BiocViews.html >> and clicking on "AnnotationData") but in order to use on such annotation >> packages you need >> >> 1. install it once in your system via source() and biocLite() just as >> with every software package >> >> 2. load it via the library() function. >> >> in order to use the human organism-level package i mentioned in my >> previous email you need to install it first and then load it prior to do >> anything else with it. >> >> let me know if this still does not solve your problem. >> >> cheers, >> robert. >> >> On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: >>> Hi Robert, >>> >>> Thank you for your reply. I happened to convert all the genes to >>> hgu95a probe IDs as I found that this is the only platform that works >>> with ExpressionSet. It would be great that we could make the entrez ID >>> works. Following is my error that I got with your code. >>> >>> >>> Thank you. >>> Wendy >>> >>> >>>> BcellSet >>> ExpressionSet (storageMode: lockedEnvironment) >>> assayData: 12148 features, 7 samples >>> element names: exprs >>> protocolData: none >>> phenoData >>> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 >>> total) >>> varLabels: CellType Platform Replicates >>> varMetadata: labelDescription >>> featureData: none >>> experimentData: use 'experimentData(object)' >>> Annotation: org.Hs.eg.db >>> preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs >>> Mapping identifiers between gene sets and feature names >>> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., >>> verbose = verbose)) : >>> error in evaluating the argument 'object' in selecting a method for >>> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, >>> inherits = FALSE) : >>> object 'org.Hs.egENTREZID' not found >>> >>> >>> >>> >>> On 14 November 2011 12:27, Robert Castelo<robert.castelo at="" upf.edu=""> >>> wrote: >>> hi Wendy, >>> >>> sorry for my late answer. in principle there is no problem for >>> the >>> gsva() function to take Entrez IDs in your expression data >>> matrix. >>> >>> if the expression data comes as a matrix, and rows are >>> annotated with >>> Entrez IDs and the gene sets are also annotated with Entrez >>> IDs, there >>> should be absolutely no problem. >>> >>> if the expression data comes as an ExpressionSet object where >>> the >>> 'features' are not Affy probe IDs but just EntrezIDs. just >>> make sure >>> that the annotation slot has the corresponding organism- level >>> package. >>> for instance, in the case of human: >>> >>> annotation(eset)<- "org.Hs.eg.db" >>> >>> let me know if you have any problem with this. >>> >>> cheers, >>> robert. >>> >>> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: >>>> Hi all, >>>> >>>> I am using the GSVA package for some analysis. I found that >>> the package >>>> only takes the gene expression matrix annotated with >>> affymetrix probe IDs, >>>> although the gene set collection is made of Entrez IDs. I >>> imagine there a >>>> step in the package for converting the Affymetrix probe IDs >>> to Entrez IDs. >>>> As my data are from the Illumina platform, I am wondering if >>> an expression >>>> matrix annotated with Entrez IDs can be used directly. >>>> >>>> Thank you, >>>> Wendy >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.2 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 4 months ago

Barcelona/Universitat Pompeu Fabra

hi Som, i'm cc'ing the BioC mailing list, please remember to do it when you answer since this works as a knowledge base for everyone else. i'd need two bits of information from you to find out what might be happening: one, right after the error pops up, please write in the R shell: traceback() and paste here the output of this function. two, please paste also here the ouput of sessionInfo() both steps are in fact recommended by the BioC mailinglist posting guide: http://www.bioconductor.org/help/mailing-list/posting-guide robert. On 11/16/11 9:04 PM, somnath bandyopadhyay wrote: > Hi Robert, > > I am trying to use GSVA on a microarray dataset and I am trying to use > one of the Broad gene set collections for the enrichment purposes. > > > library(GSEABase) > library(Biobase) > library(genefilter) > library(limma) > library(RColorBrewer) > library(graph) > library(GSVA) > > c3gsc2 <- > getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection(c ategory="c3"),geneIdType=EntrezIdentifier()) > class(c3gsc2) > c3gsc2 > > data <- read.table("gsva_infliximab_data.txt", header=T, row.names=1, > sep="\t")# the data matrix is filtered for low expressors etc. and I am > using Entrez Gene ID as row identifiers. > class(data) > data.m <- as.matrix(data) > > new <- gsva(data.m, > c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootstra p.percent > = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE) > > > I keep getting the following error at this step > Error in match(x, y) : 'match' requires vector arguments > > Could you pleaase tell me what I am doing wrong? > > Thanks so much, > Som. > > > > > > > > > > From: robert.castelo at upf.edu > > To: kellert at ohsu.edu > > Date: Wed, 16 Nov 2011 08:43:48 +0100 > > CC: wendy2.qiao at gmail.com; bioconductor at r-project.org > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > hi Tom, > > > > i'm a bit unsure what are you asking in relationship with this thread, > > but i guess you're interested in creating a custom annotation package. > > For that purpose i'd recommend you to read through the vignettes of the > > AnnotationDbi package. i'm not an expert in creating custom annotation > > packages so if you encounter problems to go ahead i think you should > > start a new thread with the specific question or problem you want to > > solve. > > > > cheers, > > robert. > > > > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote: > > > Greetings, > > > The annotation for the miRNA chip does not seem to have the same > amount of > > > information as the hgu95 db. Is there some help available for > mapping miRNA > > > probes to their target genes? > > > > > > thanks > > > Thomas (Tom) Keller, PhD > > > kellert at ohsu.edu > > > 503.494.2442 > > > 6588 R Jones Hall (BSc/CROET) > > > MMI DNA Services > > > Member of OHSU Shared Resources > > > > > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > > > > > > > hi Wendy, > > > > > > > > i'm afraid you need to get a little bit acquainted with the way > in which > > > > annotations are handled in BioC. a good starting point could be > looking > > > > a the vignette "AnnotationDbi: How to use the .db annotation > packages" > > > > from the AnnotationDbi package. > > > > > > > > the short answer to your problem is that hgu95a is not the only > platform > > > > for which annotations exist in BioC, basically there is an annotation > > > > package for each platform supported by BioC (you can look all of > them up > > > > by going to > http://www.bioconductor.org/packages/release/BiocViews.html > > > > and clicking on "AnnotationData") but in order to use on such > annotation > > > > packages you need > > > > > > > > 1. install it once in your system via source() and biocLite() just as > > > > with every software package > > > > > > > > 2. load it via the library() function. > > > > > > > > in order to use the human organism-level package i mentioned in my > > > > previous email you need to install it first and then load it > prior to do > > > > anything else with it. > > > > > > > > let me know if this still does not solve your problem. > > > > > > > > cheers, > > > > robert. > > > > > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > > > >> Hi Robert, > > > >> > > > >> Thank you for your reply. I happened to convert all the genes to > > > >> hgu95a probe IDs as I found that this is the only platform that > works > > > >> with ExpressionSet. It would be great that we could make the > entrez ID > > > >> works. Following is my error that I got with your code. > > > >> > > > >> > > > >> Thank you. > > > >> Wendy > > > >> > > > >> > > > >>> BcellSet > > > >> ExpressionSet (storageMode: lockedEnvironment) > > > >> assayData: 12148 features, 7 samples > > > >> element names: exprs > > > >> protocolData: none > > > >> phenoData > > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 > > > >> total) > > > >> varLabels: CellType Platform Replicates > > > >> varMetadata: labelDescription > > > >> featureData: none > > > >> experimentData: use 'experimentData(object)' > > > >> Annotation: org.Hs.eg.db > > > >>> > > > >> > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > > > >> Mapping identifiers between gene sets and feature names > > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., > > > >> verbose = verbose)) : > > > >> error in evaluating the argument 'object' in selecting a method for > > > >> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, > > > >> inherits = FALSE) : > > > >> object 'org.Hs.egENTREZID' not found > > > >> > > > >> > > > >> > > > >> > > > >> On 14 November 2011 12:27, Robert Castelo <robert.castelo at="" upf.edu=""> > > > >> wrote: > > > >> hi Wendy, > > > >> > > > >> sorry for my late answer. in principle there is no problem for > > > >> the > > > >> gsva() function to take Entrez IDs in your expression data > > > >> matrix. > > > >> > > > >> if the expression data comes as a matrix, and rows are > > > >> annotated with > > > >> Entrez IDs and the gene sets are also annotated with Entrez > > > >> IDs, there > > > >> should be absolutely no problem. > > > >> > > > >> if the expression data comes as an ExpressionSet object where > > > >> the > > > >> 'features' are not Affy probe IDs but just EntrezIDs. just > > > >> make sure > > > >> that the annotation slot has the corresponding organism- level > > > >> package. > > > >> for instance, in the case of human: > > > >> > > > >> annotation(eset) <- "org.Hs.eg.db" > > > >> > > > >> let me know if you have any problem with this. > > > >> > > > >> cheers, > > > >> robert. > > > >> > > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > > >>> Hi all, > > > >>> > > > >>> I am using the GSVA package for some analysis. I found that > > > >> the package > > > >>> only takes the gene expression matrix annotated with > > > >> affymetrix probe IDs, > > > >>> although the gene set collection is made of Entrez IDs. I > > > >> imagine there a > > > >>> step in the package for converting the Affymetrix probe IDs > > > >> to Entrez IDs. > > > >>> As my data are from the Illumina platform, I am wondering if > > > >> an expression > > > >>> matrix annotated with Entrez IDs can be used directly. > > > >>> > > > >>> Thank you, > > > >>> Wendy > > > >>> > > > >> > > > >>> [[alternative HTML version deleted]] > > > >>> > > > >>> _______________________________________________ > > > >>> Bioconductor mailing list > > > >>> Bioconductor at r-project.org > > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >>> Search the archives: > > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >>> > > > >> > > > >> > > > >> > > > >> > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.2 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Robert, Following is the information you asked for: > traceback() 7: match(x, y) 6: na.omit(match(x, y)) 5: FUN(X[[1L]], ...) 4: lapply(gset.idx.list, function(x, y) na.omit(match(x, y)), rownames(expr)) 3: .local(expr, gset.idx.list, ...) 2: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) 1: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > sessionInfo () R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GSVA_1.0.1 RColorBrewer_1.0-5 limma_3.8.3 genefilter_1.34.0 GSEABase_1.14.0 graph_1.30.0 annotate_1.30.1 [8] AnnotationDbi_1.14.1 Biobase_2.12.2 loaded via a namespace (and not attached): [1] DBI_0.2-5 RSQLite_0.10.0 splines_2.13.1 survival_2.36-10 tools_2.13.1 XML_3.4-2.2 xtable_1.6-0 Looking forward to your reply. Best Regards, Som. > Date: Wed, 16 Nov 2011 22:04:06 +0100 > From: robert.castelo@upf.edu > To: genome1976@hotmail.com > CC: bioconductor@r-project.org > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > hi Som, > > i'm cc'ing the BioC mailing list, please remember to do it when you > answer since this works as a knowledge base for everyone else. > > i'd need two bits of information from you to find out what might be > happening: one, right after the error pops up, please write in the R shell: > > traceback() > > and paste here the output of this function. > > two, please paste also here the ouput of > > sessionInfo() > > > both steps are in fact recommended by the BioC mailinglist posting guide: > > http://www.bioconductor.org/help/mailing-list/posting-guide > > robert. > > On 11/16/11 9:04 PM, somnath bandyopadhyay wrote: > > Hi Robert, > > > > I am trying to use GSVA on a microarray dataset and I am trying to use > > one of the Broad gene set collections for the enrichment purposes. > > > > > > library(GSEABase) > > library(Biobase) > > library(genefilter) > > library(limma) > > library(RColorBrewer) > > library(graph) > > library(GSVA) > > > > c3gsc2 <- > > getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection (category="c3"),geneIdType=EntrezIdentifier()) > > class(c3gsc2) > > c3gsc2 > > > > data <- read.table("gsva_infliximab_data.txt", header=T, row.names=1, > > sep="\t")# the data matrix is filtered for low expressors etc. and I am > > using Entrez Gene ID as row identifiers. > > class(data) > > data.m <- as.matrix(data) > > > > new <- gsva(data.m, > > c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootst rap.percent > > = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE) > > > > > > I keep getting the following error at this step > > Error in match(x, y) : 'match' requires vector arguments > > > > Could you pleaase tell me what I am doing wrong? > > > > Thanks so much, > > Som. > > > > > > > > > > > > > > > > > > > From: robert.castelo@upf.edu > > > To: kellert@ohsu.edu > > > Date: Wed, 16 Nov 2011 08:43:48 +0100 > > > CC: wendy2.qiao@gmail.com; bioconductor@r-project.org > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > > > hi Tom, > > > > > > i'm a bit unsure what are you asking in relationship with this thread, > > > but i guess you're interested in creating a custom annotation package. > > > For that purpose i'd recommend you to read through the vignettes of the > > > AnnotationDbi package. i'm not an expert in creating custom annotation > > > packages so if you encounter problems to go ahead i think you should > > > start a new thread with the specific question or problem you want to > > > solve. > > > > > > cheers, > > > robert. > > > > > > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote: > > > > Greetings, > > > > The annotation for the miRNA chip does not seem to have the same > > amount of > > > > information as the hgu95 db. Is there some help available for > > mapping miRNA > > > > probes to their target genes? > > > > > > > > thanks > > > > Thomas (Tom) Keller, PhD > > > > kellert at ohsu.edu > > > > 503.494.2442 > > > > 6588 R Jones Hall (BSc/CROET) > > > > MMI DNA Services > > > > Member of OHSU Shared Resources > > > > > > > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > > > > > > > > > hi Wendy, > > > > > > > > > > i'm afraid you need to get a little bit acquainted with the way > > in which > > > > > annotations are handled in BioC. a good starting point could be > > looking > > > > > a the vignette "AnnotationDbi: How to use the .db annotation > > packages" > > > > > from the AnnotationDbi package. > > > > > > > > > > the short answer to your problem is that hgu95a is not the only > > platform > > > > > for which annotations exist in BioC, basically there is an annotation > > > > > package for each platform supported by BioC (you can look all of > > them up > > > > > by going to > > http://www.bioconductor.org/packages/release/BiocViews.html > > > > > and clicking on "AnnotationData") but in order to use on such > > annotation > > > > > packages you need > > > > > > > > > > 1. install it once in your system via source() and biocLite() just as > > > > > with every software package > > > > > > > > > > 2. load it via the library() function. > > > > > > > > > > in order to use the human organism-level package i mentioned in my > > > > > previous email you need to install it first and then load it > > prior to do > > > > > anything else with it. > > > > > > > > > > let me know if this still does not solve your problem. > > > > > > > > > > cheers, > > > > > robert. > > > > > > > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > > > > >> Hi Robert, > > > > >> > > > > >> Thank you for your reply. I happened to convert all the genes to > > > > >> hgu95a probe IDs as I found that this is the only platform that > > works > > > > >> with ExpressionSet. It would be great that we could make the > > entrez ID > > > > >> works. Following is my error that I got with your code. > > > > >> > > > > >> > > > > >> Thank you. > > > > >> Wendy > > > > >> > > > > >> > > > > >>> BcellSet > > > > >> ExpressionSet (storageMode: lockedEnvironment) > > > > >> assayData: 12148 features, 7 samples > > > > >> element names: exprs > > > > >> protocolData: none > > > > >> phenoData > > > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7 > > > > >> total) > > > > >> varLabels: CellType Platform Replicates > > > > >> varMetadata: labelDescription > > > > >> featureData: none > > > > >> experimentData: use 'experimentData(object)' > > > > >> Annotation: org.Hs.eg.db > > > > >>> > > > > >> > > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > > > > >> Mapping identifiers between gene sets and feature names > > > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., > > > > >> verbose = verbose)) : > > > > >> error in evaluating the argument 'object' in selecting a method for > > > > >> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv, > > > > >> inherits = FALSE) : > > > > >> object 'org.Hs.egENTREZID' not found > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> On 14 November 2011 12:27, Robert Castelo <robert.castelo@upf.edu> > > > > >> wrote: > > > > >> hi Wendy, > > > > >> > > > > >> sorry for my late answer. in principle there is no problem for > > > > >> the > > > > >> gsva() function to take Entrez IDs in your expression data > > > > >> matrix. > > > > >> > > > > >> if the expression data comes as a matrix, and rows are > > > > >> annotated with > > > > >> Entrez IDs and the gene sets are also annotated with Entrez > > > > >> IDs, there > > > > >> should be absolutely no problem. > > > > >> > > > > >> if the expression data comes as an ExpressionSet object where > > > > >> the > > > > >> 'features' are not Affy probe IDs but just EntrezIDs. just > > > > >> make sure > > > > >> that the annotation slot has the corresponding organism- level > > > > >> package. > > > > >> for instance, in the case of human: > > > > >> > > > > >> annotation(eset) <- "org.Hs.eg.db" > > > > >> > > > > >> let me know if you have any problem with this. > > > > >> > > > > >> cheers, > > > > >> robert. > > > > >> > > > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > > > >>> Hi all, > > > > >>> > > > > >>> I am using the GSVA package for some analysis. I found that > > > > >> the package > > > > >>> only takes the gene expression matrix annotated with > > > > >> affymetrix probe IDs, > > > > >>> although the gene set collection is made of Entrez IDs. I > > > > >> imagine there a > > > > >>> step in the package for converting the Affymetrix probe IDs > > > > >> to Entrez IDs. > > > > >>> As my data are from the Illumina platform, I am wondering if > > > > >> an expression > > > > >>> matrix annotated with Entrez IDs can be used directly. > > > > >>> > > > > >>> Thank you, > > > > >>> Wendy > > > > >>> > > > > >> > > > > >>> [[alternative HTML version deleted]] > > > > >>> > > > > >>> _______________________________________________ > > > > >>> Bioconductor mailing list > > > > >>> Bioconductor@r-project.org > > > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > >>> Search the archives: > > > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > >>> > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor@r-project.org > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago fire1976 wyoming ▴ 380

0

Entering edit mode

hi Som, thanks for the information. even though you are not running the latest version of R with the latest version of the package, the problem does not seem to be related to that fact and i'm able reproduce it with the latest version of both things. in any case, it is always good to try to work with the last version of all the software. the short answer to your question is: try to call the gsva() function this way (i'm using below the command line you put on your first email): c3gsc2 <- geneIds(c3gsc2) new <- gsva(data.m, c3gsc2, abs.ranking=TRUE, min.sz=1, max.sz=Inf, no.bootstraps=0, bootstrap.percent = .632, parallel.sz=0, parallel.type="SOCK", verbose=TRUE, mx.diff=TRUE) the long answer is that, in principle, there is no method on gsva() to accept a call with the expression data as a matrix and the gene set as a GeneSetCollection object. When the expression data comes as a matrix the gene sets should come as a list, which is what we achieve in the line i've added before the call to gsva() by transforming the GeneSetCollection object into a list object. Under such a circumstance you should have got a more informative error saying something like there is no such gsva() method that accepts "matrix" and "GeneSetCollection", which is something that we should definitely add at some point to the package. Instead the method that accepts "matrix" and "list" is triggered but it breaks at the first line of code that tries to operate on the list of gene sets because that is not a gene list. This strange behavior comes from the (strange for me) fact that a GeneSetCollection object qualifies also as a list, for instance: library(GSVAdata) data(c2BroadSets) class(c2BroadSets) [1] "GeneSetCollection" attr(,"package") [1] "GSEABase" is.list(c2BroadSets) [1] TRUE anyway, let me know if the fix i propose you above does not work. thanks, robert. On Thu, 2011-11-17 at 09:38 -0500, somnath bandyopadhyay wrote: > Hi Robert, > > Following is the information you asked for: > > > traceback() > 7: match(x, y) > 6: na.omit(match(x, y)) > 5: FUN(X[[1L]], ...) > 4: lapply(gset.idx.list, function(x, y) na.omit(match(x, y)), > rownames(expr)) > 3: .local(expr, gset.idx.list, ...) > 2: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, > no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, > parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > 1: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, > no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, > parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > > sessionInfo () > R version 2.13.1 (2011-07-08) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > other attached packages: > [1] GSVA_1.0.1 RColorBrewer_1.0-5 limma_3.8.3 > genefilter_1.34.0 GSEABase_1.14.0 graph_1.30.0 > annotate_1.30.1 > [8] AnnotationDbi_1.14.1 Biobase_2.12.2 > loaded via a namespace (and not attached): > [1] DBI_0.2-5 RSQLite_0.10.0 splines_2.13.1 > survival_2.36-10 tools_2.13.1 XML_3.4-2.2 xtable_1.6-0 > > > Looking forward to your reply. > > Best Regards, > Som. > > > > Date: Wed, 16 Nov 2011 22:04:06 +0100 > > From: robert.castelo at upf.edu > > To: genome1976 at hotmail.com > > CC: bioconductor at r-project.org > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > hi Som, > > > > i'm cc'ing the BioC mailing list, please remember to do it when you > > answer since this works as a knowledge base for everyone else. > > > > i'd need two bits of information from you to find out what might be > > happening: one, right after the error pops up, please write in the R > shell: > > > > traceback() > > > > and paste here the output of this function. > > > > two, please paste also here the ouput of > > > > sessionInfo() > > > > > > both steps are in fact recommended by the BioC mailinglist posting > guide: > > > > http://www.bioconductor.org/help/mailing-list/posting-guide > > > > robert. > > > > On 11/16/11 9:04 PM, somnath bandyopadhyay wrote: > > > Hi Robert, > > > > > > I am trying to use GSVA on a microarray dataset and I am trying to > use > > > one of the Broad gene set collections for the enrichment purposes. > > > > > > > > > library(GSEABase) > > > library(Biobase) > > > library(genefilter) > > > library(limma) > > > library(RColorBrewer) > > > library(graph) > > > library(GSVA) > > > > > > c3gsc2 <- > > > > getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection(c ategory="c3"),geneIdType=EntrezIdentifier()) > > > class(c3gsc2) > > > c3gsc2 > > > > > > data <- read.table("gsva_infliximab_data.txt", header=T, > row.names=1, > > > sep="\t")# the data matrix is filtered for low expressors etc. and > I am > > > using Entrez Gene ID as row identifiers. > > > class(data) > > > data.m <- as.matrix(data) > > > > > > new <- gsva(data.m, > > > > c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootstra p.percent > > > > = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE) > > > > > > > > > I keep getting the following error at this step > > > Error in match(x, y) : 'match' requires vector arguments > > > > > > Could you pleaase tell me what I am doing wrong? > > > > > > Thanks so much, > > > Som. > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: robert.castelo at upf.edu > > > > To: kellert at ohsu.edu > > > > Date: Wed, 16 Nov 2011 08:43:48 +0100 > > > > CC: wendy2.qiao at gmail.com; bioconductor at r-project.org > > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > > > > > hi Tom, > > > > > > > > i'm a bit unsure what are you asking in relationship with this > thread, > > > > but i guess you're interested in creating a custom annotation > package. > > > > For that purpose i'd recommend you to read through the vignettes > of the > > > > AnnotationDbi package. i'm not an expert in creating custom > annotation > > > > packages so if you encounter problems to go ahead i think you > should > > > > start a new thread with the specific question or problem you > want to > > > > solve. > > > > > > > > cheers, > > > > robert. > > > > > > > > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote: > > > > > Greetings, > > > > > The annotation for the miRNA chip does not seem to have the > same > > > amount of > > > > > information as the hgu95 db. Is there some help available for > > > mapping miRNA > > > > > probes to their target genes? > > > > > > > > > > thanks > > > > > Thomas (Tom) Keller, PhD > > > > > kellert at ohsu.edu > > > > > 503.494.2442 > > > > > 6588 R Jones Hall (BSc/CROET) > > > > > MMI DNA Services > > > > > Member of OHSU Shared Resources > > > > > > > > > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > > > > > > > > > > > hi Wendy, > > > > > > > > > > > > i'm afraid you need to get a little bit acquainted with the > way > > > in which > > > > > > annotations are handled in BioC. a good starting point could > be > > > looking > > > > > > a the vignette "AnnotationDbi: How to use the .db annotation > > > packages" > > > > > > from the AnnotationDbi package. > > > > > > > > > > > > the short answer to your problem is that hgu95a is not the > only > > > platform > > > > > > for which annotations exist in BioC, basically there is an > annotation > > > > > > package for each platform supported by BioC (you can look > all of > > > them up > > > > > > by going to > > > http://www.bioconductor.org/packages/release/BiocViews.html > > > > > > and clicking on "AnnotationData") but in order to use on > such > > > annotation > > > > > > packages you need > > > > > > > > > > > > 1. install it once in your system via source() and > biocLite() just as > > > > > > with every software package > > > > > > > > > > > > 2. load it via the library() function. > > > > > > > > > > > > in order to use the human organism-level package i mentioned > in my > > > > > > previous email you need to install it first and then load it > > > prior to do > > > > > > anything else with it. > > > > > > > > > > > > let me know if this still does not solve your problem. > > > > > > > > > > > > cheers, > > > > > > robert. > > > > > > > > > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > > > > > >> Hi Robert, > > > > > >> > > > > > >> Thank you for your reply. I happened to convert all the > genes to > > > > > >> hgu95a probe IDs as I found that this is the only platform > that > > > works > > > > > >> with ExpressionSet. It would be great that we could make > the > > > entrez ID > > > > > >> works. Following is my error that I got with your code. > > > > > >> > > > > > >> > > > > > >> Thank you. > > > > > >> Wendy > > > > > >> > > > > > >> > > > > > >>> BcellSet > > > > > >> ExpressionSet (storageMode: lockedEnvironment) > > > > > >> assayData: 12148 features, 7 samples > > > > > >> element names: exprs > > > > > >> protocolData: none > > > > > >> phenoData > > > > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... > Affy_PREBCEL_4 (7 > > > > > >> total) > > > > > >> varLabels: CellType Platform Replicates > > > > > >> varMetadata: labelDescription > > > > > >> featureData: none > > > > > >> experimentData: use 'experimentData(object)' > > > > > >> Annotation: org.Hs.eg.db > > > > > >>> > > > > > >> > > > > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > > > > > >> Mapping identifiers between gene sets and feature names > > > > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, > to, ..., > > > > > >> verbose = verbose)) : > > > > > >> error in evaluating the argument 'object' in selecting a > method for > > > > > >> function 'GeneSetCollection': Error in get(mapName, envir = > pkgEnv, > > > > > >> inherits = FALSE) : > > > > > >> object 'org.Hs.egENTREZID' not found > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> On 14 November 2011 12:27, Robert Castelo > <robert.castelo at="" upf.edu=""> > > > > > >> wrote: > > > > > >> hi Wendy, > > > > > >> > > > > > >> sorry for my late answer. in principle there is no problem > for > > > > > >> the > > > > > >> gsva() function to take Entrez IDs in your expression data > > > > > >> matrix. > > > > > >> > > > > > >> if the expression data comes as a matrix, and rows are > > > > > >> annotated with > > > > > >> Entrez IDs and the gene sets are also annotated with Entrez > > > > > >> IDs, there > > > > > >> should be absolutely no problem. > > > > > >> > > > > > >> if the expression data comes as an ExpressionSet object > where > > > > > >> the > > > > > >> 'features' are not Affy probe IDs but just EntrezIDs. just > > > > > >> make sure > > > > > >> that the annotation slot has the corresponding > organism-level > > > > > >> package. > > > > > >> for instance, in the case of human: > > > > > >> > > > > > >> annotation(eset) <- "org.Hs.eg.db" > > > > > >> > > > > > >> let me know if you have any problem with this. > > > > > >> > > > > > >> cheers, > > > > > >> robert. > > > > > >> > > > > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > > > > >>> Hi all, > > > > > >>> > > > > > >>> I am using the GSVA package for some analysis. I found > that > > > > > >> the package > > > > > >>> only takes the gene expression matrix annotated with > > > > > >> affymetrix probe IDs, > > > > > >>> although the gene set collection is made of Entrez IDs. I > > > > > >> imagine there a > > > > > >>> step in the package for converting the Affymetrix probe > IDs > > > > > >> to Entrez IDs. > > > > > >>> As my data are from the Illumina platform, I am wondering > if > > > > > >> an expression > > > > > >>> matrix annotated with Entrez IDs can be used directly. > > > > > >>> > > > > > >>> Thank you, > > > > > >>> Wendy > > > > > >>> > > > > > >> > > > > > >>> [[alternative HTML version deleted]] > > > > > >>> > > > > > >>> _______________________________________________ > > > > > >>> Bioconductor mailing list > > > > > >>> Bioconductor at r-project.org > > > > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > >>> Search the archives: > > > > > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > >>> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > _______________________________________________ > > > > > > Bioconductor mailing list > > > > > > Bioconductor at r-project.org > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >

ADD REPLY • link 14.2 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Robert, Thanks so much for the explanation and the solution. The enrichment analysis ran perfectly well. I was wondering if there is a way to generate FDR-corrected p-values for the KS statistics; that way for each sample we could assess the probability of its association with a gene set. Thanks once again. Best Regards, Som. > Subject: RE: [BioC] GSVA: using Entrez ID's as identifiers > From: robert.castelo@upf.edu > To: genome1976@hotmail.com > CC: bioconductor@r-project.org > Date: Thu, 17 Nov 2011 18:31:11 +0100 > > hi Som, thanks for the information. > > even though you are not running the latest version of R with the latest > version of the package, the problem does not seem to be related to that > fact and i'm able reproduce it with the latest version of both things. > in any case, it is always good to try to work with the last version of > all the software. > > the short answer to your question is: try to call the gsva() function > this way (i'm using below the command line you put on your first email): > > c3gsc2 <- geneIds(c3gsc2) > > new <- gsva(data.m, c3gsc2, abs.ranking=TRUE, min.sz=1, max.sz=Inf, > no.bootstraps=0, bootstrap.percent = .632, parallel.sz=0, > parallel.type="SOCK", verbose=TRUE, mx.diff=TRUE) > > the long answer is that, in principle, there is no method on gsva() to > accept a call with the expression data as a matrix and the gene set as a > GeneSetCollection object. When the expression data comes as a matrix the > gene sets should come as a list, which is what we achieve in the line > i've added before the call to gsva() by transforming the > GeneSetCollection object into a list object. Under such a circumstance > you should have got a more informative error saying something like there > is no such gsva() method that accepts "matrix" and "GeneSetCollection", > which is something that we should definitely add at some point to the > package. Instead the method that accepts "matrix" and "list" is > triggered but it breaks at the first line of code that tries to operate > on the list of gene sets because that is not a gene list. This strange > behavior comes from the (strange for me) fact that a GeneSetCollection > object qualifies also as a list, for instance: > > library(GSVAdata) > data(c2BroadSets) > class(c2BroadSets) > [1] "GeneSetCollection" > attr(,"package") > [1] "GSEABase" > is.list(c2BroadSets) > [1] TRUE > > anyway, let me know if the fix i propose you above does not work. > > thanks, > robert. > > On Thu, 2011-11-17 at 09:38 -0500, somnath bandyopadhyay wrote: > > Hi Robert, > > > > Following is the information you asked for: > > > > > traceback() > > 7: match(x, y) > > 6: na.omit(match(x, y)) > > 5: FUN(X[[1L]], ...) > > 4: lapply(gset.idx.list, function(x, y) na.omit(match(x, y)), > > rownames(expr)) > > 3: .local(expr, gset.idx.list, ...) > > 2: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, > > no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, > > parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > > 1: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, > > no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, > > parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > > > sessionInfo () > > R version 2.13.1 (2011-07-08) > > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 LC_MONETARY=English_United States.1252 > > LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > > [1] stats graphics grDevices utils datasets methods > > base > > other attached packages: > > [1] GSVA_1.0.1 RColorBrewer_1.0-5 limma_3.8.3 > > genefilter_1.34.0 GSEABase_1.14.0 graph_1.30.0 > > annotate_1.30.1 > > [8] AnnotationDbi_1.14.1 Biobase_2.12.2 > > loaded via a namespace (and not attached): > > [1] DBI_0.2-5 RSQLite_0.10.0 splines_2.13.1 > > survival_2.36-10 tools_2.13.1 XML_3.4-2.2 xtable_1.6-0 > > > > > > Looking forward to your reply. > > > > Best Regards, > > Som. > > > > > > > Date: Wed, 16 Nov 2011 22:04:06 +0100 > > > From: robert.castelo@upf.edu > > > To: genome1976@hotmail.com > > > CC: bioconductor@r-project.org > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > > > hi Som, > > > > > > i'm cc'ing the BioC mailing list, please remember to do it when you > > > answer since this works as a knowledge base for everyone else. > > > > > > i'd need two bits of information from you to find out what might be > > > happening: one, right after the error pops up, please write in the R > > shell: > > > > > > traceback() > > > > > > and paste here the output of this function. > > > > > > two, please paste also here the ouput of > > > > > > sessionInfo() > > > > > > > > > both steps are in fact recommended by the BioC mailinglist posting > > guide: > > > > > > http://www.bioconductor.org/help/mailing-list/posting-guide > > > > > > robert. > > > > > > On 11/16/11 9:04 PM, somnath bandyopadhyay wrote: > > > > Hi Robert, > > > > > > > > I am trying to use GSVA on a microarray dataset and I am trying to > > use > > > > one of the Broad gene set collections for the enrichment purposes. > > > > > > > > > > > > library(GSEABase) > > > > library(Biobase) > > > > library(genefilter) > > > > library(limma) > > > > library(RColorBrewer) > > > > library(graph) > > > > library(GSVA) > > > > > > > > c3gsc2 <- > > > > > > getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection (category="c3"),geneIdType=EntrezIdentifier()) > > > > class(c3gsc2) > > > > c3gsc2 > > > > > > > > data <- read.table("gsva_infliximab_data.txt", header=T, > > row.names=1, > > > > sep="\t")# the data matrix is filtered for low expressors etc. and > > I am > > > > using Entrez Gene ID as row identifiers. > > > > class(data) > > > > data.m <- as.matrix(data) > > > > > > > > new <- gsva(data.m, > > > > > > c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootst rap.percent > > > > > > = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE) > > > > > > > > > > > > I keep getting the following error at this step > > > > Error in match(x, y) : 'match' requires vector arguments > > > > > > > > Could you pleaase tell me what I am doing wrong? > > > > > > > > Thanks so much, > > > > Som. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: robert.castelo@upf.edu > > > > > To: kellert@ohsu.edu > > > > > Date: Wed, 16 Nov 2011 08:43:48 +0100 > > > > > CC: wendy2.qiao@gmail.com; bioconductor@r-project.org > > > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > > > > > > > hi Tom, > > > > > > > > > > i'm a bit unsure what are you asking in relationship with this > > thread, > > > > > but i guess you're interested in creating a custom annotation > > package. > > > > > For that purpose i'd recommend you to read through the vignettes > > of the > > > > > AnnotationDbi package. i'm not an expert in creating custom > > annotation > > > > > packages so if you encounter problems to go ahead i think you > > should > > > > > start a new thread with the specific question or problem you > > want to > > > > > solve. > > > > > > > > > > cheers, > > > > > robert. > > > > > > > > > > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote: > > > > > > Greetings, > > > > > > The annotation for the miRNA chip does not seem to have the > > same > > > > amount of > > > > > > information as the hgu95 db. Is there some help available for > > > > mapping miRNA > > > > > > probes to their target genes? > > > > > > > > > > > > thanks > > > > > > Thomas (Tom) Keller, PhD > > > > > > kellert at ohsu.edu > > > > > > 503.494.2442 > > > > > > 6588 R Jones Hall (BSc/CROET) > > > > > > MMI DNA Services > > > > > > Member of OHSU Shared Resources > > > > > > > > > > > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > > > > > > > > > > > > > hi Wendy, > > > > > > > > > > > > > > i'm afraid you need to get a little bit acquainted with the > > way > > > > in which > > > > > > > annotations are handled in BioC. a good starting point could > > be > > > > looking > > > > > > > a the vignette "AnnotationDbi: How to use the .db annotation > > > > packages" > > > > > > > from the AnnotationDbi package. > > > > > > > > > > > > > > the short answer to your problem is that hgu95a is not the > > only > > > > platform > > > > > > > for which annotations exist in BioC, basically there is an > > annotation > > > > > > > package for each platform supported by BioC (you can look > > all of > > > > them up > > > > > > > by going to > > > > http://www.bioconductor.org/packages/release/BiocViews.html > > > > > > > and clicking on "AnnotationData") but in order to use on > > such > > > > annotation > > > > > > > packages you need > > > > > > > > > > > > > > 1. install it once in your system via source() and > > biocLite() just as > > > > > > > with every software package > > > > > > > > > > > > > > 2. load it via the library() function. > > > > > > > > > > > > > > in order to use the human organism-level package i mentioned > > in my > > > > > > > previous email you need to install it first and then load it > > > > prior to do > > > > > > > anything else with it. > > > > > > > > > > > > > > let me know if this still does not solve your problem. > > > > > > > > > > > > > > cheers, > > > > > > > robert. > > > > > > > > > > > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > > > > > > >> Hi Robert, > > > > > > >> > > > > > > >> Thank you for your reply. I happened to convert all the > > genes to > > > > > > >> hgu95a probe IDs as I found that this is the only platform > > that > > > > works > > > > > > >> with ExpressionSet. It would be great that we could make > > the > > > > entrez ID > > > > > > >> works. Following is my error that I got with your code. > > > > > > >> > > > > > > >> > > > > > > >> Thank you. > > > > > > >> Wendy > > > > > > >> > > > > > > >> > > > > > > >>> BcellSet > > > > > > >> ExpressionSet (storageMode: lockedEnvironment) > > > > > > >> assayData: 12148 features, 7 samples > > > > > > >> element names: exprs > > > > > > >> protocolData: none > > > > > > >> phenoData > > > > > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... > > Affy_PREBCEL_4 (7 > > > > > > >> total) > > > > > > >> varLabels: CellType Platform Replicates > > > > > > >> varMetadata: labelDescription > > > > > > >> featureData: none > > > > > > >> experimentData: use 'experimentData(object)' > > > > > > >> Annotation: org.Hs.eg.db > > > > > > >>> > > > > > > >> > > > > > > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > > > > > > >> Mapping identifiers between gene sets and feature names > > > > > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, > > to, ..., > > > > > > >> verbose = verbose)) : > > > > > > >> error in evaluating the argument 'object' in selecting a > > method for > > > > > > >> function 'GeneSetCollection': Error in get(mapName, envir = > > pkgEnv, > > > > > > >> inherits = FALSE) : > > > > > > >> object 'org.Hs.egENTREZID' not found > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> On 14 November 2011 12:27, Robert Castelo > > <robert.castelo@upf.edu> > > > > > > >> wrote: > > > > > > >> hi Wendy, > > > > > > >> > > > > > > >> sorry for my late answer. in principle there is no problem > > for > > > > > > >> the > > > > > > >> gsva() function to take Entrez IDs in your expression data > > > > > > >> matrix. > > > > > > >> > > > > > > >> if the expression data comes as a matrix, and rows are > > > > > > >> annotated with > > > > > > >> Entrez IDs and the gene sets are also annotated with Entrez > > > > > > >> IDs, there > > > > > > >> should be absolutely no problem. > > > > > > >> > > > > > > >> if the expression data comes as an ExpressionSet object > > where > > > > > > >> the > > > > > > >> 'features' are not Affy probe IDs but just EntrezIDs. just > > > > > > >> make sure > > > > > > >> that the annotation slot has the corresponding > > organism-level > > > > > > >> package. > > > > > > >> for instance, in the case of human: > > > > > > >> > > > > > > >> annotation(eset) <- "org.Hs.eg.db" > > > > > > >> > > > > > > >> let me know if you have any problem with this. > > > > > > >> > > > > > > >> cheers, > > > > > > >> robert. > > > > > > >> > > > > > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > > > > > >>> Hi all, > > > > > > >>> > > > > > > >>> I am using the GSVA package for some analysis. I found > > that > > > > > > >> the package > > > > > > >>> only takes the gene expression matrix annotated with > > > > > > >> affymetrix probe IDs, > > > > > > >>> although the gene set collection is made of Entrez IDs. I > > > > > > >> imagine there a > > > > > > >>> step in the package for converting the Affymetrix probe > > IDs > > > > > > >> to Entrez IDs. > > > > > > >>> As my data are from the Illumina platform, I am wondering > > if > > > > > > >> an expression > > > > > > >>> matrix annotated with Entrez IDs can be used directly. > > > > > > >>> > > > > > > >>> Thank you, > > > > > > >>> Wendy > > > > > > >>> > > > > > > >> > > > > > > >>> [[alternative HTML version deleted]] > > > > > > >>> > > > > > > >>> _______________________________________________ > > > > > > >>> Bioconductor mailing list > > > > > > >>> Bioconductor@r-project.org > > > > > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > >>> Search the archives: > > > > > > >> > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > >>> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioconductor mailing list > > > > > > > Bioconductor@r-project.org > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor@r-project.org > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago fire1976 wyoming ▴ 380

0

Entering edit mode

Hi Robert, Based on the solution you provided, the code ran fine. But when I tried to bootstrap it, it failed. Following is the traceback() and sessionInfo() of what I am doing: > traceback() 9: FUN(X[[1L]], ...) 8: lapply(X, FUN, ...) 7: sapply(gset.idx.list, ks_test_m, rank.scores, sort.sgn.idxs, mx.diff = mx.diff, verbose = verbose) 6: t(sapply(gset.idx.list, ks_test_m, rank.scores, sort.sgn.idxs, mx.diff = mx.diff, verbose = verbose)) 5: compute.geneset.es(expr, gset.idx.list, sample(n.samples, bootstrap.nsamples, replace = T), abs.ranking) 4: GSVA:::.gsva(expr, mapped.gset.idx.list, abs.ranking, no.bootstraps, bootstrap.percent, parallel.sz, parallel.type, verbose, mx.diff) 3: .local(expr, gset.idx.list, ...) 2: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, no.bootstraps = 1000, bootstrap.percent = 0.632, verbose = TRUE, mx.diff = TRUE) 1: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, no.bootstraps = 1000, bootstrap.percent = 0.632, verbose = TRUE, mx.diff = TRUE) > sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GSVA_1.0.1 GSEABase_1.14.0 graph_1.30.0 annotate_1.30.1 AnnotationDbi_1.14.1 [6] Biobase_2.12.2 loaded via a namespace (and not attached): [1] DBI_0.2-5 RSQLite_0.10.0 tools_2.13.1 XML_3.4-2.2 xtable_1.6-0 Moreover, I was wondering if we could get the FDR-adjusted p-values for the ks scores. Thanks, Som. > Subject: RE: [BioC] GSVA: using Entrez ID's as identifiers > From: robert.castelo@upf.edu > To: genome1976@hotmail.com > CC: bioconductor@r-project.org > Date: Thu, 17 Nov 2011 18:31:11 +0100 > > hi Som, thanks for the information. > > even though you are not running the latest version of R with the latest > version of the package, the problem does not seem to be related to that > fact and i'm able reproduce it with the latest version of both things. > in any case, it is always good to try to work with the last version of > all the software. > > the short answer to your question is: try to call the gsva() function > this way (i'm using below the command line you put on your first email): > > c3gsc2 <- geneIds(c3gsc2) > > new <- gsva(data.m, c3gsc2, abs.ranking=TRUE, min.sz=1, max.sz=Inf, > no.bootstraps=0, bootstrap.percent = .632, parallel.sz=0, > parallel.type="SOCK", verbose=TRUE, mx.diff=TRUE) > > the long answer is that, in principle, there is no method on gsva() to > accept a call with the expression data as a matrix and the gene set as a > GeneSetCollection object. When the expression data comes as a matrix the > gene sets should come as a list, which is what we achieve in the line > i've added before the call to gsva() by transforming the > GeneSetCollection object into a list object. Under such a circumstance > you should have got a more informative error saying something like there > is no such gsva() method that accepts "matrix" and "GeneSetCollection", > which is something that we should definitely add at some point to the > package. Instead the method that accepts "matrix" and "list" is > triggered but it breaks at the first line of code that tries to operate > on the list of gene sets because that is not a gene list. This strange > behavior comes from the (strange for me) fact that a GeneSetCollection > object qualifies also as a list, for instance: > > library(GSVAdata) > data(c2BroadSets) > class(c2BroadSets) > [1] "GeneSetCollection" > attr(,"package") > [1] "GSEABase" > is.list(c2BroadSets) > [1] TRUE > > anyway, let me know if the fix i propose you above does not work. > > thanks, > robert. > > On Thu, 2011-11-17 at 09:38 -0500, somnath bandyopadhyay wrote: > > Hi Robert, > > > > Following is the information you asked for: > > > > > traceback() > > 7: match(x, y) > > 6: na.omit(match(x, y)) > > 5: FUN(X[[1L]], ...) > > 4: lapply(gset.idx.list, function(x, y) na.omit(match(x, y)), > > rownames(expr)) > > 3: .local(expr, gset.idx.list, ...) > > 2: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, > > no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, > > parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > > 1: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, > > no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, > > parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE) > > > sessionInfo () > > R version 2.13.1 (2011-07-08) > > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 LC_MONETARY=English_United States.1252 > > LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > > [1] stats graphics grDevices utils datasets methods > > base > > other attached packages: > > [1] GSVA_1.0.1 RColorBrewer_1.0-5 limma_3.8.3 > > genefilter_1.34.0 GSEABase_1.14.0 graph_1.30.0 > > annotate_1.30.1 > > [8] AnnotationDbi_1.14.1 Biobase_2.12.2 > > loaded via a namespace (and not attached): > > [1] DBI_0.2-5 RSQLite_0.10.0 splines_2.13.1 > > survival_2.36-10 tools_2.13.1 XML_3.4-2.2 xtable_1.6-0 > > > > > > Looking forward to your reply. > > > > Best Regards, > > Som. > > > > > > > Date: Wed, 16 Nov 2011 22:04:06 +0100 > > > From: robert.castelo@upf.edu > > > To: genome1976@hotmail.com > > > CC: bioconductor@r-project.org > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > > > hi Som, > > > > > > i'm cc'ing the BioC mailing list, please remember to do it when you > > > answer since this works as a knowledge base for everyone else. > > > > > > i'd need two bits of information from you to find out what might be > > > happening: one, right after the error pops up, please write in the R > > shell: > > > > > > traceback() > > > > > > and paste here the output of this function. > > > > > > two, please paste also here the ouput of > > > > > > sessionInfo() > > > > > > > > > both steps are in fact recommended by the BioC mailinglist posting > > guide: > > > > > > http://www.bioconductor.org/help/mailing-list/posting-guide > > > > > > robert. > > > > > > On 11/16/11 9:04 PM, somnath bandyopadhyay wrote: > > > > Hi Robert, > > > > > > > > I am trying to use GSVA on a microarray dataset and I am trying to > > use > > > > one of the Broad gene set collections for the enrichment purposes. > > > > > > > > > > > > library(GSEABase) > > > > library(Biobase) > > > > library(genefilter) > > > > library(limma) > > > > library(RColorBrewer) > > > > library(graph) > > > > library(GSVA) > > > > > > > > c3gsc2 <- > > > > > > getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection (category="c3"),geneIdType=EntrezIdentifier()) > > > > class(c3gsc2) > > > > c3gsc2 > > > > > > > > data <- read.table("gsva_infliximab_data.txt", header=T, > > row.names=1, > > > > sep="\t")# the data matrix is filtered for low expressors etc. and > > I am > > > > using Entrez Gene ID as row identifiers. > > > > class(data) > > > > data.m <- as.matrix(data) > > > > > > > > new <- gsva(data.m, > > > > > > c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootst rap.percent > > > > > > = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE) > > > > > > > > > > > > I keep getting the following error at this step > > > > Error in match(x, y) : 'match' requires vector arguments > > > > > > > > Could you pleaase tell me what I am doing wrong? > > > > > > > > Thanks so much, > > > > Som. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: robert.castelo@upf.edu > > > > > To: kellert@ohsu.edu > > > > > Date: Wed, 16 Nov 2011 08:43:48 +0100 > > > > > CC: wendy2.qiao@gmail.com; bioconductor@r-project.org > > > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers > > > > > > > > > > hi Tom, > > > > > > > > > > i'm a bit unsure what are you asking in relationship with this > > thread, > > > > > but i guess you're interested in creating a custom annotation > > package. > > > > > For that purpose i'd recommend you to read through the vignettes > > of the > > > > > AnnotationDbi package. i'm not an expert in creating custom > > annotation > > > > > packages so if you encounter problems to go ahead i think you > > should > > > > > start a new thread with the specific question or problem you > > want to > > > > > solve. > > > > > > > > > > cheers, > > > > > robert. > > > > > > > > > > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote: > > > > > > Greetings, > > > > > > The annotation for the miRNA chip does not seem to have the > > same > > > > amount of > > > > > > information as the hgu95 db. Is there some help available for > > > > mapping miRNA > > > > > > probes to their target genes? > > > > > > > > > > > > thanks > > > > > > Thomas (Tom) Keller, PhD > > > > > > kellert at ohsu.edu > > > > > > 503.494.2442 > > > > > > 6588 R Jones Hall (BSc/CROET) > > > > > > MMI DNA Services > > > > > > Member of OHSU Shared Resources > > > > > > > > > > > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote: > > > > > > > > > > > > > hi Wendy, > > > > > > > > > > > > > > i'm afraid you need to get a little bit acquainted with the > > way > > > > in which > > > > > > > annotations are handled in BioC. a good starting point could > > be > > > > looking > > > > > > > a the vignette "AnnotationDbi: How to use the .db annotation > > > > packages" > > > > > > > from the AnnotationDbi package. > > > > > > > > > > > > > > the short answer to your problem is that hgu95a is not the > > only > > > > platform > > > > > > > for which annotations exist in BioC, basically there is an > > annotation > > > > > > > package for each platform supported by BioC (you can look > > all of > > > > them up > > > > > > > by going to > > > > http://www.bioconductor.org/packages/release/BiocViews.html > > > > > > > and clicking on "AnnotationData") but in order to use on > > such > > > > annotation > > > > > > > packages you need > > > > > > > > > > > > > > 1. install it once in your system via source() and > > biocLite() just as > > > > > > > with every software package > > > > > > > > > > > > > > 2. load it via the library() function. > > > > > > > > > > > > > > in order to use the human organism-level package i mentioned > > in my > > > > > > > previous email you need to install it first and then load it > > > > prior to do > > > > > > > anything else with it. > > > > > > > > > > > > > > let me know if this still does not solve your problem. > > > > > > > > > > > > > > cheers, > > > > > > > robert. > > > > > > > > > > > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote: > > > > > > >> Hi Robert, > > > > > > >> > > > > > > >> Thank you for your reply. I happened to convert all the > > genes to > > > > > > >> hgu95a probe IDs as I found that this is the only platform > > that > > > > works > > > > > > >> with ExpressionSet. It would be great that we could make > > the > > > > entrez ID > > > > > > >> works. Following is my error that I got with your code. > > > > > > >> > > > > > > >> > > > > > > >> Thank you. > > > > > > >> Wendy > > > > > > >> > > > > > > >> > > > > > > >>> BcellSet > > > > > > >> ExpressionSet (storageMode: lockedEnvironment) > > > > > > >> assayData: 12148 features, 7 samples > > > > > > >> element names: exprs > > > > > > >> protocolData: none > > > > > > >> phenoData > > > > > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... > > Affy_PREBCEL_4 (7 > > > > > > >> total) > > > > > > >> varLabels: CellType Platform Replicates > > > > > > >> varMetadata: labelDescription > > > > > > >> featureData: none > > > > > > >> experimentData: use 'experimentData(object)' > > > > > > >> Annotation: org.Hs.eg.db > > > > > > >>> > > > > > > >> > > > > > > preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs > > > > > > >> Mapping identifiers between gene sets and feature names > > > > > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, > > to, ..., > > > > > > >> verbose = verbose)) : > > > > > > >> error in evaluating the argument 'object' in selecting a > > method for > > > > > > >> function 'GeneSetCollection': Error in get(mapName, envir = > > pkgEnv, > > > > > > >> inherits = FALSE) : > > > > > > >> object 'org.Hs.egENTREZID' not found > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> On 14 November 2011 12:27, Robert Castelo > > <robert.castelo@upf.edu> > > > > > > >> wrote: > > > > > > >> hi Wendy, > > > > > > >> > > > > > > >> sorry for my late answer. in principle there is no problem > > for > > > > > > >> the > > > > > > >> gsva() function to take Entrez IDs in your expression data > > > > > > >> matrix. > > > > > > >> > > > > > > >> if the expression data comes as a matrix, and rows are > > > > > > >> annotated with > > > > > > >> Entrez IDs and the gene sets are also annotated with Entrez > > > > > > >> IDs, there > > > > > > >> should be absolutely no problem. > > > > > > >> > > > > > > >> if the expression data comes as an ExpressionSet object > > where > > > > > > >> the > > > > > > >> 'features' are not Affy probe IDs but just EntrezIDs. just > > > > > > >> make sure > > > > > > >> that the annotation slot has the corresponding > > organism-level > > > > > > >> package. > > > > > > >> for instance, in the case of human: > > > > > > >> > > > > > > >> annotation(eset) <- "org.Hs.eg.db" > > > > > > >> > > > > > > >> let me know if you have any problem with this. > > > > > > >> > > > > > > >> cheers, > > > > > > >> robert. > > > > > > >> > > > > > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote: > > > > > > >>> Hi all, > > > > > > >>> > > > > > > >>> I am using the GSVA package for some analysis. I found > > that > > > > > > >> the package > > > > > > >>> only takes the gene expression matrix annotated with > > > > > > >> affymetrix probe IDs, > > > > > > >>> although the gene set collection is made of Entrez IDs. I > > > > > > >> imagine there a > > > > > > >>> step in the package for converting the Affymetrix probe > > IDs > > > > > > >> to Entrez IDs. > > > > > > >>> As my data are from the Illumina platform, I am wondering > > if > > > > > > >> an expression > > > > > > >>> matrix annotated with Entrez IDs can be used directly. > > > > > > >>> > > > > > > >>> Thank you, > > > > > > >>> Wendy > > > > > > >>> > > > > > > >> > > > > > > >>> [[alternative HTML version deleted]] > > > > > > >>> > > > > > > >>> _______________________________________________ > > > > > > >>> Bioconductor mailing list > > > > > > >>> Bioconductor@r-project.org > > > > > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > >>> Search the archives: > > > > > > >> > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > >>> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioconductor mailing list > > > > > > > Bioconductor@r-project.org > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor@r-project.org > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago fire1976 wyoming ▴ 380

Login before adding your answer.