Gene Ontology Annotations from Gene Names

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.5 years ago

I am hoping to get appropriate GO mappings for a list of genes used in a microarray experiment with a view to identifying significantly regulated processes. I was planning on using the Bioconductor package GOstats to identify these processes; however, the organism under study is not a supported organism. I have attempted to use the blast2GO software to generate the gene to GO mapping, but this approach seems to be very time consuming (after generating the corresponding .fasta files, it took over 1 hour to BLAST just 10 genes). Currently, the gene identifiers I am using are simply the gene names, but it shouldn't be too difficult to derive a list of corresponding alternative identifiers (assuming they are publicly available) should it be advantageous to the GO mapping process. Is there any faster way to achieve this gene to GO mapping (either through Bioconductor packages or otherwise)? Any assistance is appreciated. Joseph -- output of sessionInfo(): - -- Sent via the guest posting facility at bioconductor.org.

Microarray GO Organism PROcess GOstats Microarray GO Organism PROcess GOstats • 2.7k views

ADD COMMENT • link updated 12.1 years ago by James W. MacDonald 68k • written 12.1 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Joseph, What's the organism? You might be able to create an org-level package using orgPkgFromNCBI() in the AnnotationForge package. Best, Jim On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote: > I am hoping to get appropriate GO mappings for a list of genes used in a microarray experiment with a view to identifying significantly regulated processes. > > I was planning on using the Bioconductor package GOstats to identify these processes; however, the organism under study is not a supported organism. I have attempted to use the blast2GO software to generate the gene to GO mapping, but this approach seems to be very time consuming (after generating the corresponding .fasta files, it took over 1 hour to BLAST just 10 genes). > > Currently, the gene identifiers I am using are simply the gene names, but it shouldn't be too difficult to derive a list of corresponding alternative identifiers (assuming they are publicly available) should it be advantageous to the GO mapping process. > > Is there any faster way to achieve this gene to GO mapping (either through Bioconductor packages or otherwise)? > > Any assistance is appreciated. > > Joseph > > -- output of sessionInfo(): > > - > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, Thanks for your reply! The organism is Campylobacter jejuni (strain: NCTC11168). How can I check if this is a viable option? According to the reference manual for AnnotationForge, the makeOrgPackageFromNCBI() function makes an organism package from annotations available from NCBI, but, according to the function arguments an author and maintainer are required; I'm not sure exactly what this applies to. Also, the function returns nothing; if this is the case, how can you access the created organism package? Joseph On Wed, Feb 5, 2014 at 3:51 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Joseph, > > What's the organism? You might be able to create an org-level package using > orgPkgFromNCBI() in the AnnotationForge package. > > Best, > > Jim > > > > On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote: >> >> I am hoping to get appropriate GO mappings for a list of genes used in a >> microarray experiment with a view to identifying significantly regulated >> processes. >> >> I was planning on using the Bioconductor package GOstats to identify these >> processes; however, the organism under study is not a supported organism. I >> have attempted to use the blast2GO software to generate the gene to GO >> mapping, but this approach seems to be very time consuming (after generating >> the corresponding .fasta files, it took over 1 hour to BLAST just 10 genes). >> >> Currently, the gene identifiers I am using are simply the gene names, but >> it shouldn't be too difficult to derive a list of corresponding alternative >> identifiers (assuming they are publicly available) should it be advantageous >> to the GO mapping process. >> >> Is there any faster way to achieve this gene to GO mapping (either through >> Bioconductor packages or otherwise)? >> >> Any assistance is appreciated. >> >> Joseph >> >> -- output of sessionInfo(): >> >> - >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 >

ADD REPLY • link 12.1 years ago Joseph Shaw ▴ 100

0

Entering edit mode

Hi Joseph, You can check to see if it is a viable option by just giving it a shot. Note that the author and maintainer should in general be you, so you would replace my oh so very droll versions with your name and email. Also note that if you are on Windows, you need to include type = "source" to the call to install.packages(). > makeOrgPackageFromNCBI(version = "0.0.1", author = "me <me at="" mine.com="">", maintainer = "me <me at="" mine.com="">",outputDir = ".", tax_id = "192222", genus = "Campylobacter", species = "jejuni") Loading required package: GO.db Getting data for gene2pubmed.gz Loading required package: RCurl Loading required package: bitops discarding data from other organisms Populating gene2pubmed table: table gene2pubmed filled Getting data for gene2accession.gz discarding data from other organisms Populating gene2accession table: table gene2accession filled Getting data for gene2refseq.gz discarding data from other organisms Populating gene2refseq table: table gene2refseq filled Getting data for gene2unigene discarding data from other organisms Populating gene2unigene table: table gene2unigene filled Getting data for gene_info.gz discarding data from other organisms Populating gene_info table: table gene_info filled Getting data for gene2go.gz discarding data from other organisms Populating gene2go table: Getting blast2GO data as a substitute for gene2go table metadata filled table map_metadata filled table gene2go filled table metadata filled table map_metadata filled Populating genes table: genes table filled Populating gene_info_temp table: gene_info_temp table filled Populating alias table: alias table filled Populating chromosomes table: chromosomes table filled Populating pubmed table: pubmed table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating unigene table: Dropping GO IDs that are too new for the current GO.db Dropping GO IDs that are too new for the current GO.db Dropping GO IDs that are too new for the current GO.db Populating go_bp table: go_bp table filled Populating go_mf table: go_mf table filled Populating go_cc table: go_cc table filled Populating go_bp_all table: go_bp_all table filled Populating go_mf_all table: go_mf_all table filled Populating go_cc_all table: go_cc_all table filled dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go Making GO views SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL table map_counts filled Creating package in ./org.Cjejuni.eg.db [1] TRUE Warning message: In .makeSimpleTable(ug, table = "unigene", con) : no values found for table unigene in this data chunk. So that built the package, but now we need to install > install.packages("org.Cjejuni.eg.db", repos = NULL) * installing *source* package ?org.Cjejuni.eg.db? ... ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (org.Cjejuni.eg.db) > library(org.Cjejuni.eg.db) > head(toTable(org.Cjejuni.egGO)) gene_id go_id Evidence Ontology 1 904332 GO:0006281 IEA BP 2 904332 GO:0030420 IEA BP 3 904333 GO:0006935 IEA BP 4 904333 GO:0007165 IEA BP 5 904334 GO:0006401 IEA BP 6 904335 GO:0006549 IEA BP Best, Jim On Wednesday, February 05, 2014 1:31:22 PM, Joseph Shaw wrote: > Hi Jim, > > Thanks for your reply! > > The organism is Campylobacter jejuni (strain: NCTC11168). How can I > check if this is a viable option? > > According to the reference manual for AnnotationForge, the > makeOrgPackageFromNCBI() function makes an organism package from > annotations available from NCBI, but, according to the function > arguments an author and maintainer are required; I'm not sure exactly > what this applies to. > > Also, the function returns nothing; if this is the case, how can you > access the created organism package? > > Joseph > > On Wed, Feb 5, 2014 at 3:51 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> Hi Joseph, >> >> What's the organism? You might be able to create an org-level package using >> orgPkgFromNCBI() in the AnnotationForge package. >> >> Best, >> >> Jim >> >> >> >> On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote: >>> >>> I am hoping to get appropriate GO mappings for a list of genes used in a >>> microarray experiment with a view to identifying significantly regulated >>> processes. >>> >>> I was planning on using the Bioconductor package GOstats to identify these >>> processes; however, the organism under study is not a supported organism. I >>> have attempted to use the blast2GO software to generate the gene to GO >>> mapping, but this approach seems to be very time consuming (after generating >>> the corresponding .fasta files, it took over 1 hour to BLAST just 10 genes). >>> >>> Currently, the gene identifiers I am using are simply the gene names, but >>> it shouldn't be too difficult to derive a list of corresponding alternative >>> identifiers (assuming they are publicly available) should it be advantageous >>> to the GO mapping process. >>> >>> Is there any faster way to achieve this gene to GO mapping (either through >>> Bioconductor packages or otherwise)? >>> >>> Any assistance is appreciated. >>> >>> Joseph >>> >>> -- output of sessionInfo(): >>> >>> - >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, > You can check to see if it is a viable option by just giving it a shot. I have attempted to call the makeOrgPackageFromNCBI() as described in your previous mail (having provided my details for the author and maintainer arguments); however, the function call doesn't fully complete. In particular, the steps outline below are completed, but it appears to make it no further. > Loading required package: GO.db > > Getting data for gene2pubmed.gz > Loading required package: RCurl > Loading required package: bitops > discarding data from other organisms > Populating gene2pubmed table: > table gene2pubmed filled > Getting data for gene2accession.gz I'm not sure if the function has failed or if the function is still in the process of completion. Could you tell me, approximately, how long the function should take to complete? For reference, I'm currently running OS X with 1.8 GHz processor and 4GB memory. Joseph

ADD REPLY • link 12.1 years ago Joseph Shaw ▴ 100

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Joseph, Please don't take conversations off-list. On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote: > Hi Jim, > > Thanks for all your assistance. I really appreciate it! > > Unfortunately, when I attempt to run > >> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type = "source") > > I get the error warning > >> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2 is required by 'org.Cjejuni.eg.db' > > I have since attempted to reinstall and update the AnnotationDbi > package on my system to a compatible iteration, but the process > results in the same error. Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi package in my release BioC install. You could probably just untar and ungzip that file and then manually change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and then install using install.packages("org.Cjejuni.eg.db", type = "source", repos = NULL) > > On a separate but related note, is it possible to restrict the list of > gene annotations from org.Cjejuni.eg.db used in the GO analysis (i.e. > the GSEAGOHyperGParams()* function) to simply include the probes used > in the experiment (i.e. create two subsets; a gene universe and a > collection of genes identified as differentially expressed)? > > (*The GSEAGOHyperGParams() function is used in the unuspported model > organisms vignette, but the author simply uses the entire gene mapping > as the gene universe and selects the first 500 genes as differentially > expressed; ideally, I would like to include genes in the universe > based on gene IDs, but this might not be the most efficient way.) You are reading the wrong vignette. While this is technically a 'unsupported organism', since you have an org package, you can just use the regular infrastructure: > univ <- Lkeys(org.Cjejuni.egACCNUM) > gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes at random > p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ, ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional = TRUE) > hyp <- hyperGTest(p) > summary(hyp) GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0012501 0.003677779 Inf 0.1221239 2 2 programmed cell death 2 GO:0016265 0.003677779 Inf 0.1221239 2 2 death I get an infinite odds ratio here because I randomly selected the only two apoptosis genes on the array. Yay for me! Best, Jim > > Relevant Vignette: > http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/in st/doc/GOstatsForUnsupportedOrganisms.pdf > > Joseph > > On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> See attached. >> >> >> On 2/6/2014 8:32 PM, Joseph Shaw wrote: >>> >>> Hi Jim, >>> >>>> You can check to see if it is a viable option by just giving it a shot. >>> >>> I have attempted to call the makeOrgPackageFromNCBI() as described in >>> your previous mail (having provided my details for the author and >>> maintainer arguments); however, the function call doesn't fully >>> complete. In particular, the steps outline below are completed, but it >>> appears to make it no further. >>> >>>> Loading required package: GO.db >>>> >>>> Getting data for gene2pubmed.gz >>>> Loading required package: RCurl >>>> Loading required package: bitops >>>> discarding data from other organisms >>>> Populating gene2pubmed table: >>>> table gene2pubmed filled >>>> Getting data for gene2accession.gz >>> >>> I'm not sure if the function has failed or if the function is still in >>> the process of completion. Could you tell me, approximately, how long >>> the function should take to complete? For reference, I'm currently >>> running OS X with 1.8 GHz processor and 4GB memory. >>> >>> Joseph >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Joseph, Here is a newer tarball, build with all release packages so that it should be able to install properly for you without modification. And Jim is right, you should be able to proceed from here now that you have an org package. Basically, we didn't used to have as many tools for making org packages, so not being supported used to be a much more serious problem than it is today. Hope this helps, Marc On 02/10/2014 08:44 AM, James W. MacDonald wrote: > Hi Joseph, > > Please don't take conversations off-list. > > On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote: >> Hi Jim, >> >> Thanks for all your assistance. I really appreciate it! >> >> Unfortunately, when I attempt to run >> >>> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type = >>> "source") >> >> I get the error warning >> >>> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2 is >>> required by 'org.Cjejuni.eg.db' >> >> I have since attempted to reinstall and update the AnnotationDbi >> package on my system to a compatible iteration, but the process >> results in the same error. > > Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi > package in my release BioC install. > > You could probably just untar and ungzip that file and then manually > change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and > then install using > > install.packages("org.Cjejuni.eg.db", type = "source", repos = NULL) > > >> >> On a separate but related note, is it possible to restrict the list of >> gene annotations from org.Cjejuni.eg.db used in the GO analysis (i.e. >> the GSEAGOHyperGParams()* function) to simply include the probes used >> in the experiment (i.e. create two subsets; a gene universe and a >> collection of genes identified as differentially expressed)? >> >> (*The GSEAGOHyperGParams() function is used in the unuspported model >> organisms vignette, but the author simply uses the entire gene mapping >> as the gene universe and selects the first 500 genes as differentially >> expressed; ideally, I would like to include genes in the universe >> based on gene IDs, but this might not be the most efficient way.) > > You are reading the wrong vignette. While this is technically a > 'unsupported organism', since you have an org package, you can just > use the regular infrastructure: > >> univ <- Lkeys(org.Cjejuni.egACCNUM) >> gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes at >> random >> p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ, >> ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional = TRUE) >> hyp <- hyperGTest(p) >> summary(hyp) > GOBPID Pvalue OddsRatio ExpCount Count Size > Term > 1 GO:0012501 0.003677779 Inf 0.1221239 2 2 programmed > cell death > 2 GO:0016265 0.003677779 Inf 0.1221239 2 2 > death > > I get an infinite odds ratio here because I randomly selected the only > two apoptosis genes on the array. Yay for me! > > Best, > > Jim > > >> >> Relevant Vignette: >> http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/i nst/doc/GOstatsForUnsupportedOrganisms.pdf >> >> >> Joseph >> >> On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at="" uw.edu=""> >> wrote: >>> See attached. >>> >>> >>> On 2/6/2014 8:32 PM, Joseph Shaw wrote: >>>> >>>> Hi Jim, >>>> >>>>> You can check to see if it is a viable option by just giving it a >>>>> shot. >>>> >>>> I have attempted to call the makeOrgPackageFromNCBI() as described in >>>> your previous mail (having provided my details for the author and >>>> maintainer arguments); however, the function call doesn't fully >>>> complete. In particular, the steps outline below are completed, but it >>>> appears to make it no further. >>>> >>>>> Loading required package: GO.db >>>>> >>>>> Getting data for gene2pubmed.gz >>>>> Loading required package: RCurl >>>>> Loading required package: bitops >>>>> discarding data from other organisms >>>>> Populating gene2pubmed table: >>>>> table gene2pubmed filled >>>>> Getting data for gene2accession.gz >>>> >>>> I'm not sure if the function has failed or if the function is still in >>>> the process of completion. Could you tell me, approximately, how long >>>> the function should take to complete? For reference, I'm currently >>>> running OS X with 1.8 GHz processor and 4GB memory. >>>> >>>> Joseph >>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------- next part -------------- A non-text attachment was scrubbed... Name: org.Cjejuni.eg.db_0.1.tar.gz Type: application/x-gzip Size: 5874006 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140210="" 823b90c3="" attachment-0001.gz="">

ADD REPLY • link 12.1 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi all, Thank you so much. Your assistance has been invaluable! Joseph On Mon, Feb 10, 2014 at 7:37 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Hi Joseph, > > Here is a newer tarball, build with all release packages so that it should > be able to install properly for you without modification. > > And Jim is right, you should be able to proceed from here now that you have > an org package. Basically, we didn't used to have as many tools for making > org packages, so not being supported used to be a much more serious problem > than it is today. > > Hope this helps, > > > Marc > > > > > On 02/10/2014 08:44 AM, James W. MacDonald wrote: >> >> Hi Joseph, >> >> Please don't take conversations off-list. >> >> On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote: >>> >>> Hi Jim, >>> >>> Thanks for all your assistance. I really appreciate it! >>> >>> Unfortunately, when I attempt to run >>> >>>> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type = >>>> "source") >>> >>> >>> I get the error warning >>> >>>> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2 is >>>> required by 'org.Cjejuni.eg.db' >>> >>> >>> I have since attempted to reinstall and update the AnnotationDbi >>> package on my system to a compatible iteration, but the process >>> results in the same error. >> >> >> Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi package >> in my release BioC install. >> >> You could probably just untar and ungzip that file and then manually >> change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and then >> install using >> >> install.packages("org.Cjejuni.eg.db", type = "source", repos = NULL) >> >> >>> >>> On a separate but related note, is it possible to restrict the list of >>> gene annotations from org.Cjejuni.eg.db used in the GO analysis (i.e. >>> the GSEAGOHyperGParams()* function) to simply include the probes used >>> in the experiment (i.e. create two subsets; a gene universe and a >>> collection of genes identified as differentially expressed)? >>> >>> (*The GSEAGOHyperGParams() function is used in the unuspported model >>> organisms vignette, but the author simply uses the entire gene mapping >>> as the gene universe and selects the first 500 genes as differentially >>> expressed; ideally, I would like to include genes in the universe >>> based on gene IDs, but this might not be the most efficient way.) >> >> >> You are reading the wrong vignette. While this is technically a >> 'unsupported organism', since you have an org package, you can just use the >> regular infrastructure: >> >>> univ <- Lkeys(org.Cjejuni.egACCNUM) >>> gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes at >>> random >>> p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ, >>> ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional = TRUE) >>> hyp <- hyperGTest(p) >>> summary(hyp) >> >> GOBPID Pvalue OddsRatio ExpCount Count Size >> Term >> 1 GO:0012501 0.003677779 Inf 0.1221239 2 2 programmed cell >> death >> 2 GO:0016265 0.003677779 Inf 0.1221239 2 2 death >> >> I get an infinite odds ratio here because I randomly selected the only two >> apoptosis genes on the array. Yay for me! >> >> Best, >> >> Jim >> >> >>> >>> Relevant Vignette: >>> >>> http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/ inst/doc/GOstatsForUnsupportedOrganisms.pdf >>> >>> Joseph >>> >>> On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at="" uw.edu=""> >>> wrote: >>>> >>>> See attached. >>>> >>>> >>>> On 2/6/2014 8:32 PM, Joseph Shaw wrote: >>>>> >>>>> >>>>> Hi Jim, >>>>> >>>>>> You can check to see if it is a viable option by just giving it a >>>>>> shot. >>>>> >>>>> >>>>> I have attempted to call the makeOrgPackageFromNCBI() as described in >>>>> your previous mail (having provided my details for the author and >>>>> maintainer arguments); however, the function call doesn't fully >>>>> complete. In particular, the steps outline below are completed, but it >>>>> appears to make it no further. >>>>> >>>>>> Loading required package: GO.db >>>>>> >>>>>> Getting data for gene2pubmed.gz >>>>>> Loading required package: RCurl >>>>>> Loading required package: bitops >>>>>> discarding data from other organisms >>>>>> Populating gene2pubmed table: >>>>>> table gene2pubmed filled >>>>>> Getting data for gene2accession.gz >>>>> >>>>> >>>>> I'm not sure if the function has failed or if the function is still in >>>>> the process of completion. Could you tell me, approximately, how long >>>>> the function should take to complete? For reference, I'm currently >>>>> running OS X with 1.8 GHz processor and 4GB memory. >>>>> >>>>> Joseph >>>> >>>> >>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.1 years ago Joseph Shaw ▴ 100

Login before adding your answer.