Search
Question: goseq/nullp with non-native identifiers
1
gravatar for Ravi Karra
6.1 years ago by
Ravi Karra140
Ravi Karra140 wrote:
Hello, I am trying to use goseq to find enriched GO terms for zebrafish RNA- seq data and am looking for advice on manually providing gene length information and GO annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene id's. Unfortunately danRer7 does not appear to be supported by goeqs's built-ins for ensembl gene ids. > supportedGenomes () [68,] db species date name AvailableGeneIDs 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9 > pwf = nullp(gene.vector, "danRer7", "ensGene") Error in getlength(names(DEgenes), genome, id) : Length information for genome danRer7 and gene ID ensGene is not in the geneLenDataBase database. You will have to specify bias.data manually. I would like to manually supply the gene length information by: > zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset = "drerio_gene_ensembl") > txsByGene=transcriptsBy(zv9txs,"gene") > lengthData=median(width(txsByGene)) and GO Data (using biomaRt): > zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl")) > GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", "go_id"), values = gene.universe, mart = zv9) How can I input this GO Data and gene length data into the nullp function of goseq to calculate a probability weighting function? Thanks and sessionInfo() below, Ravi > sessionInfo () R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0 GenomicRanges_1.8.13 [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0 geneLenDataBase_0.99.9 [9] BiasedUrn_1.04 biomaRt_2.12.0 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5 grid_2.15.1 [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20 nlme_3.1-104 [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1 rtracklayer_1.16.3 ShortRead_1.14.4 [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 > [[alternative HTML version deleted]]
ADD COMMENTlink modified 6.1 years ago by Alicia Oshlack100 • written 6.1 years ago by Ravi Karra140
0
gravatar for Alicia Oshlack
6.1 years ago by
Alicia Oshlack100 wrote:
Hi Ravi, You can use your own length data and GO categories by: pwf=nullp(gene.vector,bias.data=lengthData) go=goseq(pwf,gene2cat=GOmap) Cheers, Alicia On 3/09/12 8:00 PM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: > Date: Sun, 2 Sep 2012 09:39:48 -0400 > From: Ravi Karra <ravi.karra at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] goseq/nullp with non-native identifiers > Message-ID: <1446F9C1-DB8C-4F0F-BB7A-ABE4AA47A64A at gmail.com> > Content-Type: text/plain > > Hello, > > I am trying to use goseq to find enriched GO terms for zebrafish RNA-seq data > and am looking for advice on manually providing gene length information and GO > annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene > id's. Unfortunately danRer7 does not appear to be supported by goeqs's > built-ins for ensembl gene ids. > >> supportedGenomes () [68,] > db species date name AvailableGeneIDs > 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9 > >> pwf = nullp(gene.vector, "danRer7", "ensGene") > Error in getlength(names(DEgenes), genome, id) : > Length information for genome danRer7 and gene ID ensGene is not in the > geneLenDataBase database. You will have to specify bias.data manually. > > I would like to manually supply the gene length information by: > >> zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset = >> "drerio_gene_ensembl") >> txsByGene=transcriptsBy(zv9txs,"gene") >> lengthData=median(width(txsByGene)) > > and GO Data (using biomaRt): > >> zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl")) >> GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", >> "go_id"), values = gene.universe, mart = zv9) > > How can I input this GO Data and gene length data into the nullp function of > goseq to calculate a probability weighting function? > > Thanks and sessionInfo() below, > > Ravi > >> sessionInfo () > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0 > GenomicRanges_1.8.13 > [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0 > geneLenDataBase_0.99.9 > [9] BiasedUrn_1.04 biomaRt_2.12.0 > > loaded via a namespace (and not attached): > [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5 > grid_2.15.1 > [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20 > nlme_3.1-104 > [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1 > rtracklayer_1.16.3 ShortRead_1.14.4 > [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 >> > [[alternative HTML version deleted]] ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
ADD COMMENTlink written 6.1 years ago by Alicia Oshlack100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 456 users visited in the last hour