Getting GO ids for genenames in plasmodium falciparum

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.1 years ago

I have a list of genenames - plasmodium falciparum gotten from the plasmodb website. I am trying to get the associated GO:IDs in order to bin the genes into housekeeping versus non-housekeeping genes. And also in terms of functional and process. I have installed the org.Pf.plasmo.db using biocLite. I have tried to use this example: x <- org.Pf.plasmoGO # Get the ORF identifiers that are mapped to a GO ID mapped_genes <- mappedkeys(x) # Convert to a list xx <- as.list(x[mapped_genes]) if(length(xx) > 0) { # Try the first one got <- xx[[1]] got[[1]][["GOID"]] got[[1]][["Ontology"]] got[[1]][["Evidence"]] } It doesnt provide an opportunity to create a column and enter my own gene names. It appears to be a premapped set of genenames. As a result I decided to use the example to get all mappings in the list xx Unfortunately, I am unable to iterate through the list to generate it in a dataframe to meaningfully divide up the data. Secondly is there a way to actually query a database directly via R to get the associated GO:ID where the input would be a genename. Sorry to sound confused. I am pretty new to R and bioconductor. -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] org.Pf.plasmo.db_2.9.0 BiocInstaller_1.10.3 GO.db_2.9.0 hgu95av2.db_2.9.0 org.Hs.eg.db_2.9.0 [6] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] digest_0.6.3 grid_3.0.1 gtable_0.1.2 IRanges_1.18.4 plyr_1.8 proto_0.3-10 [7] RColorBrewer_1.0-5 reshape2_1.2.2 stats4_3.0.1 stringr_0.6.2 tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.

GO Plasmodium falciparum PROcess convert GO Plasmodium falciparum PROcess convert • 2.3k views

ADD COMMENT • link updated 11.0 years ago by Martin Morgan 25k • written 11.0 years ago by Guest User ★ 13k

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 10 weeks ago

United States

On 10/08/2013 01:09 AM, Maintainer wrote: > > I have a list of genenames - plasmodium falciparum gotten from the plasmodb website. > > I am trying to get the associated GO:IDs in order to bin the genes into housekeeping versus non-housekeeping genes. > And also in terms of functional and process. > > I have installed the org.Pf.plasmo.db using biocLite. I'm guessing you have keys like > ids <- head(keys(org.Pf.plasmo.db, "SYMBOL")) > ids [1] "PF3D7_0100100" "PF3D7_0100200" "PF3D7_0100300" "PF3D7_0100400" [5] "PF3D7_0100500" "PF3D7_0100600" and what you want to do is create your own vector 'ids' and then select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL") Martin > > I have tried to use this example: > > x <- org.Pf.plasmoGO > # Get the ORF identifiers that are mapped to a GO ID > mapped_genes <- mappedkeys(x) > # Convert to a list > xx <- as.list(x[mapped_genes]) > if(length(xx) > 0) { > # Try the first one > got <- xx[[1]] > got[[1]][["GOID"]] > got[[1]][["Ontology"]] > got[[1]][["Evidence"]] > } > > It doesnt provide an opportunity to create a column and enter my own gene names. It appears to be a premapped set of genenames. As a result I decided to use the example to get all mappings in the list xx > > Unfortunately, I am unable to iterate through the list to generate it in a dataframe to meaningfully divide up the data. > > Secondly is there a way to actually query a database directly via R to get the associated GO:ID where the input would be a genename. > > Sorry to sound confused. I am pretty new to R and bioconductor. > > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Pf.plasmo.db_2.9.0 BiocInstaller_1.10.3 GO.db_2.9.0 hgu95av2.db_2.9.0 org.Hs.eg.db_2.9.0 > [6] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] digest_0.6.3 grid_3.0.1 gtable_0.1.2 IRanges_1.18.4 plyr_1.8 proto_0.3-10 > [7] RColorBrewer_1.0-5 reshape2_1.2.2 stats4_3.0.1 stringr_0.6.2 tools_3.0.1 > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 11.0 years ago Martin Morgan 25k

0

Entering edit mode

Thanks for the tip. Seems to be working. I have now have the nested list for the GOids now for the plasmodium falciparum. I am now looking at summarizing them and have two questions. Is there a go slim database for plasmodium falciparum? Doesn't appear to have one. There are multiple evidence levels for each GoID for each gene, and at this point it is difficult to divide them. I am trying to get the CC part of the Go database and bin the genes based on location and I am thinking the GO slim will reduce the granularity? Thank you for your help. Again sorry for the confusion. Its a steep learning curve for me as I have only been looking at bioinformatics in general for the past month. Ipsita On 8 October 2013 19:14, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 10/08/2013 01:09 AM, Maintainer wrote: > >> >> I have a list of genenames - plasmodium falciparum gotten from the >> plasmodb website. >> >> I am trying to get the associated GO:IDs in order to bin the genes into >> housekeeping versus non-housekeeping genes. >> And also in terms of functional and process. >> >> I have installed the org.Pf.plasmo.db using biocLite. >> > > I'm guessing you have keys like > > > ids <- head(keys(org.Pf.plasmo.db, "SYMBOL")) > > ids > [1] "PF3D7_0100100" "PF3D7_0100200" "PF3D7_0100300" "PF3D7_0100400" > [5] "PF3D7_0100500" "PF3D7_0100600" > > and what you want to do is create your own vector 'ids' and then > > select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL") > > Martin > > >> I have tried to use this example: >> >> x <- org.Pf.plasmoGO >> # Get the ORF identifiers that are mapped to a GO ID >> mapped_genes <- mappedkeys(x) >> # Convert to a list >> xx <- as.list(x[mapped_genes]) >> if(length(xx) > 0) { >> # Try the first one >> got <- xx[[1]] >> got[[1]][["GOID"]] >> got[[1]][["Ontology"]] >> got[[1]][["Evidence"]] >> } >> >> It doesnt provide an opportunity to create a column and enter my own gene >> names. It appears to be a premapped set of genenames. As a result I decided >> to use the example to get all mappings in the list xx >> >> Unfortunately, I am unable to iterate through the list to generate it in >> a dataframe to meaningfully divide up the data. >> >> Secondly is there a way to actually query a database directly via R to >> get the associated GO:ID where the input would be a genename. >> >> Sorry to sound confused. I am pretty new to R and bioconductor. >> >> >> -- output of sessionInfo(): >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] org.Pf.plasmo.db_2.9.0 BiocInstaller_1.10.3 GO.db_2.9.0 >> hgu95av2.db_2.9.0 org.Hs.eg.db_2.9.0 >> [6] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.22.6 >> Biobase_2.20.1 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] digest_0.6.3 grid_3.0.1 gtable_0.1.2 >> IRanges_1.18.4 plyr_1.8 proto_0.3-10 >> [7] RColorBrewer_1.0-5 reshape2_1.2.2 stats4_3.0.1 >> stringr_0.6.2 tools_3.0.1 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> ______________________________**______________________________** >> ____________ >> devteam-bioc mailing list >> To unsubscribe from this mailing list send a blank email to >> devteam-bioc-leave@lists.**fhcrc.org <devteam-bioc- leave@lists.fhcrc.org=""> >> You can also unsubscribe or change your personal options at >> https://lists.fhcrc.org/**mailman/listinfo/devteam- bioc<https: lists.fhcrc.org="" mailman="" listinfo="" devteam-bioc=""> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Ipsita Sinha ▴ 10

0

Entering edit mode

On 10/08/2013 08:21 PM, Ipsita Sinha wrote: > Thanks for the tip. Seems to be working. > > I have now have the nested list for the GOids now for the plasmodium falciparum. > I am now looking at summarizing them and have two questions. > > Is there a go slim database for plasmodium falciparum? Doesn't appear to have one. > There are multiple evidence levels for each GoID for each gene, and at this > point it is difficult to divide them. I am trying to get the CC part of the Go > database and bin the genes based on location and I am thinking the GO slim will > reduce the granularity? I'm not sure which evidence codes would be useful (they are enumerated here http://www.geneontology.org/GO.evidence.shtml); in general after having mapped your ids to GO mapped = select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL") you can subset the map to have things you're interested in with something like codesILike = c("EXP", "IDA", "IPI", "IMP", "IGI", "IEP") good = subset(mapped, (ONTOLOGY %in% "CC") & (EVIDENCE %in% codesILike)) I'm not sure what your 'nested list' looks like or where you're aiming for, but with(good, split(SYMBOL, GO)) would give you a list of GO ids with their corresponding SYMBOL. I don't have any special insight into P. falciparum GO slims. Martin > > Thank you for your help. Again sorry for the confusion. Its a steep learning > curve for me as I have only been looking at bioinformatics in general for the > past month. > > Ipsita > > > On 8 October 2013 19:14, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote: > > On 10/08/2013 01:09 AM, Maintainer wrote: > > > I have a list of genenames - plasmodium falciparum gotten from the > plasmodb website. > > I am trying to get the associated GO:IDs in order to bin the genes into > housekeeping versus non-housekeeping genes. > And also in terms of functional and process. > > I have installed the org.Pf.plasmo.db using biocLite. > > > I'm guessing you have keys like > > > ids <- head(keys(org.Pf.plasmo.db, "SYMBOL")) > > ids > [1] "PF3D7_0100100" "PF3D7_0100200" "PF3D7_0100300" "PF3D7_0100400" > [5] "PF3D7_0100500" "PF3D7_0100600" > > and what you want to do is create your own vector 'ids' and then > > select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL") > > Martin > > > I have tried to use this example: > > x <- org.Pf.plasmoGO > # Get the ORF identifiers that are mapped to a GO ID > mapped_genes <- mappedkeys(x) > # Convert to a list > xx <- as.list(x[mapped_genes]) > if(length(xx) > 0) { > # Try the first one > got <- xx[[1]] > got[[1]][["GOID"]] > got[[1]][["Ontology"]] > got[[1]][["Evidence"]] > } > > It doesnt provide an opportunity to create a column and enter my own > gene names. It appears to be a premapped set of genenames. As a result I > decided to use the example to get all mappings in the list xx > > Unfortunately, I am unable to iterate through the list to generate it in > a dataframe to meaningfully divide up the data. > > Secondly is there a way to actually query a database directly via R to > get the associated GO:ID where the input would be a genename. > > Sorry to sound confused. I am pretty new to R and bioconductor. > > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] org.Pf.plasmo.db_2.9.0 BiocInstaller_1.10.3 GO.db_2.9.0 > hgu95av2.db_2.9.0 org.Hs.eg.db_2.9.0 > [6] RSQLite_0.11.4 DBI_0.2-7 > AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] digest_0.6.3 grid_3.0.1 gtable_0.1.2 > IRanges_1.18.4 plyr_1.8 proto_0.3-10 > [7] RColorBrewer_1.0-5 reshape2_1.2.2 stats4_3.0.1 > stringr_0.6.2 tools_3.0.1 > > -- > Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > > ____________________________________________________________ ________________ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.__fhcrc.org > <mailto:devteam-bioc-leave at="" lists.fhcrc.org=""> > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc > <https: lists.fhcrc.org="" mailman="" listinfo="" devteam-bioc=""> > > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 11.0 years ago Martin Morgan 25k

Login before adding your answer.