Can my problem be addressed with Bioconductor?
1
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.7 years ago
Dear All, I am very new to Bioconductor and to Gene Ontology analyses, so please forgive me if my question is trivial. I have as "universe" a list of SNPs (not all of them) from the Affymetrix 6.0 SNPchip. After some population genetics analyses I defined a subset of particular interest to me (i.e. showing signal of selection). I would like to analyze the subset of SNPs (or, better, associated genes) in order to test for gene enrichment for gene ontology categories. My first question is: are GOstats and topGO the right tools to perform this analysis on the kind of data I have (lists of genes as text files)? And if yes... I started "playing around" with Bioconductor and I got stuck with the association: I could not find the way to tell to the program that I used the Affymetrix 6.0 SNPchip. Could you point me towards some link or document helping me going through all passages needed to do the analyses I need? Thanks a lot for you help Michela Leonardi -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.0 -- Sent via the guest posting facility at bioconductor.org.
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States
Hi Michela, On 6/17/2014 2:14 PM, Michela Leonardi [guest] wrote: > Dear All, I am very new to Bioconductor and to Gene Ontology > analyses, so please forgive me if my question is trivial. I have as > "universe" a list of SNPs (not all of them) from the Affymetrix 6.0 > SNPchip. After some population genetics analyses I defined a subset > of particular interest to me (i.e. showing signal of selection). I > would like to analyze the subset of SNPs (or, better, associated > genes) in order to test for gene enrichment for gene ontology > categories. > > My first question is: are GOstats and topGO the right tools to > perform this analysis on the kind of data I have (lists of genes as > text files)? > > And if yes... I started "playing around" with Bioconductor and I got > stuck with the association: I could not find the way to tell to the > program that I used the Affymetrix 6.0 SNPchip. Could you point me > towards some link or document helping me going through all passages > needed to do the analyses I need? You are doing something unconventional, so you will not likely find anything that shows what to do. But note that (at least GOstats) is based on Gene IDs, so you need to map your SNPs to their 'associated' genes, and then get the Gene IDs (what used to be known as Entrez Gene IDs). Your universe will be the set of Gene IDs for which your universe of SNPs are associated. I have no idea how you are associating SNPs with genes, but the org.Hs.eg.db package is your friend. Say you have gene symbols (you shouldn't be relying on such things, but bear with me). symbols <- <some code="" to="" get="" symbols="" goes="" here=""> library(org.Hs.eg.db) univ <- unique(Lkeys(org.Hs.eg.db)) egs <- select(org.Hs.eg.db, symbols, "ENTREZID","ALIAS") You may get a warning that you have one or more one-to-many mappings, which you may or may not decide to resolve. Then you just do the 'usual'; p <- new("GOHyperGParams", geneIds = unique(as.character(egs$ENTREZID)), universeGeneIds = univ, ontology = "BP", annotation = "org.Hs.eg.db") hyp <- hyperGTest(p) Best, Jim > > Thanks a lot for you help > > Michela Leonardi > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 > (64-bit) > > locale: [1] > it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8 > > attached base packages: [1] stats graphics grDevices utils > datasets methods base > > loaded via a namespace (and not attached): [1] tools_3.1.0 > > -- Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ADD COMMENT 0 Entering edit mode Dear Jim, thanks a lot for your quick and very useful answer. I finally used the following code allGenes <- read file with all the genes in my dataset (conversion from SNP to gene done with Webgestalt) univ <- select(org.Hs.eg.db, allGenes, "ENTREZID","ALIAS") subSet <- read file with ?interesting? genes set <- select(org.Hs.eg.db, subSet, "ENTREZID","ALIAS") p <- new("GOHyperGParams", geneIds = unique(as.character(set$ENTREZID)), universeGeneIds = univ, ontology = "BP", annotation = "org.Hs.eg.db?) since I want to test the interesting genes versus all the genes in my set, and not the genes in my set versus all the human genes. Thanks a lot again Michela Il giorno 17/giu/2014, alle ore 20:39, James W. MacDonald <jmacdon at="" uw.edu=""> ha scritto: > Hi Michela, > > On 6/17/2014 2:14 PM, Michela Leonardi [guest] wrote: >> Dear All, I am very new to Bioconductor and to Gene Ontology >> analyses, so please forgive me if my question is trivial. I have as >> "universe" a list of SNPs (not all of them) from the Affymetrix 6.0 >> SNPchip. After some population genetics analyses I defined a subset >> of particular interest to me (i.e. showing signal of selection). I >> would like to analyze the subset of SNPs (or, better, associated >> genes) in order to test for gene enrichment for gene ontology >> categories. >> >> My first question is: are GOstats and topGO the right tools to >> perform this analysis on the kind of data I have (lists of genes as >> text files)? >> >> And if yes... I started "playing around" with Bioconductor and I got >> stuck with the association: I could not find the way to tell to the >> program that I used the Affymetrix 6.0 SNPchip. Could you point me >> towards some link or document helping me going through all passages >> needed to do the analyses I need? > > You are doing something unconventional, so you will not likely find anything that shows what to do. > > But note that (at least GOstats) is based on Gene IDs, so you need to map your SNPs to their 'associated' genes, and then get the Gene IDs (what used to be known as Entrez Gene IDs). > > Your universe will be the set of Gene IDs for which your universe of SNPs are associated. I have no idea how you are associating SNPs with genes, but the org.Hs.eg.db package is your friend. Say you have gene symbols (you shouldn't be relying on such things, but bear with me). > > symbols <- <some code="" to="" get="" symbols="" goes="" here=""> > library(org.Hs.eg.db) > univ <- unique(Lkeys(org.Hs.eg.db)) > egs <- select(org.Hs.eg.db, symbols, "ENTREZID","ALIAS") > > You may get a warning that you have one or more one-to-many mappings, which you may or may not decide to resolve. > > Then you just do the 'usual'; > > p <- new("GOHyperGParams", geneIds = unique(as.character(egs\$ENTREZID)), universeGeneIds = univ, ontology = "BP", annotation = "org.Hs.eg.db") > > hyp <- hyperGTest(p) > > Best, > > Jim > > >> >> Thanks a lot for you help >> >> Michela Leonardi >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 >> (64-bit) >> >> locale: [1] >> it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8 >> >> attached base packages: [1] stats graphics grDevices utils >> datasets methods base >> >> loaded via a namespace (and not attached): [1] tools_3.1.0 >> >> -- Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 >