conversion of geneset species ID
1
0
Entering edit mode
@iain-gallagher-2532
Last seen 9.3 years ago
United Kingdom
Dear List I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far: rm(list=ls()) library(biomaRt) library(GSEABase) setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/G SEAData/') cowGenes <- read.table('cowGenesENID.csv', header=F, sep='\t') cow = useMart("ensembl",dataset="btaurus_gene_ensembl") orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow) orth2 <- orth[which(orth[,2]!=''), ]#drop those with no human ortho orth3 <- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID head(orth3) This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id). broadSets <- getGmt('/home/iain/Documents/Work/Results/bovineMacRNADat a/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db')) broadSetsENS <- mapIdentifiers(broadSets, ENSEMBLIdentifier()) I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's. I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)? Thanks Iain > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GSEABase_1.14.0 graph_1.30.0 annotate_1.30.0 [4] org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 [7] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2 [10] biomaRt_2.8.1 loaded via a namespace (and not attached): [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 xtable_1.5-6 >
GSEABase GSEABase • 2.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States
Hi Iain -- On 09/05/2011 07:57 AM, Iain Gallagher wrote: > Dear List > > I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far: > > rm(list=ls()) > library(biomaRt) > library(GSEABase) > > setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis /GSEAData/') > > cowGenes<- read.table('cowGenesENID.csv', header=F, sep='\t') > > cow = useMart("ensembl",dataset="btaurus_gene_ensembl") > orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow) > orth2<- orth[which(orth[,2]!=''), ]#drop those with no human ortho > > orth3<- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID > > head(orth3) > > > This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id). > > broadSets<- getGmt('/home/iain/Documents/Work/Results/bovineMacRNADa ta/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db')) > > broadSetsENS<- mapIdentifiers(broadSets, ENSEMBLIdentifier()) > > I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's. > > I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)? Not sure that I follow entirely, but along the lines of lst = lapply(broadSetsENS, function(gs, map) { huids = geneIds(gs) ## map, not sure what the columns are? geneIds(gs) = map[map$huids %in% huids, "cowids"] geneIdType(gs), ENSEMBLIdentifier() gs }, ortho3) GeneSetCollection(lst) This is a bit of a guess, could be more specific if you provided a reproducible example. Hope that helps, Martin > Thanks > > Iain > >> sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GSEABase_1.14.0 graph_1.30.0 annotate_1.30.0 > [4] org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 > [7] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2 > [10] biomaRt_2.8.1 > > loaded via a namespace (and not attached): > [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 xtable_1.5-6 >> > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Dear Martin Thanks for your suggestion. I cannot get your function to work for me: > lst = lapply(broadSetsENS, function(gs, map) { + huids = geneIds(gs) + ## map, not sure what the columns are? + geneIds(gs) = map[map$humids %in% humids, "cowids"] + geneIdType(gs), ENSEMBLIdentifier() Error: unexpected ',' in: " geneIds(gs) = map[map$humids %in% huids, "cowids"] geneIdType(gs)," > gs Error: object 'gs' not found Below is a toy example of what I want to achieve (apologies for not including this before): #create a repoducible example for the GSEA problem library(biomaRt) library(GSEABase) # cow genes! cowGenes <- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404') #set up mart cow = useMart("ensembl",dataset="btaurus_gene_ensembl") # get ortho genes orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow) # drop those with no human ortho orth <- orth[which(orth[,2]!=''), ] colnames(orth) <- c('cowids', 'humids') #create a couple of genesets from human genes set1 <- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1') set2 <- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2') gsc <- GeneSetCollection(set1, set2) #create a couple of genesets from the same cow genes to illustrate hopeful outcome cowSet1 <- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1') cowSet2 <- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2') cowGsc <- GeneSetCollection(cowSet1, cowSet2) Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs. I'll continue playing with this but any further guidance would be appreciated. Best Iain --- On Mon, 5/9/11, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Subject: Re: [BioC] conversion of geneset species ID > To: "Iain Gallagher" <iaingallagher at="" btopenworld.com=""> > Cc: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> > Date: Monday, 5 September, 2011, 22:31 > Hi Iain -- > > On 09/05/2011 07:57 AM, Iain Gallagher wrote: > > Dear List > > > > I wonder if someone could help me re-annotate the > Broad c2 genesets from human to bovine IDs. Here's what I > have so far: > > > > rm(list=ls()) > > library(biomaRt) > > library(GSEABase) > > > > > setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis /GSEAData/') > > > > cowGenes<- read.table('cowGenesENID.csv', header=F, > sep='\t') > > > > cow = > useMart("ensembl",dataset="btaurus_gene_ensembl") > > orth = > getBM(c("ensembl_gene_id","human_ensembl_gene"), > filters="ensembl_gene_id",values = cowGenes[,1], mart = > cow) > > orth2<- orth[which(orth[,2]!=''), ]#drop those with > no human ortho > > > > orth3<- orth2[-which(duplicated(orth2[,1]) == > TRUE),]#get only unique mappings i.e. one cow ID to one > human ID > > > > head(orth3) > > > > > > This gets me a data frame of bovine ENSEMBL gene Ids > and the human ortholog (again ENSEMBL id). > > > > broadSets<- > getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysi s/GSEAData/c2.all.v3.0.entrez.gmt', > geneIdType = EntrezIdentifier('org.Hs.eg.db')) > > > > broadSetsENS<- mapIdentifiers(broadSets, > ENSEMBLIdentifier()) > > > > I now have the c2 Broad geneset with gene IDs > converted to human ENSEMBL ids. I would like to map the > postion of each of the ENSEMBL Ids in my dataframe (orth3) > and then substitute in the bovine id and the clean up any > NA's. > > > > I am at rather a loss as to how to do this and > wondered if someone with more familiarity with the GSEABase > would be able to help (or perhaps suggest a different > strategy!)? > > > Not sure that I follow entirely, but along the lines of > > ???lst = lapply(broadSetsENS, function(gs, > map) { > ? ? ? huids = geneIds(gs) > ? ? ? ## map, not sure what the columns > are? > ? ? ? geneIds(gs) = map[map$huids %in% > huids, "cowids"] > ? ? ? geneIdType(gs), ENSEMBLIdentifier() > ? ? ? gs > ???}, ortho3) > ???GeneSetCollection(lst) > > This is a bit of a guess, could be more specific if you > provided a > reproducible example. > > Hope that helps, > > Martin > > > Thanks > > > > Iain > > > >> sessionInfo() > > R version 2.13.1 (2011-07-08) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > >???[1] LC_CTYPE=en_GB.utf8? ? > ???LC_NUMERIC=C > >???[3] LC_TIME=en_GB.utf8? ? > ? ? LC_COLLATE=en_GB.utf8 > >???[5] LC_MONETARY=C? ? ? > ? ? ???LC_MESSAGES=en_GB.utf8 > >???[7] LC_PAPER=en_GB.utf8? ? > ???LC_NAME=C > >???[9] LC_ADDRESS=C? ? ? > ? ? ? ? LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats? ???graphics? > grDevices utils? ???datasets? > methods???base > > > > other attached packages: > >???[1] GSEABase_1.14.0? ? > ? graph_1.30.0? ? ? > ???annotate_1.30.0 > >???[4] > org.Hs.eg.db_2.5.0???org.Bt.eg.db_2.5.0???RSQLite_0.9-4 > >???[7] DBI_0.2-5? ? ? > ? ? ? AnnotationDbi_1.14.1 Biobase_2.12.2 > > [10] biomaRt_2.8.1 > > > > loaded via a namespace (and not attached): > > [1] RCurl_1.6-9? tools_2.13.1 XML_3.4-2? > ? xtable_1.5-6 > >> > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >
ADD REPLY
0
Entering edit mode
On 09/06/2011 03:18 AM, Iain Gallagher wrote: > Dear Martin > > Thanks for your suggestion. I cannot get your function to work for me: > >> lst = lapply(broadSetsENS, function(gs, map) { > + huids = geneIds(gs) > + ## map, not sure what the columns are? > + geneIds(gs) = map[map$humids %in% humids, "cowids"] > + geneIdType(gs), ENSEMBLIdentifier() typo, but for your example below taking the same approach lst = lapply(gsc, function(gs, map) { geneIds(gs) = with(map, cowids[humids %in% geneIds(gs)]) gs }, orth) cowGsc = GeneSetCollection(lst) Martin > Error: unexpected ',' in: > " geneIds(gs) = map[map$humids %in% huids, "cowids"] > geneIdType(gs)," >> gs > Error: object 'gs' not found > > Below is a toy example of what I want to achieve (apologies for not including this before): > > #create a repoducible example for the GSEA problem > library(biomaRt) > library(GSEABase) > > # cow genes! > cowGenes<- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404') > > #set up mart > cow = useMart("ensembl",dataset="btaurus_gene_ensembl") > > # get ortho genes > orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow) > > # drop those with no human ortho > orth<- orth[which(orth[,2]!=''), ] > colnames(orth)<- c('cowids', 'humids') > > #create a couple of genesets from human genes > set1<- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1') > set2<- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2') > gsc<- GeneSetCollection(set1, set2) > > #create a couple of genesets from the same cow genes to illustrate hopeful outcome > cowSet1<- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1') > cowSet2<- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2') > cowGsc<- GeneSetCollection(cowSet1, cowSet2) > > > Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs. > > > I'll continue playing with this but any further guidance would be appreciated. > > Best > > Iain > > --- On Mon, 5/9/11, Martin Morgan<mtmorgan at="" fhcrc.org=""> wrote: > >> From: Martin Morgan<mtmorgan at="" fhcrc.org=""> >> Subject: Re: [BioC] conversion of geneset species ID >> To: "Iain Gallagher"<iaingallagher at="" btopenworld.com=""> >> Cc: "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> >> Date: Monday, 5 September, 2011, 22:31 >> Hi Iain -- >> >> On 09/05/2011 07:57 AM, Iain Gallagher wrote: >>> Dear List >>> >>> I wonder if someone could help me re-annotate the >> Broad c2 genesets from human to bovine IDs. Here's what I >> have so far: >>> >>> rm(list=ls()) >>> library(biomaRt) >>> library(GSEABase) >>> >>> >> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysi s/GSEAData/') >>> >>> cowGenes<- read.table('cowGenesENID.csv', header=F, >> sep='\t') >>> >>> cow = >> useMart("ensembl",dataset="btaurus_gene_ensembl") >>> orth = >> getBM(c("ensembl_gene_id","human_ensembl_gene"), >> filters="ensembl_gene_id",values = cowGenes[,1], mart = >> cow) >>> orth2<- orth[which(orth[,2]!=''), ]#drop those with >> no human ortho >>> >>> orth3<- orth2[-which(duplicated(orth2[,1]) == >> TRUE),]#get only unique mappings i.e. one cow ID to one >> human ID >>> >>> head(orth3) >>> >>> >>> This gets me a data frame of bovine ENSEMBL gene Ids >> and the human ortholog (again ENSEMBL id). >>> >>> broadSets<- >> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalys is/GSEAData/c2.all.v3.0.entrez.gmt', >> geneIdType = EntrezIdentifier('org.Hs.eg.db')) >>> >>> broadSetsENS<- mapIdentifiers(broadSets, >> ENSEMBLIdentifier()) >>> >>> I now have the c2 Broad geneset with gene IDs >> converted to human ENSEMBL ids. I would like to map the >> postion of each of the ENSEMBL Ids in my dataframe (orth3) >> and then substitute in the bovine id and the clean up any >> NA's. >>> >>> I am at rather a loss as to how to do this and >> wondered if someone with more familiarity with the GSEABase >> would be able to help (or perhaps suggest a different >> strategy!)? >> >> >> Not sure that I follow entirely, but along the lines of >> >> lst = lapply(broadSetsENS, function(gs, >> map) { >> huids = geneIds(gs) >> ## map, not sure what the columns >> are? >> geneIds(gs) = map[map$huids %in% >> huids, "cowids"] >> geneIdType(gs), ENSEMBLIdentifier() >> gs >> }, ortho3) >> GeneSetCollection(lst) >> >> This is a bit of a guess, could be more specific if you >> provided a >> reproducible example. >> >> Hope that helps, >> >> Martin >> >>> Thanks >>> >>> Iain >>> >>>> sessionInfo() >>> R version 2.13.1 (2011-07-08) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.utf8 >> LC_NUMERIC=C >>> [3] LC_TIME=en_GB.utf8 >> LC_COLLATE=en_GB.utf8 >>> [5] LC_MONETARY=C >> LC_MESSAGES=en_GB.utf8 >>> [7] LC_PAPER=en_GB.utf8 >> LC_NAME=C >>> [9] LC_ADDRESS=C >> LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics >> grDevices utils datasets >> methods base >>> >>> other attached packages: >>> [1] GSEABase_1.14.0 >> graph_1.30.0 >> annotate_1.30.0 >>> [4] >> org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 >>> [7] DBI_0.2-5 >> AnnotationDbi_1.14.1 Biobase_2.12.2 >>> [10] biomaRt_2.8.1 >>> >>> loaded via a namespace (and not attached): >>> [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 >> xtable_1.5-6 >>>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
Thank you Martin. This is just what I wanted. Best Iain --- On Tue, 6/9/11, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Subject: Re: [BioC] conversion of geneset species ID > To: "Iain Gallagher" <iaingallagher at="" btopenworld.com=""> > Cc: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> > Date: Tuesday, 6 September, 2011, 14:03 > On 09/06/2011 03:18 AM, Iain > Gallagher wrote: > > Dear Martin > > > > Thanks for your suggestion. I cannot get your function > to work for me: > > > >>? ???lst = lapply(broadSetsENS, > function(gs, map) { > > +? ? ???huids = geneIds(gs) > > +? ? ???## map, not sure what > the columns are? > > +? ? ???geneIds(gs) = > map[map$humids %in% humids, "cowids"] > > +? ? ???geneIdType(gs), > ENSEMBLIdentifier() > > typo, but for your example below taking the same approach > > ???lst = lapply(gsc, function(gs, map) { > ? ? ? geneIds(gs) = with(map, cowids[humids > %in% geneIds(gs)]) > ? ? ? gs > ???}, orth) > ???cowGsc = GeneSetCollection(lst) > > Martin > > > Error: unexpected ',' in: > > "? ? ? geneIds(gs) = map[map$humids > %in% huids, "cowids"] > >? ? ? ? geneIdType(gs)," > >>? ? ? ? gs > > Error: object 'gs' not found > > > > Below is a toy example of what I want to achieve > (apologies for not including this before): > > > > #create a repoducible example for the GSEA problem > > library(biomaRt) > > library(GSEABase) > > > > # cow genes! > > cowGenes<- c('ENSBTAG00000003825', > 'ENSBTAG00000015185', 'ENSBTAG00000001068', > 'ENSBTAG00000017500', 'ENSBTAG00000012288', > 'ENSBTAG00000031901', 'ENSBTAG00000006103', > 'ENSBTAG00000003882', 'ENSBTAG00000026829', > 'ENSBTAG00000037404') > > > > #set up mart > > cow = > useMart("ensembl",dataset="btaurus_gene_ensembl") > > > > # get ortho genes > > orth = > getBM(c("ensembl_gene_id","human_ensembl_gene"), > filters="ensembl_gene_id", values = cowGenes, mart = cow) > > > > # drop those with no human ortho > > orth<- orth[which(orth[,2]!=''), ] > > colnames(orth)<- c('cowids', 'humids') > > > > #create a couple of genesets from human genes > > set1<- GeneSet(orth$humids[1:5], geneIdType = > ENSEMBLIdentifier(), setName = 'set1') > > set2<- GeneSet(orth$humids[3:9], geneIdType = > ENSEMBLIdentifier(), setName = 'set2') > > gsc<- GeneSetCollection(set1, set2) > > > > #create a couple of genesets from the same cow genes > to illustrate hopeful outcome > > cowSet1<- GeneSet(orth$cowids[1:5], geneIdType = > ENSEMBLIdentifier(), setName = 'Cowset1') > > cowSet2<- GeneSet(orth$cowids[3:9], geneIdType = > ENSEMBLIdentifier(), setName = 'Cowset2') > > cowGsc<- GeneSetCollection(cowSet1, cowSet2) > > > > > > Basically I'd like to go from gsc to cowGsc using the > dataframe mapping orthologs. > > > > > > I'll continue playing with this but any further > guidance would be appreciated. > > > > Best > > > > Iain > > > > --- On Mon, 5/9/11, Martin Morgan<mtmorgan at="" fhcrc.org="">? > wrote: > > > >> From: Martin Morgan<mtmorgan at="" fhcrc.org=""> > >> Subject: Re: [BioC] conversion of geneset species > ID > >> To: "Iain Gallagher"<iaingallagher at="" btopenworld.com=""> > >> Cc: "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> > >> Date: Monday, 5 September, 2011, 22:31 > >> Hi Iain -- > >> > >> On 09/05/2011 07:57 AM, Iain Gallagher wrote: > >>> Dear List > >>> > >>> I wonder if someone could help me re-annotate > the > >> Broad c2 genesets from human to bovine IDs. Here's > what I > >> have so far: > >>> > >>> rm(list=ls()) > >>> library(biomaRt) > >>> library(GSEABase) > >>> > >>> > >> > setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis /GSEAData/') > >>> > >>> cowGenes<- read.table('cowGenesENID.csv', > header=F, > >> sep='\t') > >>> > >>> cow = > >> useMart("ensembl",dataset="btaurus_gene_ensembl") > >>> orth = > >> getBM(c("ensembl_gene_id","human_ensembl_gene"), > >> filters="ensembl_gene_id",values = cowGenes[,1], > mart = > >> cow) > >>> orth2<- orth[which(orth[,2]!=''), ]#drop > those with > >> no human ortho > >>> > >>> orth3<- orth2[-which(duplicated(orth2[,1]) > == > >> TRUE),]#get only unique mappings i.e. one cow ID > to one > >> human ID > >>> > >>> head(orth3) > >>> > >>> > >>> This gets me a data frame of bovine ENSEMBL > gene Ids > >> and the human ortholog (again ENSEMBL id). > >>> > >>> broadSets<- > >> > getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysi s/GSEAData/c2.all.v3.0.entrez.gmt', > >> geneIdType = EntrezIdentifier('org.Hs.eg.db')) > >>> > >>> broadSetsENS<- mapIdentifiers(broadSets, > >> ENSEMBLIdentifier()) > >>> > >>> I now have the c2 Broad geneset with gene IDs > >> converted to human ENSEMBL ids. I would like to > map the > >> postion of each of the ENSEMBL Ids in my dataframe > (orth3) > >> and then substitute in the bovine id and the clean > up any > >> NA's. > >>> > >>> I am at rather a loss as to how to do this > and > >> wondered if someone with more familiarity with the > GSEABase > >> would be able to help (or perhaps suggest a > different > >> strategy!)? > >> > >> > >> Not sure that I follow entirely, but along the > lines of > >> > >>? ???lst = lapply(broadSetsENS, > function(gs, > >> map) { > >>? ? ? ? huids = geneIds(gs) > >>? ? ? ? ## map, not sure what > the columns > >> are? > >>? ? ? ? geneIds(gs) = > map[map$huids %in% > >> huids, "cowids"] > >>? ? ? ? geneIdType(gs), > ENSEMBLIdentifier() > >>? ? ? ? gs > >>? ???}, ortho3) > >>? ???GeneSetCollection(lst) > >> > >> This is a bit of a guess, could be more specific > if you > >> provided a > >> reproducible example. > >> > >> Hope that helps, > >> > >> Martin > >> > >>> Thanks > >>> > >>> Iain > >>> > >>>> sessionInfo() > >>> R version 2.13.1 (2011-07-08) > >>> Platform: x86_64-pc-linux-gnu (64-bit) > >>> > >>> locale: > >>>? ???[1] > LC_CTYPE=en_GB.utf8 > >>? ???LC_NUMERIC=C > >>>? ???[3] > LC_TIME=en_GB.utf8 > >>? ? ? LC_COLLATE=en_GB.utf8 > >>>? ???[5] LC_MONETARY=C > >>? ? ? > ???LC_MESSAGES=en_GB.utf8 > >>>? ???[7] > LC_PAPER=en_GB.utf8 > >>? ???LC_NAME=C > >>>? ???[9] LC_ADDRESS=C > >>? ? ? ? ? LC_TELEPHONE=C > >>> [11] LC_MEASUREMENT=en_GB.utf8 > LC_IDENTIFICATION=C > >>> > >>> attached base packages: > >>> [1] stats? ???graphics > >> grDevices utils? ???datasets > >> methods???base > >>> > >>> other attached packages: > >>>? ???[1] GSEABase_1.14.0 > >>? ? graph_1.30.0 > >>? ???annotate_1.30.0 > >>>? ???[4] > >> > org.Hs.eg.db_2.5.0???org.Bt.eg.db_2.5.0???RSQLite_0.9-4 > >>>? ???[7] DBI_0.2-5 > >>? ? ? ? AnnotationDbi_1.14.1 > Biobase_2.12.2 > >>> [10] biomaRt_2.8.1 > >>> > >>> loaded via a namespace (and not attached): > >>> [1] RCurl_1.6-9? tools_2.13.1 XML_3.4-2 > >>? ? xtable_1.5-6 > >>>> > >>> > >>> > >>> > _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> -- > >> Computational Biology > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA > 98109 > >> > >> Location: M1-B861 > >> Telephone: 206 667-2793 > >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >
ADD REPLY

Login before adding your answer.

Traffic: 968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6