conversion of geneset species ID

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 10.5 years ago

United Kingdom

Dear List I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far: rm(list=ls()) library(biomaRt) library(GSEABase) setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/G SEAData/') cowGenes <- read.table('cowGenesENID.csv', header=F, sep='\t') cow = useMart("ensembl",dataset="btaurus_gene_ensembl") orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow) orth2 <- orth[which(orth[,2]!=''), ]#drop those with no human ortho orth3 <- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID head(orth3) This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id). broadSets <- getGmt('/home/iain/Documents/Work/Results/bovineMacRNADat a/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db')) broadSetsENS <- mapIdentifiers(broadSets, ENSEMBLIdentifier()) I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's. I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)? Thanks Iain > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GSEABase_1.14.0 graph_1.30.0 annotate_1.30.0 [4] org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 [7] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2 [10] biomaRt_2.8.1 loaded via a namespace (and not attached): [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 xtable_1.5-6 >

GSEABase GSEABase • 2.8k views

ADD COMMENT • link updated 14.4 years ago by Martin Morgan 25k • written 14.4 years ago by Iain Gallagher ▴ 930

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 1 day ago

United States

Hi Iain -- On 09/05/2011 07:57 AM, Iain Gallagher wrote: > Dear List > > I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far: > > rm(list=ls()) > library(biomaRt) > library(GSEABase) > > setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis /GSEAData/') > > cowGenes<- read.table('cowGenesENID.csv', header=F, sep='\t') > > cow = useMart("ensembl",dataset="btaurus_gene_ensembl") > orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow) > orth2<- orth[which(orth[,2]!=''), ]#drop those with no human ortho > > orth3<- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID > > head(orth3) > > > This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id). > > broadSets<- getGmt('/home/iain/Documents/Work/Results/bovineMacRNADa ta/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db')) > > broadSetsENS<- mapIdentifiers(broadSets, ENSEMBLIdentifier()) > > I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's. > > I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)? Not sure that I follow entirely, but along the lines of lst = lapply(broadSetsENS, function(gs, map) { huids = geneIds(gs) ## map, not sure what the columns are? geneIds(gs) = map[map$huids %in% huids, "cowids"] geneIdType(gs), ENSEMBLIdentifier() gs }, ortho3) GeneSetCollection(lst) This is a bit of a guess, could be more specific if you provided a reproducible example. Hope that helps, Martin > Thanks > > Iain > >> sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GSEABase_1.14.0 graph_1.30.0 annotate_1.30.0 > [4] org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 > [7] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2 > [10] biomaRt_2.8.1 > > loaded via a namespace (and not attached): > [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 xtable_1.5-6 >> > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 14.4 years ago Martin Morgan 25k

0

Entering edit mode

Dear Martin Thanks for your suggestion. I cannot get your function to work for me: > lst = lapply(broadSetsENS, function(gs, map) { + huids = geneIds(gs) + ## map, not sure what the columns are? + geneIds(gs) = map[map$humids %in% humids, "cowids"] + geneIdType(gs), ENSEMBLIdentifier() Error: unexpected ',' in: " geneIds(gs) = map[map$humids %in% huids, "cowids"] geneIdType(gs)," > gs Error: object 'gs' not found Below is a toy example of what I want to achieve (apologies for not including this before): #create a repoducible example for the GSEA problem library(biomaRt) library(GSEABase) # cow genes! cowGenes <- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404') #set up mart cow = useMart("ensembl",dataset="btaurus_gene_ensembl") # get ortho genes orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow) # drop those with no human ortho orth <- orth[which(orth[,2]!=''), ] colnames(orth) <- c('cowids', 'humids') #create a couple of genesets from human genes set1 <- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1') set2 <- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2') gsc <- GeneSetCollection(set1, set2) #create a couple of genesets from the same cow genes to illustrate hopeful outcome cowSet1 <- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1') cowSet2 <- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2') cowGsc <- GeneSetCollection(cowSet1, cowSet2) Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs. I'll continue playing with this but any further guidance would be appreciated. Best Iain --- On Mon, 5/9/11, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Subject: Re: [BioC] conversion of geneset species ID > To: "Iain Gallagher" <iaingallagher at="" btopenworld.com=""> > Cc: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> > Date: Monday, 5 September, 2011, 22:31 > Hi Iain -- > > On 09/05/2011 07:57 AM, Iain Gallagher wrote: > > Dear List > > > > I wonder if someone could help me re-annotate the > Broad c2 genesets from human to bovine IDs. Here's what I > have so far: > > > > rm(list=ls()) > > library(biomaRt) > > library(GSEABase) > > > > > setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis /GSEAData/') > > > > cowGenes<- read.table('cowGenesENID.csv', header=F, > sep='\t') > > > > cow = > useMart("ensembl",dataset="btaurus_gene_ensembl") > > orth = > getBM(c("ensembl_gene_id","human_ensembl_gene"), > filters="ensembl_gene_id",values = cowGenes[,1], mart = > cow) > > orth2<- orth[which(orth[,2]!=''), ]#drop those with > no human ortho > > > > orth3<- orth2[-which(duplicated(orth2[,1]) == > TRUE),]#get only unique mappings i.e. one cow ID to one > human ID > > > > head(orth3) > > > > > > This gets me a data frame of bovine ENSEMBL gene Ids > and the human ortholog (again ENSEMBL id). > > > > broadSets<- > getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysi s/GSEAData/c2.all.v3.0.entrez.gmt', > geneIdType = EntrezIdentifier('org.Hs.eg.db')) > > > > broadSetsENS<- mapIdentifiers(broadSets, > ENSEMBLIdentifier()) > > > > I now have the c2 Broad geneset with gene IDs > converted to human ENSEMBL ids. I would like to map the > postion of each of the ENSEMBL Ids in my dataframe (orth3) > and then substitute in the bovine id and the clean up any > NA's. > > > > I am at rather a loss as to how to do this and > wondered if someone with more familiarity with the GSEABase > would be able to help (or perhaps suggest a different > strategy!)? > > > Not sure that I follow entirely, but along the lines of > > ???lst = lapply(broadSetsENS, function(gs, > map) { > ? ? ? huids = geneIds(gs) > ? ? ? ## map, not sure what the columns > are? > ? ? ? geneIds(gs) = map[map$huids %in% > huids, "cowids"] > ? ? ? geneIdType(gs), ENSEMBLIdentifier() > ? ? ? gs > ???}, ortho3) > ???GeneSetCollection(lst) > > This is a bit of a guess, could be more specific if you > provided a > reproducible example. > > Hope that helps, > > Martin > > > Thanks > > > > Iain > > > >> sessionInfo() > > R version 2.13.1 (2011-07-08) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > >???[1] LC_CTYPE=en_GB.utf8? ? > ???LC_NUMERIC=C > >???[3] LC_TIME=en_GB.utf8? ? > ? ? LC_COLLATE=en_GB.utf8 > >???[5] LC_MONETARY=C? ? ? > ? ? ???LC_MESSAGES=en_GB.utf8 > >???[7] LC_PAPER=en_GB.utf8? ? > ???LC_NAME=C > >???[9] LC_ADDRESS=C? ? ? > ? ? ? ? LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats? ???graphics? > grDevices utils? ???datasets? > methods???base > > > > other attached packages: > >???[1] GSEABase_1.14.0? ? > ? graph_1.30.0? ? ? > ???annotate_1.30.0 > >???[4] > org.Hs.eg.db_2.5.0???org.Bt.eg.db_2.5.0???RSQLite_0.9-4 > >???[7] DBI_0.2-5? ? ? > ? ? ? AnnotationDbi_1.14.1 Biobase_2.12.2 > > [10] biomaRt_2.8.1 > > > > loaded via a namespace (and not attached): > > [1] RCurl_1.6-9? tools_2.13.1 XML_3.4-2? > ? xtable_1.5-6 > >> > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >

ADD REPLY • link 14.4 years ago Iain Gallagher ▴ 930

0

Entering edit mode

On 09/06/2011 03:18 AM, Iain Gallagher wrote: > Dear Martin > > Thanks for your suggestion. I cannot get your function to work for me: > >> lst = lapply(broadSetsENS, function(gs, map) { > + huids = geneIds(gs) > + ## map, not sure what the columns are? > + geneIds(gs) = map[map$humids %in% humids, "cowids"] > + geneIdType(gs), ENSEMBLIdentifier() typo, but for your example below taking the same approach lst = lapply(gsc, function(gs, map) { geneIds(gs) = with(map, cowids[humids %in% geneIds(gs)]) gs }, orth) cowGsc = GeneSetCollection(lst) Martin > Error: unexpected ',' in: > " geneIds(gs) = map[map$humids %in% huids, "cowids"] > geneIdType(gs)," >> gs > Error: object 'gs' not found > > Below is a toy example of what I want to achieve (apologies for not including this before): > > #create a repoducible example for the GSEA problem > library(biomaRt) > library(GSEABase) > > # cow genes! > cowGenes<- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404') > > #set up mart > cow = useMart("ensembl",dataset="btaurus_gene_ensembl") > > # get ortho genes > orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow) > > # drop those with no human ortho > orth<- orth[which(orth[,2]!=''), ] > colnames(orth)<- c('cowids', 'humids') > > #create a couple of genesets from human genes > set1<- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1') > set2<- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2') > gsc<- GeneSetCollection(set1, set2) > > #create a couple of genesets from the same cow genes to illustrate hopeful outcome > cowSet1<- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1') > cowSet2<- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2') > cowGsc<- GeneSetCollection(cowSet1, cowSet2) > > > Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs. > > > I'll continue playing with this but any further guidance would be appreciated. > > Best > > Iain > > --- On Mon, 5/9/11, Martin Morgan<mtmorgan at="" fhcrc.org=""> wrote: > >> From: Martin Morgan<mtmorgan at="" fhcrc.org=""> >> Subject: Re: [BioC] conversion of geneset species ID >> To: "Iain Gallagher"<iaingallagher at="" btopenworld.com=""> >> Cc: "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> >> Date: Monday, 5 September, 2011, 22:31 >> Hi Iain -- >> >> On 09/05/2011 07:57 AM, Iain Gallagher wrote: >>> Dear List >>> >>> I wonder if someone could help me re-annotate the >> Broad c2 genesets from human to bovine IDs. Here's what I >> have so far: >>> >>> rm(list=ls()) >>> library(biomaRt) >>> library(GSEABase) >>> >>> >> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysi s/GSEAData/') >>> >>> cowGenes<- read.table('cowGenesENID.csv', header=F, >> sep='\t') >>> >>> cow = >> useMart("ensembl",dataset="btaurus_gene_ensembl") >>> orth = >> getBM(c("ensembl_gene_id","human_ensembl_gene"), >> filters="ensembl_gene_id",values = cowGenes[,1], mart = >> cow) >>> orth2<- orth[which(orth[,2]!=''), ]#drop those with >> no human ortho >>> >>> orth3<- orth2[-which(duplicated(orth2[,1]) == >> TRUE),]#get only unique mappings i.e. one cow ID to one >> human ID >>> >>> head(orth3) >>> >>> >>> This gets me a data frame of bovine ENSEMBL gene Ids >> and the human ortholog (again ENSEMBL id). >>> >>> broadSets<- >> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalys is/GSEAData/c2.all.v3.0.entrez.gmt', >> geneIdType = EntrezIdentifier('org.Hs.eg.db')) >>> >>> broadSetsENS<- mapIdentifiers(broadSets, >> ENSEMBLIdentifier()) >>> >>> I now have the c2 Broad geneset with gene IDs >> converted to human ENSEMBL ids. I would like to map the >> postion of each of the ENSEMBL Ids in my dataframe (orth3) >> and then substitute in the bovine id and the clean up any >> NA's. >>> >>> I am at rather a loss as to how to do this and >> wondered if someone with more familiarity with the GSEABase >> would be able to help (or perhaps suggest a different >> strategy!)? >> >> >> Not sure that I follow entirely, but along the lines of >> >> lst = lapply(broadSetsENS, function(gs, >> map) { >> huids = geneIds(gs) >> ## map, not sure what the columns >> are? >> geneIds(gs) = map[map$huids %in% >> huids, "cowids"] >> geneIdType(gs), ENSEMBLIdentifier() >> gs >> }, ortho3) >> GeneSetCollection(lst) >> >> This is a bit of a guess, could be more specific if you >> provided a >> reproducible example. >> >> Hope that helps, >> >> Martin >> >>> Thanks >>> >>> Iain >>> >>>> sessionInfo() >>> R version 2.13.1 (2011-07-08) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.utf8 >> LC_NUMERIC=C >>> [3] LC_TIME=en_GB.utf8 >> LC_COLLATE=en_GB.utf8 >>> [5] LC_MONETARY=C >> LC_MESSAGES=en_GB.utf8 >>> [7] LC_PAPER=en_GB.utf8 >> LC_NAME=C >>> [9] LC_ADDRESS=C >> LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics >> grDevices utils datasets >> methods base >>> >>> other attached packages: >>> [1] GSEABase_1.14.0 >> graph_1.30.0 >> annotate_1.30.0 >>> [4] >> org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 >>> [7] DBI_0.2-5 >> AnnotationDbi_1.14.1 Biobase_2.12.2 >>> [10] biomaRt_2.8.1 >>> >>> loaded via a namespace (and not attached): >>> [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 >> xtable_1.5-6 >>>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 14.4 years ago Martin Morgan 25k

0

Entering edit mode

Thank you Martin. This is just what I wanted. Best Iain --- On Tue, 6/9/11, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Subject: Re: [BioC] conversion of geneset species ID > To: "Iain Gallagher" <iaingallagher at="" btopenworld.com=""> > Cc: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> > Date: Tuesday, 6 September, 2011, 14:03 > On 09/06/2011 03:18 AM, Iain > Gallagher wrote: > > Dear Martin > > > > Thanks for your suggestion. I cannot get your function > to work for me: > > > >>? ???lst = lapply(broadSetsENS, > function(gs, map) { > > +? ? ???huids = geneIds(gs) > > +? ? ???## map, not sure what > the columns are? > > +? ? ???geneIds(gs) = > map[map$humids %in% humids, "cowids"] > > +? ? ???geneIdType(gs), > ENSEMBLIdentifier() > > typo, but for your example below taking the same approach > > ???lst = lapply(gsc, function(gs, map) { > ? ? ? geneIds(gs) = with(map, cowids[humids > %in% geneIds(gs)]) > ? ? ? gs > ???}, orth) > ???cowGsc = GeneSetCollection(lst) > > Martin > > > Error: unexpected ',' in: > > "? ? ? geneIds(gs) = map[map$humids > %in% huids, "cowids"] > >? ? ? ? geneIdType(gs)," > >>? ? ? ? gs > > Error: object 'gs' not found > > > > Below is a toy example of what I want to achieve > (apologies for not including this before): > > > > #create a repoducible example for the GSEA problem > > library(biomaRt) > > library(GSEABase) > > > > # cow genes! > > cowGenes<- c('ENSBTAG00000003825', > 'ENSBTAG00000015185', 'ENSBTAG00000001068', > 'ENSBTAG00000017500', 'ENSBTAG00000012288', > 'ENSBTAG00000031901', 'ENSBTAG00000006103', > 'ENSBTAG00000003882', 'ENSBTAG00000026829', > 'ENSBTAG00000037404') > > > > #set up mart > > cow = > useMart("ensembl",dataset="btaurus_gene_ensembl") > > > > # get ortho genes > > orth = > getBM(c("ensembl_gene_id","human_ensembl_gene"), > filters="ensembl_gene_id", values = cowGenes, mart = cow) > > > > # drop those with no human ortho > > orth<- orth[which(orth[,2]!=''), ] > > colnames(orth)<- c('cowids', 'humids') > > > > #create a couple of genesets from human genes > > set1<- GeneSet(orth$humids[1:5], geneIdType = > ENSEMBLIdentifier(), setName = 'set1') > > set2<- GeneSet(orth$humids[3:9], geneIdType = > ENSEMBLIdentifier(), setName = 'set2') > > gsc<- GeneSetCollection(set1, set2) > > > > #create a couple of genesets from the same cow genes > to illustrate hopeful outcome > > cowSet1<- GeneSet(orth$cowids[1:5], geneIdType = > ENSEMBLIdentifier(), setName = 'Cowset1') > > cowSet2<- GeneSet(orth$cowids[3:9], geneIdType = > ENSEMBLIdentifier(), setName = 'Cowset2') > > cowGsc<- GeneSetCollection(cowSet1, cowSet2) > > > > > > Basically I'd like to go from gsc to cowGsc using the > dataframe mapping orthologs. > > > > > > I'll continue playing with this but any further > guidance would be appreciated. > > > > Best > > > > Iain > > > > --- On Mon, 5/9/11, Martin Morgan<mtmorgan at="" fhcrc.org="">? > wrote: > > > >> From: Martin Morgan<mtmorgan at="" fhcrc.org=""> > >> Subject: Re: [BioC] conversion of geneset species > ID > >> To: "Iain Gallagher"<iaingallagher at="" btopenworld.com=""> > >> Cc: "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> > >> Date: Monday, 5 September, 2011, 22:31 > >> Hi Iain -- > >> > >> On 09/05/2011 07:57 AM, Iain Gallagher wrote: > >>> Dear List > >>> > >>> I wonder if someone could help me re-annotate > the > >> Broad c2 genesets from human to bovine IDs. Here's > what I > >> have so far: > >>> > >>> rm(list=ls()) > >>> library(biomaRt) > >>> library(GSEABase) > >>> > >>> > >> > setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis /GSEAData/') > >>> > >>> cowGenes<- read.table('cowGenesENID.csv', > header=F, > >> sep='\t') > >>> > >>> cow = > >> useMart("ensembl",dataset="btaurus_gene_ensembl") > >>> orth = > >> getBM(c("ensembl_gene_id","human_ensembl_gene"), > >> filters="ensembl_gene_id",values = cowGenes[,1], > mart = > >> cow) > >>> orth2<- orth[which(orth[,2]!=''), ]#drop > those with > >> no human ortho > >>> > >>> orth3<- orth2[-which(duplicated(orth2[,1]) > == > >> TRUE),]#get only unique mappings i.e. one cow ID > to one > >> human ID > >>> > >>> head(orth3) > >>> > >>> > >>> This gets me a data frame of bovine ENSEMBL > gene Ids > >> and the human ortholog (again ENSEMBL id). > >>> > >>> broadSets<- > >> > getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysi s/GSEAData/c2.all.v3.0.entrez.gmt', > >> geneIdType = EntrezIdentifier('org.Hs.eg.db')) > >>> > >>> broadSetsENS<- mapIdentifiers(broadSets, > >> ENSEMBLIdentifier()) > >>> > >>> I now have the c2 Broad geneset with gene IDs > >> converted to human ENSEMBL ids. I would like to > map the > >> postion of each of the ENSEMBL Ids in my dataframe > (orth3) > >> and then substitute in the bovine id and the clean > up any > >> NA's. > >>> > >>> I am at rather a loss as to how to do this > and > >> wondered if someone with more familiarity with the > GSEABase > >> would be able to help (or perhaps suggest a > different > >> strategy!)? > >> > >> > >> Not sure that I follow entirely, but along the > lines of > >> > >>? ???lst = lapply(broadSetsENS, > function(gs, > >> map) { > >>? ? ? ? huids = geneIds(gs) > >>? ? ? ? ## map, not sure what > the columns > >> are? > >>? ? ? ? geneIds(gs) = > map[map$huids %in% > >> huids, "cowids"] > >>? ? ? ? geneIdType(gs), > ENSEMBLIdentifier() > >>? ? ? ? gs > >>? ???}, ortho3) > >>? ???GeneSetCollection(lst) > >> > >> This is a bit of a guess, could be more specific > if you > >> provided a > >> reproducible example. > >> > >> Hope that helps, > >> > >> Martin > >> > >>> Thanks > >>> > >>> Iain > >>> > >>>> sessionInfo() > >>> R version 2.13.1 (2011-07-08) > >>> Platform: x86_64-pc-linux-gnu (64-bit) > >>> > >>> locale: > >>>? ???[1] > LC_CTYPE=en_GB.utf8 > >>? ???LC_NUMERIC=C > >>>? ???[3] > LC_TIME=en_GB.utf8 > >>? ? ? LC_COLLATE=en_GB.utf8 > >>>? ???[5] LC_MONETARY=C > >>? ? ? > ???LC_MESSAGES=en_GB.utf8 > >>>? ???[7] > LC_PAPER=en_GB.utf8 > >>? ???LC_NAME=C > >>>? ???[9] LC_ADDRESS=C > >>? ? ? ? ? LC_TELEPHONE=C > >>> [11] LC_MEASUREMENT=en_GB.utf8 > LC_IDENTIFICATION=C > >>> > >>> attached base packages: > >>> [1] stats? ???graphics > >> grDevices utils? ???datasets > >> methods???base > >>> > >>> other attached packages: > >>>? ???[1] GSEABase_1.14.0 > >>? ? graph_1.30.0 > >>? ???annotate_1.30.0 > >>>? ???[4] > >> > org.Hs.eg.db_2.5.0???org.Bt.eg.db_2.5.0???RSQLite_0.9-4 > >>>? ???[7] DBI_0.2-5 > >>? ? ? ? AnnotationDbi_1.14.1 > Biobase_2.12.2 > >>> [10] biomaRt_2.8.1 > >>> > >>> loaded via a namespace (and not attached): > >>> [1] RCurl_1.6-9? tools_2.13.1 XML_3.4-2 > >>? ? xtable_1.5-6 > >>>> > >>> > >>> > >>> > _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> -- > >> Computational Biology > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA > 98109 > >> > >> Location: M1-B861 > >> Telephone: 206 667-2793 > >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >

ADD REPLY • link 14.4 years ago Iain Gallagher ▴ 930

Login before adding your answer.