help with biomart
2
0
Entering edit mode
Tereza Roca ▴ 20
@tereza-roca-3830
Last seen 9.6 years ago
I found something wrong with biomart: if I request an illumina ID from an esembl gene ID I obtain the following: > getBM(attributes = c("ensembl_gene_id","illumina_humanwg_6_v1"), filters="ensembl_gene_id", values = "ENSG00000165891", mart = ensembl) ensembl_gene_id illumina_humanwg_6_v1 1 ENSG00000165891 NA 2 ENSG00000165891 2350152 this is fine (altough why is there a NA?) but if I request the contrary (from illumina to gene ID) I don't obtain anything: > getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), filters="illumina_humanwg_6_v1", values = c("2350152"), mart = ensembl) [1] illumina_humanwg_6_v1 ensembl_gene_id <0 rows> (or 0-length row.names) is this an error? or am I making some mistakes in the way I request it? Please advice thank you Tereza [[alternative HTML version deleted]]
• 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Hi Tereza, Tereza Roca wrote: > I found something wrong with biomart: if I request an illumina ID from an esembl gene ID I obtain the following: > >> getBM(attributes = c("ensembl_gene_id","illumina_humanwg_6_v1"), filters="ensembl_gene_id", values = "ENSG00000165891", mart = ensembl) > ensembl_gene_id illumina_humanwg_6_v1 > 1 ENSG00000165891 NA > 2 ENSG00000165891 2350152 > > this is fine (altough why is there a NA?) > > but if I request the contrary (from illumina to gene ID) I don't obtain anything: > >> getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), filters="illumina_humanwg_6_v1", values = c("2350152"), mart = ensembl) > [1] illumina_humanwg_6_v1 ensembl_gene_id > <0 rows> (or 0-length row.names) > > is this an error? or am I making some mistakes in the way I request it? Please advice Well, you aren't doing the correct query, but I don't know if I would call it a mistake (or a weird 'feature' of how Illumina IDs are coded in the Biomart database). I figured this out by doing your first query at the Biomart server, which returned 0002350152 for the Illumina ID. > getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), filters="illumina_humanwg_6_v1", "0002350152",mart) illumina_humanwg_6_v1 ensembl_gene_id 1 2350152 ENSG00000165891 Best, Jim > > thank you > > Tereza > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
thank you. this is bizarre... but now it works... shouldn't this be fixed so that output of your query agrees with the input you have to provide for the query? ________________________________ From: James W. MacDonald <jmacdon@med.umich.edu> Cc: bioconductor@stat.math.ethz.ch Sent: Wed, 2 December, 2009 14:41:50 Subject: Re: [BioC] help with biomart Hi Tereza, Tereza Roca wrote: > I found something wrong with biomart: if I request an illumina ID from an esembl gene ID I obtain the following: >> getBM(attributes = c("ensembl_gene_id","illumina_humanwg_6_v1"), filters="ensembl_gene_id", values = "ENSG00000165891", mart = ensembl) > ensembl_gene_id illumina_humanwg_6_v1 > 1 ENSG00000165891 NA > 2 ENSG00000165891 2350152 > > this is fine (altough why is there a NA?) > > but if I request the contrary (from illumina to gene ID) I don't obtain anything: > >> getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), filters="illumina_humanwg_6_v1", values = c("2350152"), mart = ensembl) > [1] illumina_humanwg_6_v1 ensembl_gene_id <0 rows> (or 0-length row.names) > > is this an error? or am I making some mistakes in the way I request it? Please advice Well, you aren't doing the correct query, but I don't know if I would call it a mistake (or a weird 'feature' of how Illumina IDs are coded in the Biomart database). I figured this out by doing your first query at the Biomart server, which returned 0002350152 for the Illumina ID. > getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), filters="illumina_humanwg_6_v1", "0002350152",mart) illumina_humanwg_6_v1 ensembl_gene_id 1 2350152 ENSG00000165891 Best, Jim > > thank you > > Tereza > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Tereza Roca wrote: > thank you. this is bizarre... but now it works... > shouldn't this be fixed so that output of your query agrees with the > input you have to provide for the query? Theoretically, yes. However it would be more difficult than you would think. Note that R automatically truncates leading zeros: > 0002350152 [1] 2350152 So when the Biomart server returns the Illumina ID (as a number), R automatically truncates the leading zeros. One could use sprintf() to keep the leading zeros, but you have to know a priori the width of the field: > sprintf("%010d", 0002350152) [1] "0002350152" There may be a trick to check the number of digits for a number prior to R truncating leading zeros, but if there is one, I don't know what it would be. If not, you would have to hard code the above sprintf() call to use 10 digits, assuming that all Illumina IDs have 10 digits. But what if not all Illumina IDs are 10 digits? What if they change to 7 digits in a future chip? As you can imagine, hard coding things like this can lead to a nightmare for the maintainer. So although this isn't an ideal situation I can't imagine things will change. Best, Jim > > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon at="" med.umich.edu=""> > *To:* Tereza Roca <rocatereza at="" yahoo.co.uk=""> > *Cc:* bioconductor at stat.math.ethz.ch > *Sent:* Wed, 2 December, 2009 14:41:50 > *Subject:* Re: [BioC] help with biomart > > Hi Tereza, > > > Tereza Roca wrote: > > I found something wrong with biomart: if I request an illumina ID > from an esembl gene ID I obtain the following: > >> getBM(attributes = c("ensembl_gene_id","illumina_humanwg_6_v1"), > filters="ensembl_gene_id", values = "ENSG00000165891", mart = ensembl) > > ensembl_gene_id illumina_humanwg_6_v1 > > 1 ENSG00000165891 NA > > 2 ENSG00000165891 2350152 > > > > this is fine (altough why is there a NA?) > > > > but if I request the contrary (from illumina to gene ID) I don't > obtain anything: > > > >> getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), > filters="illumina_humanwg_6_v1", values = c("2350152"), mart = ensembl) > > [1] illumina_humanwg_6_v1 ensembl_gene_id <0 rows> (or 0-length > row.names) > > > > is this an error? or am I making some mistakes in the way I request > it? Please advice > > Well, you aren't doing the correct query, but I don't know if I would > call it a mistake (or a weird 'feature' of how Illumina IDs are coded in > the Biomart database). I figured this out by doing your first query at > the Biomart server, which returned 0002350152 for the Illumina ID. > > > getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), > filters="illumina_humanwg_6_v1", "0002350152",mart) > illumina_humanwg_6_v1 ensembl_gene_id > 1 2350152 ENSG00000165891 > > > Best, > > Jim > > > > > > > > thank you > > > > Tereza > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M..S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not > be used for urgent or sensitive issues > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
I would expect that identifiers be treated as strings, not integers (even if an identifier happened to look like an integer most of the time -- it's not guaranteed to be true forever), no? We don't do any math on the identifiers, so why cast them to integers? -Aaron On Wed, Dec 2, 2009 at 10:38 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > > > Tereza Roca wrote: > >> thank you. this is bizarre... but now it works... >> shouldn't this be fixed so that output of your query agrees with the input >> you have to provide for the query? >> > > Theoretically, yes. However it would be more difficult than you would > think. Note that R automatically truncates leading zeros: > > > 0002350152 > [1] 2350152 > > So when the Biomart server returns the Illumina ID (as a number), R > automatically truncates the leading zeros. One could use sprintf() to keep > the leading zeros, but you have to know a priori the width of the field: > > > sprintf("%010d", 0002350152) > [1] "0002350152" > > There may be a trick to check the number of digits for a number prior to R > truncating leading zeros, but if there is one, I don't know what it would > be. If not, you would have to hard code the above sprintf() call to use 10 > digits, assuming that all Illumina IDs have 10 digits. But what if not all > Illumina IDs are 10 digits? What if they change to 7 digits in a future > chip? > > As you can imagine, hard coding things like this can lead to a nightmare > for the maintainer. So although this isn't an ideal situation I can't > imagine things will change. > > Best, > > Jim > > > > >> >> >> ------------------------------------------------------------------- ----- >> *From:* James W. MacDonald <jmacdon@med.umich.edu> >> *To:* Tereza Roca <rocatereza@yahoo.co.uk> >> *Cc:* bioconductor@stat.math.ethz.ch >> >> *Sent:* Wed, 2 December, 2009 14:41:50 >> *Subject:* Re: [BioC] help with biomart >> >> Hi Tereza, >> >> >> Tereza Roca wrote: >> > I found something wrong with biomart: if I request an illumina ID from >> an esembl gene ID I obtain the following: >> >> getBM(attributes = c("ensembl_gene_id","illumina_humanwg_6_v1"), >> filters="ensembl_gene_id", values = "ENSG00000165891", mart = ensembl) >> > ensembl_gene_id illumina_humanwg_6_v1 >> > 1 ENSG00000165891 NA >> > 2 ENSG00000165891 2350152 >> > >> > this is fine (altough why is there a NA?) >> > >> > but if I request the contrary (from illumina to gene ID) I don't obtain >> anything: >> > >> >> getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), >> filters="illumina_humanwg_6_v1", values = c("2350152"), mart = ensembl) >> > [1] illumina_humanwg_6_v1 ensembl_gene_id <0 rows> (or 0-length >> row.names) >> > >> > is this an error? or am I making some mistakes in the way I request >> it? Please advice >> >> Well, you aren't doing the correct query, but I don't know if I would call >> it a mistake (or a weird 'feature' of how Illumina IDs are coded in the >> Biomart database). I figured this out by doing your first query at the >> Biomart server, which returned 0002350152 for the Illumina ID. >> >> > getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), >> filters="illumina_humanwg_6_v1", "0002350152",mart) >> illumina_humanwg_6_v1 ensembl_gene_id >> 1 2350152 ENSG00000165891 >> >> >> Best, >> >> Jim >> >> >> >> >> > >> > thank you >> > >> > Tereza >> > >> > >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@stat.math.ethz.ch <mailto:bioconductor@stat.math.ethz.ch> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- James W. MacDonald, M..S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not >> be used for urgent or sensitive issues >> >> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > _______________________________________________ > > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Aaron Mackey wrote: > I would expect that identifiers be treated as strings, not integers > (even if an identifier happened to look like an integer most of the time > -- it's not guaranteed to be true forever), no? We don't do any math on > the identifiers, so why cast them to integers? I would think this is another maintenance issue, although I don't presume to speak for Steffen on why things work as they do. But think for a moment about what you are asking. The getBM() function is an all-purpose query device intended to get stuff from a database. Some of this stuff is numeric, and some of it is character. Some of the stuff that might be better treated as character looks like it is integer. There are multiple different Biomart servers that can be queried using getBM(): > listMarts() biomart 1 ensembl 2 snp 3 functional_genomics 4 vega 5 msd 6 bacterial_mart_3 7 fungal_mart_3 8 metazoa_mart_3 9 plant_mart_3 10 protist_mart_3 11 htgt 12 REACTOME 13 wormbase_current 14 dicty 15 rgd__mart 16 ipi_rat__mart 17 SSLP__mart 18 g4public 19 pride 20 intermart-1 21 uniprot_mart 22 ensembl_expressionmart_48 23 biomartDB 24 Eurexpress Biomart 25 pepseekerGOLD_mart06 26 Potato_01 27 Sweetpotato_01 28 Pancreatic_Expression 29 ENSEMBL_MART_ENSEMBL 30 GRAMENE_MARKER_30 31 GRAMENE_MAP_30 32 QTL_MART not to mention the archived Biomart servers that can be queried. So what you are asking is for Steffen to go through all those databases, look for all the data that appear to be integer, but should be kept as character, and then set up some functionality to make sure this happens. In addition, he would then need to maintain this over time as different data are added to the various Biomart servers that can be queried (and as new Biomart servers come on line). Sounds like fun to me! Best, Jim > > -Aaron > > On Wed, Dec 2, 2009 at 10:38 AM, James W. MacDonald > <jmacdon at="" med.umich.edu="" <mailto:jmacdon="" at="" med.umich.edu="">> wrote: > > > > Tereza Roca wrote: > > thank you. this is bizarre... but now it works... > shouldn't this be fixed so that output of your query agrees with > the input you have to provide for the query? > > > Theoretically, yes. However it would be more difficult than you > would think. Note that R automatically truncates leading zeros: > > > 0002350152 > [1] 2350152 > > So when the Biomart server returns the Illumina ID (as a number), R > automatically truncates the leading zeros. One could use sprintf() > to keep the leading zeros, but you have to know a priori the width > of the field: > > > sprintf("%010d", 0002350152) > [1] "0002350152" > > There may be a trick to check the number of digits for a number > prior to R truncating leading zeros, but if there is one, I don't > know what it would be. If not, you would have to hard code the above > sprintf() call to use 10 digits, assuming that all Illumina IDs have > 10 digits. But what if not all Illumina IDs are 10 digits? What if > they change to 7 digits in a future chip? > > As you can imagine, hard coding things like this can lead to a > nightmare for the maintainer. So although this isn't an ideal > situation I can't imagine things will change. > > Best, > > Jim > > > > > > > ------------------------------------------------------------ ------------ > *From:* James W. MacDonald <jmacdon at="" med.umich.edu=""> <mailto:jmacdon at="" med.umich.edu="">> > *To:* Tereza Roca <rocatereza at="" yahoo.co.uk=""> <mailto:rocatereza at="" yahoo.co.uk="">> > *Cc:* bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > > *Sent:* Wed, 2 December, 2009 14:41:50 > *Subject:* Re: [BioC] help with biomart > > Hi Tereza, > > > Tereza Roca wrote: > > I found something wrong with biomart: if I request an > illumina ID from an esembl gene ID I obtain the following: > >> getBM(attributes = > c("ensembl_gene_id","illumina_humanwg_6_v1"), > filters="ensembl_gene_id", values = "ENSG00000165891", mart = > ensembl) > > ensembl_gene_id illumina_humanwg_6_v1 > > 1 ENSG00000165891 NA > > 2 ENSG00000165891 2350152 > > > > this is fine (altough why is there a NA?) > > > > but if I request the contrary (from illumina to gene ID) I > don't obtain anything: > > > >> getBM(attributes = > c("illumina_humanwg_6_v1","ensembl_gene_id"), > filters="illumina_humanwg_6_v1", values = c("2350152"), mart = > ensembl) > > [1] illumina_humanwg_6_v1 ensembl_gene_id <0 rows> (or > 0-length row.names) > > > > is this an error? or am I making some mistakes in the way I > request it? Please advice > > Well, you aren't doing the correct query, but I don't know if I > would call it a mistake (or a weird 'feature' of how Illumina > IDs are coded in the Biomart database). I figured this out by > doing your first query at the Biomart server, which returned > 0002350152 for the Illumina ID. > > > getBM(attributes = > c("illumina_humanwg_6_v1","ensembl_gene_id"), > filters="illumina_humanwg_6_v1", "0002350152",mart) > illumina_humanwg_6_v1 ensembl_gene_id > 1 2350152 ENSG00000165891 > > > Best, > > Jim > > > > > > > > thank you > > > > Tereza > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > <mailto:bioconductor at="" stat.math.ethz.ch=""> <mailto:bioconductor at="" stat.math.ethz.ch="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M..S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and > should not be used for urgent or sensitive issues > > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should > not be used for urgent or sensitive issues > _______________________________________________ > > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
Agreed -- I didn't realize that all these fields were being auto-discovered. Maybe we should instead be able to instruct getBM() about field types, just as we already do with scan() and friends. -Aaron On Thu, Dec 3, 2009 at 9:29 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > > > Aaron Mackey wrote: > >> I would expect that identifiers be treated as strings, not integers (even >> if an identifier happened to look like an integer most of the time -- it's >> not guaranteed to be true forever), no? We don't do any math on the >> identifiers, so why cast them to integers? >> > > I would think this is another maintenance issue, although I don't presume > to speak for Steffen on why things work as they do. But think for a moment > about what you are asking. > > The getBM() function is an all-purpose query device intended to get stuff > from a database. Some of this stuff is numeric, and some of it is character. > Some of the stuff that might be better treated as character looks like it is > integer. > > There are multiple different Biomart servers that can be queried using > getBM(): > > > listMarts() > biomart > 1 ensembl > 2 snp > 3 functional_genomics > 4 vega > 5 msd > 6 bacterial_mart_3 > 7 fungal_mart_3 > 8 metazoa_mart_3 > 9 plant_mart_3 > 10 protist_mart_3 > 11 htgt > 12 REACTOME > 13 wormbase_current > 14 dicty > 15 rgd__mart > 16 ipi_rat__mart > 17 SSLP__mart > 18 g4public > 19 pride > 20 intermart-1 > 21 uniprot_mart > 22 ensembl_expressionmart_48 > 23 biomartDB > 24 Eurexpress Biomart > 25 pepseekerGOLD_mart06 > 26 Potato_01 > 27 Sweetpotato_01 > 28 Pancreatic_Expression > 29 ENSEMBL_MART_ENSEMBL > 30 GRAMENE_MARKER_30 > 31 GRAMENE_MAP_30 > 32 QTL_MART > > not to mention the archived Biomart servers that can be queried. > > So what you are asking is for Steffen to go through all those databases, > look for all the data that appear to be integer, but should be kept as > character, and then set up some functionality to make sure this happens. > > In addition, he would then need to maintain this over time as different > data are added to the various Biomart servers that can be queried (and as > new Biomart servers come on line). > > Sounds like fun to me! > > Best, > > Jim > > > >> -Aaron >> >> On Wed, Dec 2, 2009 at 10:38 AM, James W. MacDonald < >> jmacdon@med.umich.edu <mailto:jmacdon@med.umich.edu>> wrote: >> >> >> >> Tereza Roca wrote: >> >> thank you. this is bizarre... but now it works... >> shouldn't this be fixed so that output of your query agrees with >> the input you have to provide for the query? >> >> >> Theoretically, yes. However it would be more difficult than you >> would think. Note that R automatically truncates leading zeros: >> >> > 0002350152 >> [1] 2350152 >> >> So when the Biomart server returns the Illumina ID (as a number), R >> automatically truncates the leading zeros. One could use sprintf() >> to keep the leading zeros, but you have to know a priori the width >> of the field: >> >> > sprintf("%010d", 0002350152) >> [1] "0002350152" >> >> There may be a trick to check the number of digits for a number >> prior to R truncating leading zeros, but if there is one, I don't >> know what it would be. If not, you would have to hard code the above >> sprintf() call to use 10 digits, assuming that all Illumina IDs have >> 10 digits. But what if not all Illumina IDs are 10 digits? What if >> they change to 7 digits in a future chip? >> >> As you can imagine, hard coding things like this can lead to a >> nightmare for the maintainer. So although this isn't an ideal >> situation I can't imagine things will change. >> >> Best, >> >> Jim >> >> >> >> >> >> >> >> ------------------------------------------------------------------ ------ >> *From:* James W. MacDonald <jmacdon@med.umich.edu>> <mailto:jmacdon@med.umich.edu>> >> *To:* Tereza Roca <rocatereza@yahoo.co.uk>> <mailto:rocatereza@yahoo.co.uk>> >> *Cc:* bioconductor@stat.math.ethz.ch >> <mailto:bioconductor@stat.math.ethz.ch> >> >> *Sent:* Wed, 2 December, 2009 14:41:50 >> *Subject:* Re: [BioC] help with biomart >> >> Hi Tereza, >> >> >> Tereza Roca wrote: >> > I found something wrong with biomart: if I request an >> illumina ID from an esembl gene ID I obtain the following: >> >> getBM(attributes = >> c("ensembl_gene_id","illumina_humanwg_6_v1"), >> filters="ensembl_gene_id", values = "ENSG00000165891", mart = >> ensembl) >> > ensembl_gene_id illumina_humanwg_6_v1 >> > 1 ENSG00000165891 NA >> > 2 ENSG00000165891 2350152 >> > >> > this is fine (altough why is there a NA?) >> > >> > but if I request the contrary (from illumina to gene ID) I >> don't obtain anything: >> > >> >> getBM(attributes = >> c("illumina_humanwg_6_v1","ensembl_gene_id"), >> filters="illumina_humanwg_6_v1", values = c("2350152"), mart = >> ensembl) >> > [1] illumina_humanwg_6_v1 ensembl_gene_id <0 rows> (or >> 0-length row.names) >> > >> > is this an error? or am I making some mistakes in the way I >> request it? Please advice >> >> Well, you aren't doing the correct query, but I don't know if I >> would call it a mistake (or a weird 'feature' of how Illumina >> IDs are coded in the Biomart database). I figured this out by >> doing your first query at the Biomart server, which returned >> 0002350152 for the Illumina ID. >> >> > getBM(attributes = >> c("illumina_humanwg_6_v1","ensembl_gene_id"), >> filters="illumina_humanwg_6_v1", "0002350152",mart) >> illumina_humanwg_6_v1 ensembl_gene_id >> 1 2350152 ENSG00000165891 >> >> >> Best, >> >> Jim >> >> >> >> >> > >> > thank you >> > >> > Tereza >> > >> > >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@stat.math.ethz.ch >> <mailto:bioconductor@stat.math.ethz.ch> >> <mailto:bioconductor@stat.math.ethz.ch>> <mailto:bioconductor@stat.math.ethz.ch>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- James W. MacDonald, M..S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and >> should not be used for urgent or sensitive issues >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should >> not be used for urgent or sensitive issues >> _______________________________________________ >> >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch <mailto:bioconductor@stat.math.ethz.ch> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@rhoda-kinsella-3200
Last seen 9.6 years ago
Hi Tereza, The Illumina ID that is returned to you via the bioconductor package has stripped three zeros from the front of the ID. If you carry out this search on the Ensembl mart interface you get: Ensembl Gene ID Illumina HumanWG 6 v1 ENSG00000165891 0002350152 If I then filter on ID 0002350152, I get the Ensembl gene ID in my result set. Can you re-try your query using the ID with the zeros and see if you get the correct result? Regards Rhoda On 2 Dec 2009, at 14:15, Tereza Roca wrote: > I found something wrong with biomart: if I request an illumina ID > from an esembl gene ID I obtain the following: > >> getBM(attributes = c("ensembl_gene_id","illumina_humanwg_6_v1"), >> filters="ensembl_gene_id", values = "ENSG00000165891", mart = >> ensembl) > ensembl_gene_id illumina_humanwg_6_v1 > 1 ENSG00000165891 NA > 2 ENSG00000165891 2350152 > > this is fine (altough why is there a NA?) > > but if I request the contrary (from illumina to gene ID) I don't > obtain anything: > >> getBM(attributes = c("illumina_humanwg_6_v1","ensembl_gene_id"), >> filters="illumina_humanwg_6_v1", values = c("2350152"), mart = >> ensembl) > [1] illumina_humanwg_6_v1 ensembl_gene_id > <0 rows> (or 0-length row.names) > > is this an error? or am I making some mistakes in the way I request > it? Please advice > > thank you > > Tereza > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Rhoda Kinsella Ph.D. Ensembl Bioinformatician, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.
ADD COMMENT

Login before adding your answer.

Traffic: 954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6