biomaRt: using a list as values. confused...
1
0
Entering edit mode
@jdelasherasedacuk-1189
Last seen 8.7 years ago
United Kingdom
I'm trying to obtain information about genes within a number of regions defined by a chromosome name, start and end coordinates. I understand that the way to specify multiple filters to be used together (a set of chr+start+end) is to use a list for 'values'. This seems to work ok when I have more than one region (I tested it using two regions first, before doing the proper search for >1000), but if I were to specify just one region, it does not work... and I'm wondering how I would do it in that case. Example: library("biomaRt") ensembl = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="www.ensembl.org") chrom<-c("1", "2") chr.start<-c(11401198, 86460656) chr.stop<-c(11694590, 86663869) attributes<-c("hgnc_symbol", "entrezgene", "chromosome_name", "start_position", "end_position", "strand", "band") # extract both regions at once: getBM(attributes=attributes, filters=c("chromosome_name","start","end"), values=list(chrom,chr.start,chr.stop),mart=ensembl) #this works, returning 1939 rows of data, the first 1198 with chr1 #corresponding to teh first region, and the rest with chr2 to teh second. Good. #but how does one retrieve the data for just ONE region? # try this: getBM(attributes=attributes, filters=c("chromosome_name","start","end"), values=list(chrom[1],chr.start[1],chr.stop[1]),mart=ensembl) # it only returns one gene!!! (in two rows) so, when I just want to do a single search with multiple filters, how would I specify the values? Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6507090 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
• 5.0k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-4465
Last seen 9.6 years ago
Hi Jose, the combo filter chr + start + end is a special situation and is interpreted as give me everything in between. It is porbably not well documented but this however filter combo works only for a single region at a time so your second example is correct there are only few genes in your region on chr1. An alternative which does work for multiple regions is to use the chromosomal_region filter like: regions<-c("1:11401198:11694590", "2:86460656:86663869") attributes<-c("hgnc_symbol", "entrezgene", "chromosome_name","start_position", "end_position", "strand", "band") getBM(attributes=attributes,filters="chromosomal_region",values=region s,mart=ensembl) Cheers, Steffen On Thu, Jun 9, 2011 at 8:49 AM, <j.delasheras at="" ed.ac.uk=""> wrote: > > I'm trying to obtain information about genes within a number of regions > defined by a chromosome name, start and end coordinates. > > I understand that the way to specify multiple filters to be used together (a > set of chr+start+end) is to use a list for 'values'. > > This seems to work ok when I have more than one region (I tested it using > two regions first, before doing the proper search for >1000), but if I were > to specify just one region, it does not work... and I'm wondering how I > would do it in that case. > > Example: > > library("biomaRt") > ensembl = useMart("ENSEMBL_MART_ENSEMBL", > ? dataset="hsapiens_gene_ensembl", > ? host="www.ensembl.org") > > chrom<-c("1", "2") > chr.start<-c(11401198, 86460656) > chr.stop<-c(11694590, 86663869) > > attributes<-c("hgnc_symbol", "entrezgene", "chromosome_name", > "start_position", "end_position", "strand", "band") > > > # extract both regions at once: > getBM(attributes=attributes, > ? ? ?filters=c("chromosome_name","start","end"), > ? ? ?values=list(chrom,chr.start,chr.stop),mart=ensembl) > #this works, returning 1939 rows of data, the first 1198 with chr1 > #corresponding to teh first region, and the rest with chr2 to teh second. > Good. > > #but how does one retrieve the data for just ONE region? > # try this: > getBM(attributes=attributes, > ? ? ?filters=c("chromosome_name","start","end"), > ? ? ?values=list(chrom[1],chr.start[1],chr.stop[1]),mart=ensembl) > # it only returns one gene!!! (in two rows) > > so, when I just want to do a single search with multiple filters, how would > I specify the values? > > Jose > > -- > Dr. Jose I. de las Heras ? ? ? ? ? ? ? ? ? ? ?Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology ? ?Phone: +44 (0)131 6507090 > Institute for Cell & Molecular Biology ? ? ? ?Fax: ? +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Hi Stephen, many thanks for that. I was looking at the previous results that I said were ok and realised the ranges were wrong and that confused me even more! Thanks for teh tip about the chromosomal region, that's just what I needed! Jose Quoting Steffen Durinck <durinck.steffen at="" gene.com=""> on Thu, 9 Jun 2011 09:12:59 -0700: > Hi Jose, > > the combo filter chr + start + end is a special situation and is > interpreted as give me everything in between. It is porbably not well > documented but this however filter combo works only for a single > region at a time so your second example is correct there are only few > genes in your region on chr1. > > An alternative which does work for multiple regions is to use the > chromosomal_region filter like: > > regions<-c("1:11401198:11694590", "2:86460656:86663869") > attributes<-c("hgnc_symbol", "entrezgene", > "chromosome_name","start_position", "end_position", "strand", "band") > getBM(attributes=attributes,filters="chromosomal_region",values=regi ons,mart=ensembl) > > Cheers, > Steffen > > On Thu, Jun 9, 2011 at 8:49 AM, <j.delasheras at="" ed.ac.uk=""> wrote: >> >> I'm trying to obtain information about genes within a number of regions >> defined by a chromosome name, start and end coordinates. >> >> I understand that the way to specify multiple filters to be used together (a >> set of chr+start+end) is to use a list for 'values'. >> >> This seems to work ok when I have more than one region (I tested it using >> two regions first, before doing the proper search for >1000), but if I were >> to specify just one region, it does not work... and I'm wondering how I >> would do it in that case. >> >> Example: >> >> library("biomaRt") >> ensembl = useMart("ENSEMBL_MART_ENSEMBL", >> ? dataset="hsapiens_gene_ensembl", >> ? host="www.ensembl.org") >> >> chrom<-c("1", "2") >> chr.start<-c(11401198, 86460656) >> chr.stop<-c(11694590, 86663869) >> >> attributes<-c("hgnc_symbol", "entrezgene", "chromosome_name", >> "start_position", "end_position", "strand", "band") >> >> >> # extract both regions at once: >> getBM(attributes=attributes, >> ? ? ?filters=c("chromosome_name","start","end"), >> ? ? ?values=list(chrom,chr.start,chr.stop),mart=ensembl) >> #this works, returning 1939 rows of data, the first 1198 with chr1 >> #corresponding to teh first region, and the rest with chr2 to teh second. >> Good. >> >> #but how does one retrieve the data for just ONE region? >> # try this: >> getBM(attributes=attributes, >> ? ? ?filters=c("chromosome_name","start","end"), >> ? ? ?values=list(chrom[1],chr.start[1],chr.stop[1]),mart=ensembl) >> # it only returns one gene!!! (in two rows) >> >> so, when I just want to do a single search with multiple filters, how would >> I specify the values? >> >> Jose >> >> -- >> Dr. Jose I. de las Heras ? ? ? ? ? ? ? ? ? ? ?Email: J.delasHeras at ed.ac.uk >> The Wellcome Trust Centre for Cell Biology ? ?Phone: +44 (0)131 6507090 >> Institute for Cell & Molecular Biology ? ? ? ?Fax: ? +44 (0)131 6507360 >> Swann Building, Mayfield Road >> University of Edinburgh >> Edinburgh EH9 3JR >> UK >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6507090 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
ADD REPLY
0
Entering edit mode
Hi Jose, I'll make biomaRt throw an error when someone tries the query you attempted. Cheers, Steffen On Thu, Jun 9, 2011 at 9:15 AM, <j.delasheras at="" ed.ac.uk=""> wrote: > > Hi Stephen, > > many thanks for that. I was looking at the previous results that I said were > ok and realised the ranges were wrong and that confused me even more! > > Thanks for teh tip about the chromosomal region, that's just what I needed! > > Jose > > > Quoting Steffen Durinck <durinck.steffen at="" gene.com=""> on Thu, 9 Jun 2011 > 09:12:59 -0700: > >> Hi Jose, >> >> the combo filter chr + start + end is a special situation and is >> interpreted as give me everything in between. ?It is porbably not well >> documented but this however filter combo works only for a single >> region at a time so your second example is correct there are only few >> genes in your region on chr1. >> >> An alternative which does work for multiple regions is to use the >> chromosomal_region filter like: >> >> regions<-c("1:11401198:11694590", "2:86460656:86663869") >> attributes<-c("hgnc_symbol", "entrezgene", >> "chromosome_name","start_position", "end_position", "strand", "band") >> >> getBM(attributes=attributes,filters="chromosomal_region",values=reg ions,mart=ensembl) >> >> Cheers, >> Steffen >> >> On Thu, Jun 9, 2011 at 8:49 AM, ?<j.delasheras at="" ed.ac.uk=""> wrote: >>> >>> I'm trying to obtain information about genes within a number of regions >>> defined by a chromosome name, start and end coordinates. >>> >>> I understand that the way to specify multiple filters to be used together >>> (a >>> set of chr+start+end) is to use a list for 'values'. >>> >>> This seems to work ok when I have more than one region (I tested it using >>> two regions first, before doing the proper search for >1000), but if I >>> were >>> to specify just one region, it does not work... and I'm wondering how I >>> would do it in that case. >>> >>> Example: >>> >>> library("biomaRt") >>> ensembl = useMart("ENSEMBL_MART_ENSEMBL", >>> ? dataset="hsapiens_gene_ensembl", >>> ? host="www.ensembl.org") >>> >>> chrom<-c("1", "2") >>> chr.start<-c(11401198, 86460656) >>> chr.stop<-c(11694590, 86663869) >>> >>> attributes<-c("hgnc_symbol", "entrezgene", "chromosome_name", >>> "start_position", "end_position", "strand", "band") >>> >>> >>> # extract both regions at once: >>> getBM(attributes=attributes, >>> ? ? ?filters=c("chromosome_name","start","end"), >>> ? ? ?values=list(chrom,chr.start,chr.stop),mart=ensembl) >>> #this works, returning 1939 rows of data, the first 1198 with chr1 >>> #corresponding to teh first region, and the rest with chr2 to teh second. >>> Good. >>> >>> #but how does one retrieve the data for just ONE region? >>> # try this: >>> getBM(attributes=attributes, >>> ? ? ?filters=c("chromosome_name","start","end"), >>> ? ? ?values=list(chrom[1],chr.start[1],chr.stop[1]),mart=ensembl) >>> # it only returns one gene!!! (in two rows) >>> >>> so, when I just want to do a single search with multiple filters, how >>> would >>> I specify the values? >>> >>> Jose >>> >>> -- >>> Dr. Jose I. de las Heras ? ? ? ? ? ? ? ? ? ? ?Email: >>> J.delasHeras at ed.ac.uk >>> The Wellcome Trust Centre for Cell Biology ? ?Phone: +44 (0)131 6507090 >>> Institute for Cell & Molecular Biology ? ? ? ?Fax: ? +44 (0)131 6507360 >>> Swann Building, Mayfield Road >>> University of Edinburgh >>> Edinburgh EH9 3JR >>> UK >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > > > -- > Dr. Jose I. de las Heras ? ? ? ? ? ? ? ? ? ? ?Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology ? ?Phone: +44 (0)131 6507090 > Institute for Cell & Molecular Biology ? ? ? ?Fax: ? +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > >
ADD REPLY
0
Entering edit mode
THat's probably a good idea. Most people would realise the result is not the expected one, but it will be better to find an error and be safe. thank you! Jose Quoting Steffen Durinck <durinck.steffen at="" gene.com=""> on Thu, 9 Jun 2011 09:40:41 -0700: > Hi Jose, > > I'll make biomaRt throw an error when someone tries the query you attempted. > > Cheers, > Steffen > > On Thu, Jun 9, 2011 at 9:15 AM, <j.delasheras at="" ed.ac.uk=""> wrote: >> >> Hi Stephen, >> >> many thanks for that. I was looking at the previous results that I said were >> ok and realised the ranges were wrong and that confused me even more! >> >> Thanks for teh tip about the chromosomal region, that's just what I needed! >> >> Jose >> >> >> Quoting Steffen Durinck <durinck.steffen at="" gene.com=""> on Thu, 9 Jun 2011 >> 09:12:59 -0700: >> >>> Hi Jose, >>> >>> the combo filter chr + start + end is a special situation and is >>> interpreted as give me everything in between. ?It is porbably not well >>> documented but this however filter combo works only for a single >>> region at a time so your second example is correct there are only few >>> genes in your region on chr1. >>> >>> An alternative which does work for multiple regions is to use the >>> chromosomal_region filter like: >>> >>> regions<-c("1:11401198:11694590", "2:86460656:86663869") >>> attributes<-c("hgnc_symbol", "entrezgene", >>> "chromosome_name","start_position", "end_position", "strand", "band") >>> >>> getBM(attributes=attributes,filters="chromosomal_region",values=re gions,mart=ensembl) >>> >>> Cheers, >>> Steffen >>> >>> On Thu, Jun 9, 2011 at 8:49 AM, ?<j.delasheras at="" ed.ac.uk=""> wrote: >>>> >>>> I'm trying to obtain information about genes within a number of regions >>>> defined by a chromosome name, start and end coordinates. >>>> >>>> I understand that the way to specify multiple filters to be used together >>>> (a >>>> set of chr+start+end) is to use a list for 'values'. >>>> >>>> This seems to work ok when I have more than one region (I tested it using >>>> two regions first, before doing the proper search for >1000), but if I >>>> were >>>> to specify just one region, it does not work... and I'm wondering how I >>>> would do it in that case. >>>> >>>> Example: >>>> >>>> library("biomaRt") >>>> ensembl = useMart("ENSEMBL_MART_ENSEMBL", >>>> ? dataset="hsapiens_gene_ensembl", >>>> ? host="www.ensembl.org") >>>> >>>> chrom<-c("1", "2") >>>> chr.start<-c(11401198, 86460656) >>>> chr.stop<-c(11694590, 86663869) >>>> >>>> attributes<-c("hgnc_symbol", "entrezgene", "chromosome_name", >>>> "start_position", "end_position", "strand", "band") >>>> >>>> >>>> # extract both regions at once: >>>> getBM(attributes=attributes, >>>> ? ? ?filters=c("chromosome_name","start","end"), >>>> ? ? ?values=list(chrom,chr.start,chr.stop),mart=ensembl) >>>> #this works, returning 1939 rows of data, the first 1198 with chr1 >>>> #corresponding to teh first region, and the rest with chr2 to teh second. >>>> Good. >>>> >>>> #but how does one retrieve the data for just ONE region? >>>> # try this: >>>> getBM(attributes=attributes, >>>> ? ? ?filters=c("chromosome_name","start","end"), >>>> ? ? ?values=list(chrom[1],chr.start[1],chr.stop[1]),mart=ensembl) >>>> # it only returns one gene!!! (in two rows) >>>> >>>> so, when I just want to do a single search with multiple filters, how >>>> would >>>> I specify the values? >>>> >>>> Jose >>>> >>>> -- >>>> Dr. Jose I. de las Heras ? ? ? ? ? ? ? ? ? ? ?Email: >>>> J.delasHeras at ed.ac.uk >>>> The Wellcome Trust Centre for Cell Biology ? ?Phone: +44 (0)131 6507090 >>>> Institute for Cell & Molecular Biology ? ? ? ?Fax: ? +44 (0)131 6507360 >>>> Swann Building, Mayfield Road >>>> University of Edinburgh >>>> Edinburgh EH9 3JR >>>> UK >>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> >> >> >> -- >> Dr. Jose I. de las Heras ? ? ? ? ? ? ? ? ? ? ?Email: J.delasHeras at ed.ac.uk >> The Wellcome Trust Centre for Cell Biology ? ?Phone: +44 (0)131 6507090 >> Institute for Cell & Molecular Biology ? ? ? ?Fax: ? +44 (0)131 6507360 >> Swann Building, Mayfield Road >> University of Edinburgh >> Edinburgh EH9 3JR >> UK >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6507090 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
ADD REPLY

Login before adding your answer.

Traffic: 491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6