Question: How do I use biomaRt to get upstreamFlank Genomic Sequence for many Genomes?
gravatar for Noah Dowell
6.4 years ago by
Noah Dowell410
Noah Dowell410 wrote:
Hello All, Problem: I would like to obtain the genomic sequence that is upstream (~500 bp) of a specific bacterial gene. I want to get this sequence for all bacteria genomes that have the gene. On EcoCyc I see that many (> 100) bacteria have the gene but I do not know how to get all of the sequence in a high-throughput manner so I was going to use biomaRt to get the sequence and send to alignment programs later. I have read through the vignette and tried to get the function to work with a non- ensembl MART to no avail. I also was presented with an error (see below) that suggested I report to the mailing list. It looks like I will also have to query each of the 249 bacterial genomes in the "bacterial_mart_7" Mart individually (with getLDS or getBM) which does not seem high-throughput at all... are there any other suggestions that will allow me to take advantage a the large amount of bacterial genomic data for homology studies? Thank you for your help. Noah Attempted Solution (for a single genome): > bacGenome = useMart("bacterial_mart_7", dataset = "esc_20_gene") Checking attributes ... ok Checking filters ... ok > > filters = c("external_gene_id") > > attributes = c("external_gene_id","upstream_flank") > > values = list(external_gene_id = c("fis"), 500) > seq = getBM(attributes=attributes, filters = filters, values = values, mart= bacGenome, + checkFilters= FALSE) V1 1 fis Error in getBM(attributes = attributes, filters = filters, values = values, : The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query. Please report this to the mailing list. > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] Biobase_2.8.0 Biostrings_2.16.0 BSgenome_1.16.0 GenomicRanges_1.0.1 IRanges_1.6.0 [6] tools_2.11.0 XML_2.8-1
ADD COMMENTlink written 6.4 years ago by Noah Dowell410
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 132 users visited in the last hour