Question

How can I use biomaRt on previus databases?

0

Entering edit mode

kris.petrini • 0

@krispetrini-7786

Last seen 5.7 years ago

Italy

Hi, I'm trying to use biomaRt on a privius database of mouse, but I receive an error message where it's explain that I can use the function of biomaRt on this database. How can I fix this problem?

this is the code that I try to use:

library("biomaRt")
oldmouse = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="may2012.archive.ensembl.org", path="/biomart/martservice")
listMarts(oldmouse)
listDatasets(oldmouse)
oldmouse = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="may2012.archive.ensembl.org", path="/biomart/martservice",dataset ="mmusculus_gene_ensembl")
listFilters(oldmouse)
browseVignettes("biomaRt")
gene=getGene(id="A_51_P100052", type="with_efg_agilent_sureprint_g3_ge_8x60k", mart = oldmouse)
show(gene)

and this is the error message:

> gene=getGene(id="A_51_P100052", type="with_efg_agilent_sureprint_g3_ge_8x60k", mart = oldmouse)
Error in martCheck(mart, "ensembl") :
  This function only works when used with the ensembl BioMart.

Thanks.

biomart previus database retrive sequence • 2.5k views

ADD COMMENT • link 10.7 years ago kris.petrini • 0

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

Please note that getGene() is just a simplified way to call getBM(), which does still work:

annot <- c("mgi_symbol","efg_agilent_sureprint_g3_ge_8x60k",
           "description", "chromosome_name", "band", "strand",
           "start_position", "end_position", "ensembl_gene_id")

filt <- "efg_agilent_sureprint_g3_ge_8x60k"

id <- "A_55_P2136348"

getBM(annot, filt, id, oldmouse)

  mgi_symbol efg_agilent_sureprint_g3_ge_8x60k
1       Ccr8                     A_55_P2136348
                                                           description
1 chemokine (C-C motif) receptor 8 [Source:MGI Symbol;Acc:MGI:1201402]
  chromosome_name band strand start_position end_position    ensembl_gene_id
1               9   F4      1      120001251    120004024 ENSMUSG00000042262

If you have a vector of Agilent IDs, you can get them all at once by using

ids <- c(<agilent IDs go here>)

ADD COMMENT • link 10.7 years ago James W. MacDonald 68k

1

Entering edit mode

Steffen Durinck ▴ 90

@steffen-durinck-4894

Last seen 11.4 years ago

Thanks for this answer Jim,

I just wanted to add that getGene function will be phased out very soon to avoid these errors.

(It should have been removed a long time ago)

Cheers,

Steffen

ADD COMMENT • link 10.7 years ago Steffen Durinck ▴ 90

0

Entering edit mode

kris.petrini • 0

@krispetrini-7786

Last seen 5.7 years ago

Italy

Ok, thank you very much!

ADD COMMENT • link 10.7 years ago kris.petrini • 0

score 1 · Accepted Answer · 2015-05-22

1

Entering edit mode

kris.petrini • 0

@krispetrini-7786

Last seen 5.7 years ago

Italy

Thanks! I've tried and everything works! Just another question, but in this way how can I retrieve genomics sequence of interest like UTR, exons, introns, complete sequence of the gene and flank regions?

ADD COMMENT • link 10.7 years ago kris.petrini • 0

1

Entering edit mode

This is the beauty of using Open Source tools - you can figure this stuff out for yourself. It's also the downside, because you often have to figure this stuff out for yourself.

Since Steffen indicated that getGene() is on the chopping block, let's assume the same for getSequence(). Also please note that IIRC, EVERYTHING in biomaRt ends up getting processed through getBM(), and all these helper functions just do the busy work for you. So let's look at getSequence. Here's the top part:

getSequence <- function (chromosome, start, end, id, type, seqType, upstream,
    downstream, mart, verbose = FALSE)
{
    martCheck(mart, c("ensembl", "ENSEMBL_MART_ENSEMBL"))
    if (missing(seqType) || !seqType %in% c("cdna", "peptide",
        "3utr", "5utr", "gene_exon", "transcript_exon", "transcript_exon_intron",
        "gene_exon_intron", "coding", "coding_transcript_flank",
        "coding_gene_flank", "transcript_flank", "gene_flank"))

So we need the chromosome, start, end, and some other stuff. If you look at ?getSequence, you will see that the id is some sort of identifier, type tells getSequence what sort of ID it is, and seqType is what you want to get. And if you look at the rest of getSequence(), it is always some variation on

sequence = getBM(c(seqType, type), filters = c("chromosome_name",
                "start", "end"), values = list(chromosome, start,
                end), mart = mart, checkFilters = FALSE, verbose = verbose)

So let's try to figure this out. From an earlier post we have pretty much everything we need.

> genedat <- getBM(annot, filt, id, oldmouse)
> genedat
  mgi_symbol efg_agilent_sureprint_g3_ge_8x60k
1       Ccr8                     A_55_P2136348
                                                           description
1 chemokine (C-C motif) receptor 8 [Source:MGI Symbol;Acc:MGI:1201402]
  chromosome_name band strand start_position end_position    ensembl_gene_id
1               9   F4      1      120001251    120004024 ENSMUSG00000042262

Attempt #1

> getBM(c("gene_exon", "ensembl_gene_id"), c("chromosome_name","start","end"), genedat[,c(4,7,8)], oldmouse)
Error in getBM(c("gene_exon", "ensembl_gene_id"), c("chromosome_name",  :
  If using multiple filters, the 'value' has to be a list.
For example, a valid list for 'value' could be: list(affyid=c('1939_at','1000_at'), chromosome= '16')
Here we select on Affymetrix identifier and chromosome, only results that pass both filters will be returned

OK, whatever. We'll use a list.

> getBM(c("gene_exon", "ensembl_gene_id"), c("chromosome_name","start","end"), as.list(genedat[,c(4,7,8)]), oldmouse)
  gene_exon
1 GTCTTCCTGCCTCGATGGATTACACGATGGAGCCCAACGTCACGATGACCGACTACTACCCTGATTTCTTCACCGCCCCCTGTGACGCAGAGTTCCTCCTCAGGGGCAGCATGCTGTATCTGGCCATCTTGTACTGCGTCTTGTTTGTGCTGGGCCTTCTGGGGAACAGCCTGGTCATCTTAGTCCTCGTGGGCTGCAAGAAACTGAGGAGCATCACAGATATCTACCTCCTGAACCTGGCCGCATCCGACCTGCTCTTTGTCCTCTCTATTCCTTTTCAGACCCACAACCTGCTGGACCAGTGGGTGTTTGGGACTGCGATGTGTAAGGTGGTCTCTGGCCTTTATTACATTGGTTTTTTCAGCAGTATGTTCTTCATCACCCTAATGAGTGTGGACAGGTATCTGGCTATTGTCCACGCTGTCTATGCCATCAAGGTGAGGACGGCCAGCGTGGGCACAGCCCTGAGTCTGACAGTGTGGCTGGCTGCTGTCACAGCCACCATCCCCTTGATGGTTTTTTACCAAGTGGCCTCTGAAGACGGCATGCTACAATGTTTCCAGTTTTATGAAGAGCAGTCTTTGAGGTGGAAGCTCTTTACCCACTTTGAAATCAACGCCTTGGGTCTGCTGCTCCCCTTTGCCATCCTCCTGTTCTGCTATGTCAGGATCCTGCAGCAGCTGCGGGGCTGCCTGAACCACAACAGGACCAGAGCCATCAAGCTGGTGCTCACCGTAGTCATTGTGTCTTTACTCTTCTGGGTCCCATTCAACGTGGCCCTTTTCCTCACGTCCCTGCACGACCTGCACATCTTGGATGGATGTGCCACGAGGCAGAGGCTGGCTCTGGCCATCCATGTCACAGAGGTCATCTCTTTTACCCACTGCTGCGTGAACCCCGTCATCTACGCGTTCATAGGAGAGAAGTTTAAGAAACACCTCATGGATGTGTTTCAAAAGAGCTGCAGCCACATCTTCCTCTACTTAGGGAGACAAATGCCCGTGGGGGCGTTGGAAAGGCAGCTGTCCTCGAACCAGCGATCTTCCCATTCTTCCACCCTGGATGACATCTTGTAAGGGGAGTGTGCAGGGCAGGCAGAC
2                                                                        TGGCAGAGGAGTGGGCAGCTCTGAAACCTCAGAAGAAAGGCTCGCTCAGATAATTG
     ensembl_gene_id
1 ENSMUSG00000042262
2 ENSMUSG00000042262

So there's the idea, and it's up to you to extend to whatever you want to get. If you are planning to do much with these sequences, then you would be better off using more sophisticated infrastructure than simple lists or data.frames.

This will take some effort on your part, and lots of reading. A good place to start is by perusing the workflows.

ADD REPLY • link 10.7 years ago James W. MacDonald 68k