getSequence from biomaRt but Error in martCheck
3
0
Entering edit mode
@gabriela-santos-9210
Last seen 6.2 years ago
Mexico, Irapuato, LANGEBIO CINVESTAV

Hi! I am trying to retrieve certain 3'UTR sequences from H. sapiens using biomaRt.

The first problem I got was getting de ensembl_gene_id because of a problem with the mart option, but this was able to handle it by using:

ensembl = useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "www.ensembl.org")

The problem now is that if I try to retrieve the 3UTR sequences with that mart this error appears:

Seqs = getSequence(id = IDS[,1], type= "3utr", mart = ensembl)

Error in martCheck(mart, "ensembl") :
  This function only works when used with the ensembl BioMart.

 

Does ​anyone has any idea how to retrieve sequences??

 

Thanks!

R biomart ensemble mart • 1.7k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 days ago
United States

While there is indeed a problem with mis-using the type argument, the error is coming from biomaRt:::martCheck, which in previous versions would error out on ensembl.org. In other words, here is the top part of getSequence() in the current version of BioC:

> getSequence
function (chromosome, start, end, id, type, seqType, upstream,
    downstream, mart, verbose = FALSE)
{
    martCheck(mart, c("ensembl", "ENSEMBL_MART_ENSEMBL"))

And here is what it looked like a couple of releases ago:

> getSequence
function (chromosome, start, end, id, type, seqType, upstream,
    downstream, mart, verbose = FALSE)
{
    martCheck(mart, "ensembl")

And

> biomaRt:::martCheck(mart, "ensembl")
Error in biomaRt:::martCheck(mart, "ensembl") :
  This function only works when used with the ensembl BioMart.
> biomaRt:::martCheck(mart, c("ensembl","ENSEMBL_MART_ENSEMBL"))
>

So an additional update to the current version of R/BioC is necessary as well.

ADD COMMENT
0
Entering edit mode
Thomas Maurel ▴ 790
@thomas-maurel-5295
Last seen 6 months ago
United Kingdom

Dear Gabriela,

I believe the issue is coming from your getSequence call as "3utr" should be defined as "seqType" and not "type". Type should the type of identifier used in id (e.g: refseq,ensembl,...). Please find more information regarding getSequence page 9 of the following pdf: https://bioconductor.org/packages/release/bioc/manuals/biomaRt/man/biomaRt.pdf

Hope this helps,

Best Regards,

Thomas

ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 3 hours ago
Seattle, WA, United States

Hi Gabriela,

Alternatively, you can get the 3'UTR sequences for Human by using a combination of a TxDb object and the corresponding BSgenome package. For example:

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
library(BSgenome.Hsapiens.UCSC.hg38)
genome <- BSgenome.Hsapiens.UCSC.hg38

three_utrs <- threeUTRsByTranscript(txdb, use.names=TRUE)
three_utr_seqs <- extractTranscriptSeqs(genome, three_utrs)
three_utr_seqs
#   A DNAStringSet instance of length 65478
#         width seq                                     names
#     [1]   770 TGCCCGTTGGAGAAAACA...GCACACTGTTGGTTTCTG uc010nxq.1
#     [2]     1 G                                       uc001abv.1
#     [3]   428 GGTTGCCGGGGGTAGGGG...AATAAAGCCTGTCCCGTG uc001abw.1
#     [4]     1 A                                       uc031pjn.1
#     [5]   428 GGTTGCCGGGGGTAGGGG...AATAAAGCCTGTCCCGTG uc001abx.2
#     ...   ... ...
# [65474]     2 CA                                      uc011mfn.2
# [65475]   415 GGCAGCGGCTGTAATGGT...AAACAAACAAACAAACAA uc022brb.1
# [65476] 12925 TCGATGTGGTGACGTCGT...GGGGTCCCGGCCCTCGCG uc033dng.1
# [65477]   192 CTGTGAGGCCATTTCCAG...CGAAGCTCCTGCCTTTCG uc033dny.1
# [65478]   291 CTGTGAGGCCATTTCCAG...CCAAGCCCCGCTTTTGAC uc033dob.1

The names on the DNAStringSet object are the UCSC transcript ids. See ?threeUTRsByTranscript and ?extractTranscriptSeqs in the GenomicFeatures package for more information. Note that you can use other TxDb objects (see http://bioconductor.org/packages/release/BiocViews.html#___TxDb for a list of all the availableTxDb packages), or you can also make your own with the makeTxDbFromUCSC(), makeTxDbFromBiomart(), or makeTxDbFromGFF() functions in GenomicFeatures. For example, to make a TxDb object from Ensembl:

library(GenomicFeatures)
txdb <- makeTxDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL",
                            dataset="hsapiens_gene_ensembl",
                            host="www.ensembl.org")

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 237 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6