biomaRt code/getBM function no longer importing 'coding' sequence via seqType argument, "Invalid attribute(s): coding" error
2
0
Entering edit mode
@2a94d2b9
Last seen 7 months ago
United States

I am trying to import coding sequences from biomaRt using the following code (simplified example):

cds_seq = getSequence(id = "NM_004974", 
                      type = "refseq_mrna", 
                      seqType = "coding", 
                      mart = ensembl)

...and getting this output...

Error in biomaRt::getBM(attributes = c("hgnc_symbol", "ensembl_gene_id",  : 
  Invalid attribute(s): coding 
Please use the function 'listAttributes' to get valid attribute names

I ran the same code ~2 years ago and it worked fine. I've explored other Attributes via the "listAttributes()" function to see if the attribute name for coding sequence has been updated but I can't see to find anything close.

Note that I'm using an archived version of ensembl and wish to keep it that way so the code output doesn't change if any of the sequences of interest have been updated in newer versions. Not sure if this is relevant. Here's the command I used to import ensembl:

ensembl <- useEnsembl(biomart="ensembl", 
                      dataset="hsapiens_gene_ensembl", 
                      host="https://feb2014.archive.ensembl.org")

Please advise, and thank you!

biomaRt • 652 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

It's my understanding that the archived Biomart data are not complete, which is likely why you cannot get sequences. But you apparently just want sequences based on GRCh37, for which there is a complete archive.

> ensembl <- useEnsembl(biomart="ensembl", 
                      dataset="hsapiens_gene_ensembl", 
                      host="https://grch37.ensembl.org")
> cds_seq = getSequence(id = "NM_004974", 
                      type = "refseq_mrna", 
                      seqType = "coding", 
                      mart = ensembl)
. + > cds_seq
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        coding
1 ATGACAGTGGCCACCGGAGACCCAGCAGACGAGGCTGCTGCCCTCCCTGGGCACCCACAGGACACCTATGACCCAGAGGCAGACCACGAGTGCTGTGAGAGGGTGGTGATCAACATCTCAGGGCTGCGGTTTGAGACCCAGCTAAAGACCTTAGCCCAGTTTCCAGAGACCCTCTTAGGGGACCCAAAGAAACGAATGAGGTACTTTGACCCCCTCCGAAATGAGTACTTTTTCGATCGGAACCGCCCTAGCTTTGATGCCATTTTGTACTACTACCAGTCAGGGGGCCGATTGAGGCGACCTGTGAATGTGCCCTTAGATATATTCTCTGAAGAAATTCGGTTTTATGAGCTGGGAGAAGAAGCGATGGAGATGTTTCGGGAAGATGAAGGCTACATCAAGGAGGAAGAGCGTCCTCTGCCTGAAAATGAGTTTCAGAGACAAGTGTGGCTTCTCTTTGAATACCCAGAGAGCTCAGGGCCTGCCAGGATTATAGCTATTGTGTCTGTCATGGTGATTCTGATCTCAATTGTCAGCTTCTGTCTGGAAACATTGCCCATCTTCCGGGATGAGAATGAAGACATGCATGGTAGTGGGGTGACCTTCCACACCTATTCCAACAGCACCATCGGGTACCAGCAGTCCACTTCCTTCACAGACCCTTTCTTCATTGTAGAGACACTCTGCATCATCTGGTTCTCCTTTGAATTCTTGGTGAGGTTCTTTGCCTGTCCCAGCAAAGCCGGCTTCTTCACCAACATCATGAACATCATTGACATTGTGGCCATCATCCCCTACTTCATCACCCTGGGGACAGAGTTGGCTGAGAAGCCAGAGGACGCTCAGCAAGGCCAGCAGGCCATGTCACTGGCCATCCTCCGTGTCATCCGGTTGGTAAGAGTCTTTAGGATTTTCAAGTTGTCCAGACACTCCAAAGGTCTCCAGATTCTAGGTCAGACCCTCAAAGCCAGCATGAGAGAATTGGGCCTCCTGATATTCTTTCTCTTCATAGGGGTCATCCTTTTCTCTAGTGCTGTGTATTTTGCAGAGGCCGATGAGCGAGAGTCCCAGTTCCCCAGCATCCCAGATGCCTTCTGGTGGGCAGTCGTCTCCATGACAACTGTAGGCTATGGAGACATGGTTCCGACTACCATTGGGGGAAAGATAGTGGGTTCCCTATGTGCGATTGCAGGTGTGTTAACTATTGCCTTACCGGTCCCTGTCATTGTGTCCAATTTCAACTACTTCTACCACCGGGAGACAGAGGGAGAGGAACAGGCCCAATACTTGCAAGTGACAAGCTGTCCAAAGATCCCATCCTCCCCTGACCTAAAGAAAAGTAGAAGTGCCTCTACCATTAGTAAGTCTGATTACATGGAGATCCAGGAGGGTGTAAATAACAGTAATGAGGACTTTAGAGAGGAAAACTTGAAAACAGCCAACTGTACCTTGGCTAACACAAACTATGTGAATATTACCAAAATGTTAACTGATGTCTGA
2                                                                                                                                                                                                                                                                                                                                                                                                                                              ATGACAGTGGCCACCGGAGACCCAGCAGACGAGGCTGCTGCCCTCCCTGGGCACCCACAGGACACCTATGACCCAGAGGCAGACCACGAGTGCTGTGAGAGGGTGGTGATCAACATCTCAGGGCTGCGGTTTGAGACCCAGCTAAAGACCTTAGCCCAGTTTCCAGAGACCCTCTTAGGGGACCCAAAGAAACGAATGAGGTACTTTGACCCCCTCCGAAATGAGTACTTTTTCGATCGGAACCGCCCTAGCTTTGATGCCATTTTGTACTACTACCAGTCAGGGGGCCGATTGAGGCGACCTGTGAATGTGCCCTTAGATATATTCTCTGAAGAAATTCGGTTTTATGAGCTGGGAGAAGAAGCGATGGAGATGTTTCGGGAAGATGAAGGCTACATCAAGGAGGAAGAGCGTCCTCTGCCTGAAAATGAGTTTCAGAGACAAGTGTGGCTTCTCTTTGAATACCCAGAGAGCTCAGGGCCTGCCAGGATTATAGCTATTGTGTCTGTCATGGTGATTCTGATCTCAATTGTCAGCTTCTGTCTGGAAACATTGCCCATCTTCCGGGATGAGAATGAAGACATGCATGGTAGTGGGGTGACCTTCCACACCTATTCCAACAGCACCATCGGGTACCAGCAGTCCACTTCCTTCACAGACCCTTTCTTCATTGTAGAGACACTCTGCATCATCTGGTTCTCCTTTGAATTCTTGGTGAGGTTCTTTGCCTGTCCCAGCAAAGCCGGCTTCTTCACCAACATCATGAACATCATTGACATTGTGGCCATCATCCCCTACTTCATCACCCTGGGGACAGAGTTGGCTGAGAAGCCAGAGGACGCTCAGCAAGGCCAGCAGGCCATGTCACTGGCCATCCTCCGTGTCATCCGGTTGGAACGCAGACCTCTGCAAAGCCAGAAGAGTAAGCGGGGAAGGCAGCATCTGAACACCTCACATGACTGCACCTTAGGAATTAACCTAGTCGCGGGCATGACTGTACAGTGGACCAGGGCATCTGGTCCTGATGACAGGCAGACACCAGCTGTAACTACATTGCACAGGATGTATTGA
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Sequence unavailable
  refseq_mrna
1   NM_004974
2   NM_004974
3   NM_004974
1
Entering edit mode

This worked perfectly! I was able to retrieve the sequences of interest using host="https://grch37.ensembl.org" within the useEnsembl call. Thanks to James and Mike (below) for your help, I really appreciate it.

ADD REPLY
0
Entering edit mode

I should also point out that you can find this out by yourself by going to ensembl, clicking on the Biomart link, and then scrolling to the bottom and clicking on 'View in archive site`, which will bring up all the archives. There's one at the top that says

Ensembl GRCh37: Full Feb 2014 archive with BLAST, VEP and BioMart

And hovering over the link gave me the host URI that I used.

ADD REPLY
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 1 hour ago
EMBL Heidelberg

I'll echo James' suggestion to use the GRCh37 mirror. Most mirrors only have a guaranteed lifetime of 5 years from Ensembl, although it seems some are kept around for longer.

I also see the same error if I create the query in the BioMart web interface, meaning this isn't something that you'll be able to work around with biomaRt. I can't say why this might have changed, but I'd recommend contacting the Ensembl team directly if you think there's an issue with the service.

biomart error

ADD COMMENT

Login before adding your answer.

Traffic: 509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6