how to retrieve all attributes from biomart ?
1
0
Entering edit mode
@bioinformatics-10931
Last seen 2.8 years ago
United States

Basically I want to extract all attributes for several genes , 

when I use the following as example, I get an error , would anyone know why?

hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','ensembl_exon_id',
+                                    'description','chromosome_name','start_position','end_position','strand','band',
+                                    'transcript_start','transcript_end','external_gene_id','external_transcript_id',
+                                    'external_gene_db','transcript_db_name','transcript_count',
+                                    'percentage_gc_content','gene_biotype','transcript_biotype','source',
+                                    'transcript_source','status,transcript_status','phenotype_description',
+                                    'source_name','study_external_id','go_id','name_1006','definition_1006',
+                                    'go_linkage_type','namespace_1003','goslim_goa_accession','goslim_goa_description',
+                                    'arrayexpress','chembl'),mart = mart)
Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id",  : 
  Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status 
Please use the function 'listAttributes' to get valid attribute names

biomart • 2.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You get this error:

Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id",  : 
  Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status 
Please use the function 'listAttributes' to get valid attribute names

Can you say why that isn't sufficient/descriptive enough for you to diagnose this yourself?

ADD COMMENT
0
Entering edit mode

@James W. MacDonald

 

I just made the parsing smaller in order to show I have tried a lot but I cannot figure out what the problem is. For example 

hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
+                                    'ensembl_peptide_id','ensembl_exon_id'),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection timed out after 10003 milliseconds

Or this one 

hsapiens_inf <- getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id",
+                                    "ensembl_peptide_id","ensembl_exon_id"),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection timed out after 10003 milliseconds

I have checked the listAttributes and it seems to be ok, would you please tell me what is the problem? 

 

ADD REPLY
1
Entering edit mode

Sure. You are asking for a metric ton of data, with an arbitrarily large amount of replication, and evidently the Biomart server is taking longer than curl_fetch_memory is willing to wait. I did get it to go, and I wonder what you plan to do with a data.frame with almost 1.4 million rows?

> dim(hsapiens_inf)
[1] 1383187       4

Do note that the Biomart server is going to return a fully normalized table that is joined across each of the attributes you are requesting. So if a gene has two transcripts, you get two rows. And if the transcripts have say three exons each, you now get six rows. And if there are different proteins in there, you get more rows still. All for one gene.

Here's the worst of it:

> tail(table(table(hsapiens_inf[,1])), 30)

 742  745  758  777  790  810  832  857  863  916  924  931  936  939  974  981
   1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
 986  996 1050 1095 1115 1139 1151 1220 1266 1380 1454 2012 2045 2136
   1    1    1    1    2    1    1    1    1    1    1    1    1    1

So you have one gene that takes up 2136 rows of the data.frame! That's legit. But of what use is that?

Perhaps it would be better for you to say what you are trying to do, and then maybe somebody could offer a suggestion.

ADD REPLY
0
Entering edit mode

@James W. MacDonald basically I am trying to do gene onthology. there are 100 packages but I prefer to retrieve data from UniProt or ensemble. 

ADD REPLY
0
Entering edit mode

What is gene onthology?

ADD REPLY
0
Entering edit mode

@James W. MacDonald gene ontology means you can find various information from any given gene. Look at http://www.geneontology.org 

 

ADD REPLY
0
Entering edit mode

Oh, you mean gene ontology, not onthology. Fair enough. And if you prefer to do it your own way, rather than using the existing packages to do so, I guess you should have at it. But do note that doing things 'your own way' implies that you A) know what you are doing, and B) have it handled. So good luck with that!

ADD REPLY

Login before adding your answer.

Traffic: 618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6