Question

how to retrieve all attributes from biomart ?

0

Entering edit mode

Bioinformatics ▴ 30

@bioinformatics-10931

Last seen 3.9 years ago

United States

Basically I want to extract all attributes for several genes ,

when I use the following as example, I get an error , would anyone know why?

hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','ensembl_exon_id',
+ 'description','chromosome_name','start_position','end_position','strand','band',
+ 'transcript_start','transcript_end','external_gene_id','external_transcript_id',
+ 'external_gene_db','transcript_db_name','transcript_count',
+ 'percentage_gc_content','gene_biotype','transcript_biotype','source',
+ 'transcript_source','status,transcript_status','phenotype_description',
+ 'source_name','study_external_id','go_id','name_1006','definition_1006',
+ 'go_linkage_type','namespace_1003','goslim_goa_accession','goslim_goa_description',
+ 'arrayexpress','chembl'),mart = mart)
Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id", :
Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status
Please use the function 'listAttributes' to get valid attribute names

biomart • 3.1k views

ADD COMMENT • link updated 7.0 years ago by James W. MacDonald 68k • written 7.0 years ago by Bioinformatics ▴ 30

score 0 · Answer 1 · 2018-12-06

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

You get this error:

Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id",  : 
  Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status 
Please use the function 'listAttributes' to get valid attribute names

Can you say why that isn't sufficient/descriptive enough for you to diagnose this yourself?

ADD COMMENT • link 7.0 years ago James W. MacDonald 68k

0

Entering edit mode

@James W. MacDonald

I just made the parsing smaller in order to show I have tried a lot but I cannot figure out what the problem is. For example

hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
+ 'ensembl_peptide_id','ensembl_exon_id'),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: Connection timed out after 10003 milliseconds

Or this one

hsapiens_inf <- getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id",
+ "ensembl_peptide_id","ensembl_exon_id"),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: Connection timed out after 10003 milliseconds

I have checked the listAttributes and it seems to be ok, would you please tell me what is the problem?

ADD REPLY • link 7.0 years ago Bioinformatics ▴ 30

1

Entering edit mode

Sure. You are asking for a metric ton of data, with an arbitrarily large amount of replication, and evidently the Biomart server is taking longer than curl_fetch_memory is willing to wait. I did get it to go, and I wonder what you plan to do with a data.frame with almost 1.4 million rows?

> dim(hsapiens_inf)
[1] 1383187       4

Do note that the Biomart server is going to return a fully normalized table that is joined across each of the attributes you are requesting. So if a gene has two transcripts, you get two rows. And if the transcripts have say three exons each, you now get six rows. And if there are different proteins in there, you get more rows still. All for one gene.

Here's the worst of it:

> tail(table(table(hsapiens_inf[,1])), 30)

 742  745  758  777  790  810  832  857  863  916  924  931  936  939  974  981
   1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
 986  996 1050 1095 1115 1139 1151 1220 1266 1380 1454 2012 2045 2136
   1    1    1    1    2    1    1    1    1    1    1    1    1    1

So you have one gene that takes up 2136 rows of the data.frame! That's legit. But of what use is that?

Perhaps it would be better for you to say what you are trying to do, and then maybe somebody could offer a suggestion.

ADD REPLY • link 7.0 years ago James W. MacDonald 68k

0

Entering edit mode

@James W. MacDonald basically I am trying to do gene onthology. there are 100 packages but I prefer to retrieve data from UniProt or ensemble.

ADD REPLY • link 7.0 years ago Bioinformatics ▴ 30

0

Entering edit mode

What is gene onthology?

ADD REPLY • link 7.0 years ago James W. MacDonald 68k

0

Entering edit mode

@James W. MacDonald gene ontology means you can find various information from any given gene. Look at http://www.geneontology.org

ADD REPLY • link 7.0 years ago Bioinformatics ▴ 30

0

Entering edit mode

Oh, you mean gene ontology, not onthology. Fair enough. And if you prefer to do it your own way, rather than using the existing packages to do so, I guess you should have at it. But do note that doing things 'your own way' implies that you A) know what you are doing, and B) have it handled. So good luck with that!

ADD REPLY • link 7.0 years ago James W. MacDonald 68k