Search
Question: how to retrieve all attributes from biomart ?
0
gravatar for Bioinformatics
11 days ago by
Bioinformatics30 wrote:

Basically I want to extract all attributes for several genes , 

when I use the following as example, I get an error , would anyone know why?

hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','ensembl_exon_id',
+                                    'description','chromosome_name','start_position','end_position','strand','band',
+                                    'transcript_start','transcript_end','external_gene_id','external_transcript_id',
+                                    'external_gene_db','transcript_db_name','transcript_count',
+                                    'percentage_gc_content','gene_biotype','transcript_biotype','source',
+                                    'transcript_source','status,transcript_status','phenotype_description',
+                                    'source_name','study_external_id','go_id','name_1006','definition_1006',
+                                    'go_linkage_type','namespace_1003','goslim_goa_accession','goslim_goa_description',
+                                    'arrayexpress','chembl'),mart = mart)
Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id",  : 
  Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status 
Please use the function 'listAttributes' to get valid attribute names

ADD COMMENTlink modified 11 days ago by James W. MacDonald48k • written 11 days ago by Bioinformatics30
0
gravatar for James W. MacDonald
11 days ago by
United States
James W. MacDonald48k wrote:

You get this error:

Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id",  : 
  Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status 
Please use the function 'listAttributes' to get valid attribute names

Can you say why that isn't sufficient/descriptive enough for you to diagnose this yourself?

ADD COMMENTlink written 11 days ago by James W. MacDonald48k

@James W. MacDonald

 

I just made the parsing smaller in order to show I have tried a lot but I cannot figure out what the problem is. For example 

hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
+                                    'ensembl_peptide_id','ensembl_exon_id'),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection timed out after 10003 milliseconds

Or this one 

hsapiens_inf <- getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id",
+                                    "ensembl_peptide_id","ensembl_exon_id"),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection timed out after 10003 milliseconds

I have checked the listAttributes and it seems to be ok, would you please tell me what is the problem? 

 

ADD REPLYlink modified 11 days ago • written 11 days ago by Bioinformatics30
1

Sure. You are asking for a metric ton of data, with an arbitrarily large amount of replication, and evidently the Biomart server is taking longer than curl_fetch_memory is willing to wait. I did get it to go, and I wonder what you plan to do with a data.frame with almost 1.4 million rows?

> dim(hsapiens_inf)
[1] 1383187       4

Do note that the Biomart server is going to return a fully normalized table that is joined across each of the attributes you are requesting. So if a gene has two transcripts, you get two rows. And if the transcripts have say three exons each, you now get six rows. And if there are different proteins in there, you get more rows still. All for one gene.

Here's the worst of it:

> tail(table(table(hsapiens_inf[,1])), 30)

 742  745  758  777  790  810  832  857  863  916  924  931  936  939  974  981
   1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
 986  996 1050 1095 1115 1139 1151 1220 1266 1380 1454 2012 2045 2136
   1    1    1    1    2    1    1    1    1    1    1    1    1    1

So you have one gene that takes up 2136 rows of the data.frame! That's legit. But of what use is that?

Perhaps it would be better for you to say what you are trying to do, and then maybe somebody could offer a suggestion.

ADD REPLYlink written 10 days ago by James W. MacDonald48k

@James W. MacDonald basically I am trying to do gene onthology. there are 100 packages but I prefer to retrieve data from UniProt or ensemble. 

ADD REPLYlink written 10 days ago by Bioinformatics30

What is gene onthology?

ADD REPLYlink written 10 days ago by James W. MacDonald48k

@James W. MacDonald gene ontology means you can find various information from any given gene. Look at http://www.geneontology.org 

 

ADD REPLYlink written 10 days ago by Bioinformatics30

Oh, you mean gene ontology, not onthology. Fair enough. And if you prefer to do it your own way, rather than using the existing packages to do so, I guess you should have at it. But do note that doing things 'your own way' implies that you A) know what you are doing, and B) have it handled. So good luck with that!

ADD REPLYlink written 10 days ago by James W. MacDonald48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 347 users visited in the last hour