Basically I want to extract all attributes for several genes ,
when I use the following as example, I get an error , would anyone know why?
hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','ensembl_exon_id',
+ 'description','chromosome_name','start_position','end_position','strand','band',
+ 'transcript_start','transcript_end','external_gene_id','external_transcript_id',
+ 'external_gene_db','transcript_db_name','transcript_count',
+ 'percentage_gc_content','gene_biotype','transcript_biotype','source',
+ 'transcript_source','status,transcript_status','phenotype_description',
+ 'source_name','study_external_id','go_id','name_1006','definition_1006',
+ 'go_linkage_type','namespace_1003','goslim_goa_accession','goslim_goa_description',
+ 'arrayexpress','chembl'),mart = mart)
Error in getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id", :
Invalid attribute(s): external_gene_id, external_transcript_id, external_gene_db, transcript_db_name, status,transcript_status
Please use the function 'listAttributes' to get valid attribute names
@James W. MacDonald
I just made the parsing smaller in order to show I have tried a lot but I cannot figure out what the problem is. For example
hsapiens_inf <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
+ 'ensembl_peptide_id','ensembl_exon_id'),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: Connection timed out after 10003 milliseconds
Or this one
hsapiens_inf <- getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id",
+ "ensembl_peptide_id","ensembl_exon_id"),mart = hsapiens6)
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: Connection timed out after 10003 milliseconds
I have checked the listAttributes and it seems to be ok, would you please tell me what is the problem?
Sure. You are asking for a metric ton of data, with an arbitrarily large amount of replication, and evidently the Biomart server is taking longer than
curl_fetch_memory
is willing to wait. I did get it to go, and I wonder what you plan to do with a data.frame with almost 1.4 million rows?Do note that the Biomart server is going to return a fully normalized table that is joined across each of the attributes you are requesting. So if a gene has two transcripts, you get two rows. And if the transcripts have say three exons each, you now get six rows. And if there are different proteins in there, you get more rows still. All for one gene.
Here's the worst of it:
So you have one gene that takes up 2136 rows of the data.frame! That's legit. But of what use is that?
Perhaps it would be better for you to say what you are trying to do, and then maybe somebody could offer a suggestion.
@James W. MacDonald basically I am trying to do gene onthology. there are 100 packages but I prefer to retrieve data from UniProt or ensemble.
What is gene onthology?
@James W. MacDonald gene ontology means you can find various information from any given gene. Look at http://www.geneontology.org
Oh, you mean gene ontology, not onthology. Fair enough. And if you prefer to do it your own way, rather than using the existing packages to do so, I guess you should have at it. But do note that doing things 'your own way' implies that you A) know what you are doing, and B) have it handled. So good luck with that!