Blastp - through R
2
0
Entering edit mode
@maahpishanu-9664
Last seen 8.2 years ago

Hi,

I'm supposed to do reciprocal best hit using blastp for a long list of proteins for which I'm given only the UniProtKB/TrEMBL identifier. I have to find the ortholog proteins for each of the given proteins. I also have to find orthologs in a given list of species.

Previously, when I had only one Protein, I would do it through online tool in NCBI, there I can give the identifier and also limit the species. However, with more proteins in hand, this is not feasible anymore.

I downloaded Blast, but I cannot download any databases (memory problem, and I need refseq_protein database).

I'm familiar with R and would like to do it in R, but sending online queries to the NCBI blastp.

I searched a lot and was not able to find hint.

I installed orthologr, biomaRt and some other packages, but they only used local databases.

I would appreciate it a lot if somebody could help me how I can send online query through R. I have an R code and somewhere in the code I need to call blastp to find orthologs of my proteins among the given list of species.

 

Thanks and looking forward.

Maah

 

 

blast R • 5.5k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States

blastSequences in the annotate package might be a partial solution. For some protein sequence query,

library(annotate)
res = blastSequences(query, "nr", program="blastp", as="XML")

The return value is XML, and would need to be processed using xpath syntax. For instance, with

> query
[1] "QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFLIKHNPTNTIVYFGRYWSP"

we have

> sapply(res["//Hit_id"], xmlValue)
 [1] "gi|129295|sp|P01013.1|OVALX_CHICK" "gi|448824824|ref|NP_001263315.1|" 
 [3] "gi|971382511|ref|XP_015137661.1|"  "gi|971382509|ref|XP_015137660.1|" 
 [5] "gi|971382507|ref|XP_015137659.1|"  "gi|733881413|ref|XP_010706721.1|" 
 [7] "gi|385145541|dbj|BAM13279.1|"      "gi|71897377|ref|NP_001026172.1|"  
 [9] "gi|733881415|ref|XP_010706722.1|"  "gi|971382505|ref|XP_015137657.1|" 
> sapply(res["//Hit_len"], xmlValue)
 [1] "232" "388" "397" "402" "456" "402" "388" "388" "388" "388"
> 

 

ADD COMMENT
0
Entering edit mode

Thanks a lot. Is there anyway to give NCBI accession number (eg: ''O48946") as an input instead of the sequence?

ADD REPLY
0
Entering edit mode

Use the accession number(s) as the query

> res = blastSequences("O48946", "nr", program="blastp", as="XML")
> sapply(res["//Hit_id"], xmlValue)
 [1] "gi|15236786|ref|NP_194967.1|"     "gi|297798722|ref|XP_002867245.1|"
 [3] "gi|565449881|ref|XP_006285557.1|" "gi|727512843|ref|XP_010432725.1|"
 [5] "gi|727549104|ref|XP_010447394.1|" "gi|567217696|ref|XP_006412477.1|"
 [7] "gi|674237085|gb|KFK29850.1|"      "gi|922532521|ref|XP_013597875.1|"
 [9] "gi|674960897|emb|CDX72357.1|"     "gi|674919795|emb|CDY13484.1|"    
> res[["//BlastOutput_query-ID/text()"]]
gi|73917709|sp|O48946.1|CESA1_ARATH 
> res[["//BlastOutput_query-ID/text()"]]
gi|73917709|sp|O48946.1|CESA1_ARATH 
ADD REPLY
0
Entering edit mode
@maahpishanu-9664
Last seen 8.2 years ago

I found that I can send my query like the following through R:

 

system2('blastp',c('-remote', "-best_hit_overhang", "0.1", "-best_hit_score_edge", "0.1",'-db', "'refseq_protein'" ,'-entrez_query', "prf||O48946"),stdout=TRUE)

However, I get the following error:

 

[1] "Error: (301.23) [CONN_Read(blast4/HTTP; http://www.ncbi.nlm.nih.gov/Service/dispd.cgi?service=blast4&address=W7DELL902035.mpimp-golm.mpg.de&platform=i386-pc-win64)]  Unable to read data: Timeout[30.000000]"
[2] "Warning: Error initializing remote BLAST database data loader: Protein BLAST database ''refseq_protein'' does not exist in the NCBI servers"                                                                  
[3] "Command line argument error: Query is Empty!"                                                                                                                                                                 
attr(,"status")
[1] 1
Warning message:
running command '"blastp" -remote -best_hit_overhang 0.1 -best_hit_score_edge 0.1 -db 'refseq_protein' -entrez_query prf||O48946' had status 1 

Could anybody help me with this?

ADD COMMENT
0
Entering edit mode

Don't double-quote the database name, "'refseq_protein'", just "refseq_protein".

ADD REPLY

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6