Search
Question: Blast analysis of two sequences in R
0
gravatar for prabhakar ghorpade
3.6 years ago by
prabhakar ghorpade100 wrote:
Hello,       I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >90% identity? Sequences >1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >2 CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences.  Thanks. Dr. Ghorpade Prabhakar B. PhD Scholar ( Veterinary Biochemistry), IVRI, Izatnagar, Bareilly, U.P., India [[alternative HTML version deleted]]
ADD COMMENTlink modified 3.6 years ago by Martin Morgan ♦♦ 20k • written 3.6 years ago by prabhakar ghorpade100
1
gravatar for Martin Morgan
3.6 years ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:
On 04/21/2014 03:51 AM, prabhakar ghorpade wrote: > Hello, > I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >90% identity? > > Sequences >> 1 > CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >> 2 > CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA The 'annotate' package has 'blastSequences'; I'm not sure that it's useful enough for your purposes. In the 'devel' branch (see http://bioconductor.org/developers/how-to/useDevel/ it has been updated to be more responsive and to return richer data, e.g., df = blastSequences("CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC", timeout=40, as="data.frame") > head(df, 1) Hit_num Hit_id 1 1 gi|380719094|gb|JQ281544.1| Hit_def Hit_accession Hit_len Hsp_num 1 Expression vector pAV-UCSF, complete sequence JQ281544 11534 1 Hsp_bit-score Hsp_score Hsp_evalue Hsp_query-from Hsp_query-to Hsp_hit-from 1 82.4379 90 1.2063e-13 1 45 2126 Hsp_hit-to Hsp_query-frame Hsp_hit-frame Hsp_identity Hsp_positive Hsp_gaps 1 2170 1 1 45 45 0 Hsp_align-len Hsp_qseq 1 45 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC Hsp_hseq 1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC Hsp_midline 1 ||||||||||||||||||||||||||||||||||||||||||||| > Hope that helps; would be happy to hear of other R solutions. Martin > > > Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. > Thanks. > > > Dr. Ghorpade Prabhakar B. > PhD Scholar ( Veterinary Biochemistry), > IVRI, > Izatnagar, > Bareilly, U.P., > India > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENTlink written 3.6 years ago by Martin Morgan ♦♦ 20k
Hi, I tend in cases like this to shirk wrappers and stay closer to the command line usage of tools such as blast. Below is a generic example of run NCBI blastn (part of blast+ package) under R, and post filter the results. The approach should work with minor changes if you have not upgraded to NCBI's blast+. ~Malcolm s<- ## The sequences to be blasted and their fasta 'deflines' c('>s1','CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC' ,'>s2','CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA' ) blast.f6<- ## The fields you want back from blast. c.f. `blastn -help` c('qseqid', 'sseqid', 'pident', 'qcovs') blast.out<- ## The system call to blastn system2('blastn',c('-db',"'nt'" ,'-outfmt',sprintf('"6 %s"',paste(collapse=' ',blast.f6)) , '-perc_identity',"'.90'" ,'-num_threads', 15 # use 'em if you got 'em ! ) ,input=s ,stdout=TRUE ) blast.out.df<- ## parse blast.out as a table and assign names to it `names<-`(read.table(sep='\t',textConnection(b)),blast.f6) # Query the data frame head(blast.out.df[with(b.df,pident>=90 & qcovs>95),],3) qseqid sseqid pident qcovs 1 s1 gi|380719094|gb|JQ281544.1| 100 100 2 s1 gi|161015434|gb|EU143287.2| 100 100 3 s1 gi|161015430|gb|EU143282.2| 100 100 ~ Malcolm >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Martin Morgan >Sent: Wednesday, April 23, 2014 12:12 PM >To: prabhakar ghorpade; bioconductor at r-project.org >Subject: Re: [BioC] Blast analysis of two sequences in R > >On 04/21/2014 03:51 AM, prabhakar ghorpade wrote: >> Hello, >> I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >>90% identity? >> >> Sequences >>> 1 >> CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >>> 2 >> CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA > >The 'annotate' package has 'blastSequences'; I'm not sure that it's useful >enough for your purposes. In the 'devel' branch (see > > http://bioconductor.org/developers/how-to/useDevel/ > >it has been updated to be more responsive and to return richer data, e.g., > > df = blastSequences("CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC", > timeout=40, as="data.frame") > > > head(df, 1) > Hit_num Hit_id >1 1 gi|380719094|gb|JQ281544.1| > Hit_def Hit_accession Hit_len Hsp_num >1 Expression vector pAV-UCSF, complete sequence JQ281544 11534 1 > Hsp_bit-score Hsp_score Hsp_evalue Hsp_query-from Hsp_query-to Hsp_hit-from >1 82.4379 90 1.2063e-13 1 45 2126 > Hsp_hit-to Hsp_query-frame Hsp_hit-frame Hsp_identity Hsp_positive Hsp_gaps >1 2170 1 1 45 45 0 > Hsp_align-len Hsp_qseq >1 45 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC > Hsp_hseq >1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC > Hsp_midline >1 ||||||||||||||||||||||||||||||||||||||||||||| > > > >Hope that helps; would be happy to hear of other R solutions. > >Martin > >> >> >> Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% >identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. >> Thanks. >> >> >> Dr. Ghorpade Prabhakar B. >> PhD Scholar ( Veterinary Biochemistry), >> IVRI, >> Izatnagar, >> Bareilly, U.P., >> India >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >-- >Computational Biology / Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N. >PO Box 19024 Seattle, WA 98109 > >Location: Arnold Building M1 B861 >Phone: (206) 667-2793 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 3.6 years ago by Malcolm Cook1.4k

Hi Malcolm,

 

Would you mind please give me a reference where I could learn the NCBI query in R? I'm really interested to learn but unfortunately couldn't find the good reference yet.

 

Thanks a lot and looking forward.

Maah

ADD REPLYlink written 21 months ago by maahpishanu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 232 users visited in the last hour