Blast analysis of two sequences in R

0

Entering edit mode

prabhakar ghorpade ▴ 100

@prabhakar-ghorpade-6464

Last seen 11.5 years ago

Hello, I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >90% identity? Sequences >1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >2 CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. Thanks. Dr. Ghorpade Prabhakar B. PhD Scholar ( Veterinary Biochemistry), IVRI, Izatnagar, Bareilly, U.P., India [[alternative HTML version deleted]]

Coverage Coverage • 4.8k views

ADD COMMENT • link updated 11.8 years ago by Martin Morgan 25k • written 11.8 years ago by prabhakar ghorpade ▴ 100

1

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 23 days ago

United States

On 04/21/2014 03:51 AM, prabhakar ghorpade wrote: > Hello, > I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >90% identity? > > Sequences >> 1 > CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >> 2 > CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA The 'annotate' package has 'blastSequences'; I'm not sure that it's useful enough for your purposes. In the 'devel' branch (see http://bioconductor.org/developers/how-to/useDevel/ it has been updated to be more responsive and to return richer data, e.g., df = blastSequences("CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC", timeout=40, as="data.frame") > head(df, 1) Hit_num Hit_id 1 1 gi|380719094|gb|JQ281544.1| Hit_def Hit_accession Hit_len Hsp_num 1 Expression vector pAV-UCSF, complete sequence JQ281544 11534 1 Hsp_bit-score Hsp_score Hsp_evalue Hsp_query-from Hsp_query-to Hsp_hit-from 1 82.4379 90 1.2063e-13 1 45 2126 Hsp_hit-to Hsp_query-frame Hsp_hit-frame Hsp_identity Hsp_positive Hsp_gaps 1 2170 1 1 45 45 0 Hsp_align-len Hsp_qseq 1 45 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC Hsp_hseq 1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC Hsp_midline 1 ||||||||||||||||||||||||||||||||||||||||||||| > Hope that helps; would be happy to hear of other R solutions. Martin > > > Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. > Thanks. > > > Dr. Ghorpade Prabhakar B. > PhD Scholar ( Veterinary Biochemistry), > IVRI, > Izatnagar, > Bareilly, U.P., > India > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 11.8 years ago Martin Morgan 25k

0

Entering edit mode

Hi, I tend in cases like this to shirk wrappers and stay closer to the command line usage of tools such as blast. Below is a generic example of run NCBI blastn (part of blast+ package) under R, and post filter the results. The approach should work with minor changes if you have not upgraded to NCBI's blast+. ~Malcolm s<- ## The sequences to be blasted and their fasta 'deflines' c('>s1','CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC' ,'>s2','CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA' ) blast.f6<- ## The fields you want back from blast. c.f. `blastn -help` c('qseqid', 'sseqid', 'pident', 'qcovs') blast.out<- ## The system call to blastn system2('blastn',c('-db',"'nt'" ,'-outfmt',sprintf('"6 %s"',paste(collapse=' ',blast.f6)) , '-perc_identity',"'.90'" ,'-num_threads', 15 # use 'em if you got 'em ! ) ,input=s ,stdout=TRUE ) blast.out.df<- ## parse blast.out as a table and assign names to it `names<-`(read.table(sep='\t',textConnection(b)),blast.f6) # Query the data frame head(blast.out.df[with(b.df,pident>=90 & qcovs>95),],3) qseqid sseqid pident qcovs 1 s1 gi|380719094|gb|JQ281544.1| 100 100 2 s1 gi|161015434|gb|EU143287.2| 100 100 3 s1 gi|161015430|gb|EU143282.2| 100 100 ~ Malcolm >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Martin Morgan >Sent: Wednesday, April 23, 2014 12:12 PM >To: prabhakar ghorpade; bioconductor at r-project.org >Subject: Re: [BioC] Blast analysis of two sequences in R > >On 04/21/2014 03:51 AM, prabhakar ghorpade wrote: >> Hello, >> I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >>90% identity? >> >> Sequences >>> 1 >> CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >>> 2 >> CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA > >The 'annotate' package has 'blastSequences'; I'm not sure that it's useful >enough for your purposes. In the 'devel' branch (see > > http://bioconductor.org/developers/how-to/useDevel/ > >it has been updated to be more responsive and to return richer data, e.g., > > df = blastSequences("CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC", > timeout=40, as="data.frame") > > > head(df, 1) > Hit_num Hit_id >1 1 gi|380719094|gb|JQ281544.1| > Hit_def Hit_accession Hit_len Hsp_num >1 Expression vector pAV-UCSF, complete sequence JQ281544 11534 1 > Hsp_bit-score Hsp_score Hsp_evalue Hsp_query-from Hsp_query-to Hsp_hit-from >1 82.4379 90 1.2063e-13 1 45 2126 > Hsp_hit-to Hsp_query-frame Hsp_hit-frame Hsp_identity Hsp_positive Hsp_gaps >1 2170 1 1 45 45 0 > Hsp_align-len Hsp_qseq >1 45 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC > Hsp_hseq >1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC > Hsp_midline >1 ||||||||||||||||||||||||||||||||||||||||||||| > > > >Hope that helps; would be happy to hear of other R solutions. > >Martin > >> >> >> Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% >identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. >> Thanks. >> >> >> Dr. Ghorpade Prabhakar B. >> PhD Scholar ( Veterinary Biochemistry), >> IVRI, >> Izatnagar, >> Bareilly, U.P., >> India >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >-- >Computational Biology / Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N. >PO Box 19024 Seattle, WA 98109 > >Location: Arnold Building M1 B861 >Phone: (206) 667-2793 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.8 years ago Malcolm Cook ★ 1.6k

0

Entering edit mode

Hi Malcolm,

Would you mind please give me a reference where I could learn the NCBI query in R? I'm really interested to learn but unfortunately couldn't find the good reference yet.

Thanks a lot and looking forward.

Maah

ADD REPLY • link 10.0 years ago maahpishanu • 0

Login before adding your answer.