Automated blasting of short nucleotide sequences against each other
2
0
Entering edit mode
Ken Termiso ▴ 250
@ken-termiso-1087
Last seen 9.6 years ago
Hi all, This may be slightly off-topic, but I'd like to be able to BLAST a large set of about 500 nucleotide sequences against itself (i.e. sequence #1 gets blasted against the other 499 sequences and so on, for a total of 500x500 or 250,000 blasts), and one thing I unbelievably cannot google on the net is a script to do it...rather than writing one I was hoping that someone could point me to a link for this...I found tons of scripts for doing it against a database, but nothing with a matrix like I need to BLAST... My sequences are in plain text. I've got the standalone blast, but just need a script... Presumably this would be very useful for analyzing pseudo-homologous probe sequences..?..so maybe it isn't completely off-topic... Thanks in advance, Ken
• 1.4k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Ken, Actually, if you think about how blast (or blat or other alignment programs) works, you just need to blast the fasta against the blast database of the same sequences. You will get output from blast that includes each sequence blasted against all others, with the obvious caveaut that not all sequences are going to align, so there will be some missing comparisons--no way around that. Then, you just need to put them into some useful form--consider using bioperl if you have access to it. Better yet, if these are sequences from the same organism, just use blat and the output is tab-delimited text which you can load directly into R. If you use blat, you can just do: blat db.fasta db.fasta outfile.psl This should take just a few seconds on a modern machine, depending on the length of the sequences. Sean On Feb 18, 2005, at 2:41 PM, Ken Termiso wrote: > Hi all, > > This may be slightly off-topic, but I'd like to be able to BLAST a > large set of about 500 nucleotide sequences against itself (i.e. > sequence #1 gets blasted against the other 499 sequences and so on, > for a total of 500x500 or 250,000 blasts), and one thing I > unbelievably cannot google on the net is a script to do it...rather > than writing one I was hoping that someone could point me to a link > for this...I found tons of scripts for doing it against a database, > but nothing with a matrix like I need to BLAST... > > My sequences are in plain text. I've got the standalone blast, but > just need a script... > > Presumably this would be very useful for analyzing pseudo-homologous > probe sequences..?..so maybe it isn't completely off-topic... > > Thanks in advance, > Ken > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
@david-lapointe-225
Last seen 9.6 years ago
Well, you only need one run. A query of 500 short seqs against a database of the 500 short seqs. See http://www.oreillynet.com/pub/a/oreilly/bio/news/BLAST.html for a great example. ( Blasting Y.Pestis Proteome vs E.Coli proteome ~ 4000 seqs each, takes about 3-4 min on my laptop). BLASTCLUST might also do what you need. David On Friday 18 February 2005 02:41 pm, Ken Termiso wrote: > Hi all, > > This may be slightly off-topic, but I'd like to be able to BLAST a large > set of about 500 nucleotide sequences against itself (i.e. sequence #1 gets > blasted against the other 499 sequences and so on, for a total of 500x500 > or 250,000 blasts), and one thing I unbelievably cannot google on the net > is a script to do it...rather than writing one I was hoping that someone > could point me to a link for this...I found tons of scripts for doing it > against a database, but nothing with a matrix like I need to BLAST... > > My sequences are in plain text. I've got the standalone blast, but just > need a script... > > Presumably this would be very useful for analyzing pseudo- homologous probe > sequences..?..so maybe it isn't completely off-topic... > > Thanks in advance, > Ken > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT

Login before adding your answer.

Traffic: 897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6