Batch sequence retrieval
1
0
Entering edit mode
Daniel Brewer ★ 1.9k
@daniel-brewer-1791
Last seen 9.6 years ago
Hi all, I am in a situation where I would like to download all the sequences associated with human IMAGE clones and then blast them a range of other sequences (~3.4 million). I have the accession numbers for all of them. I have tried a number of ways to do this: 1) Search for "IMAGE: homo sapiens" and download the fasta sequence. This fails after a while for no reason. 2) A script using getSeq from the annotate library. This is very slow, but is chugging a way. 3) The batchentrez utility. There seems to a problem with the link at the moment. Has anyone got any suggestions of a better way to do this. Does Genbank allow SQL access? Many thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}}
Cancer annotate Cancer annotate • 640 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Daniel Brewer wrote: > Hi all, > > I am in a situation where I would like to download all the sequences > associated with human IMAGE clones and then blast them a range of other > sequences (~3.4 million). I have the accession numbers for all of them. > I have tried a number of ways to do this: > 1) Search for "IMAGE: homo sapiens" and download the fasta sequence. > This fails after a while for no reason. > 2) A script using getSeq from the annotate library. This is very slow, > but is chugging a way. > 3) The batchentrez utility. There seems to a problem with the link at > the moment. > > Has anyone got any suggestions of a better way to do this. Does Genbank > allow SQL access? Genbank is not stored in a SQL database. The closest they get to programmatic access is Eutils. Have you considered downloading the appropriate BLAST database and then limiting by GI number? This technique is made for doing exactly what you are suggesting. You simply need to have a file of GI numbers associated with your sequences and can then use formatdb to create a custom blast database. Sean
ADD COMMENT

Login before adding your answer.

Traffic: 467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6