locate a target species in Refseq ftp directory
2
0
Entering edit mode
heyi xiao ▴ 360
@heyi-xiao-3308
Last seen 8.2 years ago
United States
Hi all, I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? Heyi
• 1.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi Heyi, ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq- release61.txt And NCBI says 'Ha ha on you - it's not by species!' For example: zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> | head >gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus genomic sequence, ENCODE region ENr231 >gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis genomic sequence, ENCODE region ENm002 >gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus genomic sequence, ENCODE region ENm014 >gi|62903506|ref|NT_113343.1|NT_113343 Dasypus novemcinctus genomic sequence, ENCODE region ENr231 >gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis genomic sequence, ENCODE region ENr323, part 2 of 2 >gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii genomic sequence, ENCODE region ENm010 >gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii genomic sequence, ENCODE region ENr322 >gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic sequence, ENCODE region ENm002 >gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic sequence, ENCODE region ENm003 >gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic sequence, ENCODE region ENm004 Best, Jim On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote: > Hi all, > I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ > But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? > Heyi > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Thanks Jim, for the hint. That?s even worse, I will have to download and work on all files now. Heyi -------------------------------------------- On Fri, 10/4/13, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: Subject: Re: [BioC] locate a target species in Refseq ftp directory Cc: bioconductor at r-project.org Date: Friday, October 4, 2013, 11:53 AM Hi Heyi, ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq- release61.txt And NCBI says 'Ha ha on you - it's not by species!' For example: zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> | head >gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus genomic sequence, ENCODE region ENr231 >gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis genomic sequence, ENCODE region ENm002 >gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus genomic sequence, ENCODE region ENm014 >gi|62903506|ref|NT_113343.1|NT_113343 Dasypus novemcinctus genomic sequence, ENCODE region ENr231 >gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis genomic sequence, ENCODE region ENr323, part 2 of 2 >gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii genomic sequence, ENCODE region ENm010 >gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii genomic sequence, ENCODE region ENr322 >gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic sequence, ENCODE region ENm002 >gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic sequence, ENCODE region ENm003 >gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic sequence, ENCODE region ENm004 Best, Jim On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote: > Hi all, > I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ > But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? > Heyi > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Heyi, You could try the following link to the sheep sequencing consortium web site. You'll find links to gff files there with known and predicted mRNAs, together with the latest draft assembly of the sheep genome sequence (together with thousands of unmapped scaffolds and contigs .....) http://www.livestockgenomics.csiro.au/sheep/oar3.1.php Hope the helps. Dr David Iles Visiting Fellow School of Biology University of Leeds Leeds LS2 9JT UK d.e.iles at leeds.ac.uk<mailto:d.e.iles at="" leeds.ac.uk=""> On 4 Oct 2013, at 17:12, heyi xiao <xiaoheyiyh at="" yahoo.com<mailto:xiaoheyiyh="" at="" yahoo.com="">> wrote: Thanks Jim, for the hint. That?s even worse, I will have to download and work on all files now. Heyi -------------------------------------------- On Fri, 10/4/13, James W. MacDonald <jmacdon at="" uw.edu<mailto:jmacdon="" at="" uw.edu="">> wrote: Subject: Re: [BioC] locate a target species in Refseq ftp directory Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Date: Friday, October 4, 2013, 11:53 AM Hi Heyi, ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq- release61.txt And NCBI says 'Ha ha on you - it's not by species!' For example: zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> | head gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus genomic sequence, ENCODE region ENr231 gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis genomic sequence, ENCODE region ENm002 gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus genomic sequence, ENCODE region ENm014 gi|62903506|ref|NT_113343.1|NT_113343 Dasypus novemcinctus genomic sequence, ENCODE region ENr231 gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis genomic sequence, ENCODE region ENr323, part 2 of 2 gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii genomic sequence, ENCODE region ENm010 gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii genomic sequence, ENCODE region ENr322 gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic sequence, ENCODE region ENm002 gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic sequence, ENCODE region ENm003 gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic sequence, ENCODE region ENm004 Best, Jim On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote: Hi all, I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? Heyi _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Dr. Iles, Thanks for the the link. It is very helpful! Heyi -------------------------------------------- On Fri, 10/4/13, David Iles <d.e.iles at="" leeds.ac.uk=""> wrote: Subject: Re: [BioC] locate a target species in Refseq ftp directory Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Date: Friday, October 4, 2013, 2:59 PM Hi Heyi, You could try the following link to the sheep sequencing consortium web site. You'll find links to gff files there with known and predicted mRNAs, together with the latest draft assembly of the sheep genome sequence (together with thousands of unmapped scaffolds and contigs .....) http://www.livestockgenomics.csiro.au/sheep/oar3.1.php Hope the helps. Dr David Iles Visiting Fellow School of Biology University of Leeds Leeds LS2 9JT UK d.e.iles at leeds.ac.uk<mailto:d.e.iles at="" leeds.ac.uk=""> wrote: Thanks Jim, for the hint. That?s even worse, I will have to download and work on all files now. Heyi -------------------------------------------- On Fri, 10/4/13, James W. MacDonald <jmacdon at="" uw.edu<mailto:jmacdon="" at="" uw.edu="">> wrote: Subject: Re: [BioC] locate a target species in Refseq ftp directory Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Date: Friday, October 4, 2013, 11:53 AM Hi Heyi, ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq- release61.txt And NCBI says 'Ha ha on you - it's not by species!' For example: zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> | head gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus genomic sequence, ENCODE region ENr231 gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis genomic sequence, ENCODE region ENm002 gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus genomic sequence, ENCODE region ENm014 gi|62903506|ref|NT_113343.1|NT_113343 Dasypus novemcinctus genomic sequence, ENCODE region ENr231 gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis genomic sequence, ENCODE region ENr323, part 2 of 2 gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii genomic sequence, ENCODE region ENm010 gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii genomic sequence, ENCODE region ENr322 gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic sequence, ENCODE region ENm002 gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic sequence, ENCODE region ENm003 gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic sequence, ENCODE region ENm004 Best, Jim On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote: Hi all, I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? Heyi _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks Darin, That?s very useful too! -------------------------------------------- On Sat Oct 5, Darin Takemoto wrote: Hi Heyi, One way to get the Refseq RNA sequences for a given species is to look up the species in the Taxonomy DB of NCBI and click through to the details page for the species of interest (in the case of Ovis aries it is Taxonomy ID: 9940). Once there look for the Entrez records table on the right and click on the Direct links entry for the Nucleotide database. Then use the Advanced link to filter to only include Refseq sequences (choose Filter and select "nuccore pubmed refseq"). When you do this here are the results: http://www.ncbi.nlm.nih.gov/nuccore?term=%28txid9940[Organism%3Anoexp] %29%20AND%20%22nuccore%20pubmed%20refseq%22[Filter] To get the sequences click on "Send to:", select File, and select FASTA (or whatever else you want) as Format, and click Create File. Darin On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote: > Hi all, > I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ > But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? > Heyi > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
@darin-takemoto-6175
Last seen 10.2 years ago
Hi Heyi, One way to get the Refseq RNA sequences for a given species is to look up the species in the Taxonomy DB of NCBI and click through to the details page for the species of interest (in the case of Ovis aries it is Taxonomy ID: 9940). Once there look for the Entrez records table on the right and click on the Direct links entry for the Nucleotide database. Then use the Advanced link to filter to only include Refseq sequences (choose Filter and select "nuccore pubmed refseq"). When you do this here are the results: http://www.ncbi.nlm.nih.gov/nuccore?term=%28txid9940[Organism%3Anoexp] %29%20AND%20%22nuccore%20pubmed%20refseq%22[Filter] To get the sequences click on "Send to:", select File, and select FASTA (or whatever else you want) as Format, and click Create File. Darin On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote: > Hi all, > I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian :ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/ > But there so many*rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don?t really help on this. does anyone knows how to locate the right file for a target species there? > Heyi
ADD COMMENT

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6