Bioconductor Digest, Vol 95, Issue 7
0
0
Entering edit mode
Jack Zhu ▴ 170
@jack-zhu-3338
Last seen 7.1 years ago
Hi Mark and Sean, As Sean mentioned, the NCBI SRA group removed fastq data files from their ftp site, but supplies sra or sra-lite data files for downloading. In order to deal with this significant changes, I have modified the SRA package (in both 2.7 release and dev version): 1. Removed functions of listFastq, getFastqInfo and getFaastq 2. Added functions of listSRAfile, getSRAinfo and getSRAfile 3. Modified the corresponding files to reflect the change. Examples of new functions: library(SRAdb) getSRAdbFile() sra_dbname <- 'SRAmetadb.sqlite' sra_con <- dbConnect(dbDriver("SQLite"), sra_dbname) List sra-lite data file names including ftp addresses associated with "SRX000122": > rs <- listSRAfile("SRX000122", sra_con = sra_con, sraType = "litesra") > rs[1:2,] experiment sra 1 SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/By Exp/litesra/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra 2 SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/By Exp/litesra/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra The above function does not check file availability, size and date of the sra or sra-lite data files on the server, but the function getSRAinfo does this, which is good to know if you are preparing to download them: > rs <- getSRAinfo(in_acc = c("SRX000122"), sra_con = sra_con) > rs[1:2, ] sra experiment size(KB) 1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra /SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra SRX000122 104 2 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra /SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra SRX000122 50536 Next you might want to download sra or sra-lite data files from the ftp site. The getSRA file function will download all available sra or sra-lite data files associated with "SRR000648" and "SRR000657" from NCBI SRA ftp site to a new folder in current directory: > getSRAfile(in_acc = c("SRR000648", "SRR000657"), sra_con = sra_con, destdir = getwd(), sraType = "litesra", method='curl') Files are saved to: '/Users/zhujack/Documents/R' 100 103k 100 103k 0 0 57382 0 0:00:01 0:00:01 --:--:-- 124k:--:-- 0 100 154k 100 154k 0 0 132k 0 0:00:01 0:00:01 --:--:-- 217k-:-- 0 Your suggestions will be greatly appreciated. Jack > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 5 Jan 2011 11:29:10 +0000 > From: Mark Dunning <mark.dunning at="" gmail.com=""> > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] SRAdb listFastq error > Message-ID: > ? ? ? ?<aanlktikvwcofvccvgft4m_fy5wtu4gdp8te1au9vyclc at="" mail.gmail.com=""> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > I am hoping to download some Fastq files from the Short Read Archive > and am following the vignette for SRAdb. However, I get an error when > trying the example of using listFastq > >> ?listFastq("SRA011804", sra_con = sra_con) > Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : > ?Server denied you to change to the given directory > > I think I have setup the sra_con object correctly > >> sra_con > <sqliteconnection: dbi="" con="" (2626,="" 2)=""> >> ?dbListFields(sra_con, "study") > ?[1] "study_ID" ? ? ? ? ? ? "study_alias" ? ? ? ? ?"study_accession" > ?[4] "study_title" ? ? ? ? ?"study_type" ? ? ? ? ? "study_abstract" > ?[7] "center_name" ? ? ? ? ?"center_project_name" ?"project_id" > [10] "study_description" ? ?"study_url_link" ? ? ? "study_entrez_link" > [13] "study_attribute" ? ? ?"submission_accession" "sradb_updated" >> > > Cheers, > > Mark > > >> sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > ?[1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C > ?[3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8 > ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8 > ?[7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C > ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] SRAdb_1.4.0 ? graph_1.28.0 ?RSQLite_0.9-3 DBI_0.2-5 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 ?GEOquery_2.16.3 RCurl_1.4-3 ? ? tools_2.12.0 > [5] XML_3.2-0 > > > > ------------------------------ > > Message: 2 > Date: Wed, 5 Jan 2011 06:54:33 -0500 > From: Sean Davis <sdavis2 at="" mail.nih.gov=""> > To: Mark Dunning <mark.dunning at="" gmail.com=""> > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] SRAdb listFastq error > Message-ID: > ? ? ? ?<aanlktintg+cgh35_5zutbkqwnaobvwmlp-absq0b6cti at="" mail.gmail.com=""> > Content-Type: text/plain > > On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at="" gmail.com=""> wrote: > >> Hi, >> >> I am hoping to download some Fastq files from the Short Read Archive >> and am following the vignette for SRAdb. However, I get an error when >> trying the example of using listFastq >> >> > ?listFastq("SRA011804", sra_con = sra_con) >> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : >> ?Server denied you to change to the given directory >> >> > Hi, Mark. ?Unfortunately, we are going to have to remove fastq access tools > as NCBI has removed the fastq files: > > http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_St atic_fastq_dumps > > We are looking at workarounds for this, but for the time being, the > functionality is broken and will not likely be retained in the same form. > > Sean > > >> I think I have setup the sra_con object correctly >> >> > sra_con >> <sqliteconnection: dbi="" con="" (2626,="" 2)=""> >> > ?dbListFields(sra_con, "study") >> ?[1] "study_ID" ? ? ? ? ? ? "study_alias" ? ? ? ? ?"study_accession" >> ?[4] "study_title" ? ? ? ? ?"study_type" ? ? ? ? ? "study_abstract" >> ?[7] "center_name" ? ? ? ? ?"center_project_name" ?"project_id" >> [10] "study_description" ? ?"study_url_link" ? ? ? "study_entrez_link" >> [13] "study_attribute" ? ? ?"submission_accession" "sradb_updated" >> > >> >> Cheers, >> >> Mark >> >> >> > sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> ?[1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C >> ?[3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8 >> ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8 >> ?[7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C >> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] SRAdb_1.4.0 ? graph_1.28.0 ?RSQLite_0.9-3 DBI_0.2-5 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.10.0 ?GEOquery_2.16.3 RCurl_1.4-3 ? ? tools_2.12.0 >> [5] XML_3.2-0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > ? ? ? ?[[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 3 > Date: Wed, 5 Jan 2011 08:13:28 -0500 > From: Sean Davis <sdavis2 at="" mail.nih.gov=""> > To: Mark Dunning <mark.dunning at="" gmail.com=""> > Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] SRAdb listFastq error > Message-ID: > ? ? ? ?<aanlktin7lzrfrndltpec4c-zvbnkhazcaugqtphrg+s0 at="" mail.gmail.com=""> > Content-Type: text/plain > > On Wed, Jan 5, 2011 at 7:47 AM, Mark Dunning <mark.dunning at="" gmail.com=""> wrote: > >> Hi Sean, >> >> That's a shame. Thanks for the information. I think I can get the >> fastqs I need by another means though. >> >> > You don't have to work too hard at it. ?The process is described here: > > http://www.ncbi.nlm.nih.gov/books/NBK50846/#UsingToolKit_BK.3_Conver ting_SRA_format > > In short, you'll need the SRA SDK to do so. > > http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software &m=software&s=software > > Binaries are available for several architectures and OSes. > > Sean > > > >> Regards, >> >> Mark >> >> On Wed, Jan 5, 2011 at 11:54 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> > >> > >> > On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at="" gmail.com=""> >> wrote: >> >> >> >> Hi, >> >> >> >> I am hoping to download some Fastq files from the Short Read Archive >> >> and am following the vignette for SRAdb. However, I get an error when >> >> trying the example of using listFastq >> >> >> >> > ?listFastq("SRA011804", sra_con = sra_con) >> >> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : >> >> ?Server denied you to change to the given directory >> >> >> > >> > Hi, Mark. ?Unfortunately, we are going to have to remove fastq access >> tools >> > as NCBI has removed the fastq files: >> > >> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_S tatic_fastq_dumps >> > We are looking at workarounds for this, but for the time being, the >> > functionality is broken and will not likely be retained in the same form. >> > Sean >> > >> >> >> >> I think I have setup the sra_con object correctly >> >> >> >> > sra_con >> >> <sqliteconnection: dbi="" con="" (2626,="" 2)=""> >> >> > ?dbListFields(sra_con, "study") >> >> ?[1] "study_ID" ? ? ? ? ? ? "study_alias" ? ? ? ? ?"study_accession" >> >> ?[4] "study_title" ? ? ? ? ?"study_type" ? ? ? ? ? "study_abstract" >> >> ?[7] "center_name" ? ? ? ? ?"center_project_name" ?"project_id" >> >> [10] "study_description" ? ?"study_url_link" ? ? ? "study_entrez_link" >> >> [13] "study_attribute" ? ? ?"submission_accession" "sradb_updated" >> >> > >> >> >> >> Cheers, >> >> >> >> Mark >> >> >> >> >> >> > sessionInfo() >> >> R version 2.12.0 (2010-10-15) >> >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> >> >> locale: >> >> ?[1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C >> >> ?[3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8 >> >> ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8 >> >> ?[7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C >> >> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> >> >> other attached packages: >> >> [1] SRAdb_1.4.0 ? graph_1.28.0 ?RSQLite_0.9-3 DBI_0.2-5 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] Biobase_2.10.0 ?GEOquery_2.16.3 RCurl_1.4-3 ? ? tools_2.12.0 >> >> [5] XML_3.2-0 >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > > ? ? ? ?[[alternative HTML version deleted]] > > >
PROcess SRAdb PROcess SRAdb • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 1048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6