Entering edit mode
Jack Zhu
▴
170
@jack-zhu-3338
Last seen 7.1 years ago
Hi Mark and Sean,
As Sean mentioned, the NCBI SRA group removed fastq data files from
their ftp site, but supplies sra or sra-lite data files for
downloading. In order to deal with this significant changes, I have
modified the SRA package (in both 2.7 release and dev version):
1. Removed functions of listFastq, getFastqInfo and getFaastq
2. Added functions of listSRAfile, getSRAinfo and getSRAfile
3. Modified the corresponding files to reflect the change.
Examples of new functions:
library(SRAdb)
getSRAdbFile()
sra_dbname <- 'SRAmetadb.sqlite'
sra_con <- dbConnect(dbDriver("SQLite"), sra_dbname)
List sra-lite data
file names including ftp addresses associated with
"SRX000122":
> rs <- listSRAfile("SRX000122", sra_con = sra_con, sraType =
"litesra")
> rs[1:2,]
experiment
sra
1 SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/By
Exp/litesra/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra
2 SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/By
Exp/litesra/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra
The above function does not check
file availability, size and date of
the sra or sra-lite data
files on the server, but the function
getSRAinfo does this, which is good to know if you
are preparing to download them:
> rs <- getSRAinfo(in_acc = c("SRX000122"), sra_con = sra_con)
> rs[1:2, ]
sra experiment size(KB)
1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra
/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra
SRX000122 104
2 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra
/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra
SRX000122 50536
Next you might want to download sra or sra-lite data
files from the
ftp site. The getSRA
file function will download all available sra or
sra-lite data
files associated with
"SRR000648" and "SRR000657" from NCBI SRA ftp site to a new folder in
current directory:
> getSRAfile(in_acc = c("SRR000648", "SRR000657"), sra_con = sra_con,
destdir = getwd(), sraType = "litesra", method='curl')
Files are saved to: '/Users/zhujack/Documents/R'
100 103k 100 103k 0 0 57382 0 0:00:01 0:00:01
--:--:-- 124k:--:-- 0
100 154k 100 154k 0 0 132k 0 0:00:01 0:00:01
--:--:-- 217k-:-- 0
Your suggestions will be greatly appreciated.
Jack
>
----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 5 Jan 2011 11:29:10 +0000
> From: Mark Dunning <mark.dunning at="" gmail.com="">
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] SRAdb listFastq error
> Message-ID:
> ? ? ? ?<aanlktikvwcofvccvgft4m_fy5wtu4gdp8te1au9vyclc at="" mail.gmail.com="">
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> I am hoping to download some Fastq files from the Short Read Archive
> and am following the vignette for SRAdb. However, I get an error
when
> trying the example of using listFastq
>
>> ?listFastq("SRA011804", sra_con = sra_con)
> Error in curlPerform(curl = curl, .opts = opts, .encoding =
.encoding) :
> ?Server denied you to change to the given directory
>
> I think I have setup the sra_con object correctly
>
>> sra_con
> <sqliteconnection: dbi="" con="" (2626,="" 2)="">
>> ?dbListFields(sra_con, "study")
> ?[1] "study_ID" ? ? ? ? ? ? "study_alias" ? ? ? ? ?"study_accession"
> ?[4] "study_title" ? ? ? ? ?"study_type" ? ? ? ? ? "study_abstract"
> ?[7] "center_name" ? ? ? ? ?"center_project_name" ?"project_id"
> [10] "study_description" ? ?"study_url_link" ? ? ?
"study_entrez_link"
> [13] "study_attribute" ? ? ?"submission_accession" "sradb_updated"
>>
>
> Cheers,
>
> Mark
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> ?[1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C
> ?[3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8
> ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8
> ?[7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C
> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> other attached packages:
> [1] SRAdb_1.4.0 ? graph_1.28.0 ?RSQLite_0.9-3 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 ?GEOquery_2.16.3 RCurl_1.4-3 ? ? tools_2.12.0
> [5] XML_3.2-0
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 5 Jan 2011 06:54:33 -0500
> From: Sean Davis <sdavis2 at="" mail.nih.gov="">
> To: Mark Dunning <mark.dunning at="" gmail.com="">
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] SRAdb listFastq error
> Message-ID:
> ? ? ? ?<aanlktintg+cgh35_5zutbkqwnaobvwmlp-absq0b6cti at="" mail.gmail.com="">
> Content-Type: text/plain
>
> On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at="" gmail.com=""> wrote:
>
>> Hi,
>>
>> I am hoping to download some Fastq files from the Short Read
Archive
>> and am following the vignette for SRAdb. However, I get an error
when
>> trying the example of using listFastq
>>
>> > ?listFastq("SRA011804", sra_con = sra_con)
>> Error in curlPerform(curl = curl, .opts = opts, .encoding =
.encoding) :
>> ?Server denied you to change to the given directory
>>
>>
> Hi, Mark. ?Unfortunately, we are going to have to remove fastq
access tools
> as NCBI has removed the fastq files:
>
> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_St
atic_fastq_dumps
>
> We are looking at workarounds for this, but for the time being, the
> functionality is broken and will not likely be retained in the same
form.
>
> Sean
>
>
>> I think I have setup the sra_con object correctly
>>
>> > sra_con
>> <sqliteconnection: dbi="" con="" (2626,="" 2)="">
>> > ?dbListFields(sra_con, "study")
>> ?[1] "study_ID" ? ? ? ? ? ? "study_alias" ? ? ? ?
?"study_accession"
>> ?[4] "study_title" ? ? ? ? ?"study_type" ? ? ? ? ? "study_abstract"
>> ?[7] "center_name" ? ? ? ? ?"center_project_name" ?"project_id"
>> [10] "study_description" ? ?"study_url_link" ? ? ?
"study_entrez_link"
>> [13] "study_attribute" ? ? ?"submission_accession" "sradb_updated"
>> >
>>
>> Cheers,
>>
>> Mark
>>
>>
>> > sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> ?[1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C
>> ?[3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8
>> ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8
>> ?[7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C
>> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>>
>> other attached packages:
>> [1] SRAdb_1.4.0 ? graph_1.28.0 ?RSQLite_0.9-3 DBI_0.2-5
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0 ?GEOquery_2.16.3 RCurl_1.4-3 ? ? tools_2.12.0
>> [5] XML_3.2-0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 5 Jan 2011 08:13:28 -0500
> From: Sean Davis <sdavis2 at="" mail.nih.gov="">
> To: Mark Dunning <mark.dunning at="" gmail.com="">
> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
> Subject: Re: [BioC] SRAdb listFastq error
> Message-ID:
> ? ? ? ?<aanlktin7lzrfrndltpec4c-zvbnkhazcaugqtphrg+s0 at="" mail.gmail.com="">
> Content-Type: text/plain
>
> On Wed, Jan 5, 2011 at 7:47 AM, Mark Dunning <mark.dunning at="" gmail.com=""> wrote:
>
>> Hi Sean,
>>
>> That's a shame. Thanks for the information. I think I can get the
>> fastqs I need by another means though.
>>
>>
> You don't have to work too hard at it. ?The process is described
here:
>
> http://www.ncbi.nlm.nih.gov/books/NBK50846/#UsingToolKit_BK.3_Conver
ting_SRA_format
>
> In short, you'll need the SRA SDK to do so.
>
> http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software
&m=software&s=software
>
> Binaries are available for several architectures and OSes.
>
> Sean
>
>
>
>> Regards,
>>
>> Mark
>>
>> On Wed, Jan 5, 2011 at 11:54 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote:
>> >
>> >
>> > On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at="" gmail.com="">
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am hoping to download some Fastq files from the Short Read
Archive
>> >> and am following the vignette for SRAdb. However, I get an error
when
>> >> trying the example of using listFastq
>> >>
>> >> > ?listFastq("SRA011804", sra_con = sra_con)
>> >> Error in curlPerform(curl = curl, .opts = opts, .encoding =
.encoding) :
>> >> ?Server denied you to change to the given directory
>> >>
>> >
>> > Hi, Mark. ?Unfortunately, we are going to have to remove fastq
access
>> tools
>> > as NCBI has removed the fastq files:
>> >
>> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_S
tatic_fastq_dumps
>> > We are looking at workarounds for this, but for the time being,
the
>> > functionality is broken and will not likely be retained in the
same form.
>> > Sean
>> >
>> >>
>> >> I think I have setup the sra_con object correctly
>> >>
>> >> > sra_con
>> >> <sqliteconnection: dbi="" con="" (2626,="" 2)="">
>> >> > ?dbListFields(sra_con, "study")
>> >> ?[1] "study_ID" ? ? ? ? ? ? "study_alias" ? ? ? ?
?"study_accession"
>> >> ?[4] "study_title" ? ? ? ? ?"study_type" ? ? ? ? ?
"study_abstract"
>> >> ?[7] "center_name" ? ? ? ? ?"center_project_name" ?"project_id"
>> >> [10] "study_description" ? ?"study_url_link" ? ? ?
"study_entrez_link"
>> >> [13] "study_attribute" ? ? ?"submission_accession"
"sradb_updated"
>> >> >
>> >>
>> >> Cheers,
>> >>
>> >> Mark
>> >>
>> >>
>> >> > sessionInfo()
>> >> R version 2.12.0 (2010-10-15)
>> >> Platform: x86_64-pc-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >> ?[1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C
>> >> ?[3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8
>> >> ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8
>> >> ?[7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C
>> >> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>> >>
>> >> other attached packages:
>> >> [1] SRAdb_1.4.0 ? graph_1.28.0 ?RSQLite_0.9-3 DBI_0.2-5
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] Biobase_2.10.0 ?GEOquery_2.16.3 RCurl_1.4-3 ? ? tools_2.12.0
>> >> [5] XML_3.2-0
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
>