I'm trying to download some chip-seq datasets from NCBI Gene Expression Omnibus using fastq-dump command like
fastq-dump SRR499718.lite.sra
and I got error message like below:
2016-09-27T16:33:36 fastq-dump.2.7 err: item not found while constructing within virtual database module - the path 'SRR499718.lite.sra' cannot be opened as database or table.
However, running the command like following
fastq-dump SRR499718
worked fine.
When checking the GEO or SRA websites, I couldn't find any files with extension .lite.sra. My questions are 1) what are the differences between SRR*.lite.sra file and SRR*.sra file? 2) where can I download files with the extension .lite.sra?
Any suggestions will be highly appreciated.
By the way, the problem rose when I was practising the following chip-seq workflow:
source("http://bioconductor.org/workflows.R") workflowInstall("chipseqDB")
sra.numbers <- c("SRR499718", "SRR499719", "SRR499720", "SRR499721",
"SRR499734", "SRR499735", "SRR499736", "SRR499737", "SRR499738")
grouping <- c("proB-8113", "proB-8113", "proB-8108", "proB-8108",
"matureB-8059", "matureB-8059", "matureB-8059", "matureB-8059", "matureB-8086")
all.sra <- paste0(sra.numbers, ".lite.sra")
data.frame(SRA=all.sra, Condition=grouping)
for (sra in all.sra) {
code <- system(paste("fastq-dump", sra))
stopifnot(code==0L)
}
After the for loop, I got the same error message as shown above.
Thank you so much. This explains everything.
I would also point out that using Aspera to download is really the way to go, and using fastq-dump directly is probably much slower than using prefetch first (which will use Aspera if installed correctly) and then using fastq-dump to convert.