When I try to run the Rsubread commands subjunc or align in Studio, I get the error "Input files have different amount of reads.
subjunc (index="~/Documents/HarvardLincs/myindedx", readfile1="~/Documents/HarvardLincs/SRR120607501.fastq.gz", readfile2="~/Documents/HarvardLincs/SRR120607502.fastq.gz", outputfile="BT549.bam",nthreads = 8)
ERROR: two input files have different amounts of reads! The program has to terminate and no alignment results were generated!
Error in .load.delete.summary(output_file[i]) : Summary file BT549.bam.summary was not generated! The program terminated wrongly!
sessionInfo() R version 4.0.1 (2020-06-06) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.5
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] Rsubread_2.2.2
loaded via a namespace (and not attached):
[1] compiler4.0.1 Matrix1.2-18 tools4.0.1 tinytex0.24 grid4.0.1
[6] xfun0.15 lattice_0.20-41
I get the same error message for several fastq files that I downloaded from GEO. Is it true that all these files have a data problem, or is this some other issue?
Thanks AB
Looking at the file names it appears that you might have used fastq-dump to get these data? Or did you download the original format files from Google or AWS and rename? If the former, did you ensure you did things correctly? Like, did you do
and ensure that you get the same number of reads?
Yes, I used SRA toolkit to download the data on cluster. Then I downloaded it to my laptop to process it on RStudio.
I just ran the commands that you provided and this is what I get
[abano@sabine Harvard]$ zcat SRR120607501.fastq.gz | wc -l 101032420 [abano@sabine Harvard]$ zcat SRR120607502.fastq.gz | wc -l 101029168
So the 2 source files have different number of reads. Is there a way to fix this? what are my options?
Yes, I used SRA toolkit to download the data on cluster. Then I downloaded it to my laptop to process it on RStudio.
I just ran the commands that you provided and this is what I get
[abano@sabine Harvard]$ zcat SRR120607501.fastq.gz | wc -l 101032420 [abano@sabine Harvard]$ zcat SRR120607502.fastq.gz | wc -l 101029168
So the 2 source files have different number of reads. Is there a way to fix this? what are my options?
Yes, I used SRA toolkit to download the data on cluster. Then I downloaded it to my laptop to process it on RStudio.
I just ran the commands that you provided and this is what I get
[abano@sabine Harvard]$ zcat SRR120607501.fastq.gz | wc -l 101032420 [abano@sabine Harvard]$ zcat SRR120607502.fastq.gz | wc -l 101029168
So the 2 source files have different number of reads. Is there a way to fix this? what are my options?