I have RNAseq libraries of 28 samples, and I got 2-4 fastq.gz files from each sample. I am just trying to figure out how to merge them. I have already done what I feels is a due-diligence search of the bioconductor tutorials and google, and so far it appears that there is no way in r + windows to easily merge multiple fastq.gz files from the same sample into one fastq file containing all of the reads for that sample. I don't think there should be any quality differences between the reads in these files because these are not technical replicates, the sequencing center just seems to format their data output to start a new fastq file whenever the previous one reaches 603.5 MB. But I need to merge the ones pertaining to a single sample because all of the tools for downstream work all seem to assume that 1 fastq.gz file = 1 sample. Is this true?
I suppose I could map each file individually to my reference genome, and then add the mapped counts from the fasta.gz files from the same sample together. Is this what I am to infer I should do? Or should I spend a day installing a linux virtual box simply so I can use its cat command?
Thank you! This worked great. I looked at the ShortRead package, but I only saw that it was used for sampling. Much appreciated!
I would like to apply this function (it works for me!). However, I have ~20 folders inside a directory to apply this function. Anybody can indicate me how to do apply this function to all my folders? Thanks!!
Do you mean 20 directories, and in each directory you want to concatenate all fastq files? Write a function for one directory
Make sure that it works on one directory
concat_directory(<PATH TO ONE DIRECTORY>)
where you'll have to substitute the desired path in your system for<PATH TO ONE DIRECTORY>
.Then get a vector of all directories. Are they nested like main_dir/dir1, main_dir/dir2? Maybe
and apply
concat_directory
to each pathThank you so much for your help! The function worked for one directory when I tested it, (closing the parenthesis for the "fout").
When I tried for one directory it worked as expected. However, when I apply the lapply function I have this error message:
Error: Input/Output no input files found dirPath: Sample1_S18_L001_R1_001.fastq.gz pattern: character(0)
Any idea how to solve it?
Maybe you could provide a sketch of what your directory structure looks like. I'm thinking it is something like
Yes, my directory structure is exactly as you show
I think that I made a small mistake in the function, and should have added a 'full.names = TRUE' to the
dir()
functionGreat!! Now it worked! Thank you very much!