8 months ago by
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
I don't know what analysis you are conducting or what sort of sequencing you are doing, but I would be horrified to see anyone doing what you propose to do. It would be far better to improve your analysis methods so that the analysis can handle unequal sequencing depths without skewing the results. Generally speaking, such analysis methods do exist.
Having said that, if you have a matrix of read counts, and want to reduce the library size for one or more of the samples, it is easy and quick to do that using the thinCounts() function of the edgeR package. That is equivalent to randomly selecting rows of the raw FastQ file but very, very much more efficient.
For example, if `counts' is a matrix of read counts, then
counts2 <- thinCounts(counts, target.size=min(colSums(counts)))
will create a new matrix for which all the columns have the same total count. The thining is done in such a way as to simulate random selection of reads.