I would like to ask about using the voom-limma workflow on RNAseq data. Usually, when I run the workflow, the samples contain similar number of reads (translating into a similar number of counts between the samples); however, I have received data for 60 samples, each having 4-5.5M reads, with 2 samples having approximately 26M reads each.
My question is whether the voom-limma workflow will be able to "deal" with such a situation of such different amounts of reads or would this skew the results? If the latter is correct, what would you suggest to do to allow using these 2 samples?
Thank you very much!
Following is a general code I'm using for preparing the data for the differential limma analysis (originally taken from the limma guide). I am not providing a code specific to this situation with 60 samples, since I see my question as a more general one.
Thank you! Any advice will be greatly appreciated!
library(limma) library(edgeR) dge <- DGEList(counts=dataset) # dataset - a matrix of genes x samples, containing counts. design<-model.matrix(~0+factor(c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)))); # an example design keep <- filterByExpr(dge, design) dge <- dge[keep,,keep.lib.sizes=FALSE] dge <- calcNormFactors(dge) va<-voom(dge,design,plot=TRUE); sessionInfo( ) R version 3.5.2 (2018-12-20) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763) Matrix products: default locale:  LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252  LC_MONETARY=English_Canada.1252 LC_NUMERIC=C  LC_TIME=English_Canada.1252 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  edgeR_3.24.3 limma_3.38.3 loaded via a namespace (and not attached):  compiler_3.5.2 Rcpp_1.0.1 grid_3.5.2 locfit_1.5-9.1  lattice_0.20-38