I'm running analysis of a small number of samples using RNA-seq and ATAC-seq, which produce reads mapping to genes and open chromatin, respectively. I use DESeq2 to estimate scaling factors for each sample, so that I can compare e.g. the number of reads for region r in sample A to region r in sample B. However, several samples failed ATAC-seq QC and were excluded from downstream analysis. Now I have 2 questions:
1. Is it valid to include the failed QC samples when estimating size factors for the ATAC-seq? Reading note S1 of doi:10.1101/gr.133744.111, this would probably have a modest effect on the counts for all bins in the virtual reference, yielding a slightly better estimate of each size factor (because more samples) but a slight scaling of all size factors (because the virtual reference is shifted). Should adding these samples give me a better estimate of size factors?
2. Is it valid to include the failed ATAC-QC samples when estimating size factors for the RNA-seq? The samples all have fine RNAseq descriptive statistics, so I wouldn't see a problem with using them to estimate the virtual reference, even if I don't use some in downstream analysis. But I'd still want to make sure that it's a good idea, given that I'm not using the failed QC samples in downstream analysis.