Hi All,
I calculated size factors for RNAseq data using the estimatesizefactor(). They were:
1a 1c 2b 2c 1b 2c 3c
1.7260367 1.8360566 0.7666819 0.7158603 1.3751402 0.7572188 0.5521436
As you see, there is a lot of variation between the samples implying outliers. What should I do in such a case, should I proceed with the DE analysis, or is there a way I can make the data look better.
This just means the samples are not sequenced to the same depth -- this is a normal part of DESeq2 analysis. Ideally, the variation in sequencing depth should not be correlated with the covariate of interest, that's referred to as technical confounding (see another recent post). Ideally means you should try to avoid this when you are sequencing the samples.
So, if I inferred correctly, I can proceed with the data. However, for future sequencing experiments, I should try to avoid confounding factors.
Just a follow-up question: Would doing sva help?
The people loading the samples on the instrument should already be aiming to get equal numbers of reads from each sample. But sometimes the nature of the samples is such that they can't do that great a job (if, say, one sample undergoes a treatment that is really toxic, it might really not have lots of RNA). So there might be uncontrollable confounding with library size.
You might be able to ask for reruns of the samples that have fewer reads, but a three-fold difference is okay, you can proceed with that. If a sample had a 10th of the average number of reads, that would be a problem.
So, if I inferred correctly, I can proceed with the data. However, for future sequencing experiments, I should try to avoid confounding factors. Just a follow-up question: Would doing sva help?
I didn’t say there was confounding, but if there was confounding.