This is generic question, not really dependent on code. I tag DESeq2 because that is what I am using and it would be ideal to reach a solution using it, but the main question is about independent filtering. I have a dataset with two treatments and 4 biological replicates each. This is an amplicon sequencing experiment where all amplicons are the same size, and the total number of amplicons is ~200K. Similarly to a CRISPR, within each biological replicate I have internal replicates, which are amplicons with different DNA sequence that I would expect to give the same phenotype. So there are 3 of these internal replicates per amplicon, and therefore 3 internal reps x 66K amplicon groups = 200K amplicons in total. The problem I am facing is a very strong dispersion between the biological replicates of the treated samples (the control samples have a good correlation, similar to other datasets I am working with). I have looked at the internal replicates, and seen that there is a lot of variation between them as well. So I am assuming that the experimental system is very noisy. A comparison of data between independent experiments suggests that I have many false positives.
I have been thinking about the independent filtering approach described in DESeq2 and the corresponding filtering paper, so I would like your opinion about this:
- would it be OK to discard the amplicons with high variation between internal replicates, and repeat the analysis with the ones with low variation? I understand that doing this would be independent from the null hypothesis (OK according to the paper), but not correlated to the alternative (not OK according to the paper).
- is there a way to include this internal replication in the DESeq2 model? If not, could you give me a pointer to literature to approach this? The difference between my experiment and a CRISPR one is that in CRISPR the guides target different parts of a gene, so they are different. In my case, the DNA sequence of the internal replicates is different, but they all encode for the same peptide, so other than differences between codons I would expect low variability in the phenotype for the internal reps within each biological rep.
Thanks a lot, Ruben