I am having trouble figuring out why 2 sets of paired samples are affecting my data so severely. I have ~100 samples, each paired (so ~200 sets of count files). I aligned my fastq files with STAR, counted genes with featureCounts, and am using DESeq2 for differential expression analysis. My design is ~sex + sex:nested + sex:condition, where nested is the pairing factor.
At first pass, I get 716 significant genes (padj < 0.05), yet many genes have count plots like this: http://i.imgur.com/2Yrd410.png
The seemingly outlier sample is present in many of the genes, so I removed the pre and post counts for that sample.
I then got 18 sig genes, most of which look like this: http://i.imgur.com/wGE53DH.png
The seemingly outlier sample is present in all of the 18 genes, so I removed the pre and post counts for that sample. Now I get no significant genes. It seems strange that 2 samples would have such a large effect on an otherwise large data set. I would appreciate any ideas and guidance as to how best proceed! Thank you!