I am hoping for some help in understanding which read normalization approach is appropriate for the biologically complex samples that I'm hoping to do DE analysis on.
I'm interested in DE of a bacterial endosymbiont of an insect; specifically, I'm trying to establish which genes are differentially expressed in the bacterium when the host insect is infected with a eukaryotic parasite. So, in one condition (4 replicates) I have host + symbiont (plus other gut bacteria, etc.) and in the other condition (also 4 replicates), I have host + symbiont + parasite.
I've taken the approach of mapping reads to a preliminary genome sequence of the symbiont, summarizing the reads with featureCounts, and doing DE with DEseq2. I've followed the basic protocol that I've seen outline in various vignettes. I've found a handful of DE genes (~10 upregulated, ~50 downregulated), but I'm concerned that I may not be normalizing my data in the most intelligent way.
The read depth for each replicate is fairly even, ranging from ~42-57 million 2 x 125 bp reads for each. And the number of mapping endosymbiont reads is fairly even, too, ranging from 0.6-0.9% of the reads (unfortunately low, but I have to live with this). I haven't quantified the proportion of reads that derive from the parasite, but it's not huge (maybe 10%) - most of the reads seem to come from the insect host and its gut flora.
As I understand, DEseq2 is probably normalizing to the number of mapped endosymbiont reads/fragments for the replicates in the approach I've taken (?). But the sample with the parasite in it is going to affect the normalization in complicated ways. Is there some better method that might be applicable in this case?