Working with a workflow that uses Fastp -> Salmon -> Deseq2
Is it generally considered good practice to control for Fastp's read duplication rate and/or Salmon's percent mapped (from meta_info.json) when doing a Deseq DE analysis? I have noticed a fair amount of variability across a set of samples in the same prep batch and sequencing run (duplication rate, 22-52%; percent mapped, 82-92%). Duplication rate in particular seems relevant, as I didn't figure it would be that variable, and previous workflows I had done with STAR had me remove duplicate reads altogether.
I mean, it would be easy enough to do
~ read_dups + trt or
~ read_dups + map_rate + trt, but are there arguments to not do this (i.e., overfitting or removing true variation)?