Hello,
I've been using DESeq for some smallRNA-seq I'm working on, but we have some unusual data given what we're sampling the small RNA-seq data from (sorry to be cryptic, I'm bound by confidentially here), so my data has characteristics that I'm suspicious may make it unsuitable for analysis with DESeq (and other standard differential-expression packages). However, I'm no statistical expert and my endeavours to understand the statistical processes under the hood of DESeq2 and how that might interact with my data is giving me more questions so I thought to see what anyone here thinks.
Essentially, I have two major concerns with my data:
- Most expressed genes are what we would probably consider as "lowly expressed"
- There's a fair bit of variation in gene expression values between the samples within the same experimental groups
These characteristics are somewhat expected given the experiments that we're running (it's not single-cell RNA-seq), so there's no concern there's something wrong with the data or experimental set up itself.
However, I'm under the impression that the DESEq2 process removes lowly expressed genes, that it assumes samples (or replicates) within the same group should not show massive variation and that it also assumes most genes aren't differentially expressed.
As my data violates 2/3 of these and perhaps the final one too (I don't know), I'm guessing this is a problem. However, I'd be very grateful if someone with more statistical expertise than myself would be kind enough to share some insight.
Best wishes,
Gill