Question

Use of DESeq given unusual experimental data

0

Entering edit mode

GLFrey • 0

@glfreynolds-21471

Last seen 13 months ago

United Kingdom

Hello,

I've been using DESeq for some smallRNA-seq I'm working on, but we have some unusual data given what we're sampling the small RNA-seq data from (sorry to be cryptic, I'm bound by confidentially here), so my data has characteristics that I'm suspicious may make it unsuitable for analysis with DESeq (and other standard differential-expression packages). However, I'm no statistical expert and my endeavours to understand the statistical processes under the hood of DESeq2 and how that might interact with my data is giving me more questions so I thought to see what anyone here thinks.

Essentially, I have two major concerns with my data:

Most expressed genes are what we would probably consider as "lowly expressed"
There's a fair bit of variation in gene expression values between the samples within the same experimental groups

These characteristics are somewhat expected given the experiments that we're running (it's not single-cell RNA-seq), so there's no concern there's something wrong with the data or experimental set up itself.

However, I'm under the impression that the DESEq2 process removes lowly expressed genes, that it assumes samples (or replicates) within the same group should not show massive variation and that it also assumes most genes aren't differentially expressed.

As my data violates 2/3 of these and perhaps the final one too (I don't know), I'm guessing this is a problem. However, I'd be very grateful if someone with more statistical expertise than myself would be kind enough to share some insight.

Best wishes,

Gill

DESeq2 • 448 views

ADD COMMENT • link updated 13 months ago by Michael Love 41k • written 13 months ago by GLFrey • 0

score 0 · Answer 1 · 2023-03-14

Sorry for the delay.

Independent filtering doesn't remove lowly expressed genes unless they have low power. You can use filterFun="ihw" for even better performance, in results()

High variance within group is fine. What violates the model is bimodality. Do you believe you have bimodal data within groups?

It doesn't assume that most genes are not DE, but that some genes have LFC near 0. If you can provide these (controlGenes) even better.