Question

DESeq filtering specific to contrasts

0

Entering edit mode

Emma • 0

@621e0847

Last seen 9 weeks ago

United States

I'm struggling to find the "right" way to filter my RNAseq dataset. My whole dataset contains 12 treatment samples and 12 controls, but split across 4 different timepoints. So there are 4 contrasts to assess - the difference between treatment and control samples at each timepoint.

The issue that I'm running into is that for many features/genes the count levels are extremely varied between timepoints. Timepoint A may have an average of near 0 normalized counts for a particular feature, but Timepoint B could have an average of >50 normalized counts for the same feature. Because of this, features that I believe "should" be filtered out from the contrast at Timepoint A are slipping through because the overall baseMean values are skewed by Timepoint B.

Obviously, I could circumvent this issue by adjusting my prefiltering to require a certain number of raw counts in every sample (as opposed to my current filter of =>3), however then I would lose the ability to assess features like the example above at the timepoints where they are more highly expressed (Timepoint B).

Suggestions? My current thought is to subset the dataset and create a unique dds object for each timepoint that I could then filter independently. I'm just unsure whether that is the best way to go about this, or whether there are issues that I haven't thought of yet with using that approach.

DESeq2 • 391 views

ADD COMMENT • link updated 7 days ago by Michael Love 41k • written 10 weeks ago by Emma • 0

score 0 · Answer 1 · 2024-02-17

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 16 hours ago

United States

Subsetting the data sounds reasonable.

ADD COMMENT • link 9 weeks ago Michael Love 41k

0

Entering edit mode

But wouldn't this result in a different gene set for each analysis stratification? How can you then compare if maybe gene x comes up in Timepoint B but is filtered out in Timepoint A because of low counts. Maybe there is a biological reason for this not showing up in Timepoint A.

On the other hand, combining them all may introduce this noise specified by Emma. I don't know how one judges the best approach in such a situation

ADD REPLY • link 9 days ago Carlin95 • 0

0

Entering edit mode

If the counts are too low it in one group then you can't make the comparison either way (whether they are filtered out or just under-powered). The analysis choice doesn't seem to change that fact.

ADD REPLY • link 7 days ago Michael Love 41k

0

Entering edit mode

Fair enough, I guess it depends on the comparison being made, right? If one was interested in a feature comparing Timepoint A vs B, then it might be biologically valid that this feature is missing for whatever reason at A. But, on the other hand, if you have more complex contrasts (like different treatments within A and different treatments within B), it might be more prudent to split, but hopefully, that specific feature would still be collapsed during independent filtering for a specific contrast where the feature in both comparisons has low counts. At least, that's how I understand it. Please correct me if I am wrong.

ADD REPLY • link 7 days ago Carlin95 • 0

0

Entering edit mode

Right, I was suggesting subsetting as a valid choice for "treatment and control samples at each timepoint"

ADD REPLY • link 7 days ago Michael Love 41k

0

Entering edit mode

Thanks for clarifying. Could I ask maybe one more thing for my sanity? Let's say you had a multi-factor experiment with contrasts including e.g. (treatedTimepointA -treatedTimepointB) as the first and (treatedTimepointA - controlTimepointA) as another:

If one feature (gene x) was very lowly expressed in Timepoint A, but high in B, then the filtering might not kick it out. In this scenario, A vs B is a valid comparison (biologically) as I mentioned above.....but for treat A vs control A, is it likely that this gene will just not show up due to independent filtering/lfc shrinkage, correct? In such a scenario, does it then even matter to split the dataframe as the gene might be filtered down in one contrast but not another? Or does Deseq2 handle this differently than how I am imagining this play out?

Apologies if this is obvious/repetitive, but I am curious how such a thing works

fyi, I have no idea of the Original Poster's study design, I just got curious from this post as to this particular scenario.

ADD REPLY • link 7 days ago Carlin95 • 0

0

Entering edit mode

It will likely not filter it out, but really the per-dataset details would affect the results.

ADD REPLY • link 7 days ago Michael Love 41k