DESeq filtering specific to contrasts
1
0
Entering edit mode
Emma • 0
@621e0847
Last seen 3 months ago
United States

I'm struggling to find the "right" way to filter my RNAseq dataset. My whole dataset contains 12 treatment samples and 12 controls, but split across 4 different timepoints. So there are 4 contrasts to assess - the difference between treatment and control samples at each timepoint.

The issue that I'm running into is that for many features/genes the count levels are extremely varied between timepoints. Timepoint A may have an average of near 0 normalized counts for a particular feature, but Timepoint B could have an average of >50 normalized counts for the same feature. Because of this, features that I believe "should" be filtered out from the contrast at Timepoint A are slipping through because the overall baseMean values are skewed by Timepoint B.

Obviously, I could circumvent this issue by adjusting my prefiltering to require a certain number of raw counts in every sample (as opposed to my current filter of =>3), however then I would lose the ability to assess features like the example above at the timepoints where they are more highly expressed (Timepoint B).

Suggestions? My current thought is to subset the dataset and create a unique dds object for each timepoint that I could then filter independently. I'm just unsure whether that is the best way to go about this, or whether there are issues that I haven't thought of yet with using that approach.

DESeq2 • 466 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 hours ago
United States

Subsetting the data sounds reasonable.

ADD COMMENT
0
Entering edit mode

But wouldn't this result in a different gene set for each analysis stratification? How can you then compare if maybe gene x comes up in Timepoint B but is filtered out in Timepoint A because of low counts. Maybe there is a biological reason for this not showing up in Timepoint A.

On the other hand, combining them all may introduce this noise specified by Emma. I don't know how one judges the best approach in such a situation

ADD REPLY
0
Entering edit mode

If the counts are too low it in one group then you can't make the comparison either way (whether they are filtered out or just under-powered). The analysis choice doesn't seem to change that fact.

ADD REPLY
0
Entering edit mode

Fair enough, I guess it depends on the comparison being made, right? If one was interested in a feature comparing Timepoint A vs B, then it might be biologically valid that this feature is missing for whatever reason at A. But, on the other hand, if you have more complex contrasts (like different treatments within A and different treatments within B), it might be more prudent to split, but hopefully, that specific feature would still be collapsed during independent filtering for a specific contrast where the feature in both comparisons has low counts. At least, that's how I understand it. Please correct me if I am wrong.

ADD REPLY
0
Entering edit mode

Right, I was suggesting subsetting as a valid choice for "treatment and control samples at each timepoint"

ADD REPLY
0
Entering edit mode

Thanks for clarifying. Could I ask maybe one more thing for my sanity? Let's say you had a multi-factor experiment with contrasts including e.g. (treatedTimepointA -treatedTimepointB) as the first and (treatedTimepointA - controlTimepointA) as another:

If one feature (gene x) was very lowly expressed in Timepoint A, but high in B, then the filtering might not kick it out. In this scenario, A vs B is a valid comparison (biologically) as I mentioned above.....but for treat A vs control A, is it likely that this gene will just not show up due to independent filtering/lfc shrinkage, correct? In such a scenario, does it then even matter to split the dataframe as the gene might be filtered down in one contrast but not another? Or does Deseq2 handle this differently than how I am imagining this play out?

Apologies if this is obvious/repetitive, but I am curious how such a thing works

fyi, I have no idea of the Original Poster's study design, I just got curious from this post as to this particular scenario.

ADD REPLY
0
Entering edit mode

It will likely not filter it out, but really the per-dataset details would affect the results.

ADD REPLY

Login before adding your answer.

Traffic: 379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6