Question: skewed differentially expressed gene results - DESeq2
0
gravatar for CE
16 months ago by
CE10
United States
CE10 wrote:

Hi, 

I'm not sure if this is a question of outliers, since it is happening with more than one sample in a group, but I am seeing genes coming back as being differentially expressed when they are only obviously different in 3-4 samples out of 16 total samples in a group compared to 18 samples in a control group. I am using DESeq2 with default settings. Is there a way to change the settings in DESeq to prioritize genes that show similar expression within a group? I want to find differentially expressed genes that are different for most samples within a group instead of being different for about 1/4 of the samples within a group.

Thanks!

 

 

ADD COMMENTlink modified 16 months ago by Ryan C. Thompson7.4k • written 16 months ago by CE10
Answer: skewed differentially expressed gene results - DESeq2
0
gravatar for Michael Love
16 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Can you include a plotCounts() plot for one of these genes that you are less interested in?

Also, I'd suggest trying lfcShrink() to provide better LFCs for ranking. You can combine subsetting to genes with a small adjusted p-value, and then ranking by the absolute value of the shrunken LFC.

ADD COMMENTlink written 16 months ago by Michael Love26k

 

Thanks for the quick reply!

Here is an example...

 

3 samples from the 'yes' group are clearly skewing the results. This gene has a padj of 0.001, a LFC of 1.22 before shrinking and 1.17 LFC after shrinking. 

Most of our DE genes do not have very large fold changes and we are dealing with very noisy human data.

 

ADD REPLYlink written 16 months ago by CE10

I don't think these results are obviously being skewed by the 3 highest samples in the "yes" group. Even if you ignore these, the "yes" group still has a higher average normalized count than the "no" group. It might not be as significant without the 3 highest samples, but I wouldn't say that this gene is an unambiguous false positive.

ADD REPLYlink written 16 months ago by Ryan C. Thompson7.4k
Answer: skewed differentially expressed gene results - DESeq2
0
gravatar for Ryan C. Thompson
16 months ago by
Scripps Research, La Jolla, CA
Ryan C. Thompson7.4k wrote:

Is it possible that the same 3 or 4 samples are outliers in many genes? If so, it might help if you redo your analysis with limma using voomWithQualityWeights. This will hopefully identify and down-weight the samples that are consistently outliers across many genes. You can also inspect the weights to determine which samples the method believes to be outliers - these will be the samples with the lowest weights.

ADD COMMENTlink written 16 months ago by Ryan C. Thompson7.4k
1

To tack on to Ryan's answer: you can check for outliers in a PCA plot, see the vignette. 

Of course, if 3 samples always have higher counts, then it would be picked up and corrected by size factors. And if they are only "outliers" on some genes, I'm not sure I'd want to downweight them. How to approach this definitely depends on the analyst, but looking at the above plot, I would say it's a good example of DE, and the LFC seems reasonable, and I wouldn't downweight the top 3 samples in "yes".

ADD REPLYlink written 16 months ago by Michael Love26k

You are exactly right, when I plot a heatmap of the top differentially expressed genes, most of them appear to be significantly different in these same samples. They show a very similar pattern to this gene when I plotCounts(). I have almost 600 genes with padj < 0.05 which is a lot to sift through to see which genes are mostly showing up because of these same samples.

Maybe it would make sense to filter for low within-group variance to narrow down my genes of interest?

Thanks for the advice to try limma voomWithQualityWeights. I'll give it a try and see how things look.

 

ADD REPLYlink written 16 months ago by CE10

There is no point in filtering for low within-group variance. DESeq2 is already doing this when it assesses the the significance of each gene and computes a p-value and adjusted p-value. If these outlier samples are to blame for what you believe to be false positive genes, then the problem is not within-group variance. If the effect appears systematic across many genes, another option (which is compatible with DESeq2) is to use surrogate variable analysis (sva) to estimate the systematic effects and include them in the design.

Of course, the "nuclear option" is to discard the samples entirely, but I don't think that is likely to be justified in this case. And even then you need to be wary of the potential for bias if you are discarding samples until you get the result you want to see.

ADD REPLYlink written 16 months ago by Ryan C. Thompson7.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour