Question: [DESeq2] Automatic outlier detection and replacement with continuous variables
0
gravatar for cajawe
2.7 years ago by
cajawe0
cajawe0 wrote:

Hello,

I am running DESeq2_1.10.1 on a data set with a continuous predictor term (either with or without a categorical blocking term).  

My understanding is that in such cases, outlier detection and replacement is not automatically applied, and instead it's necessary to conduct a manual inspection of Cook's distances.  I base this on section 3.6 of the Nov 30 2016 version of the DESeq2 vignette www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf)

However, when I run the DESeq function, I see the following message: 

  • fitting model and testing
  • -- replacing outliers and refitting for 361 genes
  • -- DESeq argument 'minReplicatesForReplace' = 7 
  • -- original counts are preserved in counts(dds)

 

My question, then, is what actually is happening here?  Am I looking at a copy of the vignette that is out of date?  Is my analysis carrying out the outlier replacement procedure even though it's not optimal for continuous predictors?  Or am I misinterpreting the message entirely?

Thank you in advance!  I can provide more details about my DESeqDataSet object if helpful.
Cameron

 

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by cajawe0

The covariate is a disease phenotype: the proportion of afflicted individuals per inbred strain, with each RNAseq sample corresponding to a strain.  Many samples are zero—in light of your comment, I suppose that this may be the source of the problem.  I've pasted the covariate below (after a log(x+c) transformation).  

I've also carried out the analysis with the data downgraded to binary (zero vs nonzero).  Should I stick with the downgraded data and avoid using the continuous data given its unusual distribution?

Thanks!

-4.094345, -1.78271, -0.2889348, -4.094345, -4.094345, -0.7524469, -1.99021, 
-4.094345, -4.094345, -4.094345, -1.954278, -0.9287555, -4.094345, -4.094345, 
-4.094345, -1.487357, -4.094345, -0.2750636, -4.094345, -4.094345, -3.401197, 
-0.2830146, -4.094345, -4.094345, -4.094345, -2.669336, -4.094345, -4.094345, 
-4.094345, -4.094345, -1.287623, -4.094345, -4.094345, -4.094345, -4.094345, 
-2.744418, -2.148434, -2.148434, -4.094345, -3.220816, -4.094345, -4.094345, 
-4.094345, -2.870569
ADD REPLYlink modified 2.7 years ago by Michael Love26k • written 2.7 years ago by cajawe0
1

There's not necessarily a problem then. The outlier replacement procedure can run on this dataset, because there is repetition in the continuous values.

You may choose to turn it off if you feel it's not helpful, by setting minReplicatesForReplace=Inf. 

I wouldn't make modeling choices (continuous vs binary) based on this outlier procedure. It usually is just picking up on a number of genes with all 0's but then one or two samples have technical artifacts.

ADD REPLYlink written 2.7 years ago by Michael Love26k

Okay, great—thanks kindly for your help!

ADD REPLYlink written 2.7 years ago by cajawe0
Answer: [DESeq2] Automatic outlier detection and replacement with continuous variables
0
gravatar for Michael Love
2.7 years ago by
Michael Love26k
United States
Michael Love26k wrote:

Can you show what your continuous covariate looks like? While it's not described in that section, DESeq2 actually looks to see if it can still do outlier replacement if the continuous covariate has replication similar to a categorical covariate.

ADD COMMENTlink written 2.7 years ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 272 users visited in the last hour