Question: DESeq2_multiple correction_no significance
gravatar for tg369
20 months ago by
United Kingdom
tg36910 wrote:

I have total 6 samples (3 controls and 3 treated) and have a total of 576149 genes to be compared between control vs treatment. However, I found a sub set of 320336 genes for which five samples have 0 count and randomly any one of the sample has count greater than zero. Keeping these 320336 genes in my data set for is not giving any adjusted p-value significant (even I increase the cut off threshold 0.1) after DESeq2 run; however, removing those genes helped to identify differentially expressed genes with adjusted p-value <0.05 and we are able to capture relevant biology. By the way, independentFiltering option was kept TRUE by default. Do you think that I can remove those genes (counts for all 5 samples=0 and randomly one sample>0) before I run DEseq2. 

Thank you in advance.

Best wishes,



ADD COMMENTlink modified 20 months ago • written 20 months ago by tg36910
gravatar for Michael Love
20 months ago by
Michael Love13k
United States
Michael Love13k wrote:

As Wolfgang mentioned previously, that's a lot of "genes"... I'm curious what you're looking at.

As you know DESeq2 automatically filters on the mean of normalized counts: rowMeans(counts(dds, normalized=TRUE)), which you can run after:

dds <- estimateSizeFactors(dds)

to avoid first running the whole pipeline, DESeq().

So I would generally recommend to pre-filter using this quantity we call "baseMean" before running DESeq().

Filtering on unnormalized counts can be problematic, if there are extreme differences between library sizes and if the libraries are confounded with condition. IIRC there have been a number of posts on this topic on the support forum in the past 1-2 months, which you might try to find and read over for more detail.

But if you have some extreme counts, e.g. 0,0,0 vs 0,0,10000, then I could see this creating a problem for filtering. You could try the trimmed mean of normalized counts: apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6)

ADD COMMENTlink written 20 months ago by Michael Love13k

Hi Michael,

I refer to our discussion for using trim mean filtering. My specific question is that running apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6) after dds <- estimateSizeFactors(dds) and then doing dds<-DESeq(dds) would automatically filter genes?  or I am really missing some steps in between.

I already wrote the steps below if you want to see.

(I am analysing transposable elements expression difference. I took .fa.out file (mouse) from repeat masker web site. That is why this big list. however I have already removing some repeat elements which are not retroposons.)

my data:

> dim(D)
[1] 576149      6
> head(D)
                   co2a co3b co4a ms2a ms3r ms4a
CR1_Mam.1097.1593    0    0    0    0    0    5
CR1_Mam.1128.1285    0    0    3    0    0    0
CR1_Mam.1128.1445    0    9    0    0    0    0
CR1_Mam.1132.1259    0    0    0    0    0    3
CR1_Mam.1143.1293    0    0    1    0    0    0
CR1_Mam.1145.1278    0    0    0    0    3    0

> s<-factor(c("co2a", "co3b", "co4a", "ms2a", "ms3r", "ms4a"));

> condition<-factor(c("U", "U", "U", "T", "T", "T"));
> df<- data.frame(s,condition,row.names=1);
> dds<-DESeqDataSetFromMatrix(countData=D, colData=df, design= ~ condition, ignoreRank = FALSE);

> dds <- estimateSizeFactors(dds)

>apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6)

> dds<-DESeq(dds);

> res<-results(dds,contrast=c("condition","T","U"));

Indeed I performed these steps and found that only approx. 1300 genes were detected as outliers while I had 320336 genes for which 5 samples were having zero counts and only one sample randomly having >0 counts.  And I did not get any significant p-adj.

I am very much looking forward to your advice that I would highly appreciate. Many thanks in advance for your time.

Kind regards, tanay




ADD REPLYlink modified 20 months ago • written 20 months ago by tg36910
gravatar for tg369
20 months ago by
United Kingdom
tg36910 wrote:

Dear Michael,

thank you very much. I will have to do filtering using trim mean.  People want justification from me why I am removing those genes (ooo vs 0047 ...). That may be important for writing method section of my paper. Please can you suggest a sentence to argue.

In my earlier run DEseq2: Interesting thing was that, if I run whole 576k genes, then I am getting only 104 genes for which I do not have p-adj.  But if I run DESe2 after  removing those genes (320336), then lot of genes have p adj =NA. why so?


thanks again, Tanay





ADD COMMENTlink written 20 months ago by tg36910

See the DESeq2 paper for justification for mean filtering.

I can't say exactly why things change, remember independent filtering is an optimization, and you now have used two ways of ranking -- pre-filtering by trimmed mean and the regular mean used by results(). I'd recommend exploring yourself. (or if you're just wondering about the NA, this is explained in the vignette FAQ).

ADD REPLYlink written 20 months ago by Michael Love13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 136 users visited in the last hour