DESeq2_multiple correction_no significance
2
0
Entering edit mode
tg369 ▴ 10
@tg369-9320
Last seen 4.1 years ago
United Kingdom

I have total 6 samples (3 controls and 3 treated) and have a total of 576149 genes to be compared between control vs treatment. However, I found a sub set of 320336 genes for which five samples have 0 count and randomly any one of the sample has count greater than zero. Keeping these 320336 genes in my data set for is not giving any adjusted p-value significant (even I increase the cut off threshold 0.1) after DESeq2 run; however, removing those genes helped to identify differentially expressed genes with adjusted p-value <0.05 and we are able to capture relevant biology. By the way, independentFiltering option was kept TRUE by default. Do you think that I can remove those genes (counts for all 5 samples=0 and randomly one sample>0) before I run DEseq2. 

Thank you in advance.

Best wishes,

Tanay

 

deseq2 • 1.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

As Wolfgang mentioned previously, that's a lot of "genes"... I'm curious what you're looking at.

As you know DESeq2 automatically filters on the mean of normalized counts: rowMeans(counts(dds, normalized=TRUE)), which you can run after:

dds <- estimateSizeFactors(dds)

to avoid first running the whole pipeline, DESeq().

So I would generally recommend to pre-filter using this quantity we call "baseMean" before running DESeq().

Filtering on unnormalized counts can be problematic, if there are extreme differences between library sizes and if the libraries are confounded with condition. IIRC there have been a number of posts on this topic on the support forum in the past 1-2 months, which you might try to find and read over for more detail.

But if you have some extreme counts, e.g. 0,0,0 vs 0,0,10000, then I could see this creating a problem for filtering. You could try the trimmed mean of normalized counts: apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6)

ADD COMMENT
0
Entering edit mode

Hi Michael,

I refer to our discussion for using trim mean filtering. My specific question is that running apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6) after dds <- estimateSizeFactors(dds) and then doing dds<-DESeq(dds) would automatically filter genes?  or I am really missing some steps in between.

I already wrote the steps below if you want to see.

(I am analysing transposable elements expression difference. I took .fa.out file (mouse) from repeat masker web site. That is why this big list. however I have already removing some repeat elements which are not retroposons.)

my data:

> dim(D)
[1] 576149      6
> head(D)
                   co2a co3b co4a ms2a ms3r ms4a
CR1_Mam.1097.1593    0    0    0    0    0    5
CR1_Mam.1128.1285    0    0    3    0    0    0
CR1_Mam.1128.1445    0    9    0    0    0    0
CR1_Mam.1132.1259    0    0    0    0    0    3
CR1_Mam.1143.1293    0    0    1    0    0    0
CR1_Mam.1145.1278    0    0    0    0    3    0

> s<-factor(c("co2a", "co3b", "co4a", "ms2a", "ms3r", "ms4a"));

> condition<-factor(c("U", "U", "U", "T", "T", "T"));
> df<- data.frame(s,condition,row.names=1);
> dds<-DESeqDataSetFromMatrix(countData=D, colData=df, design= ~ condition, ignoreRank = FALSE);

> dds <- estimateSizeFactors(dds)

>apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6)

> dds<-DESeq(dds);

> res<-results(dds,contrast=c("condition","T","U"));

Indeed I performed these steps and found that only approx. 1300 genes were detected as outliers while I had 320336 genes for which 5 samples were having zero counts and only one sample randomly having >0 counts.  And I did not get any significant p-adj.

I am very much looking forward to your advice that I would highly appreciate. Many thanks in advance for your time.

Kind regards, tanay

 

 

 

ADD REPLY
0
Entering edit mode
tg369 ▴ 10
@tg369-9320
Last seen 4.1 years ago
United Kingdom

Dear Michael,

thank you very much. I will have to do filtering using trim mean.  People want justification from me why I am removing those genes (ooo vs 0047 ...). That may be important for writing method section of my paper. Please can you suggest a sentence to argue.

In my earlier run DEseq2: Interesting thing was that, if I run whole 576k genes, then I am getting only 104 genes for which I do not have p-adj.  But if I run DESe2 after  removing those genes (320336), then lot of genes have p adj =NA. why so?

 

thanks again, Tanay

 

 

 

 

ADD COMMENT
0
Entering edit mode

See the DESeq2 paper for justification for mean filtering.

I can't say exactly why things change, remember independent filtering is an optimization, and you now have used two ways of ranking -- pre-filtering by trimmed mean and the regular mean used by results(). I'd recommend exploring yourself. (or if you're just wondering about the NA, this is explained in the vignette FAQ).

ADD REPLY

Login before adding your answer.

Traffic: 760 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6