I have total 6 samples (3 controls and 3 treated) and have a total of 576149 genes to be compared between control vs treatment. However, I found a sub set of 320336 genes for which five samples have 0 count and randomly any one of the sample has count greater than zero. Keeping these 320336 genes in my data set for is not giving any adjusted p-value significant (even I increase the cut off threshold 0.1) after DESeq2 run; however, removing those genes helped to identify differentially expressed genes with adjusted p-value <0.05 and we are able to capture relevant biology. By the way, independentFiltering option was kept TRUE by default. Do you think that I can remove those genes (counts for all 5 samples=0 and randomly one sample>0) before I run DEseq2.
Thank you in advance.
Best wishes,
Tanay
Hi Michael,
I refer to our discussion for using trim mean filtering. My specific question is that running apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6) after dds <- estimateSizeFactors(dds) and then doing dds<-DESeq(dds) would automatically filter genes? or I am really missing some steps in between.
I already wrote the steps below if you want to see.
(I am analysing transposable elements expression difference. I took .fa.out file (mouse) from repeat masker web site. That is why this big list. however I have already removing some repeat elements which are not retroposons.)
my data:
> dim(D)
[1] 576149 6
> head(D)
co2a co3b co4a ms2a ms3r ms4a
CR1_Mam.1097.1593 0 0 0 0 0 5
CR1_Mam.1128.1285 0 0 3 0 0 0
CR1_Mam.1128.1445 0 9 0 0 0 0
CR1_Mam.1132.1259 0 0 0 0 0 3
CR1_Mam.1143.1293 0 0 1 0 0 0
CR1_Mam.1145.1278 0 0 0 0 3 0
> s<-factor(c("co2a", "co3b", "co4a", "ms2a", "ms3r", "ms4a"));
> condition<-factor(c("U", "U", "U", "T", "T", "T"));
> df<- data.frame(s,condition,row.names=1);
> dds<-DESeqDataSetFromMatrix(countData=D, colData=df, design= ~ condition, ignoreRank = FALSE);
> dds <- estimateSizeFactors(dds)
>apply(counts(dds, normalized=TRUE), 1, mean, trim=1/6)
> dds<-DESeq(dds);
> res<-results(dds,contrast=c("condition","T","U"));
Indeed I performed these steps and found that only approx. 1300 genes were detected as outliers while I had 320336 genes for which 5 samples were having zero counts and only one sample randomly having >0 counts. And I did not get any significant p-adj.
I am very much looking forward to your advice that I would highly appreciate. Many thanks in advance for your time.
Kind regards, tanay