deseq2 - many differentially expressed genes
1
0
Entering edit mode
@prasad-siddavatam-4508
Last seen 9.3 years ago
United States

Hi Michael, I am following up on our previous discussion, I ran DESeq2 with minReplicatesForReplace=Inf and cooksCutoff=FALSE and it actually increased the number of DE genes. 

Here is the sample code

dds <- DESeqDataSetFromMatrix(countData = countsMatrix, colData = colData,
                              design = ~ type);
dds <- DESeq(dds);
Gres <- results(dds, contrast=c("type","ABCD_DIF","ABCD_UND"), cooksCutoff = FALSE);

The big difference between deseq and deseq2 is there thousands of DE genes, even with 0.01 FDR. Is there any other criteria to filter the number of DEGs. The MA plots look fine (most of the genes are on x-axis and DE genes are colored RED)

deseq2 • 2.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

hi Prasad,

This is not surprising that the number of DE genes increased, as you have turned off outlier filtering. 

That you have many genes with small false discovery rate means that the fold changes are large between conditions, in particular large with respect to the within-group dispersion, and that your experiment was sufficiently powered to discover many differences. 

I would follow the suggestion in my previous response:  "You can reduce the size of the list you are interested in by either lowering the alpha or using the lfcThreshold argument of results()." A: deseq2 - many differentially expressed genes

results(dds, lfcThreshold=1)

 

ADD COMMENT
0
Entering edit mode

Hi Michael, 

Thank you very much. 

Another potential problem could be coming from the data itself because the list of genes in the data are ~45k and many of these include microRNAs and other noncoding RNAs. Since the data is coming from mRNA, having these genes (even with zero counts) in the matrix would impact the multiple correction. 

Is it a good idea to remove these genes before hand, if yes, where do you get the GFF/GTF file without the noncoding and pseudo genes?

Greatly appreciate your help.

Prasad

ADD REPLY
0
Entering edit mode

Having features with very small count won't affect the PCA for a few reasons: the transformations we recommend dampen the signal of log of low counts. Secondly, the plotPCA selects the top 500 by variance, and these low count features won't have high variance.

ADD REPLY
0
Entering edit mode

I am not talking about the PCA plot, with respect to the small counts(nc RNAs), but the number of DE genes. 

ADD REPLY

Login before adding your answer.

Traffic: 730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6