Question

DESEQ2 genes with large count variation do not show as differentially expressed

0

Entering edit mode

sharvari gujja ▴ 40

@sharvari-gujja-6614

Last seen 14 months ago

United States

Hi,

I have a gene X (in a table of 57K genes) with unique counts from 2 conditions:

Control: 173 208 740

Case: 451397 391335 540771

On running DESEQ, this gene doesn't show up as significant.

dds <- DESeq(dds)

Are there any specific parameter for such large count variation for deseq?

Please advice.

Thanks

Sharvari

deseq2 • 2.2k views

ADD COMMENT • link 8.6 years ago • updated 8.5 years ago sharvari gujja ▴ 40

score 0 · Answer 1 · 2015-09-25

hi Sharvari,

Some of the default settings are likely not working well for your particular dataset with such large and variable counts for an individual gene. I would run DESeq() again but with the following non-default arguments: DESeq(dds, betaPrior=FALSE) and then results(dds, cooksCutoff=FALSE). I wouldn't trust the shrinkage of log fold changes in this case (because the dataset might include just a single gene with an extreme fold change, so the prior variance estimation would not pick up on the proper width), nor the automatic outlier filtering. It's better to examine the top genes in this case by eye, using plotCounts, rather than relying on the automatic outlier filtering.

score 0 · Answer 2 · 2015-09-25

Thanks Michael for the suggestions. The gene now shows up as significantly expressed. I did a quick comparison on the output lists generated from default setting and one without outlier filtering and only 8 genes differ.

I have one more question about pairwise comparison with 3 or more samples. I tried to use 'contrast' option , but the job failed.

Eg:

colData:

condition

A1 Control

A2 Control

A3 Control

B1 Case1

B2 Case1

B3 Case1

C1 Case2

C2 Case2

C3 Case3

Is there an automatic way of doing pairwise comparison between all the conditions defined in 'colData'? If not, what should be the correct way of specifying contrasts for pairwise comparison?

Thanks for helping

Sharvari

Michael Love · Answer 3 · 2015-09-28

Hello Michael,

I could get contrast to work.. I did not realize there was a typo in the function.

dds <- DESeqDataSetFromMatrix(countData = countData,colData = samples,design = ~condition)

dds <- DESeq(dds)

dds <- dds[ rowSums(counts(dds)) > 1, ]

res <- results(dds, contrast=c("condition", "Case1", "Case2"))

res <- res[complete.cases(res),]

res <-res[order(res$padj),]

resSig <- subset(res, padj < 0.1)

How do I modify this code to get the PCA plot and sample to sample distance for each pairwise comparison:

# PCA

rld <- rlog(dds)

plotPCA(rld, intgroup=c("condition"));title(main="PCAPlot",outer=TRUE)

# Sample to samples distances

distsRL <- dist(t(assay(rld)))

mat <- as.matrix(distsRL)

rownames(mat) <- colnames(mat) <- with(colData(dds),paste(condition, sep=" : "))

hc <- hclust(distsRL)

heatmap.2(mat, Rowv=as.dendrogram(hc),symm=TRUE, trace="none",col = rev(hmcol), margin=c(6,6));title(main="sample-sample_dist_raw",outer=TRUE)

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] splines   grid      stats4    parallel  stats     graphics  grDevices utils    
 [9] datasets  methods   base     

other attached packages:
 [1] gplots_2.17.0             VGAM_0.9-8                RUVSeq_1.2.0             
 [4] EDASeq_2.2.0              ShortRead_1.26.0          GenomicAlignments_1.4.1  
 [7] Rsamtools_1.20.4          Biostrings_2.36.4         XVector_0.8.0            
[10] BiocParallel_1.2.20       Biobase_2.28.0            RColorBrewer_1.1-2       
[13] pheatmap_1.0.7            DESeq2_1.8.1              RcppArmadillo_0.5.100.1.0
[16] Rcpp_0.12.0               sva_3.14.0                genefilter_1.50.0        
[19] mgcv_1.8-7                nlme_3.1-122              edgeR_3.10.2             
[22] limma_3.24.15             cummeRbund_2.10.0         Gviz_1.12.1              
[25] rtracklayer_1.28.9        GenomicRanges_1.20.6      fastcluster_1.1.16       
[28] reshape2_1.4.1            ggplot2_1.0.1             RSQLite_1.0.0            
[31] DBI_0.3.1                 corrplot_0.73             HSMMSingleCell_0.102.0   
[34] BiocInstaller_1.18.4      GenomeInfoDb_1.4.2        IRanges_2.2.7            
[37] S4Vectors_0.6.3           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] bitops_1.0-6              matrixStats_0.14.2        tools_3.2.0              
 [4] KernSmooth_2.23-14        rpart_4.1-10              Hmisc_3.16-0             
 [7] colorspace_1.2-6          nnet_7.3-10               gridExtra_2.0.0          
[10] caTools_1.17.1            scales_0.2.5              DESeq_1.20.0             
[13] stringr_1.0.0             digest_0.6.8              foreign_0.8-66           
[16] R.utils_2.1.0             dichromat_2.0-0           BSgenome_1.36.3          
[19] hwriter_1.3.2             gtools_3.4.2              acepack_1.3-3.3          
[22] R.oo_1.19.0               VariantAnnotation_1.14.11 RCurl_1.95-4.6           
[25] magrittr_1.5              Formula_1.2-1             futile.logger_1.4.1      
[28] Matrix_1.2-2              munsell_0.4.2             proto_0.3-10             
[31] R.methodsS3_1.7.0         stringi_0.4-1             MASS_7.3-43              
[34] zlibbioc_1.14.0           plyr_1.8.3                gdata_2.16.1             
[37] lattice_0.20-33           GenomicFeatures_1.20.3    annotate_1.46.1          
[40] locfit_1.5-9.1            geneplotter_1.46.0        biomaRt_2.24.0           
[43] futile.options_1.0.0      XML_3.98-1.1              biovizBase_1.16.0        
[46] latticeExtra_0.6-26       lambda.r_1.1.7            gtable_0.1.2             
[49] aroma.light_2.4.0         xtable_1.7-4              survival_2.38-3          
[52] AnnotationDbi_1.30.1      cluster_2.0.3

I have 28 pairwise comparisons. Is there a better way to generate plots for each comparison?

Thanks

Sharvari

score 0 · Answer 4 · 2015-09-30

0

Entering edit mode

sharvari gujja ▴ 40

@sharvari-gujja-6614

Last seen 14 months ago

United States

Oh okay. Thanks for the feedback.

Sharvari

ADD COMMENT • link 8.6 years ago sharvari gujja ▴ 40

score 0 · Answer 5 · 2015-11-10

0

Entering edit mode

sharvari gujja ▴ 40

@sharvari-gujja-6614

Last seen 14 months ago

United States

Hello Michael,

I am running DESeq with non-default arguments:

DESeq(dds, betaPrior=FALSE) and then results(dds, cooksCutoff=FALSE).

My gene of interest has counts from 2 conditions:

C7	253819
C6	288780
C8	337701
C4	187680
R1	90161
R2	63303
R4	264
R8	74752

The gene doesn't show up as significant with or without filtering turned off. I've run DESEQ2 excluding sample R4, but get the same results.

Could you please advice if any other options could be tweaked for this comparison C vs. R?

Thanks

Sharvari

ADD COMMENT • link 8.5 years ago sharvari gujja ▴ 40

0

Entering edit mode

I can't say why a given gene is or isn't passing an adjusted p-value threshold for an experiment. It depends on many things: in particular how the rest of the genes look, how many genes are being tested, etc. For all I know, this dataset may not conform to the distributional assumptions of DESeq2. Feel free to play around with other differential packages, it's fairly easy to switch between them once you have a count matrix.

ADD REPLY • link 8.5 years ago Michael Love 41k