DESEQ2 genes with large count variation do not show as differentially expressed
6
0
Entering edit mode
@sharvari-gujja-6614
Last seen 14 months ago
United States

Hi,

I have a gene X (in a table of 57K genes) with unique counts from 2 conditions:

Control: 173    208    740    

Case: 451397    391335    540771

On running DESEQ, this gene doesn't show up as significant.

dds <- DESeq(dds)

Are there any specific parameter for such large count variation for deseq?

Please advice.

Thanks

Sharvari

deseq2 • 2.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Sharvari,

Some of the default settings are likely not working well for your particular dataset with such large and variable counts for an individual gene. I would run DESeq() again but with the following non-default arguments: DESeq(dds, betaPrior=FALSE) and then results(dds, cooksCutoff=FALSE). I wouldn't trust the shrinkage of log fold changes in this case (because the dataset might include just a single gene with an extreme fold change, so the prior variance estimation would not pick up on the proper width), nor the automatic outlier filtering. It's better to examine the top genes in this case by eye, using plotCounts, rather than relying on the automatic outlier filtering.

ADD COMMENT
0
Entering edit mode
@sharvari-gujja-6614
Last seen 14 months ago
United States

Thanks Michael for the suggestions. The gene now shows up as significantly expressed. I did a quick comparison on the output lists generated from default setting and one without outlier filtering and only 8 genes differ. 

I have one more question about pairwise comparison with 3 or more samples. I tried to use 'contrast' option , but the job failed.

Eg: 

colData:

     condition

A1 Control

A2 Control

A3 Control

B1 Case1

B2 Case1

B3 Case1

C1 Case2

C2 Case2

C3 Case3

Is there an automatic way of doing pairwise comparison between all the conditions defined in 'colData'? If not, what should be the correct way of specifying contrasts for pairwise comparison?

 

Thanks for helping

Sharvari

 

 

 

 

ADD COMMENT
0
Entering edit mode

I can't help much here unless you provide:

  • all of your code,
  • a copy of the error message
  • the output of: sessionInfo()
ADD REPLY
0
Entering edit mode
@sharvari-gujja-6614
Last seen 14 months ago
United States

Hello Michael,

I could get contrast to work.. I did not realize there was a typo in the function.

 

dds <- DESeqDataSetFromMatrix(countData = countData,colData = samples,design = ~condition)

dds <- DESeq(dds)

dds <- dds[ rowSums(counts(dds)) > 1, ]

res <- results(dds, contrast=c("condition", "Case1", "Case2"))

res <- res[complete.cases(res),]

res <-res[order(res$padj),]

resSig <- subset(res, padj < 0.1)

 

How do I modify this code to get the PCA plot and sample to sample distance for each pairwise comparison:

# PCA

rld <- rlog(dds)

plotPCA(rld, intgroup=c("condition"));title(main="PCAPlot",outer=TRUE)

# Sample to samples distances

distsRL <- dist(t(assay(rld)))

mat <- as.matrix(distsRL)

rownames(mat) <- colnames(mat) <- with(colData(dds),paste(condition, sep=" : "))

hc <- hclust(distsRL)

heatmap.2(mat, Rowv=as.dendrogram(hc),symm=TRUE, trace="none",col = rev(hmcol), margin=c(6,6));title(main="sample-sample_dist_raw",outer=TRUE)

 

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] splines   grid      stats4    parallel  stats     graphics  grDevices utils    
 [9] datasets  methods   base     

other attached packages:
 [1] gplots_2.17.0             VGAM_0.9-8                RUVSeq_1.2.0             
 [4] EDASeq_2.2.0              ShortRead_1.26.0          GenomicAlignments_1.4.1  
 [7] Rsamtools_1.20.4          Biostrings_2.36.4         XVector_0.8.0            
[10] BiocParallel_1.2.20       Biobase_2.28.0            RColorBrewer_1.1-2       
[13] pheatmap_1.0.7            DESeq2_1.8.1              RcppArmadillo_0.5.100.1.0
[16] Rcpp_0.12.0               sva_3.14.0                genefilter_1.50.0        
[19] mgcv_1.8-7                nlme_3.1-122              edgeR_3.10.2             
[22] limma_3.24.15             cummeRbund_2.10.0         Gviz_1.12.1              
[25] rtracklayer_1.28.9        GenomicRanges_1.20.6      fastcluster_1.1.16       
[28] reshape2_1.4.1            ggplot2_1.0.1             RSQLite_1.0.0            
[31] DBI_0.3.1                 corrplot_0.73             HSMMSingleCell_0.102.0   
[34] BiocInstaller_1.18.4      GenomeInfoDb_1.4.2        IRanges_2.2.7            
[37] S4Vectors_0.6.3           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] bitops_1.0-6              matrixStats_0.14.2        tools_3.2.0              
 [4] KernSmooth_2.23-14        rpart_4.1-10              Hmisc_3.16-0             
 [7] colorspace_1.2-6          nnet_7.3-10               gridExtra_2.0.0          
[10] caTools_1.17.1            scales_0.2.5              DESeq_1.20.0             
[13] stringr_1.0.0             digest_0.6.8              foreign_0.8-66           
[16] R.utils_2.1.0             dichromat_2.0-0           BSgenome_1.36.3          
[19] hwriter_1.3.2             gtools_3.4.2              acepack_1.3-3.3          
[22] R.oo_1.19.0               VariantAnnotation_1.14.11 RCurl_1.95-4.6           
[25] magrittr_1.5              Formula_1.2-1             futile.logger_1.4.1      
[28] Matrix_1.2-2              munsell_0.4.2             proto_0.3-10             
[31] R.methodsS3_1.7.0         stringi_0.4-1             MASS_7.3-43              
[34] zlibbioc_1.14.0           plyr_1.8.3                gdata_2.16.1             
[37] lattice_0.20-33           GenomicFeatures_1.20.3    annotate_1.46.1          
[40] locfit_1.5-9.1            geneplotter_1.46.0        biomaRt_2.24.0           
[43] futile.options_1.0.0      XML_3.98-1.1              biovizBase_1.16.0        
[46] latticeExtra_0.6-26       lambda.r_1.1.7            gtable_0.1.2             
[49] aroma.light_2.4.0         xtable_1.7-4              survival_2.38-3          
[52] AnnotationDbi_1.30.1      cluster_2.0.3            

I have 28 pairwise comparisons. Is there a better way to generate plots for each comparison?

Thanks

Sharvari

 

ADD COMMENT
0
Entering edit mode

hi Sharvari,

"this code to get the PCA plot and sample to sample distance for each pairwise comparison"

The PCA plot and distance matrix are unsupervised analyses really, in that, the condition information is not used directly, and certainly you don't need to specify two groups to make a PCA plot of distance matrix plot. If I were a reader, I would prefer to see all the samples together, so that I can judge if B vs A or C vs A is larger, etc.

"I have 28 pairwise comparisons. Is there a better way to generate plots for each comparison?"

That's a lot of pairs to compare. You might want to enlist a local statistician to help you correct for so many tests (DESeq2 only corrects along genes for multiple comparisons, but if you perform all possible comparison you should also consider correcting for this multiplicity as well). We do not provide functionality to programmatically generate these plots, but it should be able to script using basic R.

ADD REPLY
0
Entering edit mode
@sharvari-gujja-6614
Last seen 14 months ago
United States

Oh okay. Thanks for the feedback.

Sharvari

ADD COMMENT
0
Entering edit mode
@sharvari-gujja-6614
Last seen 14 months ago
United States

Hello Michael,

I am running DESeq with non-default arguments:

DESeq(dds, betaPrior=FALSE) and then results(dds, cooksCutoff=FALSE).

My gene of interest has counts from 2 conditions:

C7

253819

C6

288780

C8

337701

C4

187680

R1

90161   

R2

63303

R4

264

R8

74752

 

The gene doesn't show up as significant with or without filtering turned off. I've run DESEQ2 excluding sample R4, but get the same results.

Could you please advice if any other options could be tweaked for this comparison C vs. R? 

 

Thanks

Sharvari

ADD COMMENT
0
Entering edit mode

I can't say why a given gene is or isn't passing an adjusted p-value threshold for an experiment. It depends on many things: in particular how the rest of the genes look, how many genes are being tested, etc. For all I know, this dataset may not conform to the distributional assumptions of DESeq2. Feel free to play around with other differential packages, it's fairly easy to switch between them once you have a count matrix.

ADD REPLY

Login before adding your answer.

Traffic: 947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6