Diff. Expression with DESeq2 (Two conditions)
1
0
Entering edit mode
coyoung ▴ 10
@coyoung-17963
Last seen 4.8 years ago

Hello, 

New to bioconductor & DESeq2. I have trying to look into the differential expression of primary tumor vs matched normal tumors. My question is that once I run the analysis & print a MA-plot I seem to have way to many differentially expressed genes. Is there any command of function that I may be missing in my analysis?

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ condition)
ddsf <- estimateSizeFactors(dds)
ddsf$condition <- relevel(dds$condition,ref='NT')
dds <- DESeq(dds)
res <- results(dds, contrast=c('condition','MPT', 'NT'))
plotMA(dds,ylim=c(-5,5),main="Ma-Plot")

 

deseq2 • 1.8k views
ADD COMMENT
0
Entering edit mode

EDIT: Just a few of the samples below, not the entire dataset

as.data.frame(colData(dds))

                            condition
MPT_92d91508.7894.4d4d.b51e.748c367835ec_gdc_realn_rehead.bam       MPT
MPT_97213ae3.3d55.4d5c.9ab7.aaedb58adf91_gdc_realn_rehead.bam       MPT
MPT_97ccb4e0.a0b0.4541.88f9.af010324ab7e_gdc_realn_rehead.bam       MPT
MPT_9b96860f.7717.46b3.8ab7.2f7d9796c1d4_gdc_realn_rehead.bam       MPT
MPT_9f809399.b8bf.4a21.a703.14464e6594eb_gdc_realn_rehead.bam       MPT
MPT_a53b56fb.04c3.402c.9236.0bdbd80da2b8_gdc_realn_rehead.bam       MPT
MPT_a5e9d1ee.2e45.4f48.b50d.11f118fc25a2_gdc_realn_rehead.bam       MPT
MPT_aa44f578.7400.4ef8.b1b0.8c369bfda870_gdc_realn_rehead.bam       MPT
MPT_ac545a99.b3ab.4917.836b.031903d738d3_gdc_realn_rehead.bam       MPT
MPT_af2719bb.3eb8.4d96.add9.3f44ca91e6ae_gdc_realn_rehead.bam       MPT
MPT_af288cac.29cd.4451.8d3f.6a32fc3e6ab5_gdc_realn_rehead.bam       MPT
MPT_b79a9278.cdfb.429b.8439.f78b61b2a463_gdc_realn_rehead.bam       MPT
MPT_ba3418e3.a846.47e1.8eb4.e71f17277dad_gdc_realn_rehead.bam       MPT
MPT_bcc8e80a.c06c.4a1a.b308.9957fc3ca5d4_gdc_realn_rehead.bam       MPT
MPT_c0994e02.1842.4736.a010.78275693e2ab_gdc_realn_rehead.bam       MPT
MPT_c6173474.4555.47b3.8e79.08b77529d305_gdc_realn_rehead.bam       MPT
MPT_cc3ad632.c9d1.4d39.a88e.d96e16546502_gdc_realn_rehead.bam       MPT
MPT_d04d83a4.fbe1.4b6a.a48d.2950394a67bb_gdc_realn_rehead.bam       MPT
MPT_d3467666.fc2e.41f7.95d2.215c7e36c715_gdc_realn_rehead.bam       MPT
MPT_d42a6fe0.8988.43ec.93fb.eec85f3dfd62_gdc_realn_rehead.bam       MPT
MPT_d8183508.e323.4a1f.ac65.4b89c848247f_gdc_realn_rehead.bam       MPT
MPT_d908f378.a7f0.4f06.8e44.1da223489b11_gdc_realn_rehead.bam       MPT
MPT_de162e91.f450.4632.8779.d24839cd7769_gdc_realn_rehead.bam       MPT
MPT_de3be65f.e5f2.4cd1.8617.c8ca750d26c6_gdc_realn_rehead.bam       MPT
NT_1b9642cd.d963.4251.a929.b499516a3f82_gdc_realn_rehead.bam         NT
NT_218497eb.3c64.4e96.bd70.c2cf3d6d9877_gdc_realn_rehead.bam         NT
NT_29d635f0.3db9.4d3a.bae3.7e16de3b6bea_gdc_realn_rehead.bam         NT
NT_32b09cce.1d6a.44f1.8699.7cdf8d6e0fb3_gdc_realn_rehead.bam         NT
NT_38d095ea.9002.42a5.a556.1c751b0405e9_gdc_realn_rehead.bam         NT
NT_3f574e31.db95.4199.826c.c2f5b1e4332b_gdc_realn_rehead.bam         NT
NT_426159a8.d17a.4770.91ac.578ecdf46b8d_gdc_realn_rehead.bam         NT
NT_4340a3f4.4b56.432a.949a.655b5f74c2f7_gdc_realn_rehead.bam         NT
NT_4f4a304b.0e65.467e.b07e.155c528ceb15_gdc_realn_rehead.bam         NT
NT_53d5d566.f294.448b.ac48.6ffa2ba333b0_gdc_realn_rehead.bam         NT
NT_56885fdf.ae09.4478.be99.81bd282715eb_gdc_realn_rehead.bam         NT
NT_5b8d2548.b32e.4abb.9244.a618b1b2852e_gdc_realn_rehead.bam         NT
NT_67e0bb40.5006.4c03.839a.8bf3f9383b71_gdc_realn_rehead.bam         NT
NT_692d70a3.ee5a.4fb5.a99f.17cd77bb5274_gdc_realn_rehead.bam         NT
NT_6b81ca60.c91e.447d.8658.d3797422e4c7_gdc_realn_rehead.bam         NT
NT_6c329a5f.3866.4cc1.9d5e.a43413d188b5_gdc_realn_rehead.bam         NT
NT_71f1c38b.f8a7.444c.bd95.4ba50b20b658_gdc_realn_rehead.bam         NT
NT_755cf943.64da.48f7.9c92.ebda57d9253a_gdc_realn_rehead.bam         NT
NT_766f3ef2.d1a3.46a8.9881.966e8e2b9b02_gdc_realn_rehead.bam         NT
NT_7bf2426a.6731.4065.a201.8bce925a16f6_gdc_realn_rehead.bam         NT
NT_7f274fba.b331.4f6d.b257.66767f383ca8_gdc_realn_rehead.bam         NT
 

 

 

 

 

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

You may be interested in the lfcThreshold argument of results (take a look at ?results and at the DESeq2 paper).

I noticed you aren’t including the matching information in the design. This would look like ~patient + condition.

ADD COMMENT
0
Entering edit mode

Thank you for the info. What do you mean by "matching information"? Also I was able to show my MA-PLOT

 

ADD REPLY
0
Entering edit mode

You said you had tumor normal matched samples, so it sounded like there would be a covariate that identified the patient/donor.

ADD REPLY
0
Entering edit mode

No covariate. Would I need one for proper analysis?

I only have one type of factor in my colData object. The patient identifiers which are my row names & the condition: if they are MPT (matched primary) or NT (normal).  

Is there anymore information I need to provide?

ADD REPLY
0
Entering edit mode

I have recommended already to use ~patient + condition. This involves adding another factor patient to colData(dds). This is discussed in the DESeq2 vignette, that you can add additional covariates to account for this variation, while your final results table will focus on the condition differences. If this is confusing what the design means when you use ~patient + condition, I'd strongly recommend to discuss this model with a local statistician at your institute.

ADD REPLY
0
Entering edit mode

 

Thank you. I have added your suggestions. Here is my code: 

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ condition)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds <- estimateSizeFactors(dds)
dds$condition <- relevel(dds$condition, ref=’NT’)
dds <- DESeq(dds)

edit to DESeqDataSetFromMatrix function

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ patient + condition)

but when running the analysis I get the message:

63 rows did not converge in beta, labelled in mocks(object)$betaConv. Use larger maxi argument with nbinoWaldTest

My analysis also goes from taking ~15 mins to about 3 hours.

I did try the command:

dds <- nbinomWaldTest(dds, maxit=500)

Here is the error:

64 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

And this what DESeq2 ends with:

dds <- nbinomWaldTest(dds, maxit=500)
found results columns, replacing these
64 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Session Info:

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] DESeq2_1.20.0               SummarizedExperiment_1.10.1
 [3] DelayedArray_0.6.6          BiocParallel_1.14.2        
 [5] matrixStats_0.54.0          Biobase_2.40.0             
 [7] GenomicRanges_1.32.7        GenomeInfoDb_1.16.0        
 [9] IRanges_2.14.12             S4Vectors_0.18.3           
[11] BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.0          Formula_1.2-3         
 [4] assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1            
 [7] GenomeInfoDbData_1.1.0 pillar_1.3.0           RSQLite_2.1.1         
[10] backports_1.1.2        lattice_0.20-35        glue_1.3.0            
[13] digest_0.6.17          RColorBrewer_1.1-2     XVector_0.20.0        
[16] checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6       
[19] Matrix_1.2-14          plyr_1.8.4             XML_3.98-1.16         
[22] pkgconfig_2.0.2        genefilter_1.62.0      zlibbioc_1.26.0       
[25] purrr_0.2.5            xtable_1.8-3           scales_1.0.0          
[28] htmlTable_1.12         tibble_1.4.2           annotate_1.58.0       
[31] ggplot2_3.0.0          nnet_7.3-12            lazyeval_0.2.1        
[34] survival_2.42-6        magrittr_1.5           crayon_1.3.4          
[37] memoise_1.1.0          foreign_0.8-71         tools_3.5.0           
[40] data.table_1.11.4      stringr_1.3.1          locfit_1.5-9.1        
[43] munsell_0.5.0          cluster_2.0.7-1        AnnotationDbi_1.42.1  
[46] bindrcpp_0.2.2         compiler_3.5.0         rlang_0.2.2           
[49] grid_3.5.0             RCurl_1.95-4.11        rstudioapi_0.7        
[52] htmlwidgets_1.3        bitops_1.0-6           base64enc_0.1-3       
[55] gtable_0.2.0           DBI_1.0.0              R6_2.2.2              
[58] gridExtra_2.3          knitr_1.20             dplyr_0.7.6           
[61] bit_1.1-14             bindr_0.1.1            Hmisc_4.1-1           
[64] stringi_1.2.4          Rcpp_0.12.19           geneplotter_1.58.0    
[67] rpart_4.1-13           acepack_1.4.1          tidyselect_0.2.4
ADD REPLY
0
Entering edit mode

And here is the first 40 of 125 in colData:

                                                              X condition
1  MPT_0ea510ed.4e24.4c5b.ab49.cfef85aedeab_gdc_realn_rehead.bam       MPT
2  MPT_10e0e7f3.3fc1.43f7.b7fc.7bb6a375d060_gdc_realn_rehead.bam       MPT
3  MPT_1132c05d.00f1.4c3b.a0ed.06a735c82401_gdc_realn_rehead.bam       MPT
4  MPT_1228fb38.e5a9.4521.ad83.a67977908527_gdc_realn_rehead.bam       MPT
5  MPT_233446a9.2cbf.4395.9fb4.0660276d9885_gdc_realn_rehead.bam       MPT
6  MPT_269f6d2a.87dc.4fd8.af61.2ab4e6e06728_gdc_realn_rehead.bam       MPT
7  MPT_284cdeba.e8fb.4e5d.960a.dc45815fd3bd_gdc_realn_rehead.bam       MPT
8  MPT_2950c941.f93f.4948.97f2.74cf6d5161de_gdc_realn_rehead.bam       MPT
9  MPT_298f6a56.5a1d.4b35.bb8a.e670ea6a0ee9_gdc_realn_rehead.bam       MPT
10 MPT_2adbf2eb.6c1c.4624.91ce.86765f2219c5_gdc_realn_rehead.bam       MPT
11 MPT_2babb909.4d6a.4873.9010.0481cd9d7f67_gdc_realn_rehead.bam       MPT
12 MPT_2c95ea97.62f0.4cdf.b5c2.e52f1dba1df7_gdc_realn_rehead.bam       MPT
13 MPT_2d5e614b.ab16.4ff9.bb9a.0c1004890a77_gdc_realn_rehead.bam       MPT
14 MPT_3efc7114.c5a2.4bb6.b6bf.02fafc3573f6_gdc_realn_rehead.bam       MPT
15 MPT_3f53a71c.e0a7.442f.92da.4e74f5c84581_gdc_realn_rehead.bam       MPT
16 MPT_41481131.9b42.47bb.9f05.879007ad423c_gdc_realn_rehead.bam       MPT
17 MPT_4585dd6a.7829.4919.8b36.25d94983144c_gdc_realn_rehead.bam       MPT
18 MPT_495752ac.1048.48c3.9791.623dbd116a1a_gdc_realn_rehead.bam       MPT
19 MPT_4d8bf6c2.5bfa.4ac3.945b.52522a7a0cf9_gdc_realn_rehead.bam       MPT
20 MPT_5735edb0.5df0.4425.b5db.614e94f1e2db_gdc_realn_rehead.bam       MPT
21 MPT_5a66bb8d.7df6.4655.806d.1451370d27a9_gdc_realn_rehead.bam       MPT
22 MPT_5ace35cc.50a9.49ec.84fe.dc6476b51b70_gdc_realn_rehead.bam       MPT
23 MPT_61274c16.dddd.4df9.815a.52bc8fdc2462_gdc_realn_rehead.bam       MPT
24 MPT_64e21887.298b.4768.8946.cf8fbfae611c_gdc_realn_rehead.bam       MPT
25 MPT_67dd9f18.b0ba.48ef.a01c.cefe2cee9263_gdc_realn_rehead.bam       MPT
26 MPT_6a30bada.db34.45d8.af22.2a185cfc120b_gdc_realn_rehead.bam       MPT
27 MPT_6cb90565.9ca5.48ee.a258.40ea601c7380_gdc_realn_rehead.bam       MPT
28 MPT_6e3dbbf4.ea7c.438f.a944.55adb7c291d5_gdc_realn_rehead.bam       MPT
29 MPT_6fc2c0be.53ca.418f.af9b.0a3a20640519_gdc_realn_rehead.bam       MPT
30 MPT_74be9457.39ce.4029.9076.53b0e57d2067_gdc_realn_rehead.bam       MPT
31 MPT_7505607d.48c9.43ab.bce8.283ff7d6ce6c_gdc_realn_rehead.bam       MPT
32 MPT_76a97a0e.a66b.4439.9b16.ea841f882758_gdc_realn_rehead.bam       MPT
33 MPT_79c5c92d.4f9b.4495.b9ce.957581959297_gdc_realn_rehead.bam       MPT
34 MPT_80c84645.5441.4a13.9b44.c5d6b4909d93_gdc_realn_rehead.bam       MPT
35 MPT_8a036b15.e79e.4e26.8700.cc6ff96ae01f_gdc_realn_rehead.bam       MPT
36 MPT_8f2741bc.e876.4acb.ab86.b90e88330644_gdc_realn_rehead.bam       MPT
37 MPT_92d91508.7894.4d4d.b51e.748c367835ec_gdc_realn_rehead.bam       MPT
38 MPT_97213ae3.3d55.4d5c.9ab7.aaedb58adf91_gdc_realn_rehead.bam       MPT
39 MPT_97ccb4e0.a0b0.4541.88f9.af010324ab7e_gdc_realn_rehead.bam       MPT
40 MPT_9b96860f.7717.46b3.8ab7.2f7d9796c1d4_gdc_realn_rehead.bam       MPT
        patient
1  TCGA.38.4626
2  TCGA.49.6761
3  TCGA.55.6986
4  TCGA.49.4490
5  TCGA.44.6146
6  TCGA.55.6981
7  TCGA.55.6972
8  TCGA.55.6971
9  TCGA.49.4512
10 TCGA.91.6828
11 TCGA.44.6777
12 TCGA.73.4676
13 TCGA.55.6968
14 TCGA.44.2665
15 TCGA.91.6829
16 TCGA.55.6970
17 TCGA.44.6146
18 TCGA.50.6595
19 TCGA.44.2668
20 TCGA.50.5933
21 TCGA.91.6847
22 TCGA.44.2657
23 TCGA.44.2665
24 TCGA.50.5936
25 TCGA.44.2661
26 TCGA.44.2662
27 TCGA.44.6147
28 TCGA.55.6983
29 TCGA.44.6147
30 TCGA.44.2662
31 TCGA.55.6979
32 TCGA.55.6982
33 TCGA.44.6778
34 TCGA.44.6145
35 TCGA.38.4627
36 TCGA.50.5939
37 TCGA.55.6985
38 TCGA.44.3398
39 TCGA.91.6849
40 TCGA.44.2668
ADD REPLY
0
Entering edit mode

The rows not converging is not always a problem. Rather than changing maxit you can use the following filter, where n is a minimal number of samples, for example 5 samples having a count of 10 or more.

keep <- rowSums(counts(dds) >= 10) >= n
ADD REPLY
0
Entering edit mode

Thank you. I am running this code as of now.

> dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ patient + condition)
> keep <- rowSums(counts(dds) >= 10) >= 5
> dds <- dds[keep,]
> dds <- estimateSizeFactors(dds)
> dds <- estimateDispersions(dds)

Print out:

gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates

Should I continue to use the nbinomWaldTest function. Or would the general deseq2() function work best?

dds <- nbinomWaldTest(dds, maxit=500) 

 

ADD REPLY
0
Entering edit mode

You can use this code. The message just says that local is substituted if you read it.

keep <- rowSums(counts(dds) >= 10) >= 5
dds <- dds[keep,]
dds <- DESeq(dds, fitType="local")
ADD REPLY
0
Entering edit mode

Thank you for your help. Is there any reasoning for choosing to only keep rows with at least 10 counts total & in 5 or more samples? Is there any other literature I can read to support these numbers or are they more at the discretion or the person funning the analysis. 

ADD REPLY
0
Entering edit mode

Final code & output

cts <- read.csv(file='/Users/Corey/Desktop/DESeq2/DESeq2_Output/GeneName.csv')
colData <- read.csv(file='/Users/Corey/Desktop/DESeq2/DESeq2_Output/colData_try.csv')
rownames(cts) <- cts$Geneid
cts$Geneid <- NULL
dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ condition)
dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ patient + condition)
keep <- rowSums(counts(dds) >= 10) >= 5
dds <- dds[keep,]
dds$condition <- relevel(dds$condition, ref=’NT’)
Error: unexpected input in "dds$condition <- relevel(dds$condition, ref=‚"
dds$condition <- relevel(dds$condition, ref='NT')
dds <- DESeq(dds, fitType='local')
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
57 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
ADD REPLY
0
Entering edit mode

Everything is up to your discretion as the analyst.

ADD REPLY

Login before adding your answer.

Traffic: 919 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6