DESeq2 input- normalized counts that were processed
1
0
Entering edit mode
@karenchait841-12675
Last seen 7.1 years ago

Hello,

I have data that was processed after normalization (with DESeq2). The data had contamination of melanoma cells so we subtracted the % of contamination of each sample from the counts of each gene. 

What is the best way to continue the analysis with DESeq2 (DE analysis) using this data and not the raw data?

1- To round the values and use it as input to DESeq2

2- Reversing the values back to the raw data values (approximately) using the size factor.

3- Other options...

 

Thank you for your help,

Karen

 

 

Session info:

R version 3.3.2 (2016-10-31)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

 

locale:

[1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255   

[3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                  

[5] LC_TIME=Hebrew_Israel.1255    

 

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  

[9] base     

 

other attached packages:

 [1] BiocInstaller_1.24.0       ggplot2_2.2.1              gplots_3.0.1              

 [4] RColorBrewer_1.1-2         DESeq2_1.14.1              SummarizedExperiment_1.4.0

 [7] Biobase_2.34.0             GenomicRanges_1.26.4       GenomeInfoDb_1.10.3       

[10] IRanges_2.8.2              S4Vectors_0.12.2           BiocGenerics_0.20.0       

 

loaded via a namespace (and not attached):

 [1] genefilter_1.56.0    gtools_3.5.0         locfit_1.5-9.1      

 [4] splines_3.3.2        lattice_0.20-34      colorspace_1.3-2    

 [7] htmltools_0.3.5      base64enc_0.1-3      survival_2.41-2     

[10] XML_3.98-1.5         foreign_0.8-67       DBI_0.6             

[13] BiocParallel_1.8.1   plyr_1.8.4           stringr_1.2.0       

[16] zlibbioc_1.20.0      munsell_0.4.3        gtable_0.2.0        

[19] caTools_1.17.1       htmlwidgets_0.8      memoise_1.0.0       

[22] labeling_0.3         latticeExtra_0.6-28  knitr_1.15.1        

[25] geneplotter_1.52.0   AnnotationDbi_1.36.2 htmlTable_1.9       

[28] Rcpp_0.12.9          KernSmooth_2.23-15   acepack_1.4.1       

[31] xtable_1.8-2         scales_0.4.1         backports_1.0.5     

[34] checkmate_1.8.2      gdata_2.17.0         Hmisc_4.0-2         

[37] annotate_1.52.1      XVector_0.14.1       gridExtra_2.2.1     

[40] digest_0.6.12        stringi_1.1.2        grid_3.3.2          

[43] tools_3.3.2          bitops_1.0-6         magrittr_1.5        

[46] lazyeval_0.2.0       RCurl_1.95-4.8       tibble_1.2          

[49] RSQLite_1.1-2        Formula_1.2-1        cluster_2.0.5       

[52] Matrix_1.2-7.1       data.table_1.10.4    assertthat_0.1      

[55] rpart_4.1-10         nnet_7.3-12 

 

deseq2 • 854 views
ADD COMMENT
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

I'm not sure what the point of scaling all the counts in a sample is. If you're subtracting the same percent from every gene's count for a given sample, it will have no net effect on the fold change calculations, since the size factors will undo this scaling. The only effect will be to make the dispersion estimation less accurate, reducing your statistical power. You should definitely analyze the original raw counts, not any transformation. If you want to control for the effect of contamination, you should include it as a covariate in your model. I'm not sure exactly of the right way to do this, since the percent contamination should be linearly related to gene expression while the negative binomial GLM coefficients are fit on a log scale. You could use the ns function from the splines package to fit a non-linear function of contamination percent, or you could use the sva package to estimate the confounding effect on the proper log scale from the data itself. Perhaps others will weigh in on the best way to incorporate the contamination effect into your model, or perhaps you have a statistician in your lab who can advise you.

ADD COMMENT

Login before adding your answer.

Traffic: 607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6