Cannot extract size factor from DESeq2 analysis
1
0
Entering edit mode
zroger499 • 0
@zroger499-23414
Last seen 10 months ago
Portugal

Hi all. I´m doing DEG analysis using kallisto and DESeq2. I was able to complete the analysis and identify differentially expressed transcripts. However, I noticed I could not extract size factors from the DESeq2 object

I imported the estimated counts from kallisto tsv files as such:

kallisto_Counts <- tximport(files, type="kallisto", txOut = TRUE, dropInfReps = TRUE)
ddskallisto <- DESeqDataSetFromTximport(kallisto_Counts, colData = sampleTable, design = ~ condition)

And computed the sizes factors using the DESeq function

dds <- DESeq(ddskallisto)


estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

I also tried to compute using the Size factor with the dedicated function

dds_with_sz <- estimateSizeFactors(dds) 
#using 'avgTxLength' from assays(dds), correcting for library size

In both cases when I tried to access the size factor it returned NULL

sizeFactors(dds)
#NULL

I´m unsure if the results are nonsensical because the counts may have not been normalized for sequencing deepness (at least that is my understanding of the size factor operation). Using the counts and the assay function return different count matrice, but I´m still unsure if I´m interpreting this right.

Thanks in advance

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] tximport_1.14.2             ggrepel_0.8.2              
 [3] pheatmap_1.0.12             dplyr_0.8.5                
 [5] RColorBrewer_1.1-2          ggplot2_3.3.0              
 [7] DESeq2_1.26.0               SummarizedExperiment_1.16.1
 [9] DelayedArray_0.12.3         BiocParallel_1.20.1        
[11] matrixStats_0.56.0          Biobase_2.46.0             
[13] GenomicRanges_1.38.0        GenomeInfoDb_1.22.1        
[15] IRanges_2.20.2              S4Vectors_0.24.4           
[17] BiocGenerics_0.32.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.6.1          Formula_1.2-3         
 [4] assertthat_0.2.1       latticeExtra_0.6-29    blob_1.2.1            
 [7] GenomeInfoDbData_1.2.2 pillar_1.4.3           RSQLite_2.2.0         
[10] backports_1.1.6        lattice_0.20-38        glue_1.4.0            
[13] digest_0.6.25          XVector_0.26.0         checkmate_2.0.0       
[16] colorspace_1.4-1       htmltools_0.4.0        Matrix_1.2-17         
[19] XML_3.99-0.3           pkgconfig_2.0.3        genefilter_1.68.0     
[22] zlibbioc_1.32.0        purrr_0.3.4            xtable_1.8-4          
[25] scales_1.1.0           jpeg_0.1-8.1           htmlTable_1.13.3      
[28] tibble_3.0.1           annotate_1.64.0        ellipsis_0.3.0        
[31] withr_2.2.0            nnet_7.3-12            survival_3.1-12       
[34] magrittr_1.5           crayon_1.3.4           memoise_1.1.0         
[37] foreign_0.8-71         tools_3.6.1            data.table_1.12.8     
[40] hms_0.5.3              lifecycle_0.2.0        stringr_1.4.0         
[43] locfit_1.5-9.4         munsell_0.5.0          cluster_2.1.0         
[46] AnnotationDbi_1.48.0   compiler_3.6.1         rlang_0.4.5           
[49] grid_3.6.1             RCurl_1.98-1.1         rstudioapi_0.11       
[52] htmlwidgets_1.5.1      bitops_1.0-6           base64enc_0.1-3       
[55] gtable_0.3.0           DBI_1.1.0              R6_2.4.1              
[58] gridExtra_2.3          knitr_1.28             bit_1.1-15.2          
[61] Hmisc_4.4-0            readr_1.3.1            stringi_1.4.6         
[64] Rcpp_1.0.4.6           geneplotter_1.64.0     vctrs_0.2.4           
[67] rpart_4.1-15           acepack_1.4.1          png_0.1-7             
[70] tidyselect_1.0.0       xfun_0.13
deseq2 tximport • 2.1k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 3 days ago
United States

When average transcript lengths are used as offsets (in the tximeta / tximport pipeline this is the case), then we don't have sj (in the 2014 paper notation, see vignette section on the DESeq2 model), but we have sij which are gene- and sample-specific normalization factors. You can access these with:

normalizationFactors(dds)

These aren't the same across genes. If you want to get a sense for the sequencing depth trend you could take the geometric or arithmetic mean of the columns.

ADD COMMENT
0
Entering edit mode

I forgot to mention in the post, but I imported the estimated gene counts from the .tsv file kallisto output (the second to last collum). In this case, DESeq2 also does the normalization step both per genes and sample? Thanks

ADD REPLY
1
Entering edit mode

My reply is accurate for the code you posted above.

ADD REPLY

Login before adding your answer.

Traffic: 797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6