Hi all. I´m doing DEG analysis using kallisto and DESeq2. I was able to complete the analysis and identify differentially expressed transcripts. However, I noticed I could not extract size factors from the DESeq2 object
I imported the estimated counts from kallisto tsv files as such:
kallisto_Counts <- tximport(files, type="kallisto", txOut = TRUE, dropInfReps = TRUE)
ddskallisto <- DESeqDataSetFromTximport(kallisto_Counts, colData = sampleTable, design = ~ condition)
And computed the sizes factors using the DESeq function
dds <- DESeq(ddskallisto)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
I also tried to compute using the Size factor with the dedicated function
dds_with_sz <- estimateSizeFactors(dds)
#using 'avgTxLength' from assays(dds), correcting for library size
In both cases when I tried to access the size factor it returned NULL
sizeFactors(dds)
#NULL
I´m unsure if the results are nonsensical because the counts may have not been normalized for sequencing deepness (at least that is my understanding of the size factor operation). Using the counts and the assay function return different count matrice, but I´m still unsure if I´m interpreting this right.
Thanks in advance
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] tximport_1.14.2 ggrepel_0.8.2
[3] pheatmap_1.0.12 dplyr_0.8.5
[5] RColorBrewer_1.1-2 ggplot2_3.3.0
[7] DESeq2_1.26.0 SummarizedExperiment_1.16.1
[9] DelayedArray_0.12.3 BiocParallel_1.20.1
[11] matrixStats_0.56.0 Biobase_2.46.0
[13] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
[15] IRanges_2.20.2 S4Vectors_0.24.4
[17] BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.6.1 Formula_1.2-3
[4] assertthat_0.2.1 latticeExtra_0.6-29 blob_1.2.1
[7] GenomeInfoDbData_1.2.2 pillar_1.4.3 RSQLite_2.2.0
[10] backports_1.1.6 lattice_0.20-38 glue_1.4.0
[13] digest_0.6.25 XVector_0.26.0 checkmate_2.0.0
[16] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-17
[19] XML_3.99-0.3 pkgconfig_2.0.3 genefilter_1.68.0
[22] zlibbioc_1.32.0 purrr_0.3.4 xtable_1.8-4
[25] scales_1.1.0 jpeg_0.1-8.1 htmlTable_1.13.3
[28] tibble_3.0.1 annotate_1.64.0 ellipsis_0.3.0
[31] withr_2.2.0 nnet_7.3-12 survival_3.1-12
[34] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
[37] foreign_0.8-71 tools_3.6.1 data.table_1.12.8
[40] hms_0.5.3 lifecycle_0.2.0 stringr_1.4.0
[43] locfit_1.5-9.4 munsell_0.5.0 cluster_2.1.0
[46] AnnotationDbi_1.48.0 compiler_3.6.1 rlang_0.4.5
[49] grid_3.6.1 RCurl_1.98-1.1 rstudioapi_0.11
[52] htmlwidgets_1.5.1 bitops_1.0-6 base64enc_0.1-3
[55] gtable_0.3.0 DBI_1.1.0 R6_2.4.1
[58] gridExtra_2.3 knitr_1.28 bit_1.1-15.2
[61] Hmisc_4.4-0 readr_1.3.1 stringi_1.4.6
[64] Rcpp_1.0.4.6 geneplotter_1.64.0 vctrs_0.2.4
[67] rpart_4.1-15 acepack_1.4.1 png_0.1-7
[70] tidyselect_1.0.0 xfun_0.13
I forgot to mention in the post, but I imported the estimated gene counts from the .tsv file kallisto output (the second to last collum). In this case, DESeq2 also does the normalization step both per genes and sample? Thanks
My reply is accurate for the code you posted above.