I want to explore RNAseq data from Expression Atlas (EMBL-EBI database). The dataset downloaded from the database was in TPM, so I changed NAs into '0's and multiplied the values by 1E+6 (to get integer values) and added 1 for later log-transformation
data4PCA[is.na(data4PCA)] = 0
data4PCA = data4PCA * 1000000
data4PCAint = as.matrix.data.frame(data4PCA + 1)
I want to do rlogTransformation the data (rlog) but got stuck at creation of a DESeqDataSet:
dds = DESeqDataSetFromMatrix(countData = data4PCAint,
colData = design4pca,
design = ~ System)
which returns an error:
converting counts to integer mode Error in validObject(.Object) :
invalid class “DESeqDataSet” object: NA values are not allowed in the count matrix In addition: Warning message: In mde(x) : NAs introduced by coercion to integer range
head(data4PCAint)
returns:
bone.marrow colon duodenum esophagus liver lymph.node
ENSG00000000003 600001 65000001 30000001 61000001 62000001 6000001
ENSG00000000005 1 1000001 400001 800001 1 100001
ENSG00000000419 87000001 83000001 66000001 98000001 46000001 132000001
ENSG00000000457 3000001 11000001 9000001 10000001 5000001 17000001
ENSG00000000460 6000001 3000001 3000001 3000001 2000001 7000001
ENSG00000000938 308000001 6000001 6000001 12000001 6000001 63000001
rectum saliva.secreting.gland small.intestine spleen
ENSG00000000003 67000001 53000001 22000001 17000001
ENSG00000000005 1000001 400001 400001 500001
ENSG00000000419 83000001 35000001 78000001 97000001
ENSG00000000457 13000001 6000001 9000001 14000001
ENSG00000000460 5000001 1000001 3000001 5000001
ENSG00000000938 7000001 3000001 10000001 180000001
stomach tonsil vermiform.appendix
ENSG00000000003 22000001 14000001 11000001
ENSG00000000005 1 300001 2000001
ENSG00000000419 63000001 114000001 100000001
ENSG00000000457 10000001 14000001 13000001
ENSG00000000460 2000001 10000001 8000001
ENSG00000000938 6000001 36000001 172000001
so it looks like the data is OK?!
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] gplots_3.0.1.1 RColorBrewer_1.1-2
[3] reshape2_1.4.3 pheatmap_1.0.12
[5] forcats_0.4.0 stringr_1.4.0
[7] dplyr_0.8.3 purrr_0.3.2
[9] readr_1.3.1 tidyr_0.8.3
[11] tibble_2.1.3 ggplot2_3.2.0
[13] tidyverse_1.2.1 DESeq2_1.24.0
[15] SummarizedExperiment_1.14.0 DelayedArray_0.10.0
[17] BiocParallel_1.18.0 matrixStats_0.54.0
[19] Biobase_2.44.0 GenomicRanges_1.36.0
[21] GenomeInfoDb_1.20.0 IRanges_2.18.0
[23] S4Vectors_0.22.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] nlme_3.1-140 bitops_1.0-6 lubridate_1.7.4
[4] bit64_0.9-7 httr_1.4.0 tools_3.6.0
[7] backports_1.1.4 R6_2.4.0 KernSmooth_2.23-15
[10] rpart_4.1-15 Hmisc_4.2-0 DBI_1.0.0
[13] lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12
[16] withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3
[19] bit_1.1-14 compiler_3.6.0 cli_1.1.0
[22] rvest_0.3.4 htmlTable_1.13.1 xml2_1.2.0
[25] labeling_0.3 caTools_1.17.1.2 scales_1.0.0
[28] checkmate_1.9.4 genefilter_1.66.0 digest_0.6.20
[31] foreign_0.8-71 XVector_0.24.0 base64enc_0.1-3
[34] pkgconfig_2.0.2 htmltools_0.3.6 htmlwidgets_1.3
[37] rlang_0.4.0 readxl_1.3.1 rstudioapi_0.10
[40] RSQLite_2.1.1 generics_0.0.2 jsonlite_1.6
[43] gtools_3.8.1 acepack_1.4.1 RCurl_1.95-4.12
[46] magrittr_1.5 GenomeInfoDbData_1.2.1 Formula_1.2-3
[49] Matrix_1.2-17 Rcpp_1.0.1 munsell_0.5.0
[52] stringi_1.4.3 zlibbioc_1.30.0 plyr_1.8.4
[55] grid_3.6.0 blob_1.2.0 gdata_2.18.0
[58] crayon_1.3.4 lattice_0.20-38 haven_2.1.1
[61] splines_3.6.0 annotate_1.62.0 hms_0.5.0
[64] locfit_1.5-9.1 zeallot_0.1.0 knitr_1.23
[67] pillar_1.4.2 geneplotter_1.62.0 XML_3.98-1.20
[70] glue_1.3.1 latticeExtra_0.6-28 data.table_1.12.2
[73] modelr_0.1.4 vctrs_0.2.0 cellranger_1.1.0
[76] gtable_0.3.0 assertthat_0.2.1 xfun_0.8
[79] xtable_1.8-4 broom_0.5.2 survival_2.44-1.1
[82] AnnotationDbi_1.46.0 memoise_1.1.0 cluster_2.1.0
I found this example to test with:
...and it works fine?! but my data cannot get through.