DESeqDataSetFromTximport with kallisto abundances.
2
3
Entering edit mode
ben ▴ 30
@ben-13235
Last seen 7.0 years ago

Hi,

I'm trying to use DESeq2 to analyse a dataset where I've generated abundances using Kallisto following the vignette "Analyzing RNA-seq data with DESeq2".

I'm getting a consistent error with DESeqDataSetFromTximport:

> dds <- DESeqDataSetFromTximport(txi.kallisto, samples, ~condition)
Error in validObject(.Object) :
  invalid class “SummarizedExperiment” object: 1: invalid object for slot "NAMES" in class "SummarizedExperiment": got class "array", should be or extend class "character_OR_NULL"
invalid class “SummarizedExperiment” object: 2: 'names(x)' must be NULL or a character vector with no attributes

The same error occurs if you just take counts:

counts <- as.matrix(txi.kallisto$counts)

and then use DESeqDataSetFromMatrix.

Gone through this systematically and found that there's an issue with the rownames of the counts table within the tximport list, it looks like it is setting them as arrays?

> class(rownames(counts))
[1] "array"

Which then if I set them as a character it seems to accept the matrix (if I round the counts as in the fromTximport function). I can now fix the txi list by doing this:

> rownames(txi.kallisto$counts) <- as.character(rownames(txi.kallisto$counts))

> dds <- DESeqDataSetFromTximport(txi.kallisto, samples, ~condition)
using counts and average transcript lengths from tximport

This now appears to work. I think that this must just be a slight bug from one part of the workflow?

Session info below, I updated to the dev version of DESeq2 to see if that fixed it but the same error was in the main release.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.17.2              SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2        
 [5] Biobase_2.36.2             GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
 [9] S4Vectors_0.14.3           BiocGenerics_0.22.0        tximport_1.4.0            

loaded via a namespace (and not attached):
 [1] genefilter_1.58.1       locfit_1.5-9.1          rhdf5_2.20.0            splines_3.4.0           lattice_0.20-35        
 [6] colorspace_1.3-2        htmltools_0.3.6         base64enc_0.1-3         survival_2.41-3         XML_3.98-1.7           
[11] rlang_0.1.1             DBI_0.6-1               foreign_0.8-68          BiocParallel_1.10.1     RColorBrewer_1.1-2     
[16] GenomeInfoDbData_0.99.0 plyr_1.8.4              stringr_1.2.0           zlibbioc_1.22.0         munsell_0.4.3          
[21] gtable_0.2.0            htmlwidgets_0.8         memoise_1.1.0           latticeExtra_0.6-28     knitr_1.16             
[26] geneplotter_1.54.0      AnnotationDbi_1.38.1    htmlTable_1.9           Rcpp_0.12.11            acepack_1.4.1          
[31] xtable_1.8-2            scales_0.4.1            backports_1.1.0         checkmate_1.8.2         Hmisc_4.0-3            
[36] annotate_1.54.0         XVector_0.16.0          gridExtra_2.2.1         ggplot2_2.2.1           digest_0.6.12          
[41] stringi_1.1.5           grid_3.4.0              tools_3.4.0             bitops_1.0-6            magrittr_1.5           
[46] RSQLite_1.1-2           lazyeval_0.2.0          RCurl_1.95-4.8          tibble_1.3.3            Formula_1.2-1          
[51] cluster_2.0.6           Matrix_1.2-10           data.table_1.10.4       rpart_4.1-11            nnet_7.3-12            
[56] compiler_3.4.0        

 

 

 

deseq2 tximport • 2.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States
What do you get with biocValid()? It's a function from the BiocInstaller package. It looks like you have some kind of hybrid installation?
ADD COMMENT
0
Entering edit mode
> biocValid()

* sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8  
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C               
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocInstaller_1.26.0       openxlsx_4.0.17            readxl_1.0.0               mygene_1.12.0           
[5] GenomicFeatures_1.28.3     AnnotationDbi_1.38.1       RColorBrewer_1.1-2         pheatmap_1.0.8          
[9] DESeq2_1.17.2              SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2      
[13] Biobase_2.36.2             GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2          
[17] S4Vectors_0.14.3           BiocGenerics_0.22.0        tximport_1.4.0            

loaded via a namespace (and not attached):
[1] httr_1.2.1               jsonlite_1.5             splines_3.4.0            gsubfn_0.6-6          
[5] Formula_1.2-1            latticeExtra_0.6-28      cellranger_1.1.0         GenomeInfoDbData_0.99.0
[9] Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.1.0          lattice_0.20-35       
[13] chron_2.3-50             digest_0.6.12            XVector_0.16.0           checkmate_1.8.2       
[17] colorspace_1.3-2         htmltools_0.3.6          Matrix_1.2-10            plyr_1.8.4            
[21] XML_3.98-1.7             biomaRt_2.32.1           genefilter_1.58.1        zlibbioc_1.22.0       
[25] xtable_1.8-2             scales_0.4.1             BiocParallel_1.10.1      htmlTable_1.9         
[29] tibble_1.3.3             annotate_1.54.0          ggplot2_2.2.1            sqldf_0.4-10          
[33] nnet_7.3-12              lazyeval_0.2.0           proto_1.0.0              survival_2.41-3       
[37] magrittr_1.5             memoise_1.1.0            foreign_0.8-68           tools_3.4.0           
[41] data.table_1.10.4        stringr_1.2.0            munsell_0.4.3            locfit_1.5-9.1        
[45] cluster_2.0.6            Biostrings_2.44.1        compiler_3.4.0           rlang_0.1.1           
[49] rhdf5_2.20.0             grid_3.4.0               RCurl_1.95-4.8           htmlwidgets_0.8       
[53] bitops_1.0-6             base64enc_0.1-3          labeling_0.3             gtable_0.2.0          
[57] curl_2.6                 DBI_0.6-1                R6_2.2.1                 GenomicAlignments_1.12.1
[61] gridExtra_2.2.1          knitr_1.16               rtracklayer_1.36.3       Hmisc_4.0-3           
[65] stringi_1.1.5            Rcpp_0.12.11             geneplotter_1.54.0       rpart_4.1-11          
[69] acepack_1.4.1           

* Out-of-date packages
      Package LibPath                                       Installed Built   ReposVer Repository                          
ggsci "ggsci" "/home/ben/R/x86_64-pc-linux-gnu-library/3.4" "2.4"     "3.4.0" "2.7"    "https://cran.rstudio.com/src/contrib"
rJava "rJava" "/usr/lib/R/site-library"                     "0.9-6"   "3.0.2" "0.9-8"  "https://cran.rstudio.com/src/contrib"

update with biocLite()

* Packages too new for Bioconductor version '3.5'

       Version  LibPath                                    
DESeq2 "1.17.2" "/home/ben/R/x86_64-pc-linux-gnu-library/3.4"

downgrade with biocLite("DESeq2")

Error: 2 package(s) out of date; 1 package(s) too new
ADD REPLY
0
Entering edit mode

So the first step would be to downgrade DESeq2, so that biocValid() is happy. Then report back if you still have a problem. Mixing different versions of Bioconductor packages is a recipe for errors. Only using biocLite() for installation is the solution.

ADD REPLY
0
Entering edit mode
Ming Wang • 0
@ming-wang-8281
Last seen 3.1 years ago
United States

I also found this problem when I import abundance.h5 of kallisto using tximport (1.4.0). Change the class of target_id from array to character in function read_kallisto_h5 [line 42 of helper.R] can fix my problem.

target_id = as.character(ids)

# code contributed from Andrew Morgan
read_kallisto_h5 <- function(fpath, ...) {
  if (!requireNamespace("rhdf5", quietly=TRUE)) {
    stop("reading kallisto results from hdf5 files requires Bioconductor package `rhdf5`")
  }
  counts <- rhdf5::h5read(fpath, "est_counts")
  ids <- rhdf5::h5read(fpath, "aux/ids")
  efflens <- rhdf5::h5read(fpath, "aux/eff_lengths")

  stopifnot(length(counts) == length(ids)) 
  stopifnot(length(efflens) == length(ids))

  result <- data.frame(target_id = as.character(ids),
                       eff_length = efflens,
                       est_counts = as.numeric(counts),
                       stringsAsFactors = FALSE)
  normfac <- with(result, (1e6)/sum(est_counts/eff_length))
  result$tpm <- with(result, normfac*(est_counts/eff_length))
  return(result)
}

I also checked the difference of txi objects from tsv and h5. Notice [1:178136(1d)] in txi.kallisto.

library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")
samples <- read.table(file.path(dir,"samples.txt"), header=TRUE)
samples$condition <- factor(rep(c("A","B"),each=3))
rownames(samples) <- samples$run
samples[,c("pop","center","run","condition")]
tx2gene <- read.csv(file.path(dir, "tx2gene.csv"))

files <- file.path(dir, "kallisto_boot", samples$run, "abundance.h5")
names(files) <- paste0("sample", 1:6)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE, dropInfReps = TRUE)
#rownames(txi.kallisto$counts) <- as.character(rownames(txi.kallisto$counts))
head(txi.kallisto$counts)

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")
names(files) <- paste0("sample", 1:6)
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)
head(txi.kallisto.tsv$counts)
> str(txi.kallisto)
List of 4
 $ abundance          : num [1:178136, 1:6] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:178136(1d)] "ENST00000448914.1" "ENST00000631435.1" "ENST00000632684.1" "ENST00000434970.2" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ counts             : num [1:178136, 1:6] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:178136(1d)] "ENST00000448914.1" "ENST00000631435.1" "ENST00000632684.1" "ENST00000434970.2" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ length             : num [1:178136, 1:6] 7.75 6.75 6.75 5.67 4.67 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:178136(1d)] "ENST00000448914.1" "ENST00000631435.1" "ENST00000632684.1" "ENST00000434970.2" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ countsFromAbundance: chr "no"

> str(txi.kallisto.tsv)
List of 4
 $ abundance          : num [1:25343, 1:6] 3.8408 2.7673 0.0798 0.3727 0.0367 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:25343] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ counts             : num [1:25343, 1:6] 108.6 86.2 9 24 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:25343] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ length             : num [1:25343, 1:6] 1939 2136 7737 4417 1869 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:25343] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ countsFromAbundance: chr "no"
ADD COMMENT
0
Entering edit mode

Thanks for your report and fix. I just added this fix to the development branch (tximport 1.5.1), which will be released in a few weeks.

ADD REPLY

Login before adding your answer.

Traffic: 772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6