Search
Question: DESeqDataSetFromTximport with kallisto abundances.
3
gravatar for bh129
5 months ago by
bh12930
bh12930 wrote:

Hi,

I'm trying to use DESeq2 to analyse a dataset where I've generated abundances using Kallisto following the vignette "Analyzing RNA-seq data with DESeq2".

I'm getting a consistent error with DESeqDataSetFromTximport:

> dds <- DESeqDataSetFromTximport(txi.kallisto, samples, ~condition)
Error in validObject(.Object) :
  invalid class “SummarizedExperiment” object: 1: invalid object for slot "NAMES" in class "SummarizedExperiment": got class "array", should be or extend class "character_OR_NULL"
invalid class “SummarizedExperiment” object: 2: 'names(x)' must be NULL or a character vector with no attributes

The same error occurs if you just take counts:

counts <- as.matrix(txi.kallisto$counts)

and then use DESeqDataSetFromMatrix.

Gone through this systematically and found that there's an issue with the rownames of the counts table within the tximport list, it looks like it is setting them as arrays?

> class(rownames(counts))
[1] "array"

Which then if I set them as a character it seems to accept the matrix (if I round the counts as in the fromTximport function). I can now fix the txi list by doing this:

> rownames(txi.kallisto$counts) <- as.character(rownames(txi.kallisto$counts))

> dds <- DESeqDataSetFromTximport(txi.kallisto, samples, ~condition)
using counts and average transcript lengths from tximport

This now appears to work. I think that this must just be a slight bug from one part of the workflow?

Session info below, I updated to the dev version of DESeq2 to see if that fixed it but the same error was in the main release.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.17.2              SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2        
 [5] Biobase_2.36.2             GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
 [9] S4Vectors_0.14.3           BiocGenerics_0.22.0        tximport_1.4.0            

loaded via a namespace (and not attached):
 [1] genefilter_1.58.1       locfit_1.5-9.1          rhdf5_2.20.0            splines_3.4.0           lattice_0.20-35        
 [6] colorspace_1.3-2        htmltools_0.3.6         base64enc_0.1-3         survival_2.41-3         XML_3.98-1.7           
[11] rlang_0.1.1             DBI_0.6-1               foreign_0.8-68          BiocParallel_1.10.1     RColorBrewer_1.1-2     
[16] GenomeInfoDbData_0.99.0 plyr_1.8.4              stringr_1.2.0           zlibbioc_1.22.0         munsell_0.4.3          
[21] gtable_0.2.0            htmlwidgets_0.8         memoise_1.1.0           latticeExtra_0.6-28     knitr_1.16             
[26] geneplotter_1.54.0      AnnotationDbi_1.38.1    htmlTable_1.9           Rcpp_0.12.11            acepack_1.4.1          
[31] xtable_1.8-2            scales_0.4.1            backports_1.1.0         checkmate_1.8.2         Hmisc_4.0-3            
[36] annotate_1.54.0         XVector_0.16.0          gridExtra_2.2.1         ggplot2_2.2.1           digest_0.6.12          
[41] stringi_1.1.5           grid_3.4.0              tools_3.4.0             bitops_1.0-6            magrittr_1.5           
[46] RSQLite_1.1-2           lazyeval_0.2.0          RCurl_1.95-4.8          tibble_1.3.3            Formula_1.2-1          
[51] cluster_2.0.6           Matrix_1.2-10           data.table_1.10.4       rpart_4.1-11            nnet_7.3-12            
[56] compiler_3.4.0        

 

 

 

ADD COMMENTlink modified 6 weeks ago by Ming Wang0 • written 5 months ago by bh12930
0
gravatar for Michael Love
5 months ago by
Michael Love14k
United States
Michael Love14k wrote:
What do you get with biocValid()? It's a function from the BiocInstaller package. It looks like you have some kind of hybrid installation?
ADD COMMENTlink written 5 months ago by Michael Love14k
> biocValid()

* sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8  
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C               
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocInstaller_1.26.0       openxlsx_4.0.17            readxl_1.0.0               mygene_1.12.0           
[5] GenomicFeatures_1.28.3     AnnotationDbi_1.38.1       RColorBrewer_1.1-2         pheatmap_1.0.8          
[9] DESeq2_1.17.2              SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2      
[13] Biobase_2.36.2             GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2          
[17] S4Vectors_0.14.3           BiocGenerics_0.22.0        tximport_1.4.0            

loaded via a namespace (and not attached):
[1] httr_1.2.1               jsonlite_1.5             splines_3.4.0            gsubfn_0.6-6          
[5] Formula_1.2-1            latticeExtra_0.6-28      cellranger_1.1.0         GenomeInfoDbData_0.99.0
[9] Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.1.0          lattice_0.20-35       
[13] chron_2.3-50             digest_0.6.12            XVector_0.16.0           checkmate_1.8.2       
[17] colorspace_1.3-2         htmltools_0.3.6          Matrix_1.2-10            plyr_1.8.4            
[21] XML_3.98-1.7             biomaRt_2.32.1           genefilter_1.58.1        zlibbioc_1.22.0       
[25] xtable_1.8-2             scales_0.4.1             BiocParallel_1.10.1      htmlTable_1.9         
[29] tibble_1.3.3             annotate_1.54.0          ggplot2_2.2.1            sqldf_0.4-10          
[33] nnet_7.3-12              lazyeval_0.2.0           proto_1.0.0              survival_2.41-3       
[37] magrittr_1.5             memoise_1.1.0            foreign_0.8-68           tools_3.4.0           
[41] data.table_1.10.4        stringr_1.2.0            munsell_0.4.3            locfit_1.5-9.1        
[45] cluster_2.0.6            Biostrings_2.44.1        compiler_3.4.0           rlang_0.1.1           
[49] rhdf5_2.20.0             grid_3.4.0               RCurl_1.95-4.8           htmlwidgets_0.8       
[53] bitops_1.0-6             base64enc_0.1-3          labeling_0.3             gtable_0.2.0          
[57] curl_2.6                 DBI_0.6-1                R6_2.2.1                 GenomicAlignments_1.12.1
[61] gridExtra_2.2.1          knitr_1.16               rtracklayer_1.36.3       Hmisc_4.0-3           
[65] stringi_1.1.5            Rcpp_0.12.11             geneplotter_1.54.0       rpart_4.1-11          
[69] acepack_1.4.1           

* Out-of-date packages
      Package LibPath                                       Installed Built   ReposVer Repository                          
ggsci "ggsci" "/home/ben/R/x86_64-pc-linux-gnu-library/3.4" "2.4"     "3.4.0" "2.7"    "https://cran.rstudio.com/src/contrib"
rJava "rJava" "/usr/lib/R/site-library"                     "0.9-6"   "3.0.2" "0.9-8"  "https://cran.rstudio.com/src/contrib"

update with biocLite()

* Packages too new for Bioconductor version '3.5'

       Version  LibPath                                    
DESeq2 "1.17.2" "/home/ben/R/x86_64-pc-linux-gnu-library/3.4"

downgrade with biocLite("DESeq2")

Error: 2 package(s) out of date; 1 package(s) too new
ADD REPLYlink written 5 months ago by bh12930

So the first step would be to downgrade DESeq2, so that biocValid() is happy. Then report back if you still have a problem. Mixing different versions of Bioconductor packages is a recipe for errors. Only using biocLite() for installation is the solution.

ADD REPLYlink written 5 months ago by Michael Love14k
0
gravatar for Ming Wang
6 weeks ago by
Ming Wang0
United States
Ming Wang0 wrote:

I also found this problem when I import abundance.h5 of kallisto using tximport (1.4.0). Change the class of target_id from array to character in function read_kallisto_h5 [line 42 of helper.R] can fix my problem.

target_id = as.character(ids)

# code contributed from Andrew Morgan
read_kallisto_h5 <- function(fpath, ...) {
  if (!requireNamespace("rhdf5", quietly=TRUE)) {
    stop("reading kallisto results from hdf5 files requires Bioconductor package `rhdf5`")
  }
  counts <- rhdf5::h5read(fpath, "est_counts")
  ids <- rhdf5::h5read(fpath, "aux/ids")
  efflens <- rhdf5::h5read(fpath, "aux/eff_lengths")

  stopifnot(length(counts) == length(ids)) 
  stopifnot(length(efflens) == length(ids))

  result <- data.frame(target_id = as.character(ids),
                       eff_length = efflens,
                       est_counts = as.numeric(counts),
                       stringsAsFactors = FALSE)
  normfac <- with(result, (1e6)/sum(est_counts/eff_length))
  result$tpm <- with(result, normfac*(est_counts/eff_length))
  return(result)
}

I also checked the difference of txi objects from tsv and h5. Notice [1:178136(1d)] in txi.kallisto.

library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")
samples <- read.table(file.path(dir,"samples.txt"), header=TRUE)
samples$condition <- factor(rep(c("A","B"),each=3))
rownames(samples) <- samples$run
samples[,c("pop","center","run","condition")]
tx2gene <- read.csv(file.path(dir, "tx2gene.csv"))

files <- file.path(dir, "kallisto_boot", samples$run, "abundance.h5")
names(files) <- paste0("sample", 1:6)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE, dropInfReps = TRUE)
#rownames(txi.kallisto$counts) <- as.character(rownames(txi.kallisto$counts))
head(txi.kallisto$counts)

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")
names(files) <- paste0("sample", 1:6)
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)
head(txi.kallisto.tsv$counts)
> str(txi.kallisto)
List of 4
 $ abundance          : num [1:178136, 1:6] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:178136(1d)] "ENST00000448914.1" "ENST00000631435.1" "ENST00000632684.1" "ENST00000434970.2" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ counts             : num [1:178136, 1:6] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:178136(1d)] "ENST00000448914.1" "ENST00000631435.1" "ENST00000632684.1" "ENST00000434970.2" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ length             : num [1:178136, 1:6] 7.75 6.75 6.75 5.67 4.67 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:178136(1d)] "ENST00000448914.1" "ENST00000631435.1" "ENST00000632684.1" "ENST00000434970.2" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ countsFromAbundance: chr "no"

> str(txi.kallisto.tsv)
List of 4
 $ abundance          : num [1:25343, 1:6] 3.8408 2.7673 0.0798 0.3727 0.0367 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:25343] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ counts             : num [1:25343, 1:6] 108.6 86.2 9 24 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:25343] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ length             : num [1:25343, 1:6] 1939 2136 7737 4417 1869 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:25343] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
  .. ..$ : chr [1:6] "sample1" "sample2" "sample3" "sample4" ...
 $ countsFromAbundance: chr "no"
ADD COMMENTlink written 6 weeks ago by Ming Wang0

Thanks for your report and fix. I just added this fix to the development branch (tximport 1.5.1), which will be released in a few weeks.

ADD REPLYlink written 6 weeks ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 150 users visited in the last hour