Failure to load cufflinks data into cummeRbund
Entering edit mode
rhart • 0
Last seen 3.2 years ago

I'm suddenly unable to load cufflinks data into cummerbund.  I get this output from the readCufflinks() function:

Creating database I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/cuffData.db
Reading Run Info File I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/
Writing runInfo Table
Reading Read Group Info  I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/
Writing replicates Table
Reading Var Model Info  I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/
Writing varModel Table
Reading I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/genes.fpkm_tracking
Checking samples table...
Populating samples table...
Error: Column name mismatch.
In addition: There were 50 or more warnings (use warnings() to see the first 50)


And then the warnings are all like this:

Warning messages:
1: In rsqlite_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
2: In rsqlite_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries

Here the traceback into:

> traceback()
8: stop("Column name mismatch.", call. = FALSE)
7: match_col(value, col_names)
6: .local(conn, name, value, ...)
5: dbWriteTable(dbConn, "samples", samples, row.names = F, append = T)
4: dbWriteTable(dbConn, "samples", samples, row.names = F, append = T)
3: populateSampleTable(samples, dbConn)
2: loadGenes(geneFPKM, geneDiff, promoterFile, countFile = geneCount, 
       replicateFile = geneRep, dbConn)
1: readCufflinks(rebuild = T)

And my session info (after getting this error, I just used biocLite to upgrade cummeRbund):

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cummeRbund_2.18.0    Gviz_1.20.0          rtracklayer_1.36.4   GenomicRanges_1.28.4 GenomeInfoDb_1.12.2  IRanges_2.10.2       S4Vectors_0.14.3    
 [8] fastcluster_1.1.22   reshape2_1.4.2       ggplot2_2.2.1        RSQLite_2.0          BiocGenerics_0.22.0 

loaded via a namespace (and not attached):
 [1] httr_1.3.0                    Biobase_2.36.2                AnnotationHub_2.8.2           bit64_0.9-7                   splines_3.4.1                
 [6] shiny_1.0.4                   Formula_1.2-2                 interactiveDisplayBase_1.14.0 latticeExtra_0.6-28           blob_1.1.0                   
[11] BSgenome_1.44.0               GenomeInfoDbData_0.99.0       Rsamtools_1.28.0              yaml_2.1.14                   backports_1.1.0              
[16] lattice_0.20-35               biovizBase_1.24.0             digest_0.6.12                 RColorBrewer_1.1-2            XVector_0.16.0               
[21] checkmate_1.8.3               colorspace_1.3-2              httpuv_1.3.5                  htmltools_0.3.6               Matrix_1.2-11                
[26] plyr_1.8.4                    pkgconfig_2.0.1               XML_3.98-1.9                  biomaRt_2.32.1                zlibbioc_1.22.0              
[31] xtable_1.8-2                  scales_0.4.1                  BiocParallel_1.10.1           htmlTable_1.9                 tibble_1.3.3                 
[36] AnnotationFilter_1.0.0        SummarizedExperiment_1.6.3    GenomicFeatures_1.28.4        nnet_7.3-12                   lazyeval_0.2.0               
[41] mime_0.5                      survival_2.41-3               magrittr_1.5                  memoise_1.1.0                 foreign_0.8-69               
[46] BiocInstaller_1.26.0          tools_3.4.1                   data.table_1.10.4             matrixStats_0.52.2            stringr_1.2.0                
[51] munsell_0.4.3                 cluster_2.0.6                 DelayedArray_0.2.7            AnnotationDbi_1.38.2          ensembldb_2.0.4              
[56] Biostrings_2.44.2             compiler_3.4.1                rlang_0.1.2                   RCurl_1.95-4.8                dichromat_2.0-0              
[61] VariantAnnotation_1.22.3      htmlwidgets_0.9               bitops_1.0-6                  base64enc_0.1-3               gtable_0.2.0                 
[66] curl_2.8.1                    DBI_0.7                       R6_2.2.2                      GenomicAlignments_1.12.1      gridExtra_2.2.1              
[71] knitr_1.17                    bit_1.1-12                    Hmisc_4.0-3                   ProtGenerics_1.8.0            stringi_1.1.5                
[76] Rcpp_0.12.12                  rpart_4.1-11                  acepack_1.4.1                


cummerbund • 1.3k views
Entering edit mode

Also, here's the header and first line of genes.fpkm_tracking -- the file that seemed to generate the error:

tracking_id    class_code    nearest_ref_id    gene_id    gene_short_name    tss_id    locus    length    coverage    Ctrl0_FPKM    Ctrl0_conf_lo    Ctrl0_conf_hi    Ctrl0_status    Case0_FPKM    Case0_conf_lo    Case0_conf_hi    Case0_status    Ctrl24_FPKM    Ctrl24_conf_lo    Ctrl24_conf_hi    Ctrl24_status    Case24_FPKM    Case24_conf_lo    Case24_conf_hi    Case24_status
A1BG    -    -    A1BG    A1BG    TSS7852    chr19:58346805-58362848    -    -    1.08063    0    2.83236    OK    1.28488    0    3.27684    OK    2.83622    0    6.7891    OK    3.59967    0.396499    6.80284    OK


Entering edit mode

I've made progress with diagnostics.  The error is in the loadGenes() function as listed in the database-setup.R source file.  First, under the "Handle Samples Names" section, on line 152, make.db.names() is called.  This is deprecated and replaced with dbQuoteIdentifier() . If I use this function, and proceed to the populateSampleTable(samples,dbConn) step on line 162, I generate the same error as in readCufflinks().  It seems that the samples object is not in the right format for this function. 

This seems to indicate that RSQLite has been updated and the existing cummeRbund code is built on older versions.


Entering edit mode
rhart • 0
Last seen 3.2 years ago

These changes will solve the problem, all in "database-setup.R" in the source R folder:


Line 227 insert: names(diff)[10]="log2_fold_change"  #fix problem with importing col name

Line 232 replace with: insert_SQL<-"INSERT INTO geneExpDiffData VALUES(:test_id,:sample_1,:sample_2,:status,:value_1,:value_2,:log2_fold_change,:test_stat,:p_value,:q_value,:significant)"

Line 1762 replace with: samples<-data.frame(sample_index=c(1:length(samples)),sample_name=samples)  #correction--name of field is sample_index, not index


Editing this file, then sourcing it (after loading cummeRbund), solved the problem.  There's still lots of warnings about outdated RSQLite calls, but it all works.




Entering edit mode
rhart • 0
Last seen 3.2 years ago

Better: load the development version:



Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6