Question: Failure to load cufflinks data into cummeRbund
gravatar for rhart
14 months ago by
rhart0 wrote:

I'm suddenly unable to load cufflinks data into cummerbund.  I get this output from the readCufflinks() function:

Creating database I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/cuffData.db
Reading Run Info File I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/
Writing runInfo Table
Reading Read Group Info  I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/
Writing replicates Table
Reading Var Model Info  I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/
Writing varModel Table
Reading I:/EpiCenter_ISI/Hart/coga/better/Hs/cuffHs/genes.fpkm_tracking
Checking samples table...
Populating samples table...
Error: Column name mismatch.
In addition: There were 50 or more warnings (use warnings() to see the first 50)


And then the warnings are all like this:

Warning messages:
1: In rsqlite_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
2: In rsqlite_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries

Here the traceback into:

> traceback()
8: stop("Column name mismatch.", call. = FALSE)
7: match_col(value, col_names)
6: .local(conn, name, value, ...)
5: dbWriteTable(dbConn, "samples", samples, row.names = F, append = T)
4: dbWriteTable(dbConn, "samples", samples, row.names = F, append = T)
3: populateSampleTable(samples, dbConn)
2: loadGenes(geneFPKM, geneDiff, promoterFile, countFile = geneCount, 
       replicateFile = geneRep, dbConn)
1: readCufflinks(rebuild = T)

And my session info (after getting this error, I just used biocLite to upgrade cummeRbund):

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cummeRbund_2.18.0    Gviz_1.20.0          rtracklayer_1.36.4   GenomicRanges_1.28.4 GenomeInfoDb_1.12.2  IRanges_2.10.2       S4Vectors_0.14.3    
 [8] fastcluster_1.1.22   reshape2_1.4.2       ggplot2_2.2.1        RSQLite_2.0          BiocGenerics_0.22.0 

loaded via a namespace (and not attached):
 [1] httr_1.3.0                    Biobase_2.36.2                AnnotationHub_2.8.2           bit64_0.9-7                   splines_3.4.1                
 [6] shiny_1.0.4                   Formula_1.2-2                 interactiveDisplayBase_1.14.0 latticeExtra_0.6-28           blob_1.1.0                   
[11] BSgenome_1.44.0               GenomeInfoDbData_0.99.0       Rsamtools_1.28.0              yaml_2.1.14                   backports_1.1.0              
[16] lattice_0.20-35               biovizBase_1.24.0             digest_0.6.12                 RColorBrewer_1.1-2            XVector_0.16.0               
[21] checkmate_1.8.3               colorspace_1.3-2              httpuv_1.3.5                  htmltools_0.3.6               Matrix_1.2-11                
[26] plyr_1.8.4                    pkgconfig_2.0.1               XML_3.98-1.9                  biomaRt_2.32.1                zlibbioc_1.22.0              
[31] xtable_1.8-2                  scales_0.4.1                  BiocParallel_1.10.1           htmlTable_1.9                 tibble_1.3.3                 
[36] AnnotationFilter_1.0.0        SummarizedExperiment_1.6.3    GenomicFeatures_1.28.4        nnet_7.3-12                   lazyeval_0.2.0               
[41] mime_0.5                      survival_2.41-3               magrittr_1.5                  memoise_1.1.0                 foreign_0.8-69               
[46] BiocInstaller_1.26.0          tools_3.4.1                   data.table_1.10.4             matrixStats_0.52.2            stringr_1.2.0                
[51] munsell_0.4.3                 cluster_2.0.6                 DelayedArray_0.2.7            AnnotationDbi_1.38.2          ensembldb_2.0.4              
[56] Biostrings_2.44.2             compiler_3.4.1                rlang_0.1.2                   RCurl_1.95-4.8                dichromat_2.0-0              
[61] VariantAnnotation_1.22.3      htmlwidgets_0.9               bitops_1.0-6                  base64enc_0.1-3               gtable_0.2.0                 
[66] curl_2.8.1                    DBI_0.7                       R6_2.2.2                      GenomicAlignments_1.12.1      gridExtra_2.2.1              
[71] knitr_1.17                    bit_1.1-12                    Hmisc_4.0-3                   ProtGenerics_1.8.0            stringi_1.1.5                
[76] Rcpp_0.12.12                  rpart_4.1-11                  acepack_1.4.1                


ADD COMMENTlink modified 14 months ago • written 14 months ago by rhart0

Also, here's the header and first line of genes.fpkm_tracking -- the file that seemed to generate the error:

tracking_id    class_code    nearest_ref_id    gene_id    gene_short_name    tss_id    locus    length    coverage    Ctrl0_FPKM    Ctrl0_conf_lo    Ctrl0_conf_hi    Ctrl0_status    Case0_FPKM    Case0_conf_lo    Case0_conf_hi    Case0_status    Ctrl24_FPKM    Ctrl24_conf_lo    Ctrl24_conf_hi    Ctrl24_status    Case24_FPKM    Case24_conf_lo    Case24_conf_hi    Case24_status
A1BG    -    -    A1BG    A1BG    TSS7852    chr19:58346805-58362848    -    -    1.08063    0    2.83236    OK    1.28488    0    3.27684    OK    2.83622    0    6.7891    OK    3.59967    0.396499    6.80284    OK


ADD REPLYlink written 14 months ago by rhart0

I've made progress with diagnostics.  The error is in the loadGenes() function as listed in the database-setup.R source file.  First, under the "Handle Samples Names" section, on line 152, make.db.names() is called.  This is deprecated and replaced with dbQuoteIdentifier() . If I use this function, and proceed to the populateSampleTable(samples,dbConn) step on line 162, I generate the same error as in readCufflinks().  It seems that the samples object is not in the right format for this function. 

This seems to indicate that RSQLite has been updated and the existing cummeRbund code is built on older versions.


ADD REPLYlink written 14 months ago by rhart0
gravatar for rhart
14 months ago by
rhart0 wrote:

These changes will solve the problem, all in "database-setup.R" in the source R folder:


Line 227 insert: names(diff)[10]="log2_fold_change"  #fix problem with importing col name

Line 232 replace with: insert_SQL<-"INSERT INTO geneExpDiffData VALUES(:test_id,:sample_1,:sample_2,:status,:value_1,:value_2,:log2_fold_change,:test_stat,:p_value,:q_value,:significant)"

Line 1762 replace with: samples<-data.frame(sample_index=c(1:length(samples)),sample_name=samples)  #correction--name of field is sample_index, not index


Editing this file, then sourcing it (after loading cummeRbund), solved the problem.  There's still lots of warnings about outdated RSQLite calls, but it all works.




ADD COMMENTlink written 14 months ago by rhart0
gravatar for rhart
14 months ago by
rhart0 wrote:

Better: load the development version:


ADD COMMENTlink written 14 months ago by rhart0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 380 users visited in the last hour