I upgraded to R 3.6.1 last week and ever since then have noticed that my previous code to load processed RNA-Seq data with tximport has stopped working. Before I would watch and wait as multiple files were loaded sequentially, but now the following command runs very quickly and exits indicating attempted loading of only the first file:
txi = tximport(paste(inputfolder, inputfiles, sep=""), txIn = FALSE, txOut = FALSE, geneIdCol = "GeneId", abundanceCol = "FPKM", countsCol = "Expectedcount", lengthCol = "EffectiveLength") reading in files with readtsv 1
Although there's no error produced, the resulting txi object is not populated, so it clearly didn't work right. I have confirmed that all the files exist, as expected. Any ideas what I'm doing wrong? Did something change with the new version of R?
The first couple lines of one of my input files is shown below:
GeneId Expectedcount tau TranscriptId(s) Length EffectiveLength TPM FPKM UQ1K RPK Detectcallp95 Detectcallp99 MeanIsoformCoverage A1BG 175 2.02E-06 AK124712:uc002qsf.2,NM_130786:uc002qsd.4 1845.59 1686.03 2.02 1.12 58.3 94.8 D D 73.66
sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 [4] LCNUMERIC=C LCTIME=English_United States.1252
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] gplots3.0.1.1 DESeq21.24.0 SummarizedExperiment1.14.1 DelayedArray0.10.0
[5] BiocParallel1.18.1 matrixStats0.55.0 Biobase2.44.0 GenomicRanges1.36.1
[9] GenomeInfoDb1.20.0 IRanges2.18.2 S4Vectors0.22.1 BiocGenerics0.30.0
[13] tximport1.12.3 limma3.40.6
loaded via a namespace (and not attached):
[1] bit640.9-7 splines3.6.1 gtools3.8.1 Formula1.2-3 assertthat0.2.1
[6] latticeExtra0.6-28 blob1.2.0 GenomeInfoDbData1.2.1 pillar1.4.2 RSQLite2.1.2
[11] backports1.1.4 lattice0.20-38 glue1.3.1 digest0.6.20 RColorBrewer1.1-2
[16] XVector0.24.0 checkmate1.9.4 colorspace1.4-1 htmltools0.3.6 Matrix1.2-17
[21] XML3.98-1.20 pkgconfig2.0.2 genefilter1.66.0 zlibbioc1.30.0 purrr0.3.2
[26] xtable1.8-4 scales1.0.0 gdata2.18.0 htmlTable1.13.1 tibble2.1.3
[31] annotate1.62.0 ggplot23.2.1 nnet7.3-12 lazyeval0.2.2 survival2.44-1.1
[36] magrittr1.5 crayon1.3.4 memoise1.1.0 foreign0.8-72 tools3.6.1
[41] data.table1.12.2 hms0.5.1 stringr1.4.0 locfit1.5-9.1 munsell0.5.0
[46] cluster2.1.0 AnnotationDbi1.46.1 compiler3.6.1 caTools1.17.1.2 rlang0.4.0
[51] grid3.6.1 RCurl1.95-4.12 rstudioapi0.10 htmlwidgets1.3 bitops1.0-6
[56] base64enc0.1-3 gtable0.3.0 DBI1.0.0 R62.4.0 gridExtra2.3
[61] knitr1.24 dplyr0.8.3 bit1.1-14 zeallot0.1.0 Hmisc4.2-0
[66] readr1.3.1 KernSmooth2.23-15 stringi1.4.3 Rcpp1.0.2 vctrs0.2.0
[71] geneplotter1.62.0 rpart4.1-15 acepack1.4.1 tidyselect0.2.5 xfun_0.9
Unfortunately it contains some confidential information, but hopefully this redacted version gives you a sense:
[1] "C:\Users\Penny Lane\Documents\analysis\company\cmpd\clinicalsamples\downsampling\rsem\Paxgene\file1.rsem.genes.results" [2] "C:\Users\Penny Lane\Documents\analysis\company\cmpd\clinicalsamples\downsampling\rsem\Paxgene\file2.rsem.genes.results"
Could it be the space in the username folder? I noticed that messed up the installation of some packages in the new version of R, whereas it wasn't an issue for me before.
Could be. I’m not sure, but tximport is doing some very basic things here: just reading in files in a loop using either base R or readr package.
Well, it's probably not failing because of the whitespace in the folder name at least... I just moved the files to a folder without whitespace and confirmed the problem still exists. Did anything change about the package from Bioconductor 3.8 to 3.9 that could have broken the use of custom column names?
No, no changes to this basic functionality.
Can you try on another machine, and if this works, I’d recommend reinstalling your packages.
Yup, good thinking on that one. It worked on another machine, so I wiped R and reinstalled all my packages 1-by-1, testing the tximport command after each package I installed. Turns out that as soon as I install the GEOquery package, my tximport command starts malfunctioning (as described above). I also installed GEOquery recently, so my issues could be purely due to that (and not the R 3.6.1 upgrade as I had originally suspected).
Any idea why this might be happening and if there's a workaround? I'm fairly invested in using GEOquery. It appears that even if I remove GEOquery and/or reinstall tximport, the damage is already done and there's no returning to a functioning tximport without wiping out all my packages and starting from scratch.
GEOquery installs readr (it is an import) and that changes tximport’s behavior. So it seems like readr is the potentially culprit on your machine (that it exits silently).
You can override by setting importer=read.delim when you run tximport().
Awesome, that did the trick. Thanks so much for the rapid assistance Michael!