(For people who met similar issue and seek resolution, you could directly just check the end of this post.)
Hi, I was trying to do a RNA-seq analysis following the DESeq2 workflow.
I did that not long time ago, without any big issue, now I shift my server, somehow the previous pipeline no longer work.
Specifically, when I try to use "GenomicFeatures" to get read counts, I met an error:
library("GenomicFeatures") (txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character())) # Import genomic features from the file as a GRanges object ... Error in parseURI("") : cannot parse URI
I could perform this step on my Mac, ( the reason I did not use Mac to do all the counting? I tried, but the read counting was not finished after 72hrs, so I assumed the counting was too heavy, shift my effort to do the counting on the cluster).
but Mac could import genomic features make me reason that something is missing in my cluster system.
Error in parseURI("") : cannot parse URI (rtracklayer package)seemed to have similar issue, and the issue solved by updating libxml2 to 2.9.1.
I tried, but it did not work out for me.
I downloaded "libxml2-2-2.9.1-2.1.noarch.rpm" (I am using CentOS5)
rpm2cpio libxml2-2-2.9.1-2.1.noarch.rpm|cpio -i -d
export LD_LIBRARY_PATH=/host/myusername/Programme/curl-7.52.1/lib/:/host/myusername/Programme/libxml2-2-2-9-1/usr/lib/ echo $LD_LIBRARY_PATH # check the lib path setting /host/myusername/Programme/curl-7.52.1/lib/:/host/myusername/Programme/libxml2-2-2-9-1/usr/lib/ source .bashrc
still met the same issue when using the GenomicFeature packages.
Anyway since I could use the package on my Mac, I guess the problem is due to the fact something is missing in my system.
I tried the workflow of DESeq2 using the example files in the "airway" package,
having the same issue:
> library("airway") ... > library("GenomicFeatures") > gtffile <- file.path(dir,"Homo_sapiens.GRCh37.75_subset.gtf") > gtffile  "/host/somewhere/Jun/Programme/R-3.3.1/lib64/R/library/airway/extdata/Homo_sapiens.GRCh37.75_subset.gtf" > txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()) Import genomic features from the file as a GRanges object ... Error in parseURI("") : cannot parse URI
On my Mac, things look Ok:
> library("GenomicFeatures") > gtffile <- file.path(dir,"Homo_sapiens.GRCh37.75_subset.gtf") > txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()) Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK
it seemed that R-3.3.* requires a more current "libxml2.so.2" to work properly,
when I used R-3.2.5, the "libxml2.so.2" in my system seemed to work.
> library("XML") > XML::parseURI("") $scheme  "" $authority  "" $server  "" $user  "" $path  "" $query  "" $fragment  "" $port  NA attr(,"class")  "URI" > path = unclass(getLoadedDLLs()[["XML"]])$path > path  "/host/somewhere/Jun/Programme/R-3.2.5/lib64/R/library/XML/libs/XML.so" > system2("ldd",args=path) linux-vdso.so.1 => (0x00007fffe67ff000) libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00002b59e8cc5000)
Any suggestion? TKS!
> sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8  LC_PAPER=en_US.UTF-8 LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  parallel stats4 stats graphics grDevices utils datasets  methods base other attached packages:  GenomicFeatures_1.24.5 AnnotationDbi_1.34.4  Rsamtools_1.24.0 Biostrings_2.40.2  XVector_0.12.1 DESeq2_1.12.4  SummarizedExperiment_1.2.3 Biobase_2.32.0  GenomicRanges_1.24.3 GenomeInfoDb_1.8.7  IRanges_2.6.1 S4Vectors_0.10.3  BiocGenerics_0.18.0 loaded via a namespace (and not attached):  genefilter_1.54.2 locfit_1.5-9.1 splines_3.3.1  lattice_0.20-34 colorspace_1.3-2 htmltools_0.3.5  rtracklayer_1.32.2 base64enc_0.1-3 survival_2.40-1  XML_3.98-1.5 foreign_0.8-67 DBI_0.5-1  BiocParallel_1.6.6 RColorBrewer_1.1-2 plyr_1.8.4  stringr_1.2.0 zlibbioc_1.18.0 munsell_0.4.3  gtable_0.2.0 htmlwidgets_0.8 memoise_1.0.0  latticeExtra_0.6-28 knitr_1.15.1 biomaRt_2.28.0  geneplotter_1.50.0 htmlTable_1.9 Rcpp_0.12.9  acepack_1.4.1 xtable_1.8-2 backports_1.0.5  scales_0.4.1 checkmate_1.8.2 Hmisc_4.0-2  annotate_1.50.1 gridExtra_2.2.1 ggplot2_2.2.1  digest_0.6.12 stringi_1.1.2 grid_3.3.1  tools_3.3.1 bitops_1.0-6 magrittr_1.5  lazyeval_0.2.0 RCurl_1.95-4.8 tibble_1.2  RSQLite_1.1-2 Formula_1.2-1 cluster_2.0.5  Matrix_1.2-8 data.table_1.10.4 assertthat_0.1  rpart_4.1-10 GenomicAlignments_1.8.4 nnet_7.3-12
For people who has similar issue:
in your R session > XML::parseURI("")
if you see:
> Error in XML::parseURI("") : cannot parse URI
You are having the same issue as mine: R (maybe R > 3.3.*) required a more current "libxml2" to install "XML",
the administrator is unlikely to update the system "libxml2" for you, because a lot of programs are compiled with it, new library might behave different in parsing. So a feasible approach is to compile a more current version of "libxml2" in your own directory.
Then here is what you would do:
Download the recent libxml2 http://www.linuxfromscratch.org/blfs/view/svn/general/libxml2.html,
following the instruction of the link above.
Two points for installation:
./configure --prefix=$HOME/Programme --disable-static --with-history --with-zlib=/somewhere/Jun/Programme/zlib-1.2.11 && make
you have to specify zlib (--with), also means that you have to have zlib installed. I had to add "--with" to make the configure work.
the instruction mentioned that you have to call " make install" as root user, no, you do not have to.
3. set the PATH for your installed "libxml2".
4. reinstall your R packages.
This is largely based on the suggestions from Mike and Martin, for details you could check their answers and my replies.