TCGAbiolinks Error in gene.location$ensembl_gene_id : $ operator is invalid for atomic vectors
2
0
Entering edit mode
tangming2005 ▴ 200
@tangming2005-6754
Last seen 11 weeks ago
United States

Hi,

My code:

library(TCGAbiolinks)

query_rna_LUSC.hg38 <- GDCquery(project = "TCGA-LUSC", data.category = "Transcriptome Profiling",

                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

GDCdownload(query_rna_LUSC.hg38, method = "client")

LUSC_rna_data <- GDCprepare(query_rna_LUSC.hg38)

LUSC_rna_data <- GDCprepare(query_rna_LUSC.hg38)
|================================================================================| 100%    1 MB       |  10%
|================================================================================| 100%    1 MB       |  52%
  |===================================================================================================| 100%
Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature11404
Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Error in gene.location$ensembl_gene_id : 
  $ operator is invalid for atomic vectors

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SummarizedExperiment_1.4.0 Biobase_2.34.0             GenomicRanges_1.26.1      
[4] GenomeInfoDb_1.10.1        IRanges_2.8.1              S4Vectors_0.12.1          
[7] BiocGenerics_0.20.0        TCGAbiolinks_2.2.8        

loaded via a namespace (and not attached):
  [1] circlize_0.3.9              fastmatch_1.0-4             aroma.light_3.4.0          
  [4] plyr_1.8.4                  igraph_1.0.1                ConsensusClusterPlus_1.38.0
  [7] lazyeval_0.2.0              splines_3.3.1               BiocParallel_1.8.0         
 [10] pathview_1.14.0             ggplot2_2.2.1               digest_0.6.10              
 [13] foreach_1.4.3               BiocInstaller_1.24.0        GOSemSim_2.0.0             
 [16] GO.db_3.4.0                 gdata_2.17.0                magrittr_1.5               
 [19] cluster_2.0.5               doParallel_1.0.10           limma_3.30.0               
 [22] ComplexHeatmap_1.13.1       Biostrings_2.42.1           readr_1.0.0                
 [25] annotate_1.52.0             matrixStats_0.51.0          R.utils_2.4.0              
 [28] colorspace_1.2-7            rvest_0.3.2                 ggrepel_0.6.5              
 [31] dplyr_0.5.0                 RCurl_1.95-4.8              jsonlite_1.1               
 [34] hexbin_1.27.1               graph_1.52.0                genefilter_1.56.0          
 [37] supraHex_1.12.0             survival_2.39-5             iterators_1.0.8            
 [40] ape_3.5                     survminer_0.2.2             gtable_0.2.0               
 [43] zlibbioc_1.20.0             XVector_0.14.0              GetoptLong_0.1.5           
 [46] kernlab_0.9-25              Rgraphviz_2.18.0            shape_1.4.2                
 [49] prabclus_2.2-6              DEoptimR_1.0-6              scales_0.4.1               
 [52] DOSE_3.0.4                  DESeq_1.26.0                mvtnorm_1.0-5              
 [55] DBI_0.5-1                   edgeR_3.16.0                ggthemes_3.2.0             
 [58] Rcpp_0.12.8                 xtable_1.8-2                matlab_1.0.2               
 [61] mclust_5.2                  preprocessCore_1.36.0       httr_1.2.1                 
 [64] fgsea_1.0.0                 gplots_3.0.1                RColorBrewer_1.1-2         
 [67] fpc_2.1-10                  modeltools_0.2-21           XML_3.98-1.5               
 [70] R.methodsS3_1.7.1           flexmix_2.3-13              nnet_7.3-12                
 [73] locfit_1.5-9.1              reshape2_1.4.2              AnnotationDbi_1.36.0       
 [76] munsell_0.4.3               tools_3.3.1                 downloader_0.4             
 [79] RSQLite_1.0.0               stringr_1.1.0               knitr_1.14                 
 [82] robustbase_0.92-6           caTools_1.17.1              KEGGREST_1.14.0            
 [85] dendextend_1.3.0            EDASeq_2.8.0                nlme_3.1-128               
 [88] c3net_1.1.1                 whisker_0.3-2               R.oo_1.20.0                
 [91] KEGGgraph_1.32.0            DO.db_2.9                   xml2_1.0.0                 
 [94] biomaRt_2.30.0              curl_2.1                    png_0.1-7                  
 [97] affyio_1.44.0               minet_3.32.0                tibble_1.2                 
[100] geneplotter_1.52.0          stringi_1.1.2               GenomicFeatures_1.26.0     
[103] lattice_0.20-34             trimcluster_0.1-2           Matrix_1.2-7.1             
[106] GlobalOptions_0.0.10        data.table_1.10.0           bitops_1.0-6               
[109] parmigene_1.0.2             dnet_1.0.9                  rtracklayer_1.34.1         
[112] qvalue_2.6.0                R6_2.2.0                    latticeExtra_0.6-28        
[115] affy_1.52.0                 hwriter_1.3.2               ShortRead_1.32.0           
[118] KernSmooth_2.23-15          gridExtra_2.2.1             codetools_0.2-15           
[121] MASS_7.3-45                 gtools_3.5.0                assertthat_0.1             
[124] rjson_0.2.15                GenomicAlignments_1.10.0    Rsamtools_1.26.1           
[127] diptest_0.75-7              clusterProfiler_3.2.4       grid_3.3.1                 
[130] tidyr_0.6.0                 class_7.3-14 
tcgabiolinks • 3.5k views
ADD COMMENT
0
Entering edit mode

We use biomaRt to download the last version of the genome.

I have the same error using the following code: 

biomaRt::listMarts(host="www.ensembl.org")
ADD REPLY
0
Entering edit mode

Just met the same error when using GDCprepare, which was working well two weeks ago.....

ADD REPLY
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 9 hours ago
EMBL Heidelberg

The main Ensembl web service was down for maintenance yesterday, and it looks like the BioMart interface is not back up and running at the moment.  I currently get a 500 error when I browse to http://www.ensembl.org/biomart/martservice

I updated the developmental version of biomaRt ( > v. 2.31.8) to provide a more helpful error message when listMarts() doesn't find what it expects e.g.

> listMarts(host="www.ensembl.org")
Error in listMarts(host = "www.ensembl.org") : 
  Unexpected format to the list of available marts.
Please check the following URL manually, and try ?listMarts for advice.
http://www.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt

I looks like the Ensembl mirrors are still working, so maybe TCGAbiolinks can be supplied an argument to use one of those for the moment?

ADD COMMENT
0
Entering edit mode

Thank you very much for doing it. @tiagochst  how about update `TCGAbiolinks` to use mirrors?

 

ADD REPLY
1
Entering edit mode

I commited the fix yesterday.

https://github.com/BioinformaticsFMRP/TCGAbiolinks/commit/39265220f7cfeb677c3133f13214ffc4abf6df91

 

It was working with the last development version of biomaRt (2.31.7). You can install it with

devtools::install_github("Bioconductor-mirror/biomaRt")

devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")
ADD REPLY
1
Entering edit mode

Thanks for adding the patch to try the mirror site.  Unfortunately, I think it still won't currently work (due to an issue we discovered in C: biomaRt does not return entrezgene id)  as Ensembl will redirect you to your nearest mirror regardless of which host you specify.

I have added an extra argument to useMart() that overrides this behaviour, and so you need to include that as well.  It's available in biomaRt ver. > 2.31.6

ensembl = useMart("ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   host = "uswest.ensembl.org",
                   ensemblRedirect = FALSE)

Alternatively you can use the useEnsembl() function, where I made specifying the mirror argument turn the redirect off too.

ensembl2 = useEnsembl("ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "uswest")
ADD REPLY
0
Entering edit mode

 

Version 2.31.9 gave me this:

> useEnsembl("ensembl", 
+            dataset = "hsapiens_gene_ensembl", 
+            mirror = "uswest")
Error in listMarts(host = host, path = path, port = port, includeHosts = TRUE,  : 
  Unexpected format to the list of available marts.
Please check the following URL manually, and try ?listMarts for advice.
http://www.ensembl.org.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt

Also this last link is unavailable.

The first function worked with  ensemblRedirect = FALSE or ensemblRedirect = TRUE, the entrezgene is correct in both outputs.

ADD REPLY
0
Entering edit mode

I caught that error myself when I was writing the message.  Version 2.31.10 in the SVN should have patched it already.


The entrezgene results should be the same now, as the mirrors are back in sync, but them being out of sync for a few days allowed me to find this bug in biomaRt, as I was unable to reproduce the same erroneous results as users in America were seeing.

Ensembl are clearly still having some issues, so it's a little hard to demonstrate using the main site here in Europe, but if I run biomaRt on a server in Boston, USA I can see that it is redirected to the useast mirror no matter what ensembl host I specify.

ensembl_uswest = useMart( "ensembl", host = "uswest.ensembl.org" )
ensembl_asia = useMart( "ensembl", host = "asia.ensembl.org" )
> biomaRt:::martHost(ensembl_uswest)
[1] "http://useast.ensembl.org:80/biomart/martservice"
> biomaRt:::martHost(ensembl_asia)
[1] "http://useast.ensembl.org:80/biomart/martservice"

This doesn't happen if you use the ensemblRedirect = FALSE option:

> ensembl_asia = useMart( "ensembl", host = "asia.ensembl.org", 
                           ensemblRedirect = FALSE )
> biomaRt:::martHost(ensembl_asia)
[1] "http://asia.ensembl.org:80/biomart/martservice?redirect=no"
ADD REPLY
0
Entering edit mode

Please, did you propagate this function to release or it will be only in the devel?

ADD REPLY
0
Entering edit mode

The next release of Bioconductor is scheduled for two weeks time (April 25th).  Changes to the existing release versions of the packages are now closed, and then on that date the current devel version will become a new release (which should be biomaRt v2.32.0).  All the changes I've made will be present in that version., but for now you have to use the devel version to get access to them.

ADD REPLY
0
Entering edit mode

thanks,

there is another error showed up.

 

> LUAD_rna_data <- GDCprepare(query_rna_LUAD.hg38)
|================================================================================| 100%    1 MB                 |  18%
|================================================================================| 100%    1 MB                 |  23%
|================================================================================| 100%    1 MB                 |  33%
|================================================================================| 100%    1 MB                 |  37%
|================================================================================| 100%    1 MB 
|================================================================================| 100%    1 MB                 |  49%
|================================================================================| 100%    1 MB                 |  52%
|================================================================================| 100%    1 MB                 |  53%
  |=============================================================================================================| 100%
Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature13385
Entity 'copy' not defined
Error: 1: Entity 'copy' not defined

ADD REPLY
1
Entering edit mode

Ensembl BioMart is still very unstable.  This is the same error people were getting yesterday, which is basically because your query gets redirected to a 'Down for maintainance' page and biomaRt didn't handle it very helpfully.  You should get a more informative error message with the devel version of biomaRt (see biomaRt error when connecting to host)

ADD REPLY
0
Entering edit mode

Now it is randomly use genome annotations...

it is human data set, but using elephant genes...

 

> LUAD_rna_data <- GDCprepare(query_rna_LUAD.hg38)
|================================================================================| 100%    1 MB                 |  33%
|================================================================================| 100%    1 MB                 |  54%
  |=============================================================================================================| 100%
Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature13385
Downloading genome information. Using: Elephant genes (Loxafr3.0)
From the 60488 genes we couldn't map 3453

ADD REPLY
0
Entering edit mode

Please, could you check if your TCGAbiolinks version is equal to 2.3.22 and biomart to 2.31.10? The output you have is not the last version.

ADD REPLY
0
Entering edit mode

installing from github gave me this error. thanks for looking into it

> devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks")
Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@master
from URL https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/zipball/master
Installing TCGAbiolinks
trying URL 'https://cran.rstudio.com/src/contrib/survminer_0.3.1.tar.gz'
Content type 'application/x-gzip' length 3452247 bytes (3.3 MB)
==================================================
downloaded 3.3 MB

Installing survminer
trying URL 'https://cran.rstudio.com/src/contrib/broom_0.4.2.tar.gz'
Content type 'application/x-gzip' length 1388242 bytes (1.3 MB)
==================================================
downloaded 1.3 MB

Installing broom
trying URL 'https://cran.rstudio.com/src/contrib/psych_1.7.3.21.tar.gz'
downloaded 0 bytes

Error in download.file(url, destfile, method, mode = "wb", ...) :
  cannot download all files
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
  URL 'https://cran.rstudio.com/src/contrib/psych_1.7.3.21.tar.gz': status was 'SSL connect error'
Warning in download.packages(x$name, destdir = dest_dir, repos = x$repos,  :
  download of package 'psych' failed
Error in download.packages(x$name, destdir = dest_dir, repos = x$repos,  :
  subscript out of bounds

ADD REPLY
0
Entering edit mode

And does install.packages("psych") work?

ADD REPLY
0
Entering edit mode

yes, that worked if I install psych first.

ADD REPLY
0
Entering edit mode

using 2.3.22 gave me this

 

> LUSC_rna_data <- GDCprepare(query_rna_LUSC.hg38)
|========================================================================================================| 100%    1 MB
|========================================================================================================| 100%    1 MB
|========================================================================================================| 100%    1 MB
  |=============================================================================================================| 100%
Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature11404
Downloading genome information (try:0) Using: Flycatcher genes (FicAlb_1.4)
From the 60488 genes we couldn't map 3453

 

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.3.22    ensembldb_1.6.2        GenomicFeatures_1.26.2 AnnotationDbi_1.36.2   Biobase_2.34.0       
[6] GenomicRanges_1.26.4   GenomeInfoDb_1.10.3    IRanges_2.8.2          S4Vectors_0.12.2       BiocGenerics_0.20.0   

    

ADD REPLY
0
Entering edit mode

I also got this bug when I used biomart version < 2.31.10. The 2.31.10 seems to be working correctly. Could you check your biomart version, please?

ADD REPLY
0
Entering edit mode
genomics • 0
@genomics-10970
Last seen 7.7 years ago

This error is related to biomaRt and is unresolved for so many months now. Not sure if this is a chronic bug! But biomaRt keeps bailing out often at times of utmost necessity. I am experiencing this for the past two days. :( Here is what I get!

> listMarts(host="www.ensembl.org")
Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing

Please update if we have any workaround or fix for this situation!
-- Venkatesh Chellappa
ADD COMMENT

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6