Search
Question: TCGAbiolinks Error in gene.location$ensembl_gene_id : $ operator is invalid for atomic vectors
0
gravatar for tangming2005
16 months ago by
tangming200590
United States
tangming200590 wrote:

Hi,

My code:

library(TCGAbiolinks)

query_rna_LUSC.hg38 <- GDCquery(project = "TCGA-LUSC", data.category = "Transcriptome Profiling",

                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

GDCdownload(query_rna_LUSC.hg38, method = "client")

LUSC_rna_data <- GDCprepare(query_rna_LUSC.hg38)

LUSC_rna_data <- GDCprepare(query_rna_LUSC.hg38)
|================================================================================| 100%    1 MB       |  10%
|================================================================================| 100%    1 MB       |  52%
  |===================================================================================================| 100%
Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature11404
Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Error in gene.location$ensembl_gene_id : 
  $ operator is invalid for atomic vectors

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SummarizedExperiment_1.4.0 Biobase_2.34.0             GenomicRanges_1.26.1      
[4] GenomeInfoDb_1.10.1        IRanges_2.8.1              S4Vectors_0.12.1          
[7] BiocGenerics_0.20.0        TCGAbiolinks_2.2.8        

loaded via a namespace (and not attached):
  [1] circlize_0.3.9              fastmatch_1.0-4             aroma.light_3.4.0          
  [4] plyr_1.8.4                  igraph_1.0.1                ConsensusClusterPlus_1.38.0
  [7] lazyeval_0.2.0              splines_3.3.1               BiocParallel_1.8.0         
 [10] pathview_1.14.0             ggplot2_2.2.1               digest_0.6.10              
 [13] foreach_1.4.3               BiocInstaller_1.24.0        GOSemSim_2.0.0             
 [16] GO.db_3.4.0                 gdata_2.17.0                magrittr_1.5               
 [19] cluster_2.0.5               doParallel_1.0.10           limma_3.30.0               
 [22] ComplexHeatmap_1.13.1       Biostrings_2.42.1           readr_1.0.0                
 [25] annotate_1.52.0             matrixStats_0.51.0          R.utils_2.4.0              
 [28] colorspace_1.2-7            rvest_0.3.2                 ggrepel_0.6.5              
 [31] dplyr_0.5.0                 RCurl_1.95-4.8              jsonlite_1.1               
 [34] hexbin_1.27.1               graph_1.52.0                genefilter_1.56.0          
 [37] supraHex_1.12.0             survival_2.39-5             iterators_1.0.8            
 [40] ape_3.5                     survminer_0.2.2             gtable_0.2.0               
 [43] zlibbioc_1.20.0             XVector_0.14.0              GetoptLong_0.1.5           
 [46] kernlab_0.9-25              Rgraphviz_2.18.0            shape_1.4.2                
 [49] prabclus_2.2-6              DEoptimR_1.0-6              scales_0.4.1               
 [52] DOSE_3.0.4                  DESeq_1.26.0                mvtnorm_1.0-5              
 [55] DBI_0.5-1                   edgeR_3.16.0                ggthemes_3.2.0             
 [58] Rcpp_0.12.8                 xtable_1.8-2                matlab_1.0.2               
 [61] mclust_5.2                  preprocessCore_1.36.0       httr_1.2.1                 
 [64] fgsea_1.0.0                 gplots_3.0.1                RColorBrewer_1.1-2         
 [67] fpc_2.1-10                  modeltools_0.2-21           XML_3.98-1.5               
 [70] R.methodsS3_1.7.1           flexmix_2.3-13              nnet_7.3-12                
 [73] locfit_1.5-9.1              reshape2_1.4.2              AnnotationDbi_1.36.0       
 [76] munsell_0.4.3               tools_3.3.1                 downloader_0.4             
 [79] RSQLite_1.0.0               stringr_1.1.0               knitr_1.14                 
 [82] robustbase_0.92-6           caTools_1.17.1              KEGGREST_1.14.0            
 [85] dendextend_1.3.0            EDASeq_2.8.0                nlme_3.1-128               
 [88] c3net_1.1.1                 whisker_0.3-2               R.oo_1.20.0                
 [91] KEGGgraph_1.32.0            DO.db_2.9                   xml2_1.0.0                 
 [94] biomaRt_2.30.0              curl_2.1                    png_0.1-7                  
 [97] affyio_1.44.0               minet_3.32.0                tibble_1.2                 
[100] geneplotter_1.52.0          stringi_1.1.2               GenomicFeatures_1.26.0     
[103] lattice_0.20-34             trimcluster_0.1-2           Matrix_1.2-7.1             
[106] GlobalOptions_0.0.10        data.table_1.10.0           bitops_1.0-6               
[109] parmigene_1.0.2             dnet_1.0.9                  rtracklayer_1.34.1         
[112] qvalue_2.6.0                R6_2.2.0                    latticeExtra_0.6-28        
[115] affy_1.52.0                 hwriter_1.3.2               ShortRead_1.32.0           
[118] KernSmooth_2.23-15          gridExtra_2.2.1             codetools_0.2-15           
[121] MASS_7.3-45                 gtools_3.5.0                assertthat_0.1             
[124] rjson_0.2.15                GenomicAlignments_1.10.0    Rsamtools_1.26.1           
[127] diptest_0.75-7              clusterProfiler_3.2.4       grid_3.3.1                 
[130] tidyr_0.6.0                 class_7.3-14 
ADD COMMENTlink modified 16 months ago by Mike Smith2.8k • written 16 months ago by tangming200590

We use biomaRt to download the last version of the genome.

I have the same error using the following code: 

biomaRt::listMarts(host="www.ensembl.org")
ADD REPLYlink written 16 months ago by tiagochst130

Just met the same error when using GDCprepare, which was working well two weeks ago.....

ADD REPLYlink written 16 months ago by ycchiu8210
1
gravatar for Mike Smith
16 months ago by
Mike Smith2.8k
EMBL Heidelberg / de.NBI
Mike Smith2.8k wrote:

The main Ensembl web service was down for maintenance yesterday, and it looks like the BioMart interface is not back up and running at the moment.  I currently get a 500 error when I browse to http://www.ensembl.org/biomart/martservice

I updated the developmental version of biomaRt ( > v. 2.31.8) to provide a more helpful error message when listMarts() doesn't find what it expects e.g.

> listMarts(host="www.ensembl.org")
Error in listMarts(host = "www.ensembl.org") : 
  Unexpected format to the list of available marts.
Please check the following URL manually, and try ?listMarts for advice.
http://www.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt

I looks like the Ensembl mirrors are still working, so maybe TCGAbiolinks can be supplied an argument to use one of those for the moment?

ADD COMMENTlink written 16 months ago by Mike Smith2.8k

Thank you very much for doing it. @tiagochst  how about update `TCGAbiolinks` to use mirrors?

 

ADD REPLYlink written 16 months ago by tangming200590
1

I commited the fix yesterday.

https://github.com/BioinformaticsFMRP/TCGAbiolinks/commit/39265220f7cfeb677c3133f13214ffc4abf6df91

 

It was working with the last development version of biomaRt (2.31.7). You can install it with

devtools::install_github("Bioconductor-mirror/biomaRt")

devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")
ADD REPLYlink modified 16 months ago • written 16 months ago by tiagochst130
1

Thanks for adding the patch to try the mirror site.  Unfortunately, I think it still won't currently work (due to an issue we discovered in C: biomaRt does not return entrezgene id)  as Ensembl will redirect you to your nearest mirror regardless of which host you specify.

I have added an extra argument to useMart() that overrides this behaviour, and so you need to include that as well.  It's available in biomaRt ver. > 2.31.6

ensembl = useMart("ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   host = "uswest.ensembl.org",
                   ensemblRedirect = FALSE)

Alternatively you can use the useEnsembl() function, where I made specifying the mirror argument turn the redirect off too.

ensembl2 = useEnsembl("ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "uswest")
ADD REPLYlink written 16 months ago by Mike Smith2.8k

 

Version 2.31.9 gave me this:

> useEnsembl("ensembl", 
+            dataset = "hsapiens_gene_ensembl", 
+            mirror = "uswest")
Error in listMarts(host = host, path = path, port = port, includeHosts = TRUE,  : 
  Unexpected format to the list of available marts.
Please check the following URL manually, and try ?listMarts for advice.
http://www.ensembl.org.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt

Also this last link is unavailable.

The first function worked with  ensemblRedirect = FALSE or ensemblRedirect = TRUE, the entrezgene is correct in both outputs.

ADD REPLYlink written 16 months ago by tiagochst130

I caught that error myself when I was writing the message.  Version 2.31.10 in the SVN should have patched it already.


The entrezgene results should be the same now, as the mirrors are back in sync, but them being out of sync for a few days allowed me to find this bug in biomaRt, as I was unable to reproduce the same erroneous results as users in America were seeing.

Ensembl are clearly still having some issues, so it's a little hard to demonstrate using the main site here in Europe, but if I run biomaRt on a server in Boston, USA I can see that it is redirected to the useast mirror no matter what ensembl host I specify.

ensembl_uswest = useMart( "ensembl", host = "uswest.ensembl.org" )
ensembl_asia = useMart( "ensembl", host = "asia.ensembl.org" )
> biomaRt:::martHost(ensembl_uswest)
[1] "http://useast.ensembl.org:80/biomart/martservice"
> biomaRt:::martHost(ensembl_asia)
[1] "http://useast.ensembl.org:80/biomart/martservice"

This doesn't happen if you use the ensemblRedirect = FALSE option:

> ensembl_asia = useMart( "ensembl", host = "asia.ensembl.org", 
                           ensemblRedirect = FALSE )
> biomaRt:::martHost(ensembl_asia)
[1] "http://asia.ensembl.org:80/biomart/martservice?redirect=no"
ADD REPLYlink modified 16 months ago • written 16 months ago by Mike Smith2.8k

Please, did you propagate this function to release or it will be only in the devel?

ADD REPLYlink written 16 months ago by tiagochst130

The next release of Bioconductor is scheduled for two weeks time (April 25th).  Changes to the existing release versions of the packages are now closed, and then on that date the current devel version will become a new release (which should be biomaRt v2.32.0).  All the changes I've made will be present in that version., but for now you have to use the devel version to get access to them.

ADD REPLYlink written 16 months ago by Mike Smith2.8k

thanks,

there is another error showed up.

 

> LUAD_rna_data <- GDCprepare(query_rna_LUAD.hg38)
|================================================================================| 100%    1 MB                 |  18%
|================================================================================| 100%    1 MB                 |  23%
|================================================================================| 100%    1 MB                 |  33%
|================================================================================| 100%    1 MB                 |  37%
|================================================================================| 100%    1 MB 
|================================================================================| 100%    1 MB                 |  49%
|================================================================================| 100%    1 MB                 |  52%
|================================================================================| 100%    1 MB                 |  53%
  |=============================================================================================================| 100%
Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature13385
Entity 'copy' not defined
Error: 1: Entity 'copy' not defined

ADD REPLYlink written 16 months ago by tangming200590
1

Ensembl BioMart is still very unstable.  This is the same error people were getting yesterday, which is basically because your query gets redirected to a 'Down for maintainance' page and biomaRt didn't handle it very helpfully.  You should get a more informative error message with the devel version of biomaRt (see biomaRt error when connecting to host)

ADD REPLYlink modified 16 months ago • written 16 months ago by Mike Smith2.8k

Now it is randomly use genome annotations...

it is human data set, but using elephant genes...

 

> LUAD_rna_data <- GDCprepare(query_rna_LUAD.hg38)
|================================================================================| 100%    1 MB                 |  33%
|================================================================================| 100%    1 MB                 |  54%
  |=============================================================================================================| 100%
Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature13385
Downloading genome information. Using: Elephant genes (Loxafr3.0)
From the 60488 genes we couldn't map 3453

ADD REPLYlink written 16 months ago by tangming200590

Please, could you check if your TCGAbiolinks version is equal to 2.3.22 and biomart to 2.31.10? The output you have is not the last version.

ADD REPLYlink written 16 months ago by tiagochst130

installing from github gave me this error. thanks for looking into it

> devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks")
Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@master
from URL https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/zipball/master
Installing TCGAbiolinks
trying URL 'https://cran.rstudio.com/src/contrib/survminer_0.3.1.tar.gz'
Content type 'application/x-gzip' length 3452247 bytes (3.3 MB)
==================================================
downloaded 3.3 MB

Installing survminer
trying URL 'https://cran.rstudio.com/src/contrib/broom_0.4.2.tar.gz'
Content type 'application/x-gzip' length 1388242 bytes (1.3 MB)
==================================================
downloaded 1.3 MB

Installing broom
trying URL 'https://cran.rstudio.com/src/contrib/psych_1.7.3.21.tar.gz'
downloaded 0 bytes

Error in download.file(url, destfile, method, mode = "wb", ...) :
  cannot download all files
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
  URL 'https://cran.rstudio.com/src/contrib/psych_1.7.3.21.tar.gz': status was 'SSL connect error'
Warning in download.packages(x$name, destdir = dest_dir, repos = x$repos,  :
  download of package 'psych' failed
Error in download.packages(x$name, destdir = dest_dir, repos = x$repos,  :
  subscript out of bounds

ADD REPLYlink written 16 months ago by tangming200590

And does install.packages("psych") work?

ADD REPLYlink written 16 months ago by tiagochst130

yes, that worked if I install psych first.

ADD REPLYlink written 16 months ago by tangming200590

using 2.3.22 gave me this

 

> LUSC_rna_data <- GDCprepare(query_rna_LUSC.hg38)
|========================================================================================================| 100%    1 MB
|========================================================================================================| 100%    1 MB
|========================================================================================================| 100%    1 MB
  |=============================================================================================================| 100%
Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature11404
Downloading genome information (try:0) Using: Flycatcher genes (FicAlb_1.4)
From the 60488 genes we couldn't map 3453

 

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.3.22    ensembldb_1.6.2        GenomicFeatures_1.26.2 AnnotationDbi_1.36.2   Biobase_2.34.0       
[6] GenomicRanges_1.26.4   GenomeInfoDb_1.10.3    IRanges_2.8.2          S4Vectors_0.12.2       BiocGenerics_0.20.0   

    

ADD REPLYlink written 16 months ago by tangming200590

I also got this bug when I used biomart version < 2.31.10. The 2.31.10 seems to be working correctly. Could you check your biomart version, please?

ADD REPLYlink written 16 months ago by tiagochst130
0
gravatar for genomics
16 months ago by
genomics0
genomics0 wrote:

This error is related to biomaRt and is unresolved for so many months now. Not sure if this is a chronic bug! But biomaRt keeps bailing out often at times of utmost necessity. I am experiencing this for the past two days. :( Here is what I get!

> listMarts(host="www.ensembl.org")
Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing

Please update if we have any workaround or fix for this situation!
-- Venkatesh Chellappa
ADD COMMENTlink modified 16 months ago • written 16 months ago by genomics0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 321 users visited in the last hour