TCGAbiolinks GDCprepare fails to connect to Biomart web server
3
0
Entering edit mode
@sabarinathchandrasekharan-11149
Last seen 7.5 years ago

Hi,

 

I am trying to work with TCGAbiolinks, but am having problem with GDCprepare. It fails to connect to BioMart web service. 

 

​queryGBM <- GDCquery(project = "TCGA-GBM",
+                   data.category = "Gene expression",
+                   data.type = "Gene expression quantification",
+                   platform = "Illumina HiSeq", file.type  = "normalized_results",
+                   experimental.strategy = "RNA-Seq",
+                   barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),
+                   legacy = TRUE)
Accessing GDC. This might take a while...
> GDCdownload(queryGBM)
All samples have been already downloded
> data <- GDCprepare(queryGBM)
  |============================================================================================| 100%
Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Error in value[[3L]](cond) : 
  Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.

 

But I am able access the BioMart web service from biomart 

 

>entrez=c("673","837")
> goids = getBM(attributes=c('entrezgene','go_id'), filters='entrezgene', values=entrez, mart=ensembl)
> head(goids)
  entrezgene      go_id
1        673           
2        673 GO:0005737
3        673 GO:0005886
4        673 GO:0005634
5        673 GO:0005829
6        673 GO:0005509

What could be the issue?

 

Thanks and Regards,

Sabari

 

 

tcgabiolinks gdcprepare • 2.6k views
0
Entering edit mode
@tiago-chedraoui-silva-8877
Last seen 3.6 years ago
Brazil - University of São Paulo/ Los A…

Hi,

As you used legacy = TRUE, TCGAbiolinks will access Ensembl75 (hg19/GRCh37) . One of this servers might have been temporarily down.

Ensembl75 can be accessed with one of these codes (source: https://www.biostars.org/p/136775/):

grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

or 

ensembl_75 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

These sources are used in TCGAbiolinks for Ensembl75 (hg19/GRCh37).

https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/0fa5099c1c9d0d1bdab9365146e769af72c7c54e/R/TCGAPrepare.R#L502-L531

ADD COMMENT
0
Entering edit mode

Still no luck.

> grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
> ensembl_75 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
> GDCdownload(queryGBM)
All samples have been already downloded
> GDCprepare(queryGBM)
  |============================================================================================| 100%
Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Error in value[[3L]](cond) : 
  Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.
0
Entering edit mode
@sabarinathchandrasekharan-11149
Last seen 7.5 years ago

GDCprepare throwed a different error when I tried with a different data set

> query <- GDCquery(project = "TCGA-LUSC", data.category = "Gene expression", data.type = "Gene Expression Quantification", platform = "Illumina HiSeq", file.type  = "normalized_results", experimental.strategy = "RNA-Seq", sample.type = c("Primary solid Tumor"),  barcode = c("TCGA-85-8481-01A-11R-2326-07"," TCGA-56-8626-01A-11R-2403-07 "), legacy = TRUE)
Accessing GDC. This might take a while...

> GDCdownload(query)
All samples have been already downloded

> data <- GDCprepare(query)
Error in names(frame)[names(frame) == "x"] <- name : 
  names() applied to a non-vector

> data <- GDCprepare(query)

> data
function (..., list = character(), package = NULL, lib.loc = NULL, 
    verbose = getOption("verbose"), envir = .GlobalEnv) 
{
    fileExt <- function(x) {
        db <- grepl("\\.[^.]+\\.(gz|bz2|xz)$", x)
        ans <- sub(".*\\.", "", x)
        ans[db] <- sub(".*\\.([^.]+\\.)(gz|bz2|xz)$", "\\1\\2", 
            x[db])
        ans
    }
........................
    REST OF THE CODE HERE , REMOVED DUE TO WORD COUNT CONSTRAINT IN POSTING
........................
                  }
                  if (found) 
                    break
                }
                if (verbose) 
                  message(if (!found) 
                    "*NOT* ", "found", domain = NA)
            }
            if (found) 
                break
        }
        if (!found) 
            warning(gettextf("data set %s not found", sQuote(name)), 
                domain = NA)
    }
    invisible(names)
}
<bytecode: 0x0000000012c81af0>
<environment: namespace:utils>
> 

Is this a problem with accessing the API or is there any problem with my query structure?

 

Thanks,

Sabari

 

0
Entering edit mode

Could you send me the sessionInfo() from R ?

Also, it this the last version of the package?

ADD REPLY
0
Entering edit mode

Sorry about the delay: 

Here is the SessionInfo as well

> query_GBM <- GDCquery(project = "TCGA-GBM",
+                   data.category = "Gene expression",
+                   data.type = "Gene expression quantification",
+                   platform = "Illumina HiSeq", file.type  = "normalized_results",
+                   experimental.strategy = "RNA-Seq",
+                   barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),
+                   legacy = TRUE)
Accessing GDC. This might take a while...
> GDCdownload(query_GBM)
Of the 2 files for download 2 already exist.
All samples have been already downloaded
> z <- GDCprepare(query_GBM)
  |============================================================================================| 100%
Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Error in value[[3L]](cond) : 
  Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.
> 
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.0.13

loaded via a namespace (and not attached):
  [1] TH.data_1.0-7                           colorspace_1.2-6                       
  [3] rjson_0.2.15                            hwriter_1.3.2                          
  [5] class_7.3-14                            modeltools_0.2-21                      
  [7] mclust_5.2                              circlize_0.3.9                         
  [9] XVector_0.12.1                          GenomicRanges_1.24.3                   
 [11] GlobalOptions_0.0.10           
 [.]                          
                          
[129] munsell_0.4.3                         

>

0
Entering edit mode
@sabarinathchandrasekharan-11149
Last seen 7.5 years ago

I am still having this problem of GDCprepare not being able to connecto Biomart server, while other programs can. anybody else is facing this problem? Is there any work around?

0
Entering edit mode
Does the code below works?

ADD REPLY
0
Entering edit mode

Sorry this code also does not work.

> hg19 <- get.GRCh.bioMart()
Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
 Show Traceback
 
 Rerun with Debug
 Error in value[[3L]](cond) : 
  Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down. > hg38 <- get.GRCh.bioMart("hg38")
Downloading genome information. Using: Homo sapiens genes (GRCh38.p7)
 Show Traceback
 
 Rerun with Debug
 Error in value[[3L]](cond) : 
  Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down. > 
> # Test 2: default server
> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",
+                    dataset = "hsapiens_gene_ensembl")
> attributes <- c("chromosome_name",
+                 "start_position",
+                 "end_position", "strand",
+                 "ensembl_gene_id", "entrezgene",
+                 "external_gene_id")
> chrom <- c(1:22, "X", "Y")
> gene.location <- getBM(attributes = attributes,
+                        filters = c("chromosome_name"),
+                        values = list(chrom), mart = ensembl)
Error in getBM(attributes = attributes, filters = c("chromosome_name"),  : 
  Invalid attribute(s): external_gene_id 
Please use the function 'listAttributes' to get valid attribute names
> 

 

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6