TCGAbiolinks GDCquery downloads a file that reads "message": "internal server error"
2
0
Entering edit mode
jshelton • 0
@jshelton-11836
Last seen 8.1 years ago

Hi,

I am attempting to download data but instead get a file with the extension '.tar.gz'. However, this is not a compressed file. It is a text file with the words:

{
  "message": "internal server error"
}

Below is the code I ran:

samples <- c("TCGA-BA-4074", "TCGA-BA-4075")

interesting.genes <- c("TP53", "PIK3CA", "FAT1")
query.exp <- GDCquery(project = "TCGA-HNSC",

                      legacy = TRUE,

                      data.category = "Gene expression",

                      data.type = "Gene expression quantification",

                      platform = "Illumina HiSeq",

                      file.type = "results",

                      experimental.strategy = "RNA-Seq",

                      barcode = samples)

GDCdownload(query.exp)​

Thanks for your help

 
tcgabiolinks gdc • 3.6k views
ADD COMMENT
0
Entering edit mode

Hi,

I was able to run the code here.Please, what is your session info? And what query.exp$results shows?

Best regards,

Tiago 

My output shows this


> GDCdownload(query.exp)
GDCdownload will download 2 files. A total of 3.031802 MB
Downloading as: Fri_Nov_11_09_35_27_2016.tar.gz
Downloading: 1.2 MB     [1] 1

> GDCprepare(query.exp)
  |=======================================================================================================================================================| 100%
Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature14129
class: RangedSummarizedExperiment 
dim: 20330 2 
metadata(0):
assays(2): raw_count scaled_estimate
rownames(20330): A1BG|1 A1CF|29974 ... ZZEF1|23140 ZZZ3|26009
rowData names(4): gene_id entrezgene ensembl_gene_id transcript_id.transcript_id_TCGA-BA-4075-01A-01R-1436-07
colnames(2): TCGA-BA-4075-01A-01R-1436-07 TCGA-BA-4074-01A-01R-1436-07
colData names(62): sample patient ... subtype_Copy.Number subtype_PARADIGM
> data <- GDCprepare(query.exp)
  |==================================================================| 100%
Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature14129
> query.exp$results
[[1]]
     center.code                  center.name center.short_name                     center.center_id center.namespace center.center_type                      data_type                 updated_datetime
1276          07 University of North Carolina               UNC ee7a85b3-8177-5d60-a10c-51180eb9009c          unc.edu               CGCC Gene expression quantification 2016-09-07T11:17:30.997957-05:00
1425          07 University of North Carolina               UNC ee7a85b3-8177-5d60-a10c-51180eb9009c          unc.edu               CGCC Gene expression quantification 2016-09-07T11:17:30.997957-05:00
                                                                   file_name                           md5sum data_format  acl access       platform state state_comment                              file_id   data_category file_size
1276 unc.edu.85034d8f-c10c-4db2-ade2-f26ea7cf2d95.1507611.rsem.genes.results 654b40396ed647c6ba22c3fbaf963b1b         TXT open   open Illumina HiSeq  live            NA 9bbe732f-4592-4681-91ee-d9e00c88ef1c Gene expression   1508723
1425 unc.edu.78a8e33e-fd10-4dcd-b8fd-aad93db18c45.1484374.rsem.genes.results 2faf952fe332870fbec90dbe81b96b2b         TXT open   open Illumina HiSeq  live            NA 20a606f9-2aef-489b-a1d6-0044533e96ff Gene expression   1523079
                            cases submitter_id type                   tags experimental_strategy   tissue.definition
1276 TCGA-BA-4075-01A-01R-1436-07           NA file v2, unnormalized, gene               RNA-Seq Primary solid Tumor
1425 TCGA-BA-4074-01A-01R-1436-07           NA file v2, unnormalized, gene               RNA-Seq Primary solid Tumor

ADD REPLY
0
Entering edit mode
jshelton • 0
@jshelton-11836
Last seen 8.1 years ago

So my actual gene list and sample list is longer (I can email you it if you would like but it is in the sam format as my posted code). From the download I get:

> query.exp <- GDCquery(project = "TCGA-HNSC",
+                       legacy = TRUE,
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq",
+                       file.type = "results",
+                       experimental.strategy = "RNA-Seq",
+                       barcode = samples)
Accessing GDC. This might take a while...
>
> GDCdownload(query.exp)
Of the 82 files for download 2 already exist.
We will download only those that are missing ones.
GDCdownload will download 80 files. A total of 121.269906 MB
Downloading as: Fri_Nov_11_13_53_28_2016.tar.gz
  |======================================================================| 100%tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
Download completed​

query.exp$results looks normal (I'm just pasting in the top for space):

> query.exp$results
[[1]]
     center.code                  center.name center.short_name
31            07 University of North Carolina               UNC
65            07 University of North Carolina               UNC
67            07 University of North Carolina               UNC
68            07 University of North Carolina               UNC

My session info :

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Thanks:)

ADD COMMENT
0
Entering edit mode

Which TCGAbiolinks version do you have installed?

We had a problem a month ago in this part of the code. I'm just wondering if it is the old version.

ADD REPLY
0
Entering edit mode

I have TCGAbiolinks_2.3.4

Thanks

ADD REPLY
0
Entering edit mode

I just tried the same script again today and it worked. There server issue may have cleared up?

Thanks for your help

ADD REPLY
0
Entering edit mode

Okay so I spoke too soon. The script is now stalling at a later step.

Now when I run the next steps I get :

> GDCdownload(query.exp)
Of the 82 files for download 2 already exist.
We will download only those that are missing ones.
GDCdownload will download 80 files. A total of 121.269906 MB
Downloading as: Wed_Nov_16_14_08_55_2016.tar.gz
Downloading: 49 MB     [1] 1
> exp <- GDCprepare(query = query.exp,
+                   add.gistic2.mut = interesting.genes,
+                   save = TRUE,
+                   save.filename = "exp.rda")
  |======================================================================| 100%Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)

Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature14129
=> Adding GISTIC2 and mutation information....
============================================================================
 For more information about MAF data please read the following GDC manual:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
============================================================================
Accessing GDC. This might take a while...
Warning: There are more than one file for the same case. Please verify query results.
GDCdownload will download 4 files. A total of 55.154194 MB
Downloading as: Wed_Nov_16_14_13_06_2016.tar.gz
Downloading: 55 MB

|    |updated_datetime                 |file_name                                                                   |access | file_size|submitter_id                 |experimental_strategy |
|:---|:--------------------------------|:---------------------------------------------------------------------------|:------|---------:|:----------------------------|:---------------------|
|1   |2016-10-27T19:38:31.972682-05:00 |TCGA.HNSC.somaticsniper.b4c01bf0-e85a-43c1-9d9e-8d57ec4c7405.somatic.maf.gz |open   |  10011161|TCGA-HNSC-somaticsniper-open |WXS                   |
|2   |2016-10-27T19:38:37.456535-05:00 |TCGA.HNSC.mutect.b741ff1d-41e2-47ad-a148-77d66650a703.somatic.maf.gz        |open   |  17718800|TCGA-HNSC-mutect-open        |WXS                   |
|3   |2016-10-27T19:38:26.451199-05:00 |TCGA.HNSC.muse.2d2f9250-7ba6-48ef-8111-b373c5fffd6f.somatic.maf.gz          |open   |  13750833|TCGA-HNSC-muse-open          |WXS                   |
|4   |2016-10-27T19:38:20.923217-05:00 |TCGA.HNSC.varscan.61eeb81b-0591-4836-a50d-070f08ace451.somatic.maf.gz       |open   |  13673400|TCGA-HNSC-varscan-open       |WXS                   |
|1.1 |2016-10-27T19:38:31.972682-05:00 |TCGA.HNSC.somaticsniper.b4c01bf0-e85a-43c1-9d9e-8d57ec4c7405.somatic.maf.gz |open   |  10011161|TCGA-HNSC-somaticsniper-open |WXS                   |
|2.1 |2016-10-27T19:38:37.456535-05:00 |TCGA.HNSC.mutect.b741ff1d-41e2-47ad-a148-77d66650a703.somatic.maf.gz        |open   |  17718800|TCGA-HNSC-mutect-open        |WXS                   |
|3.1 |2016-10-27T19:38:26.451199-05:00 |TCGA.HNSC.muse.2d2f9250-7ba6-48ef-8111-b373c5fffd6f.somatic.maf.gz          |open   |  13750833|TCGA-HNSC-muse-open          |WXS                   |
|4.1 |2016-10-27T19:38:20.923217-05:00 |TCGA.HNSC.varscan.61eeb81b-0591-4836-a50d-070f08ace451.somatic.maf.gz       |open   |  13673400|TCGA-HNSC-varscan-open       |WXS                   |
|1.2 |2016-10-27T19:38:31.972682-05:00 |TCGA.HNSC.somaticsniper.b4c01bf0-e85a-43c1-9d9e-8d57ec4c7405.somatic.maf.gz |open   |  10011161|TCGA-HNSC-somaticsniper-open |WXS                   |
|2.2 |2016-10-27T19:38:37.456535-05:00 |TCGA.HNSC.mutect.b741ff1d-41e2-47ad-a148-77d66650a703.somatic.maf.gz        |open   |  17718800|TCGA-HNSC-mutect-open        |WXS                   |
|3.2 |2016-10-27T19:38:26.451199-05:00 |TCGA.HNSC.muse.2d2f9250-7ba6-48ef-8111-b373c5fffd6f.somatic.maf.gz          |open   |  13750833|TCGA-HNSC-muse-open          |WXS                   |
|4.2 |2016-10-27T19:38:20.923217-05:00 |TCGA.HNSC.varscan.61eeb81b-0591-4836-a50d-070f08ace451.somatic.maf.gz       |open   |  13673400|TCGA-HNSC-varscan-open       |WXS                   |
downloaded 0 bytes

Error in download.file(url, method = method, ...) :
  cannot download all files
In addition: Warning message:
In download.file(url, method = method, ...) :
  URL 'https://gdc-api.nci.nih.gov/data//64683606-b957-4478-a7d5-673de68b0341': status was '400 Bad Request'
ADD REPLY
0
Entering edit mode

Hi,

I believe the package version you have is old, that table print was removed some weeks ago. Could you update it please? 

Best regards,

Tiago

ADD REPLY
0
Entering edit mode
jshelton • 0
@jshelton-11836
Last seen 8.1 years ago

Thanks,

I rebuilt everything (from R up). I can get most of this to work but now I get:

> cnv <- TCGAbiolinks::getGistic("HNSC")​
Error: 'getGistic' is not an exported object from 'namespace:TCGAbiolinks'​

Any suggestions:)

ADD COMMENT
0
Entering edit mode
This function was not exported. You will need to use three ":" cnv <- TCGAbiolinks:::getGistic("HNSC") The reason it is not exported is that it downloads from gdac firehose ( http://gdac.broadinstitute.org/) and it is aligned to hg19. I'm still waiting to see if GDC will provide level 4 data for copy number data, then I should export a function. If you need the other results for GISTIC please, either take a look on RTCGAtoolbox or the gdac firehose (http://gdac.broadinstitute.org/) Tiago Chedraoui Silva On Thu, Nov 17, 2016 at 1:44 PM, jshelton [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User jshelton <https: support.bioconductor.org="" u="" 11836=""/> wrote Answer: > TCGAbiolinks GDCquery downloads a file that reads "message": "internal > server error" <https: support.bioconductor.org="" p="" 89370="" #89593="">: > > Thanks, > > I rebuilt everything (from R up). I can get most of this to work but now I > get: > > > cnv <- TCGAbiolinks::getGistic("HNSC")​ > Error: 'getGistic' is not an exported object from 'namespace:TCGAbiolinks'​ > > Any suggestions:) > > ------------------------------ > > Post tags: tcgabiolinks, gdc > > You may reply via email or visit https://support.bioconductor. > org/p/89370/#89593 >
ADD REPLY
0
Entering edit mode

Here is my session info:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] SummarizedExperiment_1.4.0 Biobase_2.34.0
[3] GenomicRanges_1.26.1       GenomeInfoDb_1.10.1
[5] IRanges_2.8.1              S4Vectors_0.12.0
[7] BiocGenerics_0.20.0        TCGAbiolinks_2.3.9
ADD REPLY
0
Entering edit mode

I'm also struggling to get this to work:

> TCGAbiolinks:::getGistic("HNSC")

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'getGistic' not found

ADD REPLY

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6