TCGAbiolinks GDCquery downloads a file that reads "message": "internal server error"
2
0
Entering edit mode
jshelton • 0
@jshelton-11836
Last seen 8.4 years ago

Hi,

I am attempting to download data but instead get a file with the extension '.tar.gz'. However, this is not a compressed file. It is a text file with the words:

{
  "message": "internal server error"
}

Below is the code I ran:

samples <- c("TCGA-BA-4074", "TCGA-BA-4075")

interesting.genes <- c("TP53", "PIK3CA", "FAT1")
query.exp <- GDCquery(project = "TCGA-HNSC",

                      legacy = TRUE,

                      data.category = "Gene expression",

                      data.type = "Gene expression quantification",

                      platform = "Illumina HiSeq",

                      file.type = "results",

                      experimental.strategy = "RNA-Seq",

                      barcode = samples)

GDCdownload(query.exp)​

Thanks for your help

 
tcgabiolinks gdc • 3.8k views
ADD COMMENT
0
Entering edit mode

Hi,

I was able to run the code here.Please, what is your session info? And what query.exp$results shows?

Best regards,

Tiago 

My output shows this


> GDCdownload(query.exp) GDCdownload will download 2 files. A total of 3.031802 MB Downloading as: Fri_Nov_11_09_35_27_2016.tar.gz Downloading: 1.2 MB [1] 1

> GDCprepare(query.exp) |=======================================================================================================================================================| 100% Downloading genome information. Using: Homo sapiens genes (GRCh37.p13) Starting to add information to samples => Add clinical information to samples => Adding subtype information to samples Subtype information from:doi:10.1038/nature14129 class: RangedSummarizedExperiment dim: 20330 2 metadata(0): assays(2): raw_count scaled_estimate rownames(20330): A1BG|1 A1CF|29974 ... ZZEF1|23140 ZZZ3|26009 rowData names(4): gene_id entrezgene ensembl_gene_id transcript_id.transcript_id_TCGA-BA-4075-01A-01R-1436-07 colnames(2): TCGA-BA-4075-01A-01R-1436-07 TCGA-BA-4074-01A-01R-1436-07 colData names(62): sample patient ... subtype_Copy.Number subtype_PARADIGM > data <- GDCprepare(query.exp) |==================================================================| 100% Downloading genome information. Using: Homo sapiens genes (GRCh37.p13) Starting to add information to samples => Add clinical information to samples => Adding subtype information to samples Subtype information from:doi:10.1038/nature14129 > query.exp$results [[1]] center.code center.name center.short_name center.center_id center.namespace center.center_type data_type updated_datetime 1276 07 University of North Carolina UNC ee7a85b3-8177-5d60-a10c-51180eb9009c unc.edu CGCC Gene expression quantification 2016-09-07T11:17:30.997957-05:00 1425 07 University of North Carolina UNC ee7a85b3-8177-5d60-a10c-51180eb9009c unc.edu CGCC Gene expression quantification 2016-09-07T11:17:30.997957-05:00 file_name md5sum data_format acl access platform state state_comment file_id data_category file_size 1276 unc.edu.85034d8f-c10c-4db2-ade2-f26ea7cf2d95.1507611.rsem.genes.results 654b40396ed647c6ba22c3fbaf963b1b TXT open open Illumina HiSeq live NA 9bbe732f-4592-4681-91ee-d9e00c88ef1c Gene expression 1508723 1425 unc.edu.78a8e33e-fd10-4dcd-b8fd-aad93db18c45.1484374.rsem.genes.results 2faf952fe332870fbec90dbe81b96b2b TXT open open Illumina HiSeq live NA 20a606f9-2aef-489b-a1d6-0044533e96ff Gene expression 1523079 cases submitter_id type tags experimental_strategy tissue.definition 1276 TCGA-BA-4075-01A-01R-1436-07 NA file v2, unnormalized, gene RNA-Seq Primary solid Tumor 1425 TCGA-BA-4074-01A-01R-1436-07 NA file v2, unnormalized, gene RNA-Seq Primary solid Tumor

ADD REPLY
0
Entering edit mode
jshelton • 0
@jshelton-11836
Last seen 8.4 years ago

So my actual gene list and sample list is longer (I can email you it if you would like but it is in the sam format as my posted code). From the download I get:

> query.exp <- GDCquery(project = "TCGA-HNSC",
+                       legacy = TRUE,
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq",
+                       file.type = "results",
+                       experimental.strategy = "RNA-Seq",
+                       barcode = samples)
Accessing GDC. This might take a while...
>
> GDCdownload(query.exp)
Of the 82 files for download 2 already exist.
We will download only those that are missing ones.
GDCdownload will download 80 files. A total of 121.269906 MB
Downloading as: Fri_Nov_11_13_53_28_2016.tar.gz
  |======================================================================| 100%tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
Download completed​

query.exp$results looks normal (I'm just pasting in the top for space):

> query.exp$results
[[1]]
     center.code                  center.name center.short_name
31            07 University of North Carolina               UNC
65            07 University of North Carolina               UNC
67            07 University of North Carolina               UNC
68            07 University of North Carolina               UNC

My session info :

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Thanks:)

ADD COMMENT
0
Entering edit mode

Which TCGAbiolinks version do you have installed?

We had a problem a month ago in this part of the code. I'm just wondering if it is the old version.

ADD REPLY
0
Entering edit mode

I have TCGAbiolinks_2.3.4

Thanks

ADD REPLY
0
Entering edit mode

I just tried the same script again today and it worked. There server issue may have cleared up?

Thanks for your help

ADD REPLY
0
Entering edit mode

Okay so I spoke too soon. The script is now stalling at a later step.

Now when I run the next steps I get :

> GDCdownload(query.exp)
Of the 82 files for download 2 already exist.
We will download only those that are missing ones.
GDCdownload will download 80 files. A total of 121.269906 MB
Downloading as: Wed_Nov_16_14_08_55_2016.tar.gz
Downloading: 49 MB     [1] 1
> exp <- GDCprepare(query = query.exp,
+                   add.gistic2.mut = interesting.genes,
+                   save = TRUE,
+                   save.filename = "exp.rda")
  |======================================================================| 100%Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)

Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature14129
=> Adding GISTIC2 and mutation information....
============================================================================
 For more information about MAF data please read the following GDC manual:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
============================================================================
Accessing GDC. This might take a while...
Warning: There are more than one file for the same case. Please verify query results.
GDCdownload will download 4 files. A total of 55.154194 MB
Downloading as: Wed_Nov_16_14_13_06_2016.tar.gz
Downloading: 55 MB

|    |updated_datetime                 |file_name                                                                   |access | file_size|submitter_id                 |experimental_strategy |
|:---|:--------------------------------|:---------------------------------------------------------------------------|:------|---------:|:----------------------------|:---------------------|
|1   |2016-10-27T19:38:31.972682-05:00 |TCGA.HNSC.somaticsniper.b4c01bf0-e85a-43c1-9d9e-8d57ec4c7405.somatic.maf.gz |open   |  10011161|TCGA-HNSC-somaticsniper-open |WXS                   |
|2   |2016-10-27T19:38:37.456535-05:00 |TCGA.HNSC.mutect.b741ff1d-41e2-47ad-a148-77d66650a703.somatic.maf.gz        |open   |  17718800|TCGA-HNSC-mutect-open        |WXS                   |
|3   |2016-10-27T19:38:26.451199-05:00 |TCGA.HNSC.muse.2d2f9250-7ba6-48ef-8111-b373c5fffd6f.somatic.maf.gz          |open   |  13750833|TCGA-HNSC-muse-open          |WXS                   |
|4   |2016-10-27T19:38:20.923217-05:00 |TCGA.HNSC.varscan.61eeb81b-0591-4836-a50d-070f08ace451.somatic.maf.gz       |open   |  13673400|TCGA-HNSC-varscan-open       |WXS                   |
|1.1 |2016-10-27T19:38:31.972682-05:00 |TCGA.HNSC.somaticsniper.b4c01bf0-e85a-43c1-9d9e-8d57ec4c7405.somatic.maf.gz |open   |  10011161|TCGA-HNSC-somaticsniper-open |WXS                   |
|2.1 |2016-10-27T19:38:37.456535-05:00 |TCGA.HNSC.mutect.b741ff1d-41e2-47ad-a148-77d66650a703.somatic.maf.gz        |open   |  17718800|TCGA-HNSC-mutect-open        |WXS                   |
|3.1 |2016-10-27T19:38:26.451199-05:00 |TCGA.HNSC.muse.2d2f9250-7ba6-48ef-8111-b373c5fffd6f.somatic.maf.gz          |open   |  13750833|TCGA-HNSC-muse-open          |WXS                   |
|4.1 |2016-10-27T19:38:20.923217-05:00 |TCGA.HNSC.varscan.61eeb81b-0591-4836-a50d-070f08ace451.somatic.maf.gz       |open   |  13673400|TCGA-HNSC-varscan-open       |WXS                   |
|1.2 |2016-10-27T19:38:31.972682-05:00 |TCGA.HNSC.somaticsniper.b4c01bf0-e85a-43c1-9d9e-8d57ec4c7405.somatic.maf.gz |open   |  10011161|TCGA-HNSC-somaticsniper-open |WXS                   |
|2.2 |2016-10-27T19:38:37.456535-05:00 |TCGA.HNSC.mutect.b741ff1d-41e2-47ad-a148-77d66650a703.somatic.maf.gz        |open   |  17718800|TCGA-HNSC-mutect-open        |WXS                   |
|3.2 |2016-10-27T19:38:26.451199-05:00 |TCGA.HNSC.muse.2d2f9250-7ba6-48ef-8111-b373c5fffd6f.somatic.maf.gz          |open   |  13750833|TCGA-HNSC-muse-open          |WXS                   |
|4.2 |2016-10-27T19:38:20.923217-05:00 |TCGA.HNSC.varscan.61eeb81b-0591-4836-a50d-070f08ace451.somatic.maf.gz       |open   |  13673400|TCGA-HNSC-varscan-open       |WXS                   |
downloaded 0 bytes

Error in download.file(url, method = method, ...) :
  cannot download all files
In addition: Warning message:
In download.file(url, method = method, ...) :
  URL 'https://gdc-api.nci.nih.gov/data//64683606-b957-4478-a7d5-673de68b0341': status was '400 Bad Request'
ADD REPLY
0
Entering edit mode

Hi,

I believe the package version you have is old, that table print was removed some weeks ago. Could you update it please? 

Best regards,

Tiago

ADD REPLY
0
Entering edit mode
jshelton • 0
@jshelton-11836
Last seen 8.4 years ago

Thanks,

I rebuilt everything (from R up). I can get most of this to work but now I get:

> cnv <- TCGAbiolinks::getGistic("HNSC")​
Error: 'getGistic' is not an exported object from 'namespace:TCGAbiolinks'​

Any suggestions:)

ADD COMMENT
0
Entering edit mode
This function was not exported. You will need to use three ":" cnv <- TCGAbiolinks:::getGistic("HNSC") The reason it is not exported is that it downloads from gdac firehose ( http://gdac.broadinstitute.org/) and it is aligned to hg19. I'm still waiting to see if GDC will provide level 4 data for copy number data, then I should export a function. If you need the other results for GISTIC please, either take a look on RTCGAtoolbox or the gdac firehose (http://gdac.broadinstitute.org/) Tiago Chedraoui Silva On Thu, Nov 17, 2016 at 1:44 PM, jshelton [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User jshelton <https: support.bioconductor.org="" u="" 11836=""/> wrote Answer: > TCGAbiolinks GDCquery downloads a file that reads "message": "internal > server error" <https: support.bioconductor.org="" p="" 89370="" #89593="">: > > Thanks, > > I rebuilt everything (from R up). I can get most of this to work but now I > get: > > > cnv <- TCGAbiolinks::getGistic("HNSC")​ > Error: 'getGistic' is not an exported object from 'namespace:TCGAbiolinks'​ > > Any suggestions:) > > ------------------------------ > > Post tags: tcgabiolinks, gdc > > You may reply via email or visit https://support.bioconductor. > org/p/89370/#89593 >
ADD REPLY
0
Entering edit mode

Here is my session info:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] SummarizedExperiment_1.4.0 Biobase_2.34.0
[3] GenomicRanges_1.26.1       GenomeInfoDb_1.10.1
[5] IRanges_2.8.1              S4Vectors_0.12.0
[7] BiocGenerics_0.20.0        TCGAbiolinks_2.3.9
ADD REPLY
0
Entering edit mode

I'm also struggling to get this to work:

> TCGAbiolinks:::getGistic("HNSC")

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'getGistic' not found

ADD REPLY

Login before adding your answer.

Traffic: 625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6