Cannot retrieve projects from TCGA using getGDCprojects function
2
1
Entering edit mode
jacorvar ▴ 40
@jacorvar-8972
Last seen 5 months ago
European Union

Hi

 

I've just started using TCGAbiolinks package and it seems there's a bug in the getGDCprojects function. I cannot retrieve the

projects from TCGA. When I type getGDCprojects(), I get:

Warning: 40 parsing failures.
row # A tibble: 5 x 5 col     row   col  expected     actual expected   <int> <chr>     <chr>      <chr> actual 1     1  <NA> 8 columns 71 columns file 2     2  <NA> 8 columns 71 columns row 3     3  <NA> 8 columns 71 columns col 4     4  <NA> 8 columns 71 columns expected 5     5  <NA> 8 columns 71 columns actual # ... with 1 more variables: file <chr>
... ................. ... .................................. ........ .................................. ...... .................................. .... .................................. ... .................................. ... .................................. ........ .................................. ...... .......................................
See problems(...) for more details.

# A tibble: 40 x 8
   disease_type.1                       disease_type.0 disease_type.3
            <chr>                                <chr>          <chr>
 1           <NA>                       Uveal Melanoma           <NA>
 2           <NA>               Stomach Adenocarcinoma           <NA>
 3           <NA>         Lung Squamous Cell Carcinoma           <NA>
 4           <NA> Uterine Corpus Endometrial Carcinoma           <NA>
 5           <NA>                   Cholangiocarcinoma           <NA>
 6           <NA>    Ovarian Serous Cystadenocarcinoma           <NA>
 7           <NA>             Adrenocortical Carcinoma           <NA>
 8           <NA>                         Mesothelioma           <NA>
 9           <NA>               Acute Myeloid Leukemia           <NA>
10           <NA>             Brain Lower Grade Glioma           <NA>
# ... with 30 more rows, and 5 more variables: disease_type.2 <chr>,
#   disease_type.5 <chr>, disease_type.4 <chr>, disease_type.7 <chr>,
#   disease_type.6 <chr>
Warning messages:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
2: In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)
3: Unknown or uninitialised column: 'project_id'.

EDIT: I've found out getGDCprojects requires readr package to be loaded (it should be loaded automatically). Furthermore, the line containing:

projects <- read_tsv("https://gdc-api.nci.nih.gov/projects?size=1000&format=tsv",
            col_types = "cccccccc")

only works if the col_types argument is removed.

 

tcgabiolinks TCGA software error • 2.0k views
ADD COMMENT
0
Entering edit mode

Please, which version of the package are you using? I don't get that with the last version.

Could you please update it from the github: devtools::install_github('BioinformaticsFMRP/TCGAbiolinks')
ADD REPLY
0
Entering edit mode

You can also use: `biocLite('BioinformaticsFMRP/TCGAbiolinks')`

ADD REPLY
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States

These are warnings and not errors. When using functionality from other packages (like read_tsv from readr), the best practice is NOT to load the package (ie., not to do library(readr)), but only to import the functionality. 

That said, the GDC recently changed the underlying data for projects that likely breaks most functionality for parsing the projects into a nice data.frame. Note the many columns that look like disease_type.x. This change is probably why removing the col_types removes some of the warnings.

ADD COMMENT
0
Entering edit mode
jacorvar ▴ 40
@jacorvar-8972
Last seen 5 months ago
European Union

That specific case does not prompt any errors, but the following does:

# import the read_tsv function from readr package

>backports::import("readr", "read_tsv")

#load TCGAbiolinks package

> library(TCGAbiolinks)

# lines 4-8 from listing 1 in https://f1000research.com/articles/5-1542/v2

> query.met.gbm <- GDCquery(project = "TCGA-GBM", legacy = TRUE, data.category = "DNA methylation", platform = "Illumina Human Methylation 450", barcode = c("TCGA-76-4926-01B-01D-1481-05", "TCGA-28-5211-01C-11D-1844-05"))


I am unable to use the GDCquery function without errors.

I guess this issue is directly related to the fact I'm unable to retrieve the projects when using getGDCprojects(), since the error I get comes from the checkProjectInput function, which calls getGDCprojects().

ADD COMMENT
0
Entering edit mode

The output of the GDCquery function is:

--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------

Genome of reference: hg19
Warning: 40 parsing failures.
row # A tibble: 5 x 5 col     row   col  expected     actual expected   <int> <chr>     <chr>      <chr> actual 1     1  <NA> 8 columns 71 columns file 2     2  <NA> 8 columns 71 columns row 3     3  <NA> 8 columns 71 columns col 4     4  <NA> 8 columns 71 columns expected 5     5  <NA> 8 columns 71 columns actual # ... with 1 more variables: file <chr>
... ................. ... .................................. ........ .................................. ...... .................................. .... .................................. ... .................................. ... .................................. ........ .................................. ...... .......................................
See problems(...) for more details.
ADD REPLY
0
Entering edit mode

And also:

|disease_type.2              |disease_type.5                      |disease_type.4       |disease_type.6               |
|:---------------------------|:-----------------------------------|:--------------------|:----------------------------|
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
ADD REPLY
0
Entering edit mode

And finally:

|Thymic Epithelial Neoplasms |Complex Mixed and Stromal Neoplasms |Basal Cell Neoplasms |Ductal and Lobular Neoplasms |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
Error in checkProjectInput(project) :
  Please set a valid project argument from the column id above. Project TCGA-GBM was not found.
Además: Warning messages:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
2: In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)
3: Unknown or uninitialised column: 'project_id'.
4: Unknown or uninitialised column: 'project_id'.
ADD REPLY

Login before adding your answer.

Traffic: 484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6