Search
Question: Cannot retrieve projects from TCGA using getGDCprojects function
1
gravatar for jacorvar
21 days ago by
jacorvar30
European Union
jacorvar30 wrote:

Hi

 

I've just started using TCGAbiolinks package and it seems there's a bug in the getGDCprojects function. I cannot retrieve the

projects from TCGA. When I type getGDCprojects(), I get:

Warning: 40 parsing failures.
row # A tibble: 5 x 5 col     row   col  expected     actual expected   <int> <chr>     <chr>      <chr> actual 1     1  <NA> 8 columns 71 columns file 2     2  <NA> 8 columns 71 columns row 3     3  <NA> 8 columns 71 columns col 4     4  <NA> 8 columns 71 columns expected 5     5  <NA> 8 columns 71 columns actual # ... with 1 more variables: file <chr>
... ................. ... .................................. ........ .................................. ...... .................................. .... .................................. ... .................................. ... .................................. ........ .................................. ...... .......................................
See problems(...) for more details.

# A tibble: 40 x 8
   disease_type.1                       disease_type.0 disease_type.3
            <chr>                                <chr>          <chr>
 1           <NA>                       Uveal Melanoma           <NA>
 2           <NA>               Stomach Adenocarcinoma           <NA>
 3           <NA>         Lung Squamous Cell Carcinoma           <NA>
 4           <NA> Uterine Corpus Endometrial Carcinoma           <NA>
 5           <NA>                   Cholangiocarcinoma           <NA>
 6           <NA>    Ovarian Serous Cystadenocarcinoma           <NA>
 7           <NA>             Adrenocortical Carcinoma           <NA>
 8           <NA>                         Mesothelioma           <NA>
 9           <NA>               Acute Myeloid Leukemia           <NA>
10           <NA>             Brain Lower Grade Glioma           <NA>
# ... with 30 more rows, and 5 more variables: disease_type.2 <chr>,
#   disease_type.5 <chr>, disease_type.4 <chr>, disease_type.7 <chr>,
#   disease_type.6 <chr>
Warning messages:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
2: In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)
3: Unknown or uninitialised column: 'project_id'.

EDIT: I've found out getGDCprojects requires readr package to be loaded (it should be loaded automatically). Furthermore, the line containing:

projects <- read_tsv("https://gdc-api.nci.nih.gov/projects?size=1000&format=tsv",
            col_types = "cccccccc")

only works if the col_types argument is removed.

 

ADD COMMENTlink modified 20 days ago • written 21 days ago by jacorvar30

Please, which version of the package are you using? I don't get that with the last version.

Could you please update it from the github: devtools::install_github('BioinformaticsFMRP/TCGAbiolinks')
ADD REPLYlink written 15 days ago by tiagochst100

You can also use: `biocLite('BioinformaticsFMRP/TCGAbiolinks')`

ADD REPLYlink written 14 days ago by Sean Davis21k
0
gravatar for Sean Davis
20 days ago by
Sean Davis21k
United States
Sean Davis21k wrote:

These are warnings and not errors. When using functionality from other packages (like read_tsv from readr), the best practice is NOT to load the package (ie., not to do library(readr)), but only to import the functionality. 

That said, the GDC recently changed the underlying data for projects that likely breaks most functionality for parsing the projects into a nice data.frame. Note the many columns that look like disease_type.x. This change is probably why removing the col_types removes some of the warnings.

ADD COMMENTlink modified 20 days ago • written 20 days ago by Sean Davis21k
0
gravatar for jacorvar
20 days ago by
jacorvar30
European Union
jacorvar30 wrote:

That specific case does not prompt any errors, but the following does:

# import the read_tsv function from readr package

>backports::import("readr", "read_tsv")

#load TCGAbiolinks package

> library(TCGAbiolinks)

# lines 4-8 from listing 1 in https://f1000research.com/articles/5-1542/v2

> query.met.gbm <- GDCquery(project = "TCGA-GBM", legacy = TRUE, data.category = "DNA methylation", platform = "Illumina Human Methylation 450", barcode = c("TCGA-76-4926-01B-01D-1481-05", "TCGA-28-5211-01C-11D-1844-05"))


I am unable to use the GDCquery function without errors.

I guess this issue is directly related to the fact I'm unable to retrieve the projects when using getGDCprojects(), since the error I get comes from the checkProjectInput function, which calls getGDCprojects().

ADD COMMENTlink modified 20 days ago • written 20 days ago by jacorvar30

The output of the GDCquery function is:

--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------

Genome of reference: hg19
Warning: 40 parsing failures.
row # A tibble: 5 x 5 col     row   col  expected     actual expected   <int> <chr>     <chr>      <chr> actual 1     1  <NA> 8 columns 71 columns file 2     2  <NA> 8 columns 71 columns row 3     3  <NA> 8 columns 71 columns col 4     4  <NA> 8 columns 71 columns expected 5     5  <NA> 8 columns 71 columns actual # ... with 1 more variables: file <chr>
... ................. ... .................................. ........ .................................. ...... .................................. .... .................................. ... .................................. ... .................................. ........ .................................. ...... .......................................
See problems(...) for more details.
ADD REPLYlink modified 20 days ago • written 20 days ago by jacorvar30

And also:

|disease_type.2              |disease_type.5                      |disease_type.4       |disease_type.6               |
|:---------------------------|:-----------------------------------|:--------------------|:----------------------------|
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
ADD REPLYlink written 20 days ago by jacorvar30

And finally:

|Thymic Epithelial Neoplasms |Complex Mixed and Stromal Neoplasms |Basal Cell Neoplasms |Ductal and Lobular Neoplasms |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
|NA                          |NA                                  |NA                   |NA                           |
Error in checkProjectInput(project) :
  Please set a valid project argument from the column id above. Project TCGA-GBM was not found.
Además: Warning messages:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
2: In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)
3: Unknown or uninitialised column: 'project_id'.
4: Unknown or uninitialised column: 'project_id'.
ADD REPLYlink written 20 days ago by jacorvar30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 297 users visited in the last hour