Using TCGAbiolinks for exon and oligonucleotide level microarray
Entering edit mode
Fernando • 0
Last seen 5.0 years ago

Hello, I'm starting in a project involving the analysis of microarray gene expression data and I began with TCGAbiolinks. I have downloaded the data succesfully from TCGA and now I'm trying to prepare the data with the function TCGAprepare, for probe-level microarrays is fine (if I put SummarizedExperiment as False, otherwise give me an error), but for the platforms of oligonucleotide and exon microarrays it doesn't return nothing (an empthy object). Is because the package isn't intended to deal with this kind of data?, or is that I'm not doing things ok?

tcga tcgabiolinks microarray exons oligonucleotide • 1.4k views
Entering edit mode

Hi Fernando, we don't have all platforms in TCGAprepare. Do you have an example of these two platforms?

  • platforms of oligonucleotide
  • exon microarrays

There is a list in the vignette, but I'm not sure which one it is. I'll try to add them asap.

Entering edit mode

Hello, thanks for answer me again. For example, for exon microarrays, I did

TCGAquery(tumor = "gbm", platform = "HuEx-1_0-st-v2", level=3) (which works perfectly as the TCGAdownload function)

but I get a NULL object when I use TCGAprepare. But it isn't a big trouble, because I can try to make the matrix of the experiments. What I don't understand is something in the downloaded files: the numbers of barcodes of the experiments of a .txt file are smaller than the number of signal measures in that file, could you explain me why this happens?

Entering edit mode

Yes, there was no code for this platform ("HuEx-1_0-st-v2)

I just added the code in the github repository (

I'll add to bioconductor soon.


query <- TCGAquery(tumor = "gbm", platform = "HuEx-1_0-st-v2", level=3) 
TCGAdownload(query,type = "gene")
data <- TCGAprepare(query,dir = ".",type = "gene")

TCGAdownload(query,type = "FIRMA")
data <- TCGAprepare(query,dir = ".",type = "FIRMA", summarizedExperiment = FALSE)
Entering edit mode

Thank you very much, tiagochst. But well, now that we are asking... could you please do something too with this platforms:
HG-CGH-244A, HG-CGH-415k_G4124A, CGH-1x1M_G4447A ?

I see, that you add also the option of gene in type, which wasn't there, because in here and here the package seems always more oriented to RNA-seq data.

And for a last thing (this maybe is too much, but who could say?), could you recommend me some bibliography to understand better the underlying methods behind the package?, this is for learn also how to do the things (for example, I don't know how works the TCGAprepare function to join the exon microarrays experiments with their respective TCGA codes (I know that you have covered, but I don't understand how)), I am very very new in this subject.

Entering edit mode
Last seen 18 days ago
Miami, US

Hi Fernando.

I added the platforms, it is in github it will be soon in Bioconductor.


Yes, the type argument was more RNA-seq oriented. But as other platforms has sometimes more than one data type we are trying to provide both for the users. We need to improve/update the vignette.

What I understood from the question is how I linked the experiment (files with data) with the barcode (I'm sorry if I got it wrong). To map the barcodes to the experiments there are three types of situations: 

  • Barcode in the filename
  • Uuid in the filenames, then we map the uuid to tcga using TCGA api
  • sdrf files in the mage folder: each experiment normally has a folder with the metada information that maps the file to the TCGAbarcode. Example for HuEx-1_0-st-v2 

More or less these cases can be seen in this code 

In TCGAprepare we only read the data, no normalization is done. If further steps needs to be made, it is up to the user. 

Also, there is another database called FIREHOSE Firehose got TCGA data and sometimes they do some processing steps, you can also download the data from there with other packages (TCGA2STAT, RTCGAtoolbox).



Entering edit mode

Thanks tiagochst, for your answer (which was very fast) and the changes to the package for make me the life easier. You got it right, that was exactly what I was asking.

I know that the package (from what I read) is not intended (at least for now) to do normalization to all platforms, but in fact I only want to have a package to download and prepare the information from TCGA in R, so TCGAbiolinks came as a ring in my finger.

I have read about the packages that you mention but I get the impression that they don't work in Windows (I prefer to work in Windows), I am right?, but even if they did, with your package I'm great.

Thanks again, for your answer and explanations. By the way (for not lose the habit), I tried to install the package from GitHub, but I got an error:

> devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks")
Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@master
from URL
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached

What should I do?


Login before adding your answer.

Traffic: 1130 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6