Question

Using TCGAbiolinks for exon and oligonucleotide level microarray

0

Entering edit mode

Fernando • 0

@fernando-10107

Last seen 6.9 years ago

Cuba

Hello, I'm starting in a project involving the analysis of microarray gene expression data and I began with TCGAbiolinks. I have downloaded the data succesfully from TCGA and now I'm trying to prepare the data with the function TCGAprepare, for probe-level microarrays is fine (if I put SummarizedExperiment as False, otherwise give me an error), but for the platforms of oligonucleotide and exon microarrays it doesn't return nothing (an empthy object). Is because the package isn't intended to deal with this kind of data?, or is that I'm not doing things ok?

tcga tcgabiolinks microarray exons oligonucleotide • 2.0k views

ADD COMMENT • link updated 7.9 years ago by Tiago C. Silva ▴ 270 • written 8.0 years ago by Fernando • 0

0

Entering edit mode

Hi Fernando, we don't have all platforms in TCGAprepare. Do you have an example of these two platforms?

platforms of oligonucleotide
exon microarrays

There is a list in the vignette, but I'm not sure which one it is. I'll try to add them asap.

ADD REPLY • link 8.0 years ago Tiago C. Silva ▴ 270

0

Entering edit mode

Hello, thanks for answer me again. For example, for exon microarrays, I did

TCGAquery(tumor = "gbm", platform = "HuEx-1_0-st-v2", level=3) (which works perfectly as the TCGAdownload function)

but I get a NULL object when I use TCGAprepare. But it isn't a big trouble, because I can try to make the matrix of the experiments. What I don't understand is something in the downloaded files: the numbers of barcodes of the experiments of a .txt file are smaller than the number of signal measures in that file, could you explain me why this happens?

ADD REPLY • link 7.9 years ago Fernando • 0

1

Entering edit mode

Yes, there was no code for this platform ("HuEx-1_0-st-v2)

I just added the code in the github repository (https://github.com/BioinformaticsFMRP/TCGAbiolinks)

I'll add to bioconductor soon.

query <- TCGAquery(tumor = "gbm", platform = "HuEx-1_0-st-v2", level=3) 
TCGAdownload(query,type = "gene")
data <- TCGAprepare(query,dir = ".",type = "gene")

TCGAdownload(query,type = "FIRMA")
data <- TCGAprepare(query,dir = ".",type = "FIRMA", summarizedExperiment = FALSE)

ADD REPLY • link 7.9 years ago Tiago C. Silva ▴ 270

0

Entering edit mode

Thank you very much, tiagochst. But well, now that we are asking... could you please do something too with this platforms:
HG-CGH-244A, HG-CGH-415k_G4124A, CGH-1x1M_G4447A ?

I see, that you add also the option of gene in type, which wasn't there, because in here and here the package seems always more oriented to RNA-seq data.

And for a last thing (this maybe is too much, but who could say?), could you recommend me some bibliography to understand better the underlying methods behind the package?, this is for learn also how to do the things (for example, I don't know how works the TCGAprepare function to join the exon microarrays experiments with their respective TCGA codes (I know that you have covered, but I don't understand how)), I am very very new in this subject.

ADD REPLY • link 7.9 years ago Fernando • 0

score 1 · Answer 1 · 2016-05-16

Hi Fernando.

I added the platforms, it is in github it will be soon in Bioconductor.

Example:

Yes, the type argument was more RNA-seq oriented. But as other platforms has sometimes more than one data type we are trying to provide both for the users. We need to improve/update the vignette.

What I understood from the question is how I linked the experiment (files with data) with the barcode (I'm sorry if I got it wrong). To map the barcodes to the experiments there are three types of situations:

Barcode in the filename
Uuid in the filenames, then we map the uuid to tcga using TCGA api
sdrf files in the mage folder: each experiment normally has a folder with the metada information that maps the file to the TCGAbarcode. Example for HuEx-1_0-st-v2

More or less these cases can be seen in this code

In TCGAprepare we only read the data, no normalization is done. If further steps needs to be made, it is up to the user.

Also, there is another database called FIREHOSE https://gdac.broadinstitute.org/. Firehose got TCGA data and sometimes they do some processing steps, you can also download the data from there with other packages (TCGA2STAT, RTCGAtoolbox).