Search
Question: Analysis using TCGAbiolinks
1
gravatar for fawazfebin
9 months ago by
fawazfebin30
fawazfebin30 wrote:

Hi 

I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance

ADD COMMENTlink modified 9 months ago by antonio.colaprico40 • written 9 months ago by fawazfebin30
2
gravatar for antonio.colaprico
9 months ago by
USA/ Florida/ University of Miami Hospital
antonio.colaprico40 wrote:

Hi fawazfebin thank you for interest in using our tool TCGAbiolinks, and pb.panigrahi86 for helping to find a solution. I am sharing here the code to obtain the triple negative TCGA-BRCA and I will add it in our manual. If you have any other questions or issues please you write here or in https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues where if me or Tiago tiagochst cannot answer you it is possible that you can have a prompt response from the GitHub community as well. Have a nice day and good work. Best, Antonio.

library(TCGAbiolinks)

#-------------------  4.1 Parameter Definition                 --------------------

CancerProject <- "TCGA-BRCA"
DataDirectory <- paste0("GDC_",gsub("-","_",CancerProject))
FileNameData <- paste0(DataDirectory, "_","Illumina HiSeq",".rda")

# Query platform Illumina HiSeq with a list of barcode 
query <- GDCquery(project = CancerProject, 
                  data.category = "Gene expression",
                  data.type = "Gene expression quantification",
                  platform = "Illumina HiSeq", 
                  file.type = "results",
                  experimental.strategy = "RNA-Seq",
                  legacy = TRUE)

samplesDown <- query$results[[1]]$cases

dataAssy.sub <- TCGAquery_subtype(tumor = gsub("TCGA-","",CancerProject))

dataERneg <- dataAssy.sub[dataAssy.sub$ER.Status %in% "Negative",]
dataPRneg <- dataAssy.sub[dataAssy.sub$PR.Status %in% "Negative",]
dataHER2neg <- dataAssy.sub[dataAssy.sub$HER2.Final.Status %in% "Negative",]

dataTNBC <- Reduce(intersect, list(dataERneg$patient, 
                                   dataPRneg$patient,
                                   dataHER2neg$patient))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "NT")

dataSmTP_TNBC <- dataSmTP[substr(dataSmTP,1,12) %in% dataTNBC]

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", 
                      file.type = "results",
                      barcode = c(dataSmTP_TNBC, dataSmTP),
                      experimental.strategy = "RNA-Seq",
                      legacy = TRUE)

GDCdownload(query = queryDown,
            directory = DataDirectory)

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  DataDirectory,
                       save.filename = FileNameData)

 

 

ADD COMMENTlink written 9 months ago by antonio.colaprico40

 

Great thanks Antonio for the specific code. Can I know whether there is only gene expression data available for triple negative breast cancer?

 

ADD REPLYlink written 9 months ago by fawazfebin30

I got the following warning message while running one of the commands above:

> queryDown <- GDCquery(project = CancerProject, 
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq", 
+                       file.type = "results",
+                       barcode = c(dataSmTP_TNBC, dataSmTP),
+                       experimental.strategy = "RNA-Seq",
+                       legacy = TRUE)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-BRCA
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By experimental.strategy
ooo By data.type
ooo By file.type
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
Warning: There are more than one file for the same case. Please verify query results. You can use the command View(getResults(query)) in rstudio
ooo Check if there results for the query
-------------------
o Preparing output
-------------------

Anything to be cleared at this point?  Great thanks in advance.

 

 

 

 

 

ADD REPLYlink written 9 months ago by fawazfebin30
0
gravatar for pb.panigrahi86
9 months ago by
pb.panigrahi860 wrote:

From clinical data, you have to filter samples whose ER/PR/HER2 status is negative. Once you get sample ids, you can use these to fetch data for these samples.

Meanwhile, I will see if I can provide you sample code for doing that.

ADD COMMENTlink written 9 months ago by pb.panigrahi860
1
Thanks for the help. I will be grateful if you could provide the code.
ADD REPLYlink written 9 months ago by fawazfebin30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 159 users visited in the last hour