Search
Question: Analysis using TCGAbiolinks
1
gravatar for fawazfebin
12 months ago by
fawazfebin30
fawazfebin30 wrote:

Hi 

I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance

ADD COMMENTlink modified 12 months ago by antonio.colaprico40 • written 12 months ago by fawazfebin30
2
gravatar for antonio.colaprico
12 months ago by
USA/ Florida/ University of Miami Hospital
antonio.colaprico40 wrote:

Hi fawazfebin thank you for interest in using our tool TCGAbiolinks, and pb.panigrahi86 for helping to find a solution. I am sharing here the code to obtain the triple negative TCGA-BRCA and I will add it in our manual. If you have any other questions or issues please you write here or in https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues where if me or Tiago tiagochst cannot answer you it is possible that you can have a prompt response from the GitHub community as well. Have a nice day and good work. Best, Antonio.

library(TCGAbiolinks)

#-------------------  4.1 Parameter Definition                 --------------------

CancerProject <- "TCGA-BRCA"
DataDirectory <- paste0("GDC_",gsub("-","_",CancerProject))
FileNameData <- paste0(DataDirectory, "_","Illumina HiSeq",".rda")

# Query platform Illumina HiSeq with a list of barcode 
query <- GDCquery(project = CancerProject, 
                  data.category = "Gene expression",
                  data.type = "Gene expression quantification",
                  platform = "Illumina HiSeq", 
                  file.type = "results",
                  experimental.strategy = "RNA-Seq",
                  legacy = TRUE)

samplesDown <- query$results[[1]]$cases

dataAssy.sub <- TCGAquery_subtype(tumor = gsub("TCGA-","",CancerProject))

dataERneg <- dataAssy.sub[dataAssy.sub$ER.Status %in% "Negative",]
dataPRneg <- dataAssy.sub[dataAssy.sub$PR.Status %in% "Negative",]
dataHER2neg <- dataAssy.sub[dataAssy.sub$HER2.Final.Status %in% "Negative",]

dataTNBC <- Reduce(intersect, list(dataERneg$patient, 
                                   dataPRneg$patient,
                                   dataHER2neg$patient))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "NT")

dataSmTP_TNBC <- dataSmTP[substr(dataSmTP,1,12) %in% dataTNBC]

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", 
                      file.type = "results",
                      barcode = c(dataSmTP_TNBC, dataSmTP),
                      experimental.strategy = "RNA-Seq",
                      legacy = TRUE)

GDCdownload(query = queryDown,
            directory = DataDirectory)

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  DataDirectory,
                       save.filename = FileNameData)

 

 

ADD COMMENTlink written 12 months ago by antonio.colaprico40

 

Great thanks Antonio for the specific code. Can I know whether there is only gene expression data available for triple negative breast cancer?

 

ADD REPLYlink written 12 months ago by fawazfebin30

I got the following warning message while running one of the commands above:

> queryDown <- GDCquery(project = CancerProject, 
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq", 
+                       file.type = "results",
+                       barcode = c(dataSmTP_TNBC, dataSmTP),
+                       experimental.strategy = "RNA-Seq",
+                       legacy = TRUE)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-BRCA
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By experimental.strategy
ooo By data.type
ooo By file.type
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
Warning: There are more than one file for the same case. Please verify query results. You can use the command View(getResults(query)) in rstudio
ooo Check if there results for the query
-------------------
o Preparing output
-------------------

Anything to be cleared at this point?  Great thanks in advance.

 

 

 

 

 

ADD REPLYlink written 11 months ago by fawazfebin30
0
gravatar for pb.panigrahi86
12 months ago by
pb.panigrahi860 wrote:

From clinical data, you have to filter samples whose ER/PR/HER2 status is negative. Once you get sample ids, you can use these to fetch data for these samples.

Meanwhile, I will see if I can provide you sample code for doing that.

ADD COMMENTlink written 12 months ago by pb.panigrahi860
1
Thanks for the help. I will be grateful if you could provide the code.
ADD REPLYlink written 12 months ago by fawazfebin30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 182 users visited in the last hour