Hi

I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance

Hi fawazfebin thank you for interest in using our tool TCGAbiolinks, and pb.panigrahi86 for helping to find a solution. I am sharing here the code to obtain the triple negative TCGA-BRCA and I will add it in our manual. If you have any other questions or issues please you write here or in https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues where if me or Tiago tiagochst cannot answer you it is possible that you can have a prompt response from the GitHub community as well. Have a nice day and good work. Best, Antonio.

#-------------------  4.1 Parameter Definition                 --------------------

CancerProject <- "TCGA-BRCA"

# Query platform Illumina HiSeq with a list of barcode
query <- GDCquery(project = CancerProject,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
experimental.strategy = "RNA-Seq",
legacy = TRUE)

samplesDown <- query$results[[1]]$cases

dataAssy.sub <- TCGAquery_subtype(tumor = gsub("TCGA-","",CancerProject))

dataERneg <- dataAssy.sub[dataAssy.sub$ER.Status %in% "Negative",] dataPRneg <- dataAssy.sub[dataAssy.sub$PR.Status %in% "Negative",]
dataHER2neg <- dataAssy.sub[dataAssy.sub$HER2.Final.Status %in% "Negative",] dataTNBC <- Reduce(intersect, list(dataERneg$patient,
dataPRneg$patient, dataHER2neg$patient))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "TP")

dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "NT")

dataSmTP_TNBC <- dataSmTP[substr(dataSmTP,1,12) %in% dataTNBC]

queryDown <- GDCquery(project = CancerProject,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
barcode = c(dataSmTP_TNBC, dataSmTP),
experimental.strategy = "RNA-Seq",
legacy = TRUE)

dataPrep <- GDCprepare(query = queryDown,
save = TRUE,
save.filename = FileNameData)

Great thanks Antonio for the specific code. Can I know whether there is only gene expression data available for triple negative breast cancer?

I got the following warning message while running one of the commands above:

> queryDown <- GDCquery(project = CancerProject,
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq",
+                       file.type = "results",
+                       barcode = c(dataSmTP_TNBC, dataSmTP),
+                       experimental.strategy = "RNA-Seq",
+                       legacy = TRUE)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-BRCA
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By experimental.strategy
ooo By data.type
ooo By file.type
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
Warning: There are more than one file for the same case. Please verify query results. You can use the command View(getResults(query)) in rstudio
ooo Check if there results for the query
-------------------
o Preparing output
-------------------

Anything to be cleared at this point?  Great thanks in advance.

0
Entering edit mode

Thanks for this detailed response Antonio. Has the TCGAquery_subtype() been updated somehow? The data frame that I retrieve does not contain the columns you mention: "ER.Status", "PR.Status", "HER2.Final.Status"? Is there any other way I can retrieve this information? I should say, I also tried

TCGA_MolecularSubtype("TCGA-60-2721-01A-01R-0851-07") # just using the vignette example barcode

But this retrieved an empty data frame for some reason - tried other barcodes, but same result! Not sure what is going wrong, to be honest, so any help would be greatly appreciated!

Best,

Ralitsa

From clinical data, you have to filter samples whose ER/PR/HER2 status is negative. Once you get sample ids, you can use these to fetch data for these samples.

Meanwhile, I will see if I can provide you sample code for doing that.

Thanks for the help. I will be grateful if you could provide the code.