2
1
Entering edit mode
fawazfebin ▴ 30
@fawazfebin-14053
Last seen 18 months ago

Hi

I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance

2
Entering edit mode
@antoniocolaprico-14504
Last seen 4.2 years ago
USA/ Florida/ University of Miami Hospi…

Hi fawazfebin thank you for interest in using our tool TCGAbiolinks, and pb.panigrahi86 for helping to find a solution. I am sharing here the code to obtain the triple negative TCGA-BRCA and I will add it in our manual. If you have any other questions or issues please you write here or in https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues where if me or Tiago tiagochst cannot answer you it is possible that you can have a prompt response from the GitHub community as well. Have a nice day and good work. Best, Antonio.

#-------------------  4.1 Parameter Definition                 --------------------

CancerProject <- "TCGA-BRCA"

# Query platform Illumina HiSeq with a list of barcode
query <- GDCquery(project = CancerProject,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
experimental.strategy = "RNA-Seq",
legacy = TRUE)

samplesDown <- query$results[[1]]$cases

dataAssy.sub <- TCGAquery_subtype(tumor = gsub("TCGA-","",CancerProject))

dataERneg <- dataAssy.sub[dataAssy.sub$ER.Status %in% "Negative",] dataPRneg <- dataAssy.sub[dataAssy.sub$PR.Status %in% "Negative",]
dataHER2neg <- dataAssy.sub[dataAssy.sub$HER2.Final.Status %in% "Negative",] dataTNBC <- Reduce(intersect, list(dataERneg$patient,
dataPRneg$patient, dataHER2neg$patient))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "TP")

dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "NT")

dataSmTP_TNBC <- dataSmTP[substr(dataSmTP,1,12) %in% dataTNBC]

queryDown <- GDCquery(project = CancerProject,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
barcode = c(dataSmTP_TNBC, dataSmTP),
experimental.strategy = "RNA-Seq",
legacy = TRUE)

dataPrep <- GDCprepare(query = queryDown,
save = TRUE,
save.filename = FileNameData)

0
Entering edit mode

Great thanks Antonio for the specific code. Can I know whether there is only gene expression data available for triple negative breast cancer?

0
Entering edit mode

I got the following warning message while running one of the commands above:

> queryDown <- GDCquery(project = CancerProject,
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq",
+                       file.type = "results",
+                       barcode = c(dataSmTP_TNBC, dataSmTP),
+                       experimental.strategy = "RNA-Seq",
+                       legacy = TRUE)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-BRCA
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By experimental.strategy
ooo By data.type
ooo By file.type
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
Warning: There are more than one file for the same case. Please verify query results. You can use the command View(getResults(query)) in rstudio
ooo Check if there results for the query
-------------------
o Preparing output
-------------------

Anything to be cleared at this point?  Great thanks in advance.

0
Entering edit mode

Thanks for this detailed response Antonio. Has the TCGAquery_subtype() been updated somehow? The data frame that I retrieve does not contain the columns you mention: "ER.Status", "PR.Status", "HER2.Final.Status"? Is there any other way I can retrieve this information? I should say, I also tried

TCGA_MolecularSubtype("TCGA-60-2721-01A-01R-0851-07") # just using the vignette example barcode

But this retrieved an empty data frame for some reason - tried other barcodes, but same result! Not sure what is going wrong, to be honest, so any help would be greatly appreciated!

Best,

Ralitsa

1
Entering edit mode
@pbpanigrahi86-14641
Last seen 4.1 years ago

From clinical data, you have to filter samples whose ER/PR/HER2 status is negative. Once you get sample ids, you can use these to fetch data for these samples.

Meanwhile, I will see if I can provide you sample code for doing that.

1
Entering edit mode
Thanks for the help. I will be grateful if you could provide the code.