Hi
I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance
Hi
I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance
Hi fawazfebin thank you for interest in using our tool TCGAbiolinks, and pb.panigrahi86 for helping to find a solution. I am sharing here the code to obtain the triple negative TCGA-BRCA and I will add it in our manual. If you have any other questions or issues please you write here or in https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues where if me or Tiago tiagochst cannot answer you it is possible that you can have a prompt response from the GitHub community as well. Have a nice day and good work. Best, Antonio.
library(TCGAbiolinks)
#------------------- 4.1 Parameter Definition --------------------
CancerProject <- "TCGA-BRCA"
DataDirectory <- paste0("GDC_",gsub("-","_",CancerProject))
FileNameData <- paste0(DataDirectory, "_","Illumina HiSeq",".rda")
# Query platform Illumina HiSeq with a list of barcode
query <- GDCquery(project = CancerProject,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
experimental.strategy = "RNA-Seq",
legacy = TRUE)
samplesDown <- query$results[[1]]$cases
dataAssy.sub <- TCGAquery_subtype(tumor = gsub("TCGA-","",CancerProject))
dataERneg <- dataAssy.sub[dataAssy.sub$ER.Status %in% "Negative",]
dataPRneg <- dataAssy.sub[dataAssy.sub$PR.Status %in% "Negative",]
dataHER2neg <- dataAssy.sub[dataAssy.sub$HER2.Final.Status %in% "Negative",]
dataTNBC <- Reduce(intersect, list(dataERneg$patient,
dataPRneg$patient,
dataHER2neg$patient))
dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "TP")
dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "NT")
dataSmTP_TNBC <- dataSmTP[substr(dataSmTP,1,12) %in% dataTNBC]
queryDown <- GDCquery(project = CancerProject,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
barcode = c(dataSmTP_TNBC, dataSmTP),
experimental.strategy = "RNA-Seq",
legacy = TRUE)
GDCdownload(query = queryDown,
directory = DataDirectory)
dataPrep <- GDCprepare(query = queryDown,
save = TRUE,
directory = DataDirectory,
save.filename = FileNameData)
From clinical data, you have to filter samples whose ER/PR/HER2 status is negative. Once you get sample ids, you can use these to fetch data for these samples.
Meanwhile, I will see if I can provide you sample code for doing that.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Great thanks Antonio for the specific code. Can I know whether there is only gene expression data available for triple negative breast cancer?
I got the following warning message while running one of the commands above:
Anything to be cleared at this point? Great thanks in advance.
Thanks for this detailed response Antonio. Has the TCGAquery_subtype() been updated somehow? The data frame that I retrieve does not contain the columns you mention: "ER.Status", "PR.Status", "HER2.Final.Status"? Is there any other way I can retrieve this information? I should say, I also tried
TCGA_MolecularSubtype("TCGA-60-2721-01A-01R-0851-07") # just using the vignette example barcode
But this retrieved an empty data frame for some reason - tried other barcodes, but same result! Not sure what is going wrong, to be honest, so any help would be greatly appreciated!
Best,
Ralitsa