0
17 months ago by
chrisclarkson10030 wrote:

I need to make a statistical comparison using breast cancer data. I have made a heat map at the following link on the Bioportal:

http://www.cbioportal.org/index.do?cancer_study_id=brca_tcga&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&data_priority=0&case_set_id=brca_tcga_mrna&gene_list=BRCA1%250ABRCA2%250ATP53%250ACCL2%250ACCR3%250ACD44%250AENG%250AIL6%250AIL33%250ACD33%250ACSF1%250AHIF1A%250ACLEC7A&geneset_list=%20&tab_index=tab_visualize&Action=Submit&genetic_profile_ids_PROFILE_MRNA_EXPRESSION=brca_tcga_mrna_median_Zscores&show_samples=false&heatmap_track_groups=brca_tcga_mrna_median_Zscores%2CBRCA1%2CBRCA2%2CTP53%2CCCL2%2CCCR3%2CCD44%2CENG%2CIL6%2CIL33%2CCD33%2CCSF1%2CHIF1A%2CCLEC7A

To do this I initially went to the http://www.cbioportal.org page and selected the cancer samples that I am interested in: Breast Invasive Carcinoma (TCGA, Provisional)

I then went to the textbook to enter the names of the genes that I am interested in: BRCA1 BRCA2 TP53 CCL2 CCR3 CD44 ENG IL6 IL33 CD33 CSF1 HIF1A CLEC7A

Then I submitted the query and was able to plot a clustered heat map in the "oncoprint" tab.

While this type of annotation is useful, I would also like to be able to download the data that is generating this heat map and make a similar heat map on my local

computer (for experimental reasons).

To attempt this I went to the original search page and clicked the "View summary" button.

From this I found a "Download Data" button at the top of the page.

This returns a 'tar.gz' file with lots of interesting datasets. e.g.:

data_mRNA_median_Zscores.txt

data_expression_median.txt

data_RNA_Seq_v2_mRNA_median_Zscores.txt

data_RNA_Seq_v2_expression_median.txt

I want to find the expression data that was used to generate the histogram shown in the first provided link. From the downloaded files, I initially tried data_RNA_Seq_v2_expression_median.txt

Below is my attempt to reproduce a heatmap similar to the one above:

data=read.table('data_RNA_Seq_v2_expression_median.txt',header=T,fill = T,stringsAsFactors = F)

data_OI=data.frame()
for(i in genes_OI$V1){ data_OI=rbind(data_OI,data[which(data[,1]==i),]) } sumis.na(data_OI)) library(gplots) png('test_TCGA_patients.png',height = 1000,width=1000) data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric) data=na.omit(data) heatmap.2(as.matrix(data_OI[,-c(1,2)]),labCol = NA, labRow = data_OI[,1],cexRow = 1.4,keysize = 1.4) dev.off()  The resulting heatmpat is as follows: But this is not at all like the heatmap in the link at the top of the page.... is there some normalisation step that I am missing? I also tried using the file where the the Zscores were computed: data_RNA_Seq_v2_mRNA_median_Zscores.txt This file however (just from looking at the file content does not require any distance matrix): data=read.table('data_RNA_Seq_v2_mRNA_median_Zscores.txt',header=T,fill = T,stringsAsFactors = F) genes_OI=c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A") data_OI=data.frame() for(i in genes_OI$V1){
data_OI=rbind(data_OI,data[which(data[,1]==i),])
}
png('test_TCGA_patients.png',height = 1000,width=1000)
data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric)

data=na.omit(data)

hclustfunc <- function(x, method = "complete", dmeth = "euclidean") {
hclust(dist(x, method = dmeth), method = method)
}
rc<-hclustfunc(data_OI[,-c(1,2)])
cd=t(data_OI[,-c(1,2)])
cc<-hclustfunc(cd)
heatmap(as.matrix(data_OI[,-c(1,2)]), Rowv=as.dendrogram(rc),
Colv=as.dendrogram(cc),labRow = data_OI[,1],labCol = NA)
dev.off()

This unfortunately does not produce anything similar to the heat map seen in the link above....

Hence I am wondering were it is that I am going wrong....? Am I using the correct file or is there

a normalisation step that I am missing?

modified 17 months ago • written 17 months ago by chrisclarkson10030

Hello chrisclarkson100!

We believe that this post does not fit the main topic of this site.

This question doesn't have anything to do with any Bioconductor package, and is instead a general question about how to analyze data from cbioportal. You should ask them.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

Hi Chris! If you post your question on https://groups.google.com/forum/#!forum/cbioportal the cBioPortal community will take a look at it!

Best, Sander