chrisclarkson10030 wrote:

I need to make a statistical comparison using breast cancer data. I have made a heat map at the following link on the Bioportal:

http://www.cbioportal.org/index.do?cancer_study_id=brca_tcga&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&data_priority=0&case_set_id=brca_tcga_mrna&gene_list=BRCA1%250ABRCA2%250ATP53%250ACCL2%250ACCR3%250ACD44%250AENG%250AIL6%250AIL33%250ACD33%250ACSF1%250AHIF1A%250ACLEC7A&geneset_list=%20&tab_index=tab_visualize&Action=Submit&genetic_profile_ids_PROFILE_MRNA_EXPRESSION=brca_tcga_mrna_median_Zscores&show_samples=false&heatmap_track_groups=brca_tcga_mrna_median_Zscores%2CBRCA1%2CBRCA2%2CTP53%2CCCL2%2CCCR3%2CCD44%2CENG%2CIL6%2CIL33%2CCD33%2CCSF1%2CHIF1A%2CCLEC7A

To do this I initially went to the http://www.cbioportal.org page and selected the cancer samples that I am interested in: Breast Invasive Carcinoma (TCGA, Provisional)

I then went to the textbook to enter the names of the genes that I am interested in: BRCA1 BRCA2 TP53 CCL2 CCR3 CD44 ENG IL6 IL33 CD33 CSF1 HIF1A CLEC7A

Then I submitted the query and was able to plot a clustered heat map in the "oncoprint" tab.

While this type of annotation is useful, I would also like to be able to download the data that is generating this heat map and make a similar heat map on my local

computer (for experimental reasons).

To attempt this I went to the original search page and clicked the "View summary" button.

From this I found a "Download Data" button at the top of the page.

This returns a 'tar.gz' file with lots of interesting datasets. e.g.:

data_mRNA_median_Zscores.txt

data_expression_median.txt

data_RNA_Seq_v2_mRNA_median_Zscores.txt

data_RNA_Seq_v2_expression_median.txt

I want to find the expression data that was used to generate the histogram shown in the first provided link. From the downloaded files, I initially tried data_RNA_Seq_v2_expression_median.txt

Below is my attempt to reproduce a heatmap similar to the one above:

data=read.table('data_RNA_Seq_v2_expression_median.txt',header=T,fill = T,stringsAsFactors = F)

data_OI=data.frame()
for(i in genes_OI$V1){ data_OI=rbind(data_OI,data[which(data[,1]==i),]) } sumis.na(data_OI)) library(gplots) png('test_TCGA_patients.png',height = 1000,width=1000) data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric) data=na.omit(data) heatmap.2(as.matrix(data_OI[,-c(1,2)]),labCol = NA, labRow = data_OI[,1],cexRow = 1.4,keysize = 1.4) dev.off()  The resulting heatmpat is as follows: But this is not at all like the heatmap in the link at the top of the page.... is there some normalisation step that I am missing? I also tried using the file where the the Zscores were computed: data_RNA_Seq_v2_mRNA_median_Zscores.txt This file however (just from looking at the file content does not require any distance matrix): data=read.table('data_RNA_Seq_v2_mRNA_median_Zscores.txt',header=T,fill = T,stringsAsFactors = F) genes_OI=c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A") data_OI=data.frame() for(i in genes_OI$V1){
data_OI=rbind(data_OI,data[which(data[,1]==i),])
}
png('test_TCGA_patients.png',height = 1000,width=1000)
data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric)

data=na.omit(data)

hclustfunc <- function(x, method = "complete", dmeth = "euclidean") {
hclust(dist(x, method = dmeth), method = method)
}
rc<-hclustfunc(data_OI[,-c(1,2)])
cd=t(data_OI[,-c(1,2)])
cc<-hclustfunc(cd)
heatmap(as.matrix(data_OI[,-c(1,2)]), Rowv=as.dendrogram(rc),
Colv=as.dendrogram(cc),labRow = data_OI[,1],labCol = NA)
dev.off()

This unfortunately does not produce anything similar to the heat map seen in the link above....

Hence I am wondering were it is that I am going wrong....? Am I using the correct file or is there

a normalisation step that I am missing?

