Using gene symbols as labels for heatmap instead of microarray ID
1
0
Entering edit mode
@antonio-brito-camacho-6297
Last seen 9.6 years ago
Dear all, I am trying to analyze a publicly available dataset from GEO and I would like to put in the heatmap row labels the more ?human readable? , Gene Symbols instead of the chip ?ID? . I am aware that the function heatmap.2 accepts a parameter "labRow ?, but I am not able to access the values in the fvarLabel ?Gene Symbol?. Can someone help me? The code that I have cobbled together from some websites and that i am using is the following: library(limma) library(GEOquery) library(gplots) #get the GEO dataset, the authors mention that the expression values are already normalized using systematic variation normalization and log2 transformed > gse <- getGEO(?GSE41342?) #select a subset of samples > tmp <- gse[[1]] > eset <- tmp[ , tmpt$characteristics_ch1.2 %in% c(?protocol: no surgery?, ?protocol: DMM surgery?) & tmp$characteristics_ch1.4 %in% c(?age: 12 weeks?, ?age: 20 weeks?)] #create groups > f <- factor(as.character(eset$characteristics_ch1.2)) > design <- model.matrix(~f) #i don?t understand fully what this command does #compare differences in expression > fit <-eBayes(lmFit(eset, design) #select genes that have a meaningful significance > selected <- p.adjust(fit$p.value[ , 2] < 0.05 > esetSel <- eset[selected,] #create the heatmap heatmap.2(exprs(esetSel), col=redgreen(75), scale=?none", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] gplots_2.12.1 limma_3.18.7 GEOquery_2.28.0 Biobase_2.22.0 [5] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 gtools_3.1.1 [5] KernSmooth_2.23-10 RCurl_1.95-4.1 tools_3.0.2 XML_3.95-0.2 Thank you for your help Ant?nio
Normalization Normalization • 2.3k views
ADD COMMENT
0
Entering edit mode
@aliaksei-holik-4992
Last seen 8.2 years ago
Spain/Barcelona/Centre for Genomic Regu…
Hi Antonio, I'm not sure what you have tried so far to access the gene symbol values or whether they are even included in the dataset. I would suggest generating your own list of gene symbols from IDs using the annotation package for your platform. This way you also can be sure that you're using the most up to date annotation as new genes get mapped to existing probe IDs. All the best, Aliaksei. On 19/12/13 1:04 PM, Ant?nio Brito Camacho wrote: > Dear all, > > I am trying to analyze a publicly available dataset from GEO and I would like to put in the heatmap row labels the more ?human readable? , Gene Symbols instead of the chip ?ID? . > I am aware that the function heatmap.2 accepts a parameter "labRow ?, but I am not able to access the values in the fvarLabel ?Gene Symbol?. Can someone help me? > The code that I have cobbled together from some websites and that i am using is the following: > > library(limma) > library(GEOquery) > library(gplots) > > #get the GEO dataset, the authors mention that the expression values are already normalized using systematic variation normalization and log2 transformed > >> gse <- getGEO(?GSE41342?) > > #select a subset of samples >> tmp <- gse[[1]] >> eset <- tmp[ , tmpt$characteristics_ch1.2 %in% c(?protocol: no surgery?, ?protocol: DMM surgery?) & tmp$characteristics_ch1.4 %in% c(?age: 12 weeks?, ?age: 20 weeks?)] > > #create groups >> f <- factor(as.character(eset$characteristics_ch1.2)) >> design <- model.matrix(~f) #i don?t understand fully what this command does > > #compare differences in expression >> fit <-eBayes(lmFit(eset, design) > > #select genes that have a meaningful significance >> selected <- p.adjust(fit$p.value[ , 2] < 0.05 >> esetSel <- eset[selected,] > > #create the heatmap > heatmap.2(exprs(esetSel), col=redgreen(75), scale=?none", > key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] gplots_2.12.1 limma_3.18.7 GEOquery_2.28.0 Biobase_2.22.0 > [5] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 gtools_3.1.1 > [5] KernSmooth_2.23-10 RCurl_1.95-4.1 tools_3.0.2 XML_3.95-0.2 > > Thank you for your help > > Ant?nio > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Hello Aliaksei The data is included in the dataset, i can see it. It is under a fvarLabel called "Gene Symbol". I hadn't thought of the idea of using the anotation generated by the package but i will give it a try. Best regards Ant?nio Brito Camacho No dia 19/12/2013, ?s 05:56, Aliaksei Holik <salvador at="" bio.bsu.by=""> escreveu: > Hi Antonio, > > I'm not sure what you have tried so far to access the gene symbol values or whether they are even included in the dataset. I would suggest generating your own list of gene symbols from IDs using the annotation package for your platform. This way you also can be sure that you're using the most up to date annotation as new genes get mapped to existing probe IDs. > > All the best, > > Aliaksei. > >> On 19/12/13 1:04 PM, Ant?nio Brito Camacho wrote: >> Dear all, >> >> I am trying to analyze a publicly available dataset from GEO and I would like to put in the heatmap row labels the more ?human readable? , Gene Symbols instead of the chip ?ID? . >> I am aware that the function heatmap.2 accepts a parameter "labRow ?, but I am not able to access the values in the fvarLabel ?Gene Symbol?. Can someone help me? >> The code that I have cobbled together from some websites and that i am using is the following: >> >> library(limma) >> library(GEOquery) >> library(gplots) >> >> #get the GEO dataset, the authors mention that the expression values are already normalized using systematic variation normalization and log2 transformed >> >>> gse <- getGEO(?GSE41342?) >> >> #select a subset of samples >>> tmp <- gse[[1]] >>> eset <- tmp[ , tmpt$characteristics_ch1.2 %in% c(?protocol: no surgery?, ?protocol: DMM surgery?) & tmp$characteristics_ch1.4 %in% c(?age: 12 weeks?, ?age: 20 weeks?)] >> >> #create groups >>> f <- factor(as.character(eset$characteristics_ch1.2)) >>> design <- model.matrix(~f) #i don?t understand fully what this command does >> >> #compare differences in expression >>> fit <-eBayes(lmFit(eset, design) >> >> #select genes that have a meaningful significance >>> selected <- p.adjust(fit$p.value[ , 2] < 0.05 >>> esetSel <- eset[selected,] >> >> #create the heatmap >> heatmap.2(exprs(esetSel), col=redgreen(75), scale=?none", >> key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) >> >> sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] gplots_2.12.1 limma_3.18.7 GEOquery_2.28.0 Biobase_2.22.0 >> [5] BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 gtools_3.1.1 >> [5] KernSmooth_2.23-10 RCurl_1.95-4.1 tools_3.0.2 XML_3.95-0.2 >> >> Thank you for your help >> >> Ant?nio >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>
ADD REPLY
0
Entering edit mode
On 12/19/2013 12:11 AM, Ant?nio Brito Camacho wrote: > Hello Aliaksei > The data is included in the dataset, i can see it. It is under a fvarLabel called "Gene Symbol". probably the symbols are accessible as fData(esetSel)[["Gene Symbol"]] so heatmap.2(exprs(esetSel), labRow=fData(esetSel)[["Gene Symbol"]]) Martin > I hadn't thought of the idea of using the anotation generated by the package but i will give it a try. > > Best regards > Ant?nio Brito Camacho > > > No dia 19/12/2013, ?s 05:56, Aliaksei Holik <salvador at="" bio.bsu.by=""> escreveu: > >> Hi Antonio, >> >> I'm not sure what you have tried so far to access the gene symbol values or whether they are even included in the dataset. I would suggest generating your own list of gene symbols from IDs using the annotation package for your platform. This way you also can be sure that you're using the most up to date annotation as new genes get mapped to existing probe IDs. >> >> All the best, >> >> Aliaksei. >> >>> On 19/12/13 1:04 PM, Ant?nio Brito Camacho wrote: >>> Dear all, >>> >>> I am trying to analyze a publicly available dataset from GEO and I would like to put in the heatmap row labels the more ?human readable? , Gene Symbols instead of the chip ?ID? . >>> I am aware that the function heatmap.2 accepts a parameter "labRow ?, but I am not able to access the values in the fvarLabel ?Gene Symbol?. Can someone help me? >>> The code that I have cobbled together from some websites and that i am using is the following: >>> >>> library(limma) >>> library(GEOquery) >>> library(gplots) >>> >>> #get the GEO dataset, the authors mention that the expression values are already normalized using systematic variation normalization and log2 transformed >>> >>>> gse <- getGEO(?GSE41342?) >>> >>> #select a subset of samples >>>> tmp <- gse[[1]] >>>> eset <- tmp[ , tmpt$characteristics_ch1.2 %in% c(?protocol: no surgery?, ?protocol: DMM surgery?) & tmp$characteristics_ch1.4 %in% c(?age: 12 weeks?, ?age: 20 weeks?)] >>> >>> #create groups >>>> f <- factor(as.character(eset$characteristics_ch1.2)) >>>> design <- model.matrix(~f) #i don?t understand fully what this command does >>> >>> #compare differences in expression >>>> fit <-eBayes(lmFit(eset, design) >>> >>> #select genes that have a meaningful significance >>>> selected <- p.adjust(fit$p.value[ , 2] < 0.05 >>>> esetSel <- eset[selected,] >>> >>> #create the heatmap >>> heatmap.2(exprs(esetSel), col=redgreen(75), scale=?none", >>> key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) >>> >>> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] gplots_2.12.1 limma_3.18.7 GEOquery_2.28.0 Biobase_2.22.0 >>> [5] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 gtools_3.1.1 >>> [5] KernSmooth_2.23-10 RCurl_1.95-4.1 tools_3.0.2 XML_3.95-0.2 >>> >>> Thank you for your help >>> >>> Ant?nio >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Hello Martin I tried as you suggested and it worked out great ! Problem solved. Thank's António PS I tried a similar approach before, creating an intermediate variable to get the symbols using x <- fData(esetSel) labels <-x$Gene.Symbol but when i used it in heatmap.2(exprs(esetSel),labRow=labels) the labels would appear in the wrong order . in the documentation it said that it was because of the reordering that the dendrogram does, but using your solution the labels appear in the correct order ! On 19/12/2013, at 17:23, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 12/19/2013 12:11 AM, António Brito Camacho wrote: >> Hello Aliaksei >> The data is included in the dataset, i can see it. It is under a fvarLabel called "Gene Symbol". > > probably the symbols are accessible as > > fData(esetSel)[["Gene Symbol"]] > > so heatmap.2(exprs(esetSel), labRow=fData(esetSel)[["Gene Symbol"]]) > > Martin > >> I hadn't thought of the idea of using the anotation generated by the package but i will give it a try. >> >> Best regards >> António Brito Camacho >> >> >> No dia 19/12/2013, às 05:56, Aliaksei Holik <salvador@bio.bsu.by> escreveu: >> >>> Hi Antonio, >>> >>> I'm not sure what you have tried so far to access the gene symbol values or whether they are even included in the dataset. I would suggest generating your own list of gene symbols from IDs using the annotation package for your platform. This way you also can be sure that you're using the most up to date annotation as new genes get mapped to existing probe IDs. >>> >>> All the best, >>> >>> Aliaksei. >>> >>>> On 19/12/13 1:04 PM, António Brito Camacho wrote: >>>> Dear all, >>>> >>>> I am trying to analyze a publicly available dataset from GEO and I would like to put in the heatmap row labels the more “human readable” , Gene Symbols instead of the chip “ID” . >>>> I am aware that the function heatmap.2 accepts a parameter "labRow “, but I am not able to access the values in the fvarLabel “Gene Symbol”. Can someone help me? >>>> The code that I have cobbled together from some websites and that i am using is the following: >>>> >>>> library(limma) >>>> library(GEOquery) >>>> library(gplots) >>>> >>>> #get the GEO dataset, the authors mention that the expression values are already normalized using systematic variation normalization and log2 transformed >>>> >>>>> gse <- getGEO(‘GSE41342’) >>>> >>>> #select a subset of samples >>>>> tmp <- gse[[1]] >>>>> eset <- tmp[ , tmpt$characteristics_ch1.2 %in% c(“protocol: no surgery”, “protocol: DMM surgery”) & tmp$characteristics_ch1.4 %in% c(“age: 12 weeks”, “age: 20 weeks”)] >>>> >>>> #create groups >>>>> f <- factor(as.character(eset$characteristics_ch1.2)) >>>>> design <- model.matrix(~f) #i don’t understand fully what this command does >>>> >>>> #compare differences in expression >>>>> fit <-eBayes(lmFit(eset, design) >>>> >>>> #select genes that have a meaningful significance >>>>> selected <- p.adjust(fit$p.value[ , 2] < 0.05 >>>>> esetSel <- eset[selected,] >>>> >>>> #create the heatmap >>>> heatmap.2(exprs(esetSel), col=redgreen(75), scale=“none", >>>> key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) >>>> >>>> sessionInfo() >>>> R version 3.0.2 (2013-09-25) >>>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>>> >>>> locale: >>>> [1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] gplots_2.12.1 limma_3.18.7 GEOquery_2.28.0 Biobase_2.22.0 >>>> [5] BiocGenerics_0.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 gtools_3.1.1 >>>> [5] KernSmooth_2.23-10 RCurl_1.95-4.1 tools_3.0.2 XML_3.95-0.2 >>>> >>>> Thank you for your help >>>> >>>> António >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6