error under "hclust" for microarray clustering
2
0
Entering edit mode
avehna ▴ 240
@avehna-3930
Last seen 10.2 years ago
Hi All, I'm trying to cluster 21657 genes that are differentially expressed in my microarray data, but it's actually not working for me. After reading the normalized signal and calculating the mean for each treatment I proceed to read the list of genes differentially expressed (previously calculated using limma). The problem occurs during "hclust" function (please see below my code and corresponding error). Is it possible for this error to be due to the number of genes? when I use the same code for only 1000 genes it works pretty well. How could I solve this problem? I need this figure for my paper... Thank you for your help! Sincerely, Avhena ************************************************ > signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] > pDatam <- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = TRUE, sep = '\t') > pData <- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = TRUE, sep = '\t') > expset <- new("ExpressionSet", exprs = signal, phenoData = pData) > means1 <- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), method="logged", logged=FALSE)) > means2 <- means(pairwise.comparison(expset, "Type", c("BMP.VPA", "SHH.1D"), method="logged", logged=FALSE)) > means3 <- means(pairwise.comparison(expset, "Type", c("SHH.6H", "SHH.VPA.1D"), method="logged", logged=FALSE)) > all_means<-cbind(means1,means2,means3) > expmeans <- new("ExpressionSet", exprs = all_means, phenoData = pDatam) > subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", "SHH.VPA.1D")) > genes<-read.table("affy_ids_diff_exprs05.dat") > mysubset<-exprs(subset)[match(levels(genes[,]), rownames(exprs(subset))),] > hr <- hclust(as.dist(1-cor(t(mysubset), method="spearman")), method="complete") Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method = "complete") : NA/NaN/Inf in foreign function call (arg 11) Calls: hclust -> .Fortran In addition: Warning message: In cor(t(mysubset), method = "spearman") : the standard deviation is zero Execution halted [[alternative HTML version deleted]]
• 3.8k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Thu, Jan 27, 2011 at 12:00 AM, avehna <avhena@gmail.com> wrote: > Hi All, > > I'm trying to cluster 21657 genes that are differentially expressed in my > microarray data, but it's actually not working for me. After reading the > normalized signal and calculating the mean for each treatment I proceed to > read the list of genes differentially expressed (previously calculated > using > limma). The problem occurs during "hclust" function (please see below my > code and corresponding error). Is it possible for this error to be due to > the number of genes? when I use the same code for only 1000 genes it works > pretty well. > > How could I solve this problem? I need this figure for my paper... > > Thank you for your help! > > Sincerely, > Avhena > > > > ************************************************ > > signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] > > pDatam <- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = > TRUE, sep = '\t') > > pData <- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = > TRUE, sep = '\t') > > expset <- new("ExpressionSet", exprs = signal, phenoData = pData) > > > means1 <- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), > method="logged", logged=FALSE)) > > means2 <- means(pairwise.comparison(expset, "Type", c("BMP.VPA", > "SHH.1D"), method="logged", logged=FALSE)) > > means3 <- means(pairwise.comparison(expset, "Type", c("SHH.6H", > "SHH.VPA.1D"), method="logged", logged=FALSE)) > > all_means<-cbind(means1,means2,means3) > > expmeans <- new("ExpressionSet", exprs = all_means, phenoData = pDatam) > > subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", > "SHH.VPA.1D")) > > genes<-read.table("affy_ids_diff_exprs05.dat") > > mysubset<-exprs(subset)[match(levels(genes[,]), > rownames(exprs(subset))),] > > > hr <- hclust(as.dist(1-cor(t(mysubset), method="spearman")), > method="complete") > > Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method > = > "complete") : > NA/NaN/Inf in foreign function call (arg 11) > Looks like you might have some NAs or Inf in your data. Try summary(mysubset) to see. Sean > Calls: hclust -> .Fortran > In addition: Warning message: > In cor(t(mysubset), method = "spearman") : the standard deviation is zero > Execution halted > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Sean, you were right... there are some inf in my data. Thanks a lot for your help! On Thu, Jan 27, 2011 at 6:04 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > > On Thu, Jan 27, 2011 at 12:00 AM, avehna <avhena@gmail.com> wrote: > >> Hi All, >> >> I'm trying to cluster 21657 genes that are differentially expressed in my >> microarray data, but it's actually not working for me. After reading the >> normalized signal and calculating the mean for each treatment I proceed to >> read the list of genes differentially expressed (previously calculated >> using >> limma). The problem occurs during "hclust" function (please see below my >> code and corresponding error). Is it possible for this error to be due to >> the number of genes? when I use the same code for only 1000 genes it works >> pretty well. >> >> How could I solve this problem? I need this figure for my paper... >> >> Thank you for your help! >> >> Sincerely, >> Avhena >> >> >> >> ************************************************ >> > signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] >> > pDatam <- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = >> TRUE, sep = '\t') >> > pData <- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = >> TRUE, sep = '\t') >> > expset <- new("ExpressionSet", exprs = signal, phenoData = pData) >> >> > means1 <- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), >> method="logged", logged=FALSE)) >> > means2 <- means(pairwise.comparison(expset, "Type", c("BMP.VPA", >> "SHH.1D"), method="logged", logged=FALSE)) >> > means3 <- means(pairwise.comparison(expset, "Type", c("SHH.6H", >> "SHH.VPA.1D"), method="logged", logged=FALSE)) >> > all_means<-cbind(means1,means2,means3) >> > expmeans <- new("ExpressionSet", exprs = all_means, phenoData = pDatam) >> > subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", >> "SHH.VPA.1D")) >> > genes<-read.table("affy_ids_diff_exprs05.dat") >> > mysubset<-exprs(subset)[match(levels(genes[,]), >> rownames(exprs(subset))),] >> >> > hr <- hclust(as.dist(1-cor(t(mysubset), method="spearman")), >> method="complete") >> >> Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method >> = >> "complete") : >> NA/NaN/Inf in foreign function call (arg 11) >> > > Looks like you might have some NAs or Inf in your data. Try > summary(mysubset) to see. > > Sean > > >> Calls: hclust -> .Fortran >> In addition: Warning message: >> In cor(t(mysubset), method = "spearman") : the standard deviation is zero >> Execution halted >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States
Hi Avhena, On 1/27/2011 12:00 AM, avehna wrote: > Hi All, > > I'm trying to cluster 21657 genes that are differentially expressed in my > microarray data, but it's actually not working for me. After reading the > normalized signal and calculating the mean for each treatment I proceed to > read the list of genes differentially expressed (previously calculated using > limma). The problem occurs during "hclust" function (please see below my > code and corresponding error). Is it possible for this error to be due to > the number of genes? when I use the same code for only 1000 genes it works > pretty well. > > How could I solve this problem? I need this figure for my paper... You need to remove the rows that have no variability. For example: > dat <- matrix(rnorm(1000), nc=10) > dat[3,] <- rep(dat[3,3], 10) ## make row three have var=0 > hclust(as.dist(1-cor(t(dat), method="spearman")), method="complete") Error in hclust(as.dist(1 - cor(t(dat), method = "spearman")), method = "complete") : NA/NaN/Inf in foreign function call (arg 11) In addition: Warning message: In cor(t(dat), method = "spearman") : the standard deviation is zero now again, without this row > hclust(as.dist(1-cor(t(dat[-3,]), method="spearman")), method="complete") Call: hclust(d = as.dist(1 - cor(t(dat[-3, ]), method = "spearman")), method = "complete") Cluster method : complete Number of objects: 99 something like ind <- apply(mysubset, 1, var) == 0 mysubset <- mysubset[!ind,] should do the trick. Best, Jim > > Thank you for your help! > > Sincerely, > Avhena > > > > ************************************************ >> signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] >> pDatam<- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = > TRUE, sep = '\t') >> pData<- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = > TRUE, sep = '\t') >> expset<- new("ExpressionSet", exprs = signal, phenoData = pData) > >> means1<- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), > method="logged", logged=FALSE)) >> means2<- means(pairwise.comparison(expset, "Type", c("BMP.VPA", > "SHH.1D"), method="logged", logged=FALSE)) >> means3<- means(pairwise.comparison(expset, "Type", c("SHH.6H", > "SHH.VPA.1D"), method="logged", logged=FALSE)) >> all_means<-cbind(means1,means2,means3) >> expmeans<- new("ExpressionSet", exprs = all_means, phenoData = pDatam) >> subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", > "SHH.VPA.1D")) >> genes<-read.table("affy_ids_diff_exprs05.dat") >> mysubset<-exprs(subset)[match(levels(genes[,]), rownames(exprs(subset))),] > >> hr<- hclust(as.dist(1-cor(t(mysubset), method="spearman")), > method="complete") > > Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method = > "complete") : > NA/NaN/Inf in foreign function call (arg 11) > Calls: hclust -> .Fortran > In addition: Warning message: > In cor(t(mysubset), method = "spearman") : the standard deviation is zero > Execution halted > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Jim, thanks a lot! there were some inf in my data... I will try again after removing then. Avhena On Thu, Jan 27, 2011 at 9:43 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Avhena, > > > On 1/27/2011 12:00 AM, avehna wrote: > >> Hi All, >> >> I'm trying to cluster 21657 genes that are differentially expressed in my >> microarray data, but it's actually not working for me. After reading the >> normalized signal and calculating the mean for each treatment I proceed to >> read the list of genes differentially expressed (previously calculated >> using >> limma). The problem occurs during "hclust" function (please see below my >> code and corresponding error). Is it possible for this error to be due to >> the number of genes? when I use the same code for only 1000 genes it works >> pretty well. >> >> How could I solve this problem? I need this figure for my paper... >> > > You need to remove the rows that have no variability. For example: > > > dat <- matrix(rnorm(1000), nc=10) > > dat[3,] <- rep(dat[3,3], 10) ## make row three have var=0 > > hclust(as.dist(1-cor(t(dat), method="spearman")), method="complete") > Error in hclust(as.dist(1 - cor(t(dat), method = "spearman")), method = > "complete") : > > NA/NaN/Inf in foreign function call (arg 11) > In addition: Warning message: > In cor(t(dat), method = "spearman") : the standard deviation is zero > > now again, without this row > > > hclust(as.dist(1-cor(t(dat[-3,]), method="spearman")), method="complete") > > Call: > hclust(d = as.dist(1 - cor(t(dat[-3, ]), method = "spearman")), method = > "complete") > > Cluster method : complete > Number of objects: 99 > > something like > > ind <- apply(mysubset, 1, var) == 0 > mysubset <- mysubset[!ind,] > > should do the trick. > > Best, > > Jim > > > >> Thank you for your help! >> >> Sincerely, >> Avhena >> >> >> >> ************************************************ >> >>> signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] >>> pDatam<- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = >>> >> TRUE, sep = '\t') >> >>> pData<- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = >>> >> TRUE, sep = '\t') >> >>> expset<- new("ExpressionSet", exprs = signal, phenoData = pData) >>> >> >> means1<- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), >>> >> method="logged", logged=FALSE)) >> >>> means2<- means(pairwise.comparison(expset, "Type", c("BMP.VPA", >>> >> "SHH.1D"), method="logged", logged=FALSE)) >> >>> means3<- means(pairwise.comparison(expset, "Type", c("SHH.6H", >>> >> "SHH.VPA.1D"), method="logged", logged=FALSE)) >> >>> all_means<-cbind(means1,means2,means3) >>> expmeans<- new("ExpressionSet", exprs = all_means, phenoData = pDatam) >>> subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", >>> >> "SHH.VPA.1D")) >> >>> genes<-read.table("affy_ids_diff_exprs05.dat") >>> mysubset<-exprs(subset)[match(levels(genes[,]), >>> rownames(exprs(subset))),] >>> >> >> hr<- hclust(as.dist(1-cor(t(mysubset), method="spearman")), >>> >> method="complete") >> >> Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method >> = >> "complete") : >> NA/NaN/Inf in foreign function call (arg 11) >> Calls: hclust -> .Fortran >> In addition: Warning message: >> In cor(t(mysubset), method = "spearman") : the standard deviation is zero >> Execution halted >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6