error under "hclust" for microarray clustering

0

Entering edit mode

avehna ▴ 240

@avehna-3930

Last seen 9.6 years ago

Hi All, I'm trying to cluster 21657 genes that are differentially expressed in my microarray data, but it's actually not working for me. After reading the normalized signal and calculating the mean for each treatment I proceed to read the list of genes differentially expressed (previously calculated using limma). The problem occurs during "hclust" function (please see below my code and corresponding error). Is it possible for this error to be due to the number of genes? when I use the same code for only 1000 genes it works pretty well. How could I solve this problem? I need this figure for my paper... Thank you for your help! Sincerely, Avhena ************************************************ > signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] > pDatam <- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = TRUE, sep = '\t') > pData <- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = TRUE, sep = '\t') > expset <- new("ExpressionSet", exprs = signal, phenoData = pData) > means1 <- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), method="logged", logged=FALSE)) > means2 <- means(pairwise.comparison(expset, "Type", c("BMP.VPA", "SHH.1D"), method="logged", logged=FALSE)) > means3 <- means(pairwise.comparison(expset, "Type", c("SHH.6H", "SHH.VPA.1D"), method="logged", logged=FALSE)) > all_means<-cbind(means1,means2,means3) > expmeans <- new("ExpressionSet", exprs = all_means, phenoData = pDatam) > subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", "SHH.VPA.1D")) > genes<-read.table("affy_ids_diff_exprs05.dat") > mysubset<-exprs(subset)[match(levels(genes[,]), rownames(exprs(subset))),] > hr <- hclust(as.dist(1-cor(t(mysubset), method="spearman")), method="complete") Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method = "complete") : NA/NaN/Inf in foreign function call (arg 11) Calls: hclust -> .Fortran In addition: Warning message: In cor(t(mysubset), method = "spearman") : the standard deviation is zero Execution halted [[alternative HTML version deleted]]

• 3.6k views

ADD COMMENT • link updated 13.2 years ago by James W. MacDonald 65k • written 13.2 years ago by avehna ▴ 240

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Thu, Jan 27, 2011 at 12:00 AM, avehna <avhena@gmail.com> wrote: > Hi All, > > I'm trying to cluster 21657 genes that are differentially expressed in my > microarray data, but it's actually not working for me. After reading the > normalized signal and calculating the mean for each treatment I proceed to > read the list of genes differentially expressed (previously calculated > using > limma). The problem occurs during "hclust" function (please see below my > code and corresponding error). Is it possible for this error to be due to > the number of genes? when I use the same code for only 1000 genes it works > pretty well. > > How could I solve this problem? I need this figure for my paper... > > Thank you for your help! > > Sincerely, > Avhena > > > > ************************************************ > > signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] > > pDatam <- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = > TRUE, sep = '\t') > > pData <- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = > TRUE, sep = '\t') > > expset <- new("ExpressionSet", exprs = signal, phenoData = pData) > > > means1 <- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), > method="logged", logged=FALSE)) > > means2 <- means(pairwise.comparison(expset, "Type", c("BMP.VPA", > "SHH.1D"), method="logged", logged=FALSE)) > > means3 <- means(pairwise.comparison(expset, "Type", c("SHH.6H", > "SHH.VPA.1D"), method="logged", logged=FALSE)) > > all_means<-cbind(means1,means2,means3) > > expmeans <- new("ExpressionSet", exprs = all_means, phenoData = pDatam) > > subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", > "SHH.VPA.1D")) > > genes<-read.table("affy_ids_diff_exprs05.dat") > > mysubset<-exprs(subset)[match(levels(genes[,]), > rownames(exprs(subset))),] > > > hr <- hclust(as.dist(1-cor(t(mysubset), method="spearman")), > method="complete") > > Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method > = > "complete") : > NA/NaN/Inf in foreign function call (arg 11) > Looks like you might have some NAs or Inf in your data. Try summary(mysubset) to see. Sean > Calls: hclust -> .Fortran > In addition: Warning message: > In cor(t(mysubset), method = "spearman") : the standard deviation is zero > Execution halted > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.2 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean, you were right... there are some inf in my data. Thanks a lot for your help! On Thu, Jan 27, 2011 at 6:04 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > > On Thu, Jan 27, 2011 at 12:00 AM, avehna <avhena@gmail.com> wrote: > >> Hi All, >> >> I'm trying to cluster 21657 genes that are differentially expressed in my >> microarray data, but it's actually not working for me. After reading the >> normalized signal and calculating the mean for each treatment I proceed to >> read the list of genes differentially expressed (previously calculated >> using >> limma). The problem occurs during "hclust" function (please see below my >> code and corresponding error). Is it possible for this error to be due to >> the number of genes? when I use the same code for only 1000 genes it works >> pretty well. >> >> How could I solve this problem? I need this figure for my paper... >> >> Thank you for your help! >> >> Sincerely, >> Avhena >> >> >> >> ************************************************ >> > signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] >> > pDatam <- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = >> TRUE, sep = '\t') >> > pData <- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = >> TRUE, sep = '\t') >> > expset <- new("ExpressionSet", exprs = signal, phenoData = pData) >> >> > means1 <- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), >> method="logged", logged=FALSE)) >> > means2 <- means(pairwise.comparison(expset, "Type", c("BMP.VPA", >> "SHH.1D"), method="logged", logged=FALSE)) >> > means3 <- means(pairwise.comparison(expset, "Type", c("SHH.6H", >> "SHH.VPA.1D"), method="logged", logged=FALSE)) >> > all_means<-cbind(means1,means2,means3) >> > expmeans <- new("ExpressionSet", exprs = all_means, phenoData = pDatam) >> > subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", >> "SHH.VPA.1D")) >> > genes<-read.table("affy_ids_diff_exprs05.dat") >> > mysubset<-exprs(subset)[match(levels(genes[,]), >> rownames(exprs(subset))),] >> >> > hr <- hclust(as.dist(1-cor(t(mysubset), method="spearman")), >> method="complete") >> >> Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method >> = >> "complete") : >> NA/NaN/Inf in foreign function call (arg 11) >> > > Looks like you might have some NAs or Inf in your data. Try > summary(mysubset) to see. > > Sean > > >> Calls: hclust -> .Fortran >> In addition: Warning message: >> In cor(t(mysubset), method = "spearman") : the standard deviation is zero >> Execution halted >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 13.2 years ago avehna ▴ 240

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

Hi Avhena, On 1/27/2011 12:00 AM, avehna wrote: > Hi All, > > I'm trying to cluster 21657 genes that are differentially expressed in my > microarray data, but it's actually not working for me. After reading the > normalized signal and calculating the mean for each treatment I proceed to > read the list of genes differentially expressed (previously calculated using > limma). The problem occurs during "hclust" function (please see below my > code and corresponding error). Is it possible for this error to be due to > the number of genes? when I use the same code for only 1000 genes it works > pretty well. > > How could I solve this problem? I need this figure for my paper... You need to remove the rows that have no variability. For example: > dat <- matrix(rnorm(1000), nc=10) > dat[3,] <- rep(dat[3,3], 10) ## make row three have var=0 > hclust(as.dist(1-cor(t(dat), method="spearman")), method="complete") Error in hclust(as.dist(1 - cor(t(dat), method = "spearman")), method = "complete") : NA/NaN/Inf in foreign function call (arg 11) In addition: Warning message: In cor(t(dat), method = "spearman") : the standard deviation is zero now again, without this row > hclust(as.dist(1-cor(t(dat[-3,]), method="spearman")), method="complete") Call: hclust(d = as.dist(1 - cor(t(dat[-3, ]), method = "spearman")), method = "complete") Cluster method : complete Number of objects: 99 something like ind <- apply(mysubset, 1, var) == 0 mysubset <- mysubset[!ind,] should do the trick. Best, Jim > > Thank you for your help! > > Sincerely, > Avhena > > > > ************************************************ >> signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] >> pDatam<- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = > TRUE, sep = '\t') >> pData<- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = > TRUE, sep = '\t') >> expset<- new("ExpressionSet", exprs = signal, phenoData = pData) > >> means1<- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), > method="logged", logged=FALSE)) >> means2<- means(pairwise.comparison(expset, "Type", c("BMP.VPA", > "SHH.1D"), method="logged", logged=FALSE)) >> means3<- means(pairwise.comparison(expset, "Type", c("SHH.6H", > "SHH.VPA.1D"), method="logged", logged=FALSE)) >> all_means<-cbind(means1,means2,means3) >> expmeans<- new("ExpressionSet", exprs = all_means, phenoData = pDatam) >> subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", > "SHH.VPA.1D")) >> genes<-read.table("affy_ids_diff_exprs05.dat") >> mysubset<-exprs(subset)[match(levels(genes[,]), rownames(exprs(subset))),] > >> hr<- hclust(as.dist(1-cor(t(mysubset), method="spearman")), > method="complete") > > Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method = > "complete") : > NA/NaN/Inf in foreign function call (arg 11) > Calls: hclust -> .Fortran > In addition: Warning message: > In cor(t(mysubset), method = "spearman") : the standard deviation is zero > Execution halted > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 13.2 years ago James W. MacDonald 65k

0

Entering edit mode

Jim, thanks a lot! there were some inf in my data... I will try again after removing then. Avhena On Thu, Jan 27, 2011 at 9:43 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Avhena, > > > On 1/27/2011 12:00 AM, avehna wrote: > >> Hi All, >> >> I'm trying to cluster 21657 genes that are differentially expressed in my >> microarray data, but it's actually not working for me. After reading the >> normalized signal and calculating the mean for each treatment I proceed to >> read the list of genes differentially expressed (previously calculated >> using >> limma). The problem occurs during "hclust" function (please see below my >> code and corresponding error). Is it possible for this error to be due to >> the number of genes? when I use the same code for only 1000 genes it works >> pretty well. >> >> How could I solve this problem? I need this figure for my paper... >> > > You need to remove the rows that have no variability. For example: > > > dat <- matrix(rnorm(1000), nc=10) > > dat[3,] <- rep(dat[3,3], 10) ## make row three have var=0 > > hclust(as.dist(1-cor(t(dat), method="spearman")), method="complete") > Error in hclust(as.dist(1 - cor(t(dat), method = "spearman")), method = > "complete") : > > NA/NaN/Inf in foreign function call (arg 11) > In addition: Warning message: > In cor(t(dat), method = "spearman") : the standard deviation is zero > > now again, without this row > > > hclust(as.dist(1-cor(t(dat[-3,]), method="spearman")), method="complete") > > Call: > hclust(d = as.dist(1 - cor(t(dat[-3, ]), method = "spearman")), method = > "complete") > > Cluster method : complete > Number of objects: 99 > > something like > > ind <- apply(mysubset, 1, var) == 0 > mysubset <- mysubset[!ind,] > > should do the trick. > > Best, > > Jim > > > >> Thank you for your help! >> >> Sincerely, >> Avhena >> >> >> >> ************************************************ >> >>> signal<-signal[-grep("AFFX",rownames(signal)), ,drop=FALSE] >>> pDatam<- read.AnnotatedDataFrame('pdatam.txt', row.names = 1, header = >>> >> TRUE, sep = '\t') >> >>> pData<- read.AnnotatedDataFrame('pdata.txt', row.names = 1, header = >>> >> TRUE, sep = '\t') >> >>> expset<- new("ExpressionSet", exprs = signal, phenoData = pData) >>> >> >> means1<- means(pairwise.comparison(expset, "Type", c("Control", "BMP"), >>> >> method="logged", logged=FALSE)) >> >>> means2<- means(pairwise.comparison(expset, "Type", c("BMP.VPA", >>> >> "SHH.1D"), method="logged", logged=FALSE)) >> >>> means3<- means(pairwise.comparison(expset, "Type", c("SHH.6H", >>> >> "SHH.VPA.1D"), method="logged", logged=FALSE)) >> >>> all_means<-cbind(means1,means2,means3) >>> expmeans<- new("ExpressionSet", exprs = all_means, phenoData = pDatam) >>> subset<-get.array.subset(expmeans, "Type", c("Control", "BMP", "SHH.1D", >>> >> "SHH.VPA.1D")) >> >>> genes<-read.table("affy_ids_diff_exprs05.dat") >>> mysubset<-exprs(subset)[match(levels(genes[,]), >>> rownames(exprs(subset))),] >>> >> >> hr<- hclust(as.dist(1-cor(t(mysubset), method="spearman")), >>> >> method="complete") >> >> Error in hclust(as.dist(1 - cor(t(mysubset), method = "spearman")), method >> = >> "complete") : >> NA/NaN/Inf in foreign function call (arg 11) >> Calls: hclust -> .Fortran >> In addition: Warning message: >> In cor(t(mysubset), method = "spearman") : the standard deviation is zero >> Execution halted >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues [[alternative HTML version deleted]]

ADD REPLY • link 13.2 years ago avehna ▴ 240

Login before adding your answer.