Problems with robustPca in pcaMethods

0

Entering edit mode

Julian Gehring ★ 1.3k

@julian-gehring-5818

Last seen 4.9 years ago

Hi, You may check if you have missing values 'NA's in your data. Best wishes Julian On 27/02/14 21:53, J Brown [guest] wrote: > > Hi all; > > I'm new to R and trying to use the pcaMethods package to analyze a qPCR dataset. My dataset contains many missing values and I think the module I want to use is robustPca, but when I try to apply it to my dataset I keep getting the error described below. Using nipalsPca on my dataset works without errors, so I don't think it's a data-format issue. Using robustPca on the pcaMethods sample dataset "metaboliteData", which has missing values, also works fine (although it warns about missing values), so it isn't a general problem with my install of R and the relevant packages. > > The traceback results seems to say that the error is caused by a weighted-median calculation that is part of the robustPca command, but I have no idea why this only comes up using my dataset: could it be because my dataset is already median-normalized (before importing to R)? Troubleshooting this is beyond my abilities at this point; I'd be grateful for any insight anyone can offer. > >> pca_results <- pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE) > Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed > In addition: Warning message: > In robustPca(prepres$data, nPcs = nPcs, ...) : > Data is incomplete, it is not recommended to use robustPca for missing value estimation > >> traceback() > 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE) > 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE) > 5: FUN(newX[, i], ...) > 4: apply(x, 1, L1RegCoef, bk) > 3: robustSvd(Matrix) > 2: robustPca(prepres$data, nPcs = nPcs, ...) > 1: pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE) > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pcaMethods_1.52.1 Rcpp_0.11.0 matrixStats_0.8.14 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] R.methodsS3_1.6.1 tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

pcaMethods pcaMethods • 1.3k views

ADD COMMENT • link 10.1 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 8 days ago

United States

[cc:ing the maintainer of pcaMethods] Author of matrixStats::weightedMedian() here. I won't solve OPs problem but I'll add some clues to the non-informative error message from weightedMedian(). On Mon, Mar 3, 2014 at 9:35 AM, Julian Gehring <julian.gehring at="" embl.de=""> wrote: > Hi, > > You may check if you have missing values 'NA's in your data. > > Best wishes > Julian > > > On 27/02/14 21:53, J Brown [guest] wrote: >> >> Hi all; >> >> I'm new to R and trying to use the pcaMethods package to analyze a qPCR dataset. My dataset contains many missing values and I think the module I want to use is robustPca, but when I try to apply it to my dataset I keep getting the error described below. Using nipalsPca on my dataset works without errors, so I don't think it's a data-format issue. Using robustPca on the pcaMethods sample dataset "metaboliteData", which has missing values, also works fine (although it warns about missing values), so it isn't a general problem with my install of R and the relevant packages. >> >> The traceback results seems to say that the error is caused by a weighted-median calculation that is part of the robustPca command, but I have no idea why this only comes up using my dataset: could it be because my dataset is already median-normalized (before importing to R)? Troubleshooting this is beyond my abilities at this point; I'd be grateful for any insight anyone can offer. >> >>> pca_results <- pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE) >> Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed >> In addition: Warning message: >> In robustPca(prepres$data, nPcs = nPcs, ...) : >> Data is incomplete, it is not recommended to use robustPca for missing value estimation >> >>> traceback() >> 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE) >> 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE) >> 5: FUN(newX[, i], ...) >> 4: apply(x, 1, L1RegCoef, bk) >> 3: robustSvd(Matrix) >> 2: robustPca(prepres$data, nPcs = nPcs, ...) >> 1: pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE) That error in weightedMedian() occurs because there are missing values in either in x[keep]/a or in the weights abs(a) and argument 'na.rm' defaults to NA(*). It's better if robustSvd() would call weightedMedian(x[keep]/a, abs(a), na.rm=TRUE, interpolate=FALSE), or possibly na.rm=FALSE. (*) With weightedMedian(..., na.rm=NA) one tells that function to trust the data (including the weights) to have no missing values. This option exists for efficiency reasons. If there are missing values, the na.rm=TRUE should be used. If there could be missing value, na.rm=FALSE should be used (in case NA is returned if there are missing values). That the default is NA is unconventional and I may consider changing matrixStats to use the more commonly used na.rm=FALSE (no promises though). /Henrik >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] pcaMethods_1.52.1 Rcpp_0.11.0 matrixStats_0.8.14 Biobase_2.22.0 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] R.methodsS3_1.6.1 tools_3.0.2 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Sorry for the slow reply, I changed the call to weightedMedian to use na.rm=TRUE but not sure if it solves the issue.. Julian, any chance you could share the data? (preferably subsetted to the minimum size that still causes the error) cheers, Henning (pcaMethods maintainer) 2014-03-03 20:34 GMT+01:00 Henrik Bengtsson <hb@biostat.ucsf.edu>: > [cc:ing the maintainer of pcaMethods] > > Author of matrixStats::weightedMedian() here. I won't solve OPs > problem but I'll add some clues to the non-informative error message > from weightedMedian(). > > On Mon, Mar 3, 2014 at 9:35 AM, Julian Gehring <julian.gehring@embl.de> > wrote: > > Hi, > > > > You may check if you have missing values 'NA's in your data. > > > > Best wishes > > Julian > > > > > > On 27/02/14 21:53, J Brown [guest] wrote: > >> > >> Hi all; > >> > >> I'm new to R and trying to use the pcaMethods package to analyze a qPCR > dataset. My dataset contains many missing values and I think the module I > want to use is robustPca, but when I try to apply it to my dataset I keep > getting the error described below. Using nipalsPca on my dataset works > without errors, so I don't think it's a data-format issue. Using robustPca > on the pcaMethods sample dataset "metaboliteData", which has missing > values, also works fine (although it warns about missing values), so it > isn't a general problem with my install of R and the relevant packages. > >> > >> The traceback results seems to say that the error is caused by a > weighted-median calculation that is part of the robustPca command, but I > have no idea why this only comes up using my dataset: could it be because > my dataset is already median-normalized (before importing to R)? > Troubleshooting this is beyond my abilities at this point; I'd be grateful > for any insight anyone can offer. > >> > >>> pca_results <- pca(centered_data, method = "robustPca", nPcs = 10, > center = FALSE) > >> Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed > >> In addition: Warning message: > >> In robustPca(prepres$data, nPcs = nPcs, ...) : > >> Data is incomplete, it is not recommended to use robustPca for > missing value estimation > >> > >>> traceback() > >> 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE) > >> 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE) > >> 5: FUN(newX[, i], ...) > >> 4: apply(x, 1, L1RegCoef, bk) > >> 3: robustSvd(Matrix) > >> 2: robustPca(prepres$data, nPcs = nPcs, ...) > >> 1: pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE) > > That error in weightedMedian() occurs because there are missing values > in either in x[keep]/a or in the weights abs(a) and argument 'na.rm' > defaults to NA(*). It's better if robustSvd() would call > weightedMedian(x[keep]/a, abs(a), na.rm=TRUE, interpolate=FALSE), or > possibly na.rm=FALSE. > > (*) With weightedMedian(..., na.rm=NA) one tells that function to > trust the data (including the weights) to have no missing values. > This option exists for efficiency reasons. If there are missing > values, the na.rm=TRUE should be used. If there could be missing > value, na.rm=FALSE should be used (in case NA is returned if there are > missing values). That the default is NA is unconventional and I may > consider changing matrixStats to use the more commonly used > na.rm=FALSE (no promises though). > > /Henrik > > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-apple-darwin10.8.0 (64-bit) > >> > >> locale: > >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > base > >> > >> other attached packages: > >> [1] pcaMethods_1.52.1 Rcpp_0.11.0 matrixStats_0.8.14 > Biobase_2.22.0 BiocGenerics_0.8.0 > >> > >> loaded via a namespace (and not attached): > >> [1] R.methodsS3_1.6.1 tools_3.0.2 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Henning Redestig ▴ 20

0

Entering edit mode

Julian Gehring ★ 1.3k

@julian-gehring-5818

Last seen 4.9 years ago

Hi Henning, I'm not the OP, you would have to ask him for the data. Best wishes Julian On 12/03/14 22:25, Henning Redestig wrote: > Sorry for the slow reply, I changed the call to weightedMedian to use > na.rm=TRUE but not sure if it solves the issue.. Julian, any chance you > could share the data? (preferably subsetted to the minimum size that > still causes the error) > > cheers, Henning (pcaMethods maintainer) > > > 2014-03-03 20:34 GMT+01:00 Henrik Bengtsson <hb at="" biostat.ucsf.edu=""> <mailto:hb at="" biostat.ucsf.edu="">>: > > [cc:ing the maintainer of pcaMethods] > > Author of matrixStats::weightedMedian() here. I won't solve OPs > problem but I'll add some clues to the non-informative error message > from weightedMedian(). > > On Mon, Mar 3, 2014 at 9:35 AM, Julian Gehring > <julian.gehring at="" embl.de="" <mailto:julian.gehring="" at="" embl.de="">> wrote: > > Hi, > > > > You may check if you have missing values 'NA's in your data. > > > > Best wishes > > Julian > > > > > > On 27/02/14 21:53, J Brown [guest] wrote: > >> > >> Hi all; > >> > >> I'm new to R and trying to use the pcaMethods package to analyze > a qPCR dataset. My dataset contains many missing values and I think > the module I want to use is robustPca, but when I try to apply it to > my dataset I keep getting the error described below. Using nipalsPca > on my dataset works without errors, so I don't think it's a > data-format issue. Using robustPca on the pcaMethods sample dataset > "metaboliteData", which has missing values, also works fine > (although it warns about missing values), so it isn't a general > problem with my install of R and the relevant packages. > >> > >> The traceback results seems to say that the error is caused by a > weighted-median calculation that is part of the robustPca command, > but I have no idea why this only comes up using my dataset: could it > be because my dataset is already median-normalized (before importing > to R)? Troubleshooting this is beyond my abilities at this point; > I'd be grateful for any insight anyone can offer. > >> > >>> pca_results <- pca(centered_data, method = "robustPca", nPcs = > 10, center = FALSE) > >> Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed > >> In addition: Warning message: > >> In robustPca(prepres$data, nPcs = nPcs, ...) : > >> Data is incomplete, it is not recommended to use robustPca for > missing value estimation > >> > >>> traceback() > >> 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE) > >> 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE) > >> 5: FUN(newX[, i], ...) > >> 4: apply(x, 1, L1RegCoef, bk) > >> 3: robustSvd(Matrix) > >> 2: robustPca(prepres$data, nPcs = nPcs, ...) > >> 1: pca(centered_data, method = "robustPca", nPcs = 10, center = > FALSE) > > That error in weightedMedian() occurs because there are missing values > in either in x[keep]/a or in the weights abs(a) and argument 'na.rm' > defaults to NA(*). It's better if robustSvd() would call > weightedMedian(x[keep]/a, abs(a), na.rm=TRUE, interpolate=FALSE), or > possibly na.rm=FALSE. > > (*) With weightedMedian(..., na.rm=NA) one tells that function to > trust the data (including the weights) to have no missing values. > This option exists for efficiency reasons. If there are missing > values, the na.rm=TRUE should be used. If there could be missing > value, na.rm=FALSE should be used (in case NA is returned if there are > missing values). That the default is NA is unconventional and I may > consider changing matrixStats to use the more commonly used > na.rm=FALSE (no promises though). > > /Henrik > > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-apple-darwin10.8.0 (64-bit) > >> > >> locale: > >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets > methods base > >> > >> other attached packages: > >> [1] pcaMethods_1.52.1 Rcpp_0.11.0 matrixStats_0.8.14 > Biobase_2.22.0 BiocGenerics_0.8.0 > >> > >> loaded via a namespace (and not attached): > >> [1] R.methodsS3_1.6.1 tools_3.0.2 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 10.1 years ago Julian Gehring ★ 1.3k

Login before adding your answer.