VSN: minimum number of controls?
1
0
Entering edit mode
@eric-e-snyder-4010
Last seen 10.3 years ago
Hello, In my first project with R and BioConductor, I am analyzing some small microarrays, starting with variance normalization with vsn. Using Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the "kidney" dataset without trouble. However, when trying to run: > fit = vsn2( noDNAcontrols ) Error in .local(x, reference, strata, ...) : One or more of the strata contain less than 42 elements. Please reduce the number of strata so that there is enough in each stratum. using my own data, I got the error above. I finally got around the error by simulating a dataset containing 50 controls (my original data had only 6). Surprisingly, even 42 controls was insufficient. A collaborator, using the same dataset, was able to run vsn successfully using an earlier version of R (2.9.0) and Bioconductor (version ?). Is anyone familiar with this problem? I see two ways forward: 1, Find the appropriate (old) version of Bioconductor and analyze with the original controls. 2. Use the current R/Bioconductor releases and either find a software patch or a work-around. As for #2, maybe it is not unreasonable to use >42 controls on most microarrays. However, this particular dataset is from a series of small protein arrays (each probed with patient serum then visualized with labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning "no protein") controls per patient (with a total 853 patients in the dataset). Consequently, it is not possible to run a huge number of controls, given the number of experimental cells per slide. On a related note, in my effort to inflate the controls that I did have into a sufficiently large number, I used "rnorm" to simulate/synthesize the data. Here "noDNAstats" is a 2 x 853 matrix consisting of the mean and standard deviation from the patients' noDNAcontrols in the first and second rows, respectively. i=1 noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i]) for(i in c( 2:ncol(noDNAstats) ) ){ noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i], noDNAstats[2,i])) } My understanding was that rnorm would create a dataset of the requested size with the requested mean and SD. The numbers I get are in the same ballpark but the means and SD are not the same. Am I missing something? Thanks! eesnyder -- Eric E. Snyder, Ph.D. Virginia Bioinformatics Institute Virginia Polytechnic Institute and State University Blacksburg, VA 24061-0447 USA Email: eesnyder at vbi.vt.edu Phone: (540) 231-5428 JDAM: N 37 13.248', W 80 25.551'
Normalization vsn Normalization vsn • 1.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Hi Eric -- On 04/02/2010 02:41 PM, Eric E. Snyder wrote: > Hello, > > In my first project with R and BioConductor, I am analyzing some small > microarrays, starting with variance normalization with vsn. Using > Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the > "kidney" dataset without trouble. However, when trying to run: > >> fit = vsn2( noDNAcontrols ) > Error in .local(x, reference, strata, ...) : > One or more of the strata contain less than 42 elements. > Please reduce the number of strata so that there is enough in each stratum. Always good to provide sessionInfo() so that we know the details of the software you're using > library(vsn) > sessionInfo() R version 2.10.1 Patched (2010-03-27 r51570) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] vsn_3.14.0 Biobase_2.6.1 loaded via a namespace (and not attached): [1] affy_1.24.2 affyio_1.14.0 grid_2.10.1 [4] lattice_0.18-3 limma_3.2.3 preprocessCore_1.8.0 and then good to try for a reproducible example, or at least enough info for other to reproduce your error. I started with example(vsn2) and then > vsn2(kidney[1:20,]) Error in vsnMatrix(exprs(x), reference, strata, ...) : One or more of the strata contain less than 42 elements. Please reduce the number of strata so that there is enough in each stratum. My guess is that noDNAcontrols is a matrix-like object with rows and columns transposed, i.e., samples x features rather than features x samples. What is class(noDNAcontrols) and dim(noDNAcontrols) ? Might as well copy and paste the output directly from R > using my own data, I got the error above. I finally got around the > error by simulating a dataset containing 50 controls (my original data > had only 6). Surprisingly, even 42 controls was insufficient. > > A collaborator, using the same dataset, was able to run vsn successfully > using an earlier version of R (2.9.0) and Bioconductor (version ?). > > Is anyone familiar with this problem? > > I see two ways forward: > > 1, Find the appropriate (old) version of Bioconductor and analyze with > the original controls. > > 2. Use the current R/Bioconductor releases and either find a software > patch or a work-around. > > As for #2, maybe it is not unreasonable to use >42 controls on most > microarrays. However, this particular dataset is from a series of small > protein arrays (each probed with patient serum then visualized with > labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning > "no protein") controls per patient (with a total 853 patients in the > dataset). Consequently, it is not possible to run a huge number of > controls, given the number of experimental cells per slide. > > On a related note, in my effort to inflate the controls that I did have > into a sufficiently large number, I used "rnorm" to simulate/synthesize > the data. Here "noDNAstats" is a 2 x 853 matrix consisting of the mean > and standard deviation from the patients' noDNAcontrols in the first and > second rows, respectively. > > i=1 > noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i]) > for(i in c( 2:ncol(noDNAstats) ) ){ > noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i], > noDNAstats[2,i])) > } > > My understanding was that rnorm would create a dataset of the requested > size with the requested mean and SD. The numbers I get are in the same > ballpark but the means and SD are not the same. Am I missing something? at one level this looks ok, but there isn't enough info to reproduce, or to see precisely what your problem is. Can you be more specific, maybe with a simpler example, say creating a matrix with two columns, where you specify mean and sd as numbers directly rather than 'hidden' in a matrix that we don't have access to? Martin > > Thanks! > eesnyder -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Martin Morgan wrote: > On 04/02/2010 02:41 PM, Eric E. Snyder wrote: >> In my first project with R and BioConductor, I am analyzing some small >> microarrays, starting with variance normalization with vsn. Using >> Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the >> "kidney" dataset without trouble. However, when trying to run: >> >>> fit = vsn2( noDNAcontrols ) >> Error in .local(x, reference, strata, ...) : >> One or more of the strata contain less than 42 elements. >> Please reduce the number of strata so that there is enough in each stratum. > > Always good to provide sessionInfo() so that we know the details of the > software you're using Okay, my sessionInfo: > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] vsn_3.14.0 Biobase_2.6.1 loaded via a namespace (and not attached): [1] affy_1.24.2 affyio_1.14.0 grid_2.10.1 [4] lattice_0.17-26 limma_3.2.1 preprocessCore_1.8.0 > and then good to try for a reproducible example, or at least enough info > for other to reproduce your error. I started with example(vsn2) and then > >> vsn2(kidney[1:20,]) > Error in vsnMatrix(exprs(x), reference, strata, ...) : > One or more of the strata contain less than 42 elements. > Please reduce the number of strata so that there is enough in each stratum. > > My guess is that noDNAcontrols is a matrix-like object with rows and > columns transposed, i.e., samples x features rather than features x > samples. What is class(noDNAcontrols) and dim(noDNAcontrols) ? Might as > well copy and paste the output directly from R > dim(noDNAcontrols) [1] 6 853 > noDNAcontrols X1 X2 X3 X4 ... X853 no_DNA4 9840 5193 4854 6466 ... 6121 no_DNA5 5244 3569 3419 4587 ... 3595 no_DNA6 4630 3271 2877 5270 ... 2729 no_DNA3 4403 3782 3368 6004 ... 1557 no_DNA1 3745 4984 2842 6701 ... 783 no_DNA2 2099 4230 3165 6777 ... 756 [ellipsis provided by me and vi] This is the data that vsn2() gags on. Since the vsn2( kidney[1:20,] ) example also fails, it looks like vsn2() has a pretty strict requirement for minimum sample size. If so, why is that and is there any way around it? I hope I have supplied enough information to work on now; it you need anything else. please ask. Many thanks! As for my second question concerning the behavior of rnorm(), I should probably simplify matters and resubmit it under a separate subject line. Cheers, eesnyder -- Eric E. Snyder, Ph.D. Virginia Bioinformatics Institute Virginia Polytechnic Institute and State University Blacksburg, VA 24061-0447 USA Email: eesnyder at vbi.vt.edu JDAM: N 37 12'01.6", W 80 24'26.9"
ADD REPLY
0
Entering edit mode
On 04/03/2010 05:39 PM, Eric E. Snyder wrote: > Martin Morgan wrote: >> On 04/02/2010 02:41 PM, Eric E. Snyder wrote: >>> In my first project with R and BioConductor, I am analyzing some small >>> microarrays, starting with variance normalization with vsn. Using >>> Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the >>> "kidney" dataset without trouble. However, when trying to run: >>> >>>> fit = vsn2( noDNAcontrols ) >>> Error in .local(x, reference, strata, ...) : >>> One or more of the strata contain less than 42 elements. >>> Please reduce the number of strata so that there is enough in each stratum. >> >> Always good to provide sessionInfo() so that we know the details of the >> software you're using > > Okay, my sessionInfo: > >> sessionInfo() > R version 2.10.1 (2009-12-14) > x86_64-unknown-linux-gnu > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] vsn_3.14.0 Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] affy_1.24.2 affyio_1.14.0 grid_2.10.1 > [4] lattice_0.17-26 limma_3.2.1 preprocessCore_1.8.0 > >> and then good to try for a reproducible example, or at least enough info >> for other to reproduce your error. I started with example(vsn2) and then >> >>> vsn2(kidney[1:20,]) >> Error in vsnMatrix(exprs(x), reference, strata, ...) : >> One or more of the strata contain less than 42 elements. >> Please reduce the number of strata so that there is enough in each stratum. >> >> My guess is that noDNAcontrols is a matrix-like object with rows and >> columns transposed, i.e., samples x features rather than features x >> samples. What is class(noDNAcontrols) and dim(noDNAcontrols) ? Might as >> well copy and paste the output directly from R > >> dim(noDNAcontrols) > [1] 6 853 > >> noDNAcontrols > X1 X2 X3 X4 ... X853 > no_DNA4 9840 5193 4854 6466 ... 6121 > no_DNA5 5244 3569 3419 4587 ... 3595 > no_DNA6 4630 3271 2877 5270 ... 2729 > no_DNA3 4403 3782 3368 6004 ... 1557 > no_DNA1 3745 4984 2842 6701 ... 783 > no_DNA2 2099 4230 3165 6777 ... 756 > > [ellipsis provided by me and vi] > > This is the data that vsn2() gags on. Since the vsn2( kidney[1:20,] ) > example also fails, it looks like vsn2() has a pretty strict requirement > for minimum sample size. If so, why is that and is there any way around it? > > I hope I have supplied enough information to work on now; it you need > anything else. please ask. I think you have 6 samples and 853 features; you want to transpose your data, so that you have 853 features and 6 samples. If noDNAcontrols is a matrix, then vsn2(t(noDNAcontrols)) ! Also, but secondary to getting your data oriented correctly, vsn2 has an argument described on the help page ?vns2 minDataPointsPerStratum which is dictating how many rows (i.e., features) are used to describe each stratum. The vignette browseVignettes('vsn') describes what stratum is meant to refer to. Martin > Many thanks! > > As for my second question concerning the behavior of rnorm(), I should > probably simplify matters and resubmit it under a separate subject line. > > Cheers, > eesnyder -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6