Help on invariantset normalization function
2
0
Entering edit mode
@sophie-lamarre-5372
Last seen 9.7 years ago
Hello, I try the invariantset normalization function (affy package) on my data: > test_pat1 = normalize.invariantset(data_ready_to_normalize_met1[,1], + bd_20hk_norm[,1], + prd.td=c(0.003,0.007)) Error on while ((ns.old - ns)> 50) { : missing value where TRUE / FALSE is required # My data to normalize > data_ready_to_normalize_met1[1:5,1] [1] 5.803779 11.566477 8.583049 8.531674 9.490483 # My vector containing my 20 housekeeping genes > bd_20hk_norm[1:5,1] [1] 14.92680 15.58281 15.15885 15.09599 15.23146 My session info: > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] affy_1.32.1 preprocessCore_1.16.0 gplots_2.10.1 KernSmooth_2.23-7 [5] caTools_1.13 bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2 [9] geneplotter_1.32.1 lattice_0.20-0 annotate_1.32.3 AnnotationDbi_1.16.19 [13] Biobase_2.14.0 limma_3.10.3 loaded via a namespace (and not attached): [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 IRanges_1.12.6 [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 xtable_1.7-0 [9] zlibbioc_1.0.1 I have no missing value: > test = is.na(data_ready_to_normalize_met1[,1]) > sum(test) [1] 0 Could you help me or give me a example in order I can resolve my problem? Thank your very much, Kind Regards, Sophie LAMARRE [[alternative HTML version deleted]]
Normalization Normalization • 1.2k views
ADD COMMENT
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
Hello, I try the invariantset normalization function (affy package) on my data: > test_pat1 = normalize.invariantset(data_ready_to_normalize_met1[,1], bd_20hk_norm[,1], prd.td=c(0.003,0.007)) Error on while ((ns.old - ns) > 50) { : missing value where TRUE / FALSE is required # My data to normalize data_ready_to_normalize_met1[1:5,1] [1] 5.803779 11.566477 8.583049 8.531674 9.490483 My vector containing my 20 housekeeping genes bd_20hk_norm[1:5,1] [1] 14.92680 15.58281 15.15885 15.09599 15.23146 I have no missing value: test = is.na(data_ready_to_normalize_met1[,1]) sum(test) [1] 0 Could you help me or give me a example in order I can resolve my problem? Thank your very much, Kind Regards, Sophie LAMARRE -- output of sessionInfo(): R version 2.14.1 (2011-12-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] affy_1.32.1 preprocessCore_1.16.0 gplots_2.10.1 KernSmooth_2.23-7 [5] caTools_1.13 bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2 [9] geneplotter_1.32.1 lattice_0.20-0 annotate_1.32.3 AnnotationDbi_1.16.19 [13] Biobase_2.14.0 limma_3.10.3 loaded via a namespace (and not attached): [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 IRanges_1.12.6 [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 xtable_1.7-0 [9] zlibbioc_1.0.1 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi Sophie, On 7/2/2012 8:03 AM, Sophie Lamarre wrote: > Hello, > > I try the invariantset normalization function (affy package) on my data: > >> test_pat1 = normalize.invariantset(data_ready_to_normalize_met1[,1], > + bd_20hk_norm[,1], > + prd.td=c(0.003,0.007)) > Error on while ((ns.old - ns)> 50) { : > missing value where TRUE / FALSE is required When you do data_ready_to_normalize_met1[,1] you are selecting data from only one array. It isn't possible to figure out which probesets are invariant with only one array (because the implication is that the probesets don't vary in any array). Is there a particular reason that you are trying to normalize just one array? Best, Jim > > > # My data to normalize > >> data_ready_to_normalize_met1[1:5,1] > [1] 5.803779 11.566477 8.583049 8.531674 9.490483 > > # My vector containing my 20 housekeeping genes >> bd_20hk_norm[1:5,1] > [1] 14.92680 15.58281 15.15885 15.09599 15.23146 > > My session info: > > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 > [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods base > > other attached packages: > [1] affy_1.32.1 preprocessCore_1.16.0 gplots_2.10.1 KernSmooth_2.23-7 > [5] caTools_1.13 bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2 > [9] geneplotter_1.32.1 lattice_0.20-0 annotate_1.32.3 AnnotationDbi_1.16.19 > [13] Biobase_2.14.0 limma_3.10.3 > > loaded via a namespace (and not attached): > [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 IRanges_1.12.6 > [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 xtable_1.7-0 > [9] zlibbioc_1.0.1 > > > I have no missing value: > >> test = is.na(data_ready_to_normalize_met1[,1]) >> sum(test) > [1] 0 > > > > Could you help me or give me a example in order I can resolve my problem? > > Thank your very much, > > Kind Regards, > > Sophie LAMARRE > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hello Jim, I have 151 patients in my file and 16 417 genes without the 20 housekeeping genes I need to normalize. I want to try different normalization methods using housekeeping genes. The classic method is to calculate the mean of the housekeeping genes (selected) by patient, and subtract this value to each genes of the same patient. I would try the invariant set method with my data file and my list of housekeeping genes. When I read the help, one said I had to have 2 vectors: my data file to normalize and my file containing the intensities of housekeeping genes (which help me to normalize): Usage normalize.AffyBatch.invariantset(abatch, prd.td = c(0.003, 0.007), verbose = FALSE, baseline.type = c("mean","median ","pseudo-mean","pseudo-median"), type = c("separate","pmonly","mmonly","together")) normalize.invariantset(data, ref, prd.td=c(0.003,0.007)) Arguments |abatch| an|AffyBatch <affybatch%2dclass.html>|object. |data| a vector of intensities on a chip (to normalize to the reference). |ref| a vector of reference intensities. Thank you for your help, Kind Regards, -- Sophie LAMARRE Le 02/07/2012 16:12, James W. MacDonald a écrit : > Hi Sophie, > > On 7/2/2012 8:03 AM, Sophie Lamarre wrote: >> Hello, >> >> I try the invariantset normalization function (affy package) on my data: >> >>> test_pat1 = normalize.invariantset(data_ready_to_normalize_met1[,1], >> + bd_20hk_norm[,1], >> + prd.td=c(0.003,0.007)) >> Error on while ((ns.old - ns)> 50) { : >> missing value where TRUE / FALSE is required > > When you do > > data_ready_to_normalize_met1[,1] > > > you are selecting data from only one array. It isn't possible to > figure out which probesets are invariant with only one array (because > the implication is that the probesets don't vary in any array). > > Is there a particular reason that you are trying to normalize just one > array? > > Best, > > Jim > > > >> >> >> # My data to normalize >> >>> data_ready_to_normalize_met1[1:5,1] >> [1] 5.803779 11.566477 8.583049 8.531674 9.490483 >> >> # My vector containing my 20 housekeeping genes >>> bd_20hk_norm[1:5,1] >> [1] 14.92680 15.58281 15.15885 15.09599 15.23146 >> >> My session info: >> >> >>> sessionInfo() >> R version 2.14.1 (2011-12-22) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >> LC_TIME=fr_FR.UTF-8 >> [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 >> LC_MESSAGES=fr_FR.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C >> [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 >> LC_IDENTIFICATION=C >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] affy_1.32.1 preprocessCore_1.16.0 >> gplots_2.10.1 KernSmooth_2.23-7 >> [5] caTools_1.13 bitops_1.0-4.1 >> gdata_2.8.2 gtools_2.6.2 >> [9] geneplotter_1.32.1 lattice_0.20-0 >> annotate_1.32.3 AnnotationDbi_1.16.19 >> [13] Biobase_2.14.0 limma_3.10.3 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 >> IRanges_1.12.6 >> [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 >> xtable_1.7-0 >> [9] zlibbioc_1.0.1 >> >> >> I have no missing value: >> >>> test = is.na(data_ready_to_normalize_met1[,1]) >>> sum(test) >> [1] 0 >> >> >> >> Could you help me or give me a example in order I can resolve my >> problem? >> >> Thank your very much, >> >> Kind Regards, >> >> Sophie LAMARRE >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Sophie, On 7/2/2012 10:35 AM, Sophie Lamarre wrote: > Hello Jim, > > I have 151 patients in my file and 16 417 genes without the 20 > housekeeping genes I need to normalize. > I want to try different normalization methods using housekeeping genes. > The classic method is to calculate the mean of the housekeeping genes > (selected) by patient, and subtract this value to each genes of the > same patient. > > I would try the invariant set method with my data file and my list of > housekeeping genes. > When I read the help, one said I had to have 2 vectors: my data file > to normalize and my file containing the intensities of housekeeping > genes (which help me to normalize): Ah, I see. The problem here is that you misunderstand what normalize.invariantset() is intended to do. It is not intended to do what you want, which is to use a set of housekeeping genes to normalize the data. Instead, this is really an internal function for normalize.AffyBatch.invariantset(). The idea here is to take one chip (which is what you did), and then some artificially derived 'reference' chip that contains the same number of genes as your chip (and is derived from the mean, median, etc for each gene), and then determine which genes don't change expression between the two, and then fit a line on those 'invariant' genes, which will then be used to normalize your data. If your two vectors are not the same length, you will get the error you see. This is quite different from what you want to do. I don't think there are any functions to do such a simple normalization, and quite frankly what you propose is neither classic nor recommended (if by classic you mean 'a very common and accepted method' rather than 'what people did way back in the past before they knew better'). To do what you propose is just a simple application of colMeans() and sweep(). Best, Jim > > Usage > > normalize.AffyBatch.invariantset(abatch, prd.td = c(0.003, 0.007), > verbose = FALSE, > baseline.type = c("mean","median ","pseudo-mean","pseudo-median"), > type = c("separate","pmonly","mmonly","together")) > > normalize.invariantset(data, ref, prd.td=c(0.003,0.007)) > > > Arguments > > |abatch| > > an|AffyBatch <affybatch%2dclass.html>|object. > > |data| > > a vector of intensities on a chip (to normalize to the reference). > > |ref| > > a vector of reference intensities. > > > > Thank you for your help, > > Kind Regards, > -- > Sophie LAMARRE > > > Le 02/07/2012 16:12, James W. MacDonald a ?crit : >> Hi Sophie, >> >> On 7/2/2012 8:03 AM, Sophie Lamarre wrote: >>> Hello, >>> >>> I try the invariantset normalization function (affy package) on my >>> data: >>> >>>> test_pat1 = normalize.invariantset(data_ready_to_normalize_met1[,1], >>> + bd_20hk_norm[,1], >>> + prd.td=c(0.003,0.007)) >>> Error on while ((ns.old - ns)> 50) { : >>> missing value where TRUE / FALSE is required >> >> When you do >> >> data_ready_to_normalize_met1[,1] >> >> >> you are selecting data from only one array. It isn't possible to >> figure out which probesets are invariant with only one array (because >> the implication is that the probesets don't vary in any array). >> >> Is there a particular reason that you are trying to normalize just >> one array? >> >> Best, >> >> Jim >> >> >> >>> >>> >>> # My data to normalize >>> >>>> data_ready_to_normalize_met1[1:5,1] >>> [1] 5.803779 11.566477 8.583049 8.531674 9.490483 >>> >>> # My vector containing my 20 housekeeping genes >>>> bd_20hk_norm[1:5,1] >>> [1] 14.92680 15.58281 15.15885 15.09599 15.23146 >>> >>> My session info: >>> >>> >>>> sessionInfo() >>> R version 2.14.1 (2011-12-22) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >>> LC_TIME=fr_FR.UTF-8 >>> [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 >>> LC_MESSAGES=fr_FR.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> LC_ADDRESS=C >>> [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 >>> LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] grid stats graphics grDevices utils datasets >>> methods base >>> >>> other attached packages: >>> [1] affy_1.32.1 preprocessCore_1.16.0 >>> gplots_2.10.1 KernSmooth_2.23-7 >>> [5] caTools_1.13 bitops_1.0-4.1 >>> gdata_2.8.2 gtools_2.6.2 >>> [9] geneplotter_1.32.1 lattice_0.20-0 >>> annotate_1.32.3 AnnotationDbi_1.16.19 >>> [13] Biobase_2.14.0 limma_3.10.3 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 >>> IRanges_1.12.6 >>> [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 >>> xtable_1.7-0 >>> [9] zlibbioc_1.0.1 >>> >>> >>> I have no missing value: >>> >>>> test = is.na(data_ready_to_normalize_met1[,1]) >>>> sum(test) >>> [1] 0 >>> >>> >>> >>> Could you help me or give me a example in order I can resolve my >>> problem? >>> >>> Thank your very much, >>> >>> Kind Regards, >>> >>> Sophie LAMARRE >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Jim, Now I understand the problem! But I have to normalize diagnostic microarray so I'm looking for several methods of normalization in order to retain the best. I can't use the quantile normalization because I don't know if the majority of genes are invariants. I think the housekeeping genes normalization could be a possible normalization. I selected the 20 housekeeping genes which seem to be the least invariants. I don't think the normalization with the invariantset function is appropriated in my case. But if you have any suggestions, I would be glad! Thank you very much for your help, Sophie Le 02/07/2012 18:31, James W. MacDonald a écrit : > Hi Sophie, > > On 7/2/2012 10:35 AM, Sophie Lamarre wrote: >> Hello Jim, >> >> I have 151 patients in my file and 16 417 genes without the 20 >> housekeeping genes I need to normalize. >> I want to try different normalization methods using housekeeping genes. >> The classic method is to calculate the mean of the housekeeping genes >> (selected) by patient, and subtract this value to each genes of the >> same patient. >> >> I would try the invariant set method with my data file and my list of >> housekeeping genes. >> When I read the help, one said I had to have 2 vectors: my data file >> to normalize and my file containing the intensities of housekeeping >> genes (which help me to normalize): > > Ah, I see. The problem here is that you misunderstand what > normalize.invariantset() is intended to do. It is not intended to do > what you want, which is to use a set of housekeeping genes to > normalize the data. Instead, this is really an internal function for > normalize.AffyBatch.invariantset(). > > The idea here is to take one chip (which is what you did), and then > some artificially derived 'reference' chip that contains the same > number of genes as your chip (and is derived from the mean, median, > etc for each gene), and then determine which genes don't change > expression between the two, and then fit a line on those 'invariant' > genes, which will then be used to normalize your data. If your two > vectors are not the same length, you will get the error you see. > > This is quite different from what you want to do. I don't think there > are any functions to do such a simple normalization, and quite frankly > what you propose is neither classic nor recommended (if by classic you > mean 'a very common and accepted method' rather than 'what people did > way back in the past before they knew better'). > > To do what you propose is just a simple application of colMeans() and > sweep(). > > Best, > > Jim > > >> >> Usage >> >> normalize.AffyBatch.invariantset(abatch, prd.td = c(0.003, 0.007), >> verbose = FALSE, >> baseline.type = >> c("mean","median","pseudo-mean","pseudo-median"), >> type = >> c("separate","pmonly","mmonly","together")) >> >> normalize.invariantset(data, ref, prd.td=c(0.003,0.007)) >> >> >> Arguments >> >> |abatch| >> >> an|AffyBatch <affybatch%2dclass.html>|object. >> >> |data| >> >> a vector of intensities on a chip (to normalize to the reference). >> >> |ref| >> >> a vector of reference intensities. >> >> >> >> Thank you for your help, >> >> Kind Regards, >> -- >> Sophie LAMARRE >> >> >> Le 02/07/2012 16:12, James W. MacDonald a écrit : >>> Hi Sophie, >>> >>> On 7/2/2012 8:03 AM, Sophie Lamarre wrote: >>>> Hello, >>>> >>>> I try the invariantset normalization function (affy package) on my >>>> data: >>>> >>>>> test_pat1 = >>>>> normalize.invariantset(data_ready_to_normalize_met1[,1], >>>> + bd_20hk_norm[,1], >>>> + prd.td=c(0.003,0.007)) >>>> Error on while ((ns.old - ns)> 50) { : >>>> missing value where TRUE / FALSE is required >>> >>> When you do >>> >>> data_ready_to_normalize_met1[,1] >>> >>> >>> you are selecting data from only one array. It isn't possible to >>> figure out which probesets are invariant with only one array >>> (because the implication is that the probesets don't vary in any >>> array). >>> >>> Is there a particular reason that you are trying to normalize just >>> one array? >>> >>> Best, >>> >>> Jim >>> >>> >>> >>>> >>>> >>>> # My data to normalize >>>> >>>>> data_ready_to_normalize_met1[1:5,1] >>>> [1] 5.803779 11.566477 8.583049 8.531674 9.490483 >>>> >>>> # My vector containing my 20 housekeeping genes >>>>> bd_20hk_norm[1:5,1] >>>> [1] 14.92680 15.58281 15.15885 15.09599 15.23146 >>>> >>>> My session info: >>>> >>>> >>>>> sessionInfo() >>>> R version 2.14.1 (2011-12-22) >>>> Platform: x86_64-redhat-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >>>> LC_TIME=fr_FR.UTF-8 >>>> [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 >>>> LC_MESSAGES=fr_FR.UTF-8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> LC_ADDRESS=C >>>> [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 >>>> LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] grid stats graphics grDevices utils datasets >>>> methods base >>>> >>>> other attached packages: >>>> [1] affy_1.32.1 preprocessCore_1.16.0 >>>> gplots_2.10.1 KernSmooth_2.23-7 >>>> [5] caTools_1.13 bitops_1.0-4.1 >>>> gdata_2.8.2 gtools_2.6.2 >>>> [9] geneplotter_1.32.1 lattice_0.20-0 >>>> annotate_1.32.3 AnnotationDbi_1.16.19 >>>> [13] Biobase_2.14.0 limma_3.10.3 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 >>>> IRanges_1.12.6 >>>> [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 >>>> xtable_1.7-0 >>>> [9] zlibbioc_1.0.1 >>>> >>>> >>>> I have no missing value: >>>> >>>>> test = is.na(data_ready_to_normalize_met1[,1]) >>>>> sum(test) >>>> [1] 0 >>>> >>>> >>>> >>>> Could you help me or give me a example in order I can resolve my >>>> problem? >>>> >>>> Thank your very much, >>>> >>>> Kind Regards, >>>> >>>> Sophie LAMARRE >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Sophie you could have a look at Section 7 "Normalisation with ?spike-in? probes" of the vsn package vignette. Best wishes Wolfgang Jul/3/12 11:35 AM, Sophie Lamarre scripsit:: > Hi Jim, > > Now I understand the problem! > But I have to normalize diagnostic microarray so I'm looking for several > methods of normalization in order to retain the best. I can't use the > quantile normalization because I don't know if the majority of genes are > invariants. > I think the housekeeping genes normalization could be a possible > normalization. I selected the 20 housekeeping genes which seem to be the > least invariants. > I don't think the normalization with the invariantset function is > appropriated in my case. > > But if you have any suggestions, I would be glad! > > Thank you very much for your help, > > Sophie > > Le 02/07/2012 18:31, James W. MacDonald a ?crit : >> Hi Sophie, >> >> On 7/2/2012 10:35 AM, Sophie Lamarre wrote: >>> Hello Jim, >>> >>> I have 151 patients in my file and 16 417 genes without the 20 >>> housekeeping genes I need to normalize. >>> I want to try different normalization methods using housekeeping genes. >>> The classic method is to calculate the mean of the housekeeping genes >>> (selected) by patient, and subtract this value to each genes of the >>> same patient. >>> >>> I would try the invariant set method with my data file and my list of >>> housekeeping genes. >>> When I read the help, one said I had to have 2 vectors: my data file >>> to normalize and my file containing the intensities of housekeeping >>> genes (which help me to normalize): >> >> Ah, I see. The problem here is that you misunderstand what >> normalize.invariantset() is intended to do. It is not intended to do >> what you want, which is to use a set of housekeeping genes to >> normalize the data. Instead, this is really an internal function for >> normalize.AffyBatch.invariantset(). >> >> The idea here is to take one chip (which is what you did), and then >> some artificially derived 'reference' chip that contains the same >> number of genes as your chip (and is derived from the mean, median, >> etc for each gene), and then determine which genes don't change >> expression between the two, and then fit a line on those 'invariant' >> genes, which will then be used to normalize your data. If your two >> vectors are not the same length, you will get the error you see. >> >> This is quite different from what you want to do. I don't think there >> are any functions to do such a simple normalization, and quite frankly >> what you propose is neither classic nor recommended (if by classic you >> mean 'a very common and accepted method' rather than 'what people did >> way back in the past before they knew better'). >> >> To do what you propose is just a simple application of colMeans() and >> sweep(). >> >> Best, >> >> Jim >> >> >>> >>> Usage >>> >>> normalize.AffyBatch.invariantset(abatch, prd.td = c(0.003, 0.007), >>> verbose = FALSE, >>> baseline.type = >>> c("mean","median","pseudo-mean","pseudo-median"), >>> type = >>> c("separate","pmonly","mmonly","together")) >>> >>> normalize.invariantset(data, ref, prd.td=c(0.003,0.007)) >>> >>> >>> Arguments >>> >>> |abatch| >>> >>> an|AffyBatch <affybatch%2dclass.html>|object. >>> >>> |data| >>> >>> a vector of intensities on a chip (to normalize to the reference). >>> >>> |ref| >>> >>> a vector of reference intensities. >>> >>> >>> >>> Thank you for your help, >>> >>> Kind Regards, >>> -- >>> Sophie LAMARRE >>> >>> >>> Le 02/07/2012 16:12, James W. MacDonald a ?crit : >>>> Hi Sophie, >>>> >>>> On 7/2/2012 8:03 AM, Sophie Lamarre wrote: >>>>> Hello, >>>>> >>>>> I try the invariantset normalization function (affy package) on my >>>>> data: >>>>> >>>>>> test_pat1 = >>>>>> normalize.invariantset(data_ready_to_normalize_met1[,1], >>>>> + bd_20hk_norm[,1], >>>>> + prd.td=c(0.003,0.007)) >>>>> Error on while ((ns.old - ns)> 50) { : >>>>> missing value where TRUE / FALSE is required >>>> >>>> When you do >>>> >>>> data_ready_to_normalize_met1[,1] >>>> >>>> >>>> you are selecting data from only one array. It isn't possible to >>>> figure out which probesets are invariant with only one array >>>> (because the implication is that the probesets don't vary in any >>>> array). >>>> >>>> Is there a particular reason that you are trying to normalize just >>>> one array? >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>>> >>>>> >>>>> # My data to normalize >>>>> >>>>>> data_ready_to_normalize_met1[1:5,1] >>>>> [1] 5.803779 11.566477 8.583049 8.531674 9.490483 >>>>> >>>>> # My vector containing my 20 housekeeping genes >>>>>> bd_20hk_norm[1:5,1] >>>>> [1] 14.92680 15.58281 15.15885 15.09599 15.23146 >>>>> >>>>> My session info: >>>>> >>>>> >>>>>> sessionInfo() >>>>> R version 2.14.1 (2011-12-22) >>>>> Platform: x86_64-redhat-linux-gnu (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >>>>> LC_TIME=fr_FR.UTF-8 >>>>> [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 >>>>> LC_MESSAGES=fr_FR.UTF-8 >>>>> [7] LC_PAPER=C LC_NAME=C >>>>> LC_ADDRESS=C >>>>> [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 >>>>> LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] grid stats graphics grDevices utils datasets >>>>> methods base >>>>> >>>>> other attached packages: >>>>> [1] affy_1.32.1 preprocessCore_1.16.0 >>>>> gplots_2.10.1 KernSmooth_2.23-7 >>>>> [5] caTools_1.13 bitops_1.0-4.1 >>>>> gdata_2.8.2 gtools_2.6.2 >>>>> [9] geneplotter_1.32.1 lattice_0.20-0 >>>>> annotate_1.32.3 AnnotationDbi_1.16.19 >>>>> [13] Biobase_2.14.0 limma_3.10.3 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5 >>>>> IRanges_1.12.6 >>>>> [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1 >>>>> xtable_1.7-0 >>>>> [9] zlibbioc_1.0.1 >>>>> >>>>> >>>>> I have no missing value: >>>>> >>>>>> test = is.na(data_ready_to_normalize_met1[,1]) >>>>>> sum(test) >>>>> [1] 0 >>>>> >>>>> >>>>> >>>>> Could you help me or give me a example in order I can resolve my >>>>> problem? >>>>> >>>>> Thank your very much, >>>>> >>>>> Kind Regards, >>>>> >>>>> Sophie LAMARRE >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY

Login before adding your answer.

Traffic: 459 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6