tilingArray normalizeByReference Data Scaling

0

Entering edit mode

Dario Strbenac ★ 1.5k

@dario-strbenac-5916

Last seen 12 hours ago

Australia

Hello, Could the description about what steps are done be made clearer ? Looking at the output, it seems that the data has been divided by the DNA values and log2 scaled after DNA reference normalisation. But there's no mention of this in ?normalizeByReference and the Bioinformatics article gives a number of different ways to normalise the data. I also notice that the 5% quantiles of the normalised results aren't scaled to 0. Is this an additional step ? Looking at the source code, I also noticed that it log2 scales the reference probes and natural log scales the background probes. Is this intentional ? Thanks, Dario. -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia

• 732 views

ADD COMMENT • link 12.0 years ago Dario Strbenac ★ 1.5k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 5 weeks ago

EMBL European Molecular Biology Laborat…

Dear Dario thank you for the feedback. Sorry that this is so much trouble for you. The method implemented in the normalizeByReference function is described in the "Methods" part, Section 2.3 of the article Huber W, Toedling J, Steinmetz, L. Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22, 1963-1970 (2006), see http://www-huber.embl.de/pub/pdf/Huber-Bioinf-2006.pdf A single method is described there. Did you maybe refer to Section 3 of the paper, where the method is applied and compared to potential alternatives, or Fig.5, where some results of the comparison are visualised? In any case, I have added a more specific reference to Section 2.3 to the function's manual page. Regarding the subtraction of the 5% quantile from the final values in order to make the lower range of the data be at around 0, this is mentioned in the description of Fig.5, but it is not described in the Methods part, and it is not part of the function. It is trivial enough that you can do it yourself if you find it useful. So, to summarise, what the function does is described in Section 2.3, and applications of the function are described in other parts of that paper and in various other papers. If you can tell me exactly what is unclear I'll be happy to add a clarification to the function's manual page. Regarding logarithms, base 2 is used throughout. I am not sure I understand your question, can you point out what you think the problem is? Best wishes Wolfgang May/10/12 2:00 AM, Dario Strbenac scripsit:: > Hello, > > Could the description about what steps are done be made clearer ? > Looking at the output, it seems that the data has been divided by the > DNA values and log2 scaled after DNA reference normalisation. But > there's no mention of this in ?normalizeByReference and the > Bioinformatics article gives a number of different ways to normalise > the data. I also notice that the 5% quantiles of the normalised > results aren't scaled to 0. Is this an additional step ? > > Looking at the source code, I also noticed that it log2 scales the > reference probes and natural log scales the background probes. Is > this intentional ? > > Thanks, Dario. > > -------------------------------------- Dario Strbenac Research > Assistant Cancer Epigenetics Garvan Institute of Medical Research > Darlinghurst NSW 2010 Australia > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 12.0 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dario Strbenac ★ 1.5k

@dario-strbenac-5916

Last seen 12 hours ago

Australia

Thanks for the clarification. Yes, I was confused which one of the options in Figure 5 of the article were implemented in the tilingArray packages. What I meant by the difference in log bases is : refSig = rowMeans(log2(exprs(reference)[pm, , drop = FALSE])) whereas ybg[, j] = tapply(log(exprs(x)[background, j], 2), strata, genefilter::shorth, tie.action = "min") ?log says the default base is e. So is there a mathematical reason to use base 2 for reference probes, and base e for background probes ? - Dario.

ADD COMMENT • link 12.0 years ago Dario Strbenac ★ 1.5k

0

Entering edit mode

May/15/12 4:00 AM, Dario Strbenac scripsit:: > Thanks for the clarification. Yes, I was confused which one of the options in > Figure 5 of the article were implemented in the tilingArray packages. > > What I meant by the difference in log bases is : > > refSig = rowMeans(log2(exprs(reference)[pm, , drop = FALSE])) > whereas > ybg[, j] = tapply(log(exprs(x)[background, j], 2), strata, genefilter::shorth, > tie.action = "min") > > ?log says the default base is e. So is there a mathematical reason to use base > 2 for reference probes, and base e for background probes ? Hi Dario I see. The second argument of 'log' is 'base', which in the ybg<- line above is set to 2. In fact base=2 is used in all calls to the 'log' function in the package. Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.0 years ago Wolfgang Huber ★ 13k

Login before adding your answer.