preprocess core for quantile normalization

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I have to run preprocess core for qantile normalization of my data. Data has protein name in column and peak intensity in row. I have data of different time point. each time point has three replicates. I did all protein normalization for now but two time points are not behaving as expected. So I would like to do quantile normalization using for the data and this I don't want between replicates but between time points. Please advise me how to do this. I can run R if you will explain me to matrix preparation. Best regards, Yashwant -- output of sessionInfo(): no -- Sent via the guest posting facility at bioconductor.org.

Normalization Normalization • 1.6k views

ADD COMMENT • link updated 10.9 years ago by Tim Triche ★ 4.2k • written 10.9 years ago by Guest User ★ 13k

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.6 years ago

United States

if you have different numbers of peaks, bins, whatever, you will need to do kernel smoothing to get the appropriate distribution of quantiles, then normalize to that. Pan Du implemented this in the 'lumi' package, FWIW. Personally I think it's a clever enough idea (kernel-smoothed quantile normalization) that you should read it anyways, but... If you just want to normalize equal numbers of bins, it is pretty straightforward, e.g. suppose you have 100 runs of 100 bins/peaks/whatever (key point being they all have the same number of things per run to normalize ranks of): library(preprocessCore) par(mfrow=c(1,2)) foo <- matrix(rpois(10000, rgamma(1.5, 90)), ncol=100) plot(density(foo[,1]), xlab='reads', main='before') for(i in 2:100) lines(density(foo[,i])) bar <- normalize.quantiles(foo) plot(density(bar[,1]), xlab='reads', main='after') for(i in 2:100) lines(density(bar[,i])) sessionInfo() ## R version 3.0.0 (2013-04-03) ## Platform: x86_64-unknown-linux-gnu (64-bit) ## ## attached base packages: ## [1] stats graphics grDevices datasets utils methods base ## ## other attached packages: ## [1] preprocessCore_1.23.0 BiocInstaller_1.11.1 gtools_2.7.1 ## [4] devtools_1.2 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.3 httr_0.2 memoise_0.1 parallel_3.0.0 ## [6] RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.0 whisker_0.3-2 On Tue, Jun 4, 2013 at 6:21 AM, Yashwant Kumar [guest] < guest@bioconductor.org> wrote: > > I have to run preprocess core for qantile normalization of my data. Data > has protein name in column and peak intensity in row. > > I have data of different time point. each time point has three replicates. > I did all protein normalization for now but two time points are not > behaving as expected. So I would like to do quantile normalization using > for the data and this I don't want between replicates but between time > points. > > Please advise me how to do this. I can run R if you will explain me to > matrix preparation. > > Best regards, > Yashwant > > -- output of sessionInfo(): > > no > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD COMMENT • link 10.9 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Incidentally, this would be a really dumb idea (qnorm'ing read counts) if you wanted to do sensitive differential (expression/DNAse/ChIP/whatever) because it destroys count information. But that's another matter. On Tue, Jun 4, 2013 at 9:46 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > if you have different numbers of peaks, bins, whatever, you will need to > do kernel smoothing to get the appropriate distribution of quantiles, then > normalize to that. Pan Du implemented this in the 'lumi' package, FWIW. > Personally I think it's a clever enough idea (kernel-smoothed quantile > normalization) that you should read it anyways, but... > > If you just want to normalize equal numbers of bins, it is pretty > straightforward, e.g. suppose you have 100 runs of 100 bins/peaks/whatever > (key point being they all have the same number of things per run to > normalize ranks of): > > library(preprocessCore) > > par(mfrow=c(1,2)) > foo <- matrix(rpois(10000, rgamma(1.5, 90)), ncol=100) > plot(density(foo[,1]), xlab='reads', main='before') > for(i in 2:100) lines(density(foo[,i])) > > bar <- normalize.quantiles(foo) > plot(density(bar[,1]), xlab='reads', main='after') > for(i in 2:100) lines(density(bar[,i])) > > > sessionInfo() > ## R version 3.0.0 (2013-04-03) > ## Platform: x86_64-unknown-linux-gnu (64-bit) > ## > ## attached base packages: > ## [1] stats graphics grDevices datasets utils methods base > > ## > ## other attached packages: > ## [1] preprocessCore_1.23.0 BiocInstaller_1.11.1 gtools_2.7.1 > ## [4] devtools_1.2 > ## > ## loaded via a namespace (and not attached): > ## [1] digest_0.6.3 evaluate_0.4.3 httr_0.2 memoise_0.1 > parallel_3.0.0 > ## [6] RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.0 whisker_0.3-2 > > > > On Tue, Jun 4, 2013 at 6:21 AM, Yashwant Kumar [guest] < > guest@bioconductor.org> wrote: > >> >> I have to run preprocess core for qantile normalization of my data. Data >> has protein name in column and peak intensity in row. >> >> I have data of different time point. each time point has three >> replicates. I did all protein normalization for now but two time points are >> not behaving as expected. So I would like to do quantile normalization >> using for the data and this I don't want between replicates but between >> time points. >> >> Please advise me how to do this. I can run R if you will explain me to >> matrix preparation. >> >> Best regards, >> Yashwant >> >> -- output of sessionInfo(): >> >> no >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 10.9 years ago Tim Triche ★ 4.2k

Login before adding your answer.