Question regarding cellhts2 output
3
0
Entering edit mode
@maziztgenorg-5342
Last seen 10.3 years ago
Hi, Will CellHTS2 work if I donot have positive controls on my plate. I only have negative controls. Thanks, Meraj -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of maziz@tgen.org Sent: Saturday, July 28, 2012 12:11 PM To: joseph.barry at embl.de Cc: bioconductor at r-project.org Subject: Re: [BioC] Question regarding cellhts2 output Hi Joseph, We are getting ready to writeup our findings. I am still wondering about one question. I apologize if you have answered that before But in order to clarify please help me understand the process by which CellHTS2 processes out data. So as I mentioned before we have 900 siRNA (x4 siRNA per gene), no replicates in our experiments. The input to cellhts2 is already a simple ratio between a phosphorylated vs baseline protein. I am using the online version of cellhts2 (http://web-cellhts2.dkfz.de /cellHTS-java/CellHTS2) and not the R version. Following are my parameters that are part of the output generated by the online web cellhts2. ////////////////////////////////////////////////////////////////////// // orgDir=getwd() setwd("/temp/cellHTS2/JOB8905381608534530143") Indir="/temp/cellHTS2/JOB8905381608534530143" zz <- file("/temp/cellHTS2/JOB8905381608534530143_RUN44161073050464844 87/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="test" Outdir_report="/temp/cellHTS2/JOB8905381608534530143_RUN44161073050464 84487" LogTransform=FALSE PlateList="Platelist.txt" Plateconf="PlateConfig.txt" Description="Description.txt" NormalizationMethod="Bscore" NormalizationScaling="additive" VarianceAdjust="byPlate" SummaryMethod="mean" Screenlog="Screenlog.txt" Score="zscore" Annotation="GeneIDs.txt" library(cellHTS2) x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) scores=Data(xsc) ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) setwd(orgDir) sink() ////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////// I choose the normalization method as Bscore and VarianceAdjust "byPlate". One of the questions is After cellhts2 does the Bscore normalization/smoothing of the plate, Does it then take those values and calculate Zscores. You have been very helpful, and I really appreciate it. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de] Sent: Monday, July 16, 2012 12:22 AM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Dear Meraj, Wolfgang has already replied to your questions on the mailing list. Please make sure that you are properly subscribed: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Best wishes, Joseph Barry On Jul 14, 2012, at 1:07 AM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: Thank you for responding. Somehow I did not receive your reply email and I got to your response to my question when I was searching for a solution online. So given the variances are accounted for: According to wikipedia: FPR = FP/(FP+TN) Suppose I have 50 wells of "-ve" controls in total across all plates and 20 (TP) show up in the "hit list". This will give me True Positive Rate (TPR) sensitivity: TPR = TP/(TP+FN) TPR= 20/(20+30) = 0.4 I am not sure how to translate that to FPR since I donot know the FPs and TNs. If we have had done a confirmation screen then we could have found out the false positives and true negatives. Am I on the right track? meraj From: Meraj Aziz Sent: Saturday, June 16, 2012 11:49 PM To: 'Joseph Barry' Cc: 'bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">' Subject: RE: Question regarding cellhts2 output I am using CellHTS2 to calculate Bscores. My experiment has only one replicate. There are approx 900 genes (x4 siRNA). From: Meraj Aziz Sent: Saturday, June 16, 2012 5:05 PM To: 'Joseph Barry' Cc: 'bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">' Subject: RE: Question regarding cellhts2 output Hi, Is there a way to calculate the False Discovery Rate (FDR) for an RNAi Experiment. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Wednesday, June 13, 2012 2:45 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, Yes, that would be great. Thanks for being understanding. Best wishes, Joseph On Jun 13, 2012, at 7:49 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: So next time I ask a question I will include bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> In my CC. I apologize for this. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Wednesday, June 13, 2012 2:55 AM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, The negative/positive controls are defined by the user, and their "significance" varies greatly from experiment to experiment. Some have no negative controls, others do. It depends on experimental design. Most of the time they are for quality control, as you say. However, the normalization method "negatives" does make use of this information. See the package documentation for further details. As regards the assignment of probabilities, I would not interpret the Z or Bscores in this way. Each well can be viewed as being independent from the others (again depending on exp design) so you are not really sampling in the way you are suggesting. The setting of a threshold is usually an arbitrary choice based on the data. It is fine to just state your threshold and present the results directly. I am happy to answer any further questions, should you have any. However, it would be great if you could send any such questions out through the bioconductor mailing list so that other users may contribute to the discussion and benefit from the commentary. Many thanks, Joseph On Jun 13, 2012, at 1:46 AM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: Hi Joseph One more question regarding CellHTS2 is the use of negative controls. When I run CellHTS2 with Bscore normalization, what is the significance of the negative Controls on the plates. Are negative and positive controls only for quality control and visualization purpose or are they actually used somehow in the Bscore calculations. Thanks, Meraj From: Meraj Aziz Sent: Tuesday, June 12, 2012 1:12 PM To: 'Joseph Barry' Subject: RE: Question regarding cellhts2 output Hi Joseph, Thanks for your reply. The reason i was interested in knowing if my screen was normally distributed is using the Bscores (assuming the scores are standard deviations from the median) to assign probability to each siRNA (using something like Zscore to Probability tables/calculators). The outcome from Z/Bscore to probability should give the probability that the given siRNA effect observed by chance is x%. For example: So at threshold "2" (Bscore) the probability is 0.023 or 2.23%. This 2.23% means that the probability of a siRNA giving you the observed effect by chance Is less than 2.23%. For threshold "3" the probability is 0.00135 or 0.135%. For that I need to be sure we are assumption of normality is true or not. I hope I am interpreting the results from CellHTS2 Bscore normalization the right way. Our aim is to justify why we are using a particular Bscore cutoff. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Tuesday, June 12, 2012 12:07 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, The density plot just shows the distribution of scores for your screen and conveniently marks the positions of positive/negative controls. Your screen is not fully normal as it does not have the classical bell shape. However I would not read too much into whether a screen is normally distributed or not. Scores which seem to break the trend (such as your SMG1) tend to lie further from the line on the Q-Q plot but I would not waste too much time looking at this. They are primarily for quality control, to check that the distribution does not look "funny". Best wishes, Joseph On Jun 12, 2012, at 8:18 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: Hi Joseph The Q-Q plots gives a measure of testing for normality of our RNAi distribution. Attached is my screens Q-Q plot. What does the density plot imply and is my screen normally distributed? You have been really helpful. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Monday, June 11, 2012 12:55 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, My apologies, I had not spotted the line: xsc=scoreReplicates(xn, sign = "-", method = Score) , which is calculating the zscore at this stage and multiplying by -1. This is absolutely fine. Therefore I don't think there is anything wrong with your analysis. I would not be concerned that you get a score of -76 s.d.. This is perfectly reasonable, given that the standard deviation is ~0.07, i.e. the scores seem high simply because you divide by a small number. Hope this helps, Joseph On Jun 11, 2012, at 9:26 PM, Joseph Barry wrote: Hi Meraj, I noticed in your output that VarianceAdjust="none" so I guess that you have not divided by the MAD (or standard deviation) using cellHTS2, but have rather done this as a post- processing step? Can you check that you have not made a mistake in calculating the zscore? In R, I quickly manually divided by MAD and obtained a more conservative range: range(x$normalized_r1_ch1/mad(x$normalized_r1_ch1, na.rm=TRUE), na.rm=TRUE) [1] -12.46033 63.93462 The median is zero, as it should be, so the subtraction of the median is working fine. As a solution, I recommend you reanalyze your data with the VarianceAdjust="byPlate" option turned on. Best wishes, Joseph On Jun 11, 2012, at 9:04 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: Attached is my output from CellHTS2. So I was interested in gene "SMG1" and at a cutoff of "-2 BScore" I get all 4 siRNA, which is good. But the score in the negative goes upto -76.26 Standard deviations which seems a lot. My parameters are as follows: orgDir=getwd() setwd("/temp/cellHTS2/JOB5676000587616137010") Indir="/temp/cellHTS2/JOB5676000587616137010" zz <- file("/temp/cellHTS2/JOB5676000587616137010_RUN13703788433091823 39/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="SCNA_with_pos_ctrl" Outdir_report="/temp/cellHTS2/JOB5676000587616137010_RUN13703788433091 82339" LogTransform=FALSE PlateList="Platelist.txt" Plateconf="PlateConfig.txt" Description="Description.txt" NormalizationMethod="Bscore" NormalizationScaling="additive" VarianceAdjust="none" SummaryMethod="mean" Screenlog="Screenlog.txt" Score="zscore" Annotation="GeneIDs.txt" library(cellHTS2) x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) scores=Data(xsc) ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) setwd(orgDir) sink() Any comments from you will really help guiding me towards the right direction. meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Monday, June 11, 2012 11:53 AM To: Meraj Aziz Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Subject: Re: Question regarding cellhts2 output Hi Meraj, One clarification: the Bscore method in cellHTS2 does not automatically divide by the MAD. One must explicitly specify varianceAdjust="byPlate" to enforce this. Best wishes, Joseph On Jun 11, 2012, at 8:37 PM, Joseph Barry wrote: Hi Meraj, I would recommend that you use the method="median" and varianceAdjust="byPlate" (or alternatively "byExperiment" or "byBatch", depending on the context) options to normalizePlates. This will subtract the median and divide by the median absolute deviation (MAD), which is slightly more robust than the classical zscore, where one subtracts the mean and divides by the standard deviation. The Bscore normalization method subtracts the plate median and divides by the plate MAD, but also applies a two-way median polish to correct for row and column effects. Thus it is essentially a zscore with a few more bells and whistles attached, if you will. The references at the bottom of the ?Bscore documentation explain this in more detail and will help you to decide whether or not this is appropriate for your data. (cc'd to the bioconductor mailing list for future googlers :) ) Best wishes, Joseph On Jun 11, 2012, at 8:11 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: Hi Joseph I have a question regarding the scores generated by cellhts2. I would really appreciate if you can answer them. In your paper http://genomebiology.com/content/pdf/gb-2006-7-7-r66.pdf you mention zscore as the basis of your score. Online cellhts2 does not have a zscore normalization mechanism/option. Question is: 1) How can I only choose zscore normalization. 2) And if I choose Bscore normalization. Is the score really standard deviation from the mean/median. In the R_OUTPUT file I see: NormalizationMethod="Bscore" Score="zscore" (what exactly does this imply) Thank you for your help Meraj <cellhts2_output_scna_project.xlsx> <density.pdf><qqplot.pdf> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Normalization Visualization PROcess cellHTS2 ASSIGN Normalization Visualization PROcess • 2.2k views
ADD COMMENT
0
Entering edit mode
Joseph Barry ▴ 160
@joseph-barry-5000
Last seen 8.1 years ago
Dana-Farber Cancer Institute, Boston, U…
Dear Meraj, Yes, it will work without positive controls. You can annotate your controls however you see fit. See page 12 of the package vignette for more detail ( http://www.bioconductor.org/packages/2.12/bioc/vignettes /cellHTS2/inst/doc/cellhts2Complete.pdf ). Whether or not you require positive controls for proper interpretation of your results is a different question. Best wishes, Joseph On Apr 4, 2013, at 12:53 AM, <maziz at="" tgen.org=""> <maziz at="" tgen.org=""> wrote: > Hi, > Will CellHTS2 work if I donot have positive controls on my plate. I only have negative controls. > Thanks, > Meraj > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of maziz at tgen.org > Sent: Saturday, July 28, 2012 12:11 PM > To: joseph.barry at embl.de > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Question regarding cellhts2 output > > Hi Joseph, > > We are getting ready to writeup our findings. > I am still wondering about one question. I apologize if you have answered that before But in order to clarify please help me understand the process by which CellHTS2 processes out data. > > So as I mentioned before we have 900 siRNA (x4 siRNA per gene), no replicates in our experiments. > The input to cellhts2 is already a simple ratio between a phosphorylated vs baseline protein. > > I am using the online version of cellhts2 (http://web- cellhts2.dkfz.de/cellHTS-java/CellHTS2) and not the R version. > Following are my parameters that are part of the output generated by the online web cellhts2. > > //////////////////////////////////////////////////////////////////// //// > orgDir=getwd() > setwd("/temp/cellHTS2/JOB8905381608534530143") > Indir="/temp/cellHTS2/JOB8905381608534530143" > zz <- file("/temp/cellHTS2/JOB8905381608534530143_RUN441610730504648 4487/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="test" > Outdir_report="/temp/cellHTS2/JOB8905381608534530143_RUN441610730504 6484487" > LogTransform=FALSE > PlateList="Platelist.txt" > Plateconf="PlateConfig.txt" > Description="Description.txt" > NormalizationMethod="Bscore" > NormalizationScaling="additive" > VarianceAdjust="byPlate" > SummaryMethod="mean" > Screenlog="Screenlog.txt" > Score="zscore" > Annotation="GeneIDs.txt" > library(cellHTS2) > x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) > scores=Data(xsc) > ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) > setwd(orgDir) > sink() > //////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////// > > I choose the normalization method as Bscore and VarianceAdjust "byPlate". > One of the questions is After cellhts2 does the Bscore normalization/smoothing of the plate, Does it then take those values and calculate Zscores. > > You have been very helpful, and I really appreciate it. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry at embl.de] > Sent: Monday, July 16, 2012 12:22 AM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Dear Meraj, > > Wolfgang has already replied to your questions on the mailing list. Please make sure that you are properly subscribed: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > > Best wishes, > Joseph Barry > > On Jul 14, 2012, at 1:07 AM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: > > > Thank you for responding. Somehow I did not receive your reply email and I got to your response to my question when I was searching for a solution online. > > So given the variances are accounted for: > > According to wikipedia: > FPR = FP/(FP+TN) > > Suppose I have 50 wells of "-ve" controls in total across all plates and 20 (TP) show up in the "hit list". > This will give me True Positive Rate (TPR) sensitivity: > > TPR = TP/(TP+FN) > TPR= 20/(20+30) = 0.4 > > I am not sure how to translate that to FPR since I donot know the FPs and TNs. > If we have had done a confirmation screen then we could have found out the false positives and true negatives. > > Am I on the right track? > > meraj > > > > From: Meraj Aziz > Sent: Saturday, June 16, 2012 11:49 PM > To: 'Joseph Barry' > Cc: 'bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">' > Subject: RE: Question regarding cellhts2 output > > I am using CellHTS2 to calculate Bscores. My experiment has only one replicate. > There are approx 900 genes (x4 siRNA). > > From: Meraj Aziz > Sent: Saturday, June 16, 2012 5:05 PM > To: 'Joseph Barry' > Cc: 'bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">' > Subject: RE: Question regarding cellhts2 output > > Hi, > > Is there a way to calculate the False Discovery Rate (FDR) for an RNAi Experiment. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry at embl.de]<mailto:[mailto:joseph.barry at="" embl.de]=""> > Sent: Wednesday, June 13, 2012 2:45 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > Yes, that would be great. Thanks for being understanding. > > Best wishes, > Joseph > > On Jun 13, 2012, at 7:49 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: > > So next time I ask a question I will include bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > In my CC. > > I apologize for this. > > Thanks, > Meraj > > From: Joseph Barry [mailto:joseph.barry at embl.de]<mailto:[mailto:joseph.barry at="" embl.de]=""> > Sent: Wednesday, June 13, 2012 2:55 AM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > The negative/positive controls are defined by the user, and their "significance" varies greatly from experiment to experiment. Some have no negative controls, others do. It depends on experimental design. Most of the time they are for quality control, as you say. However, the normalization method "negatives" does make use of this information. See the package documentation for further details. > > As regards the assignment of probabilities, I would not interpret the Z or Bscores in this way. Each well can be viewed as being independent from the others (again depending on exp design) so you are not really sampling in the way you are suggesting. The setting of a threshold is usually an arbitrary choice based on the data. It is fine to just state your threshold and present the results directly. > > I am happy to answer any further questions, should you have any. However, it would be great if you could send any such questions out through the bioconductor mailing list so that other users may contribute to the discussion and benefit from the commentary. > > Many thanks, > Joseph > > On Jun 13, 2012, at 1:46 AM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: > > Hi Joseph > > One more question regarding CellHTS2 is the use of negative controls. > When I run CellHTS2 with Bscore normalization, what is the significance of the negative Controls on the plates. Are negative and positive controls only for quality control and visualization purpose or are they actually used somehow in the Bscore calculations. > > Thanks, > Meraj > > > > From: Meraj Aziz > Sent: Tuesday, June 12, 2012 1:12 PM > To: 'Joseph Barry' > Subject: RE: Question regarding cellhts2 output > > Hi Joseph, > > Thanks for your reply. > The reason i was interested in knowing if my screen was normally distributed is using the Bscores (assuming the scores are standard deviations from the median) to assign probability to each siRNA (using something like Zscore to Probability tables/calculators). > > The outcome from Z/Bscore to probability should give the probability that the given siRNA effect observed by chance is x%. > > For example: > > So at threshold "2" (Bscore) the probability is 0.023 or 2.23%. > This 2.23% means that the probability of a siRNA giving you the observed effect by chance Is less than 2.23%. > For threshold "3" the probability is 0.00135 or 0.135%. > > For that I need to be sure we are assumption of normality is true or not. > > I hope I am interpreting the results from CellHTS2 Bscore normalization the right way. > Our aim is to justify why we are using a particular Bscore cutoff. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry at embl.de]<mailto:[mailto:joseph.barry at="" embl.de]=""> > Sent: Tuesday, June 12, 2012 12:07 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > The density plot just shows the distribution of scores for your screen and conveniently marks the positions of positive/negative controls. Your screen is not fully normal as it does not have the classical bell shape. However I would not read too much into whether a screen is normally distributed or not. Scores which seem to break the trend (such as your SMG1) tend to lie further from the line on the Q-Q plot but I would not waste too much time looking at this. They are primarily for quality control, to check that the distribution does not look "funny". > > Best wishes, > Joseph > > On Jun 12, 2012, at 8:18 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: > > Hi Joseph > > The Q-Q plots gives a measure of testing for normality of our RNAi distribution. > Attached is my screens Q-Q plot. > > What does the density plot imply and is my screen normally distributed? > You have been really helpful. > > Thanks, > Meraj > > > > > From: Joseph Barry [mailto:joseph.barry at embl.de]<mailto:[mailto:joseph.barry at="" embl.de]=""> > Sent: Monday, June 11, 2012 12:55 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > My apologies, I had not spotted the line: > > xsc=scoreReplicates(xn, sign = "-", method = Score) > > , which is calculating the zscore at this stage and multiplying by -1. This is absolutely fine. > > Therefore I don't think there is anything wrong with your analysis. I would not be concerned that you get a score of -76 s.d.. This is perfectly reasonable, given that the standard deviation is ~0.07, i.e. the scores seem high simply because you divide by a small number. > > Hope this helps, > Joseph > > On Jun 11, 2012, at 9:26 PM, Joseph Barry wrote: > > Hi Meraj, > > I noticed in your output that > VarianceAdjust="none" > so I guess that you have not divided by the MAD (or standard deviation) using cellHTS2, but have rather done this as a post- processing step? > > Can you check that you have not made a mistake in calculating the zscore? In R, I quickly manually divided by MAD and obtained a more conservative range: > > range(x$normalized_r1_ch1/mad(x$normalized_r1_ch1, na.rm=TRUE), na.rm=TRUE) [1] -12.46033 63.93462 > > The median is zero, as it should be, so the subtraction of the median is working fine. > > As a solution, I recommend you reanalyze your data with the VarianceAdjust="byPlate" option turned on. > > Best wishes, > Joseph > > > > On Jun 11, 2012, at 9:04 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: > > Attached is my output from CellHTS2. > So I was interested in gene "SMG1" and at a cutoff of "-2 BScore" > I get all 4 siRNA, which is good. But the score in the negative goes upto > -76.26 Standard deviations which seems a lot. > > My parameters are as follows: > > orgDir=getwd() > setwd("/temp/cellHTS2/JOB5676000587616137010") > Indir="/temp/cellHTS2/JOB5676000587616137010" > zz <- file("/temp/cellHTS2/JOB5676000587616137010_RUN137037884330918 2339/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="SCNA_with_pos_ctrl" > Outdir_report="/temp/cellHTS2/JOB5676000587616137010_RUN137037884330 9182339" > LogTransform=FALSE > PlateList="Platelist.txt" > Plateconf="PlateConfig.txt" > Description="Description.txt" > NormalizationMethod="Bscore" > NormalizationScaling="additive" > VarianceAdjust="none" > SummaryMethod="mean" > Screenlog="Screenlog.txt" > Score="zscore" > Annotation="GeneIDs.txt" > library(cellHTS2) > x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) > scores=Data(xsc) > ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) > setwd(orgDir) > sink() > > Any comments from you will really help guiding me towards the right direction. > > meraj > > From: Joseph Barry [mailto:joseph.barry at embl.de]<mailto:[mailto:joseph.barry at="" embl.de]=""> > Sent: Monday, June 11, 2012 11:53 AM > To: Meraj Aziz > Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > One clarification: the Bscore method in cellHTS2 does not automatically divide by the MAD. One must explicitly specify varianceAdjust="byPlate" to enforce this. > > Best wishes, > Joseph > > > On Jun 11, 2012, at 8:37 PM, Joseph Barry wrote: > > > > Hi Meraj, > > I would recommend that you use the method="median" and varianceAdjust="byPlate" (or alternatively "byExperiment" or "byBatch", depending on the context) options to normalizePlates. This will subtract the median and divide by the median absolute deviation (MAD), which is slightly more robust than the classical zscore, where one subtracts the mean and divides by the standard deviation. > > The Bscore normalization method subtracts the plate median and divides by the plate MAD, but also applies a two-way median polish to correct for row and column effects. Thus it is essentially a zscore with a few more bells and whistles attached, if you will. The references at the bottom of the ?Bscore documentation explain this in more detail and will help you to decide whether or not this is appropriate for your data. > > (cc'd to the bioconductor mailing list for future googlers :) ) > > Best wishes, > Joseph > > On Jun 11, 2012, at 8:11 PM, <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> <maziz at="" tgen.org<mailto:maziz="" at="" tgen.org="">> wrote: > > > > Hi Joseph > > I have a question regarding the scores generated by cellhts2. > I would really appreciate if you can answer them. > > In your paper > http://genomebiology.com/content/pdf/gb-2006-7-7-r66.pdf > you mention zscore as the basis of your score. Online > cellhts2 does not have a zscore normalization mechanism/option. > > Question is: > 1) How can I only choose zscore normalization. > 2) And if I choose Bscore normalization. Is the score really standard > deviation from the mean/median. > > In the R_OUTPUT file I see: > NormalizationMethod="Bscore" > Score="zscore" > (what exactly does this imply) > > Thank you for your help > > Meraj > > > <cellhts2_output_scna_project.xlsx> > > > <density.pdf><qqplot.pdf> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@maziztgenorg-5342
Last seen 10.3 years ago
Hi Joseph, If we are using the following: Documentation: ## robust Z score method (plate intensities are subtracted by the per- plate median on sample wells and divided by the per-plate MAD on sample wells): xZ <- normalizePlates(KcViabSmall, scale="additive", log=FALSE, method="median", varianceAdjust="byPlate") then essentially we donot really need to use: scoreReplicates(object, sign="+", method="zscore", ...) Documentation: "method="zscore" (robust z-scores), for each replicate, this is calculated by subtracting the overall median from each measurement and dividing the result by the overall mad. These are estimated for each replicate by considering the distribution of intensities (over all plates) in the wells whose content is annotated as sample." Which is essentially doing the same thing. Am I correct? Meraj -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of maziz@tgen.org Sent: Saturday, July 28, 2012 12:11 PM To: joseph.barry@embl.de Cc: bioconductor@r-project.org Subject: Re: [BioC] Question regarding cellhts2 output Hi Joseph, We are getting ready to writeup our findings. I am still wondering about one question. I apologize if you have answered that before But in order to clarify please help me understand the process by which CellHTS2 processes out data. So as I mentioned before we have 900 siRNA (x4 siRNA per gene), no replicates in our experiments. The input to cellhts2 is already a simple ratio between a phosphorylated vs baseline protein. I am using the online version of cellhts2 (http://web-cellhts2.dkfz.de /cellHTS-java/CellHTS2) and not the R version. Following are my parameters that are part of the output generated by the online web cellhts2. ////////////////////////////////////////////////////////////////////// // orgDir=getwd() setwd("/temp/cellHTS2/JOB8905381608534530143") Indir="/temp/cellHTS2/JOB8905381608534530143" zz <- file("/temp/cellHTS2/JOB8905381608534530143_RUN44161073050464844 87/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="test" Outdir_report="/temp/cellHTS2/JOB8905381608534530143_RUN44161073050464 84487" LogTransform=FALSE PlateList="Platelist.txt" Plateconf="PlateConfig.txt" Description="Description.txt" NormalizationMethod="Bscore" NormalizationScaling="additive" VarianceAdjust="byPlate" SummaryMethod="mean" Screenlog="Screenlog.txt" Score="zscore" Annotation="GeneIDs.txt" library(cellHTS2) x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) scores=Data(xsc) ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) setwd(orgDir) sink() ////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////// I choose the normalization method as Bscore and VarianceAdjust "byPlate". One of the questions is After cellhts2 does the Bscore normalization/smoothing of the plate, Does it then take those values and calculate Zscores. You have been very helpful, and I really appreciate it. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de] Sent: Monday, July 16, 2012 12:22 AM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Dear Meraj, Wolfgang has already replied to your questions on the mailing list. Please make sure that you are properly subscribed: bioconductor@r-proj ect.org<mailto:bioconductor@r-project.org<mailto:bioconductor@r-projec t.org%3cmailto:bioconductor@r-project.org="">> Best wishes, Joseph Barry On Jul 14, 2012, at 1:07 AM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Thank you for responding. Somehow I did not receive your reply email and I got to your response to my question when I was searching for a solution online. So given the variances are accounted for: According to wikipedia: FPR = FP/(FP+TN) Suppose I have 50 wells of "-ve" controls in total across all plates and 20 (TP) show up in the "hit list". This will give me True Positive Rate (TPR) sensitivity: TPR = TP/(TP+FN) TPR= 20/(20+30) = 0.4 I am not sure how to translate that to FPR since I donot know the FPs and TNs. If we have had done a confirmation screen then we could have found out the false positives and true negatives. Am I on the right track? meraj From: Meraj Aziz Sent: Saturday, June 16, 2012 11:49 PM To: 'Joseph Barry' Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org>' Subject: RE: Question regarding cellhts2 output I am using CellHTS2 to calculate Bscores. My experiment has only one replicate. There are approx 900 genes (x4 siRNA). From: Meraj Aziz Sent: Saturday, June 16, 2012 5:05 PM To: 'Joseph Barry' Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org>' Subject: RE: Question regarding cellhts2 output Hi, Is there a way to calculate the False Discovery Rate (FDR) for an RNAi Experiment. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Wednesday, June 13, 2012 2:45 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, Yes, that would be great. Thanks for being understanding. Best wishes, Joseph On Jun 13, 2012, at 7:49 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: So next time I ask a question I will include bioconductor@r-project.or g<mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org% 3cmailto:bioconductor@r-project.org="">> In my CC. I apologize for this. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Wednesday, June 13, 2012 2:55 AM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, The negative/positive controls are defined by the user, and their "significance" varies greatly from experiment to experiment. Some have no negative controls, others do. It depends on experimental design. Most of the time they are for quality control, as you say. However, the normalization method "negatives" does make use of this information. See the package documentation for further details. As regards the assignment of probabilities, I would not interpret the Z or Bscores in this way. Each well can be viewed as being independent from the others (again depending on exp design) so you are not really sampling in the way you are suggesting. The setting of a threshold is usually an arbitrary choice based on the data. It is fine to just state your threshold and present the results directly. I am happy to answer any further questions, should you have any. However, it would be great if you could send any such questions out through the bioconductor mailing list so that other users may contribute to the discussion and benefit from the commentary. Many thanks, Joseph On Jun 13, 2012, at 1:46 AM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Hi Joseph One more question regarding CellHTS2 is the use of negative controls. When I run CellHTS2 with Bscore normalization, what is the significance of the negative Controls on the plates. Are negative and positive controls only for quality control and visualization purpose or are they actually used somehow in the Bscore calculations. Thanks, Meraj From: Meraj Aziz Sent: Tuesday, June 12, 2012 1:12 PM To: 'Joseph Barry' Subject: RE: Question regarding cellhts2 output Hi Joseph, Thanks for your reply. The reason i was interested in knowing if my screen was normally distributed is using the Bscores (assuming the scores are standard deviations from the median) to assign probability to each siRNA (using something like Zscore to Probability tables/calculators). The outcome from Z/Bscore to probability should give the probability that the given siRNA effect observed by chance is x%. For example: So at threshold "2" (Bscore) the probability is 0.023 or 2.23%. This 2.23% means that the probability of a siRNA giving you the observed effect by chance Is less than 2.23%. For threshold "3" the probability is 0.00135 or 0.135%. For that I need to be sure we are assumption of normality is true or not. I hope I am interpreting the results from CellHTS2 Bscore normalization the right way. Our aim is to justify why we are using a particular Bscore cutoff. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Tuesday, June 12, 2012 12:07 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, The density plot just shows the distribution of scores for your screen and conveniently marks the positions of positive/negative controls. Your screen is not fully normal as it does not have the classical bell shape. However I would not read too much into whether a screen is normally distributed or not. Scores which seem to break the trend (such as your SMG1) tend to lie further from the line on the Q-Q plot but I would not waste too much time looking at this. They are primarily for quality control, to check that the distribution does not look "funny". Best wishes, Joseph On Jun 12, 2012, at 8:18 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Hi Joseph The Q-Q plots gives a measure of testing for normality of our RNAi distribution. Attached is my screens Q-Q plot. What does the density plot imply and is my screen normally distributed? You have been really helpful. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Monday, June 11, 2012 12:55 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, My apologies, I had not spotted the line: xsc=scoreReplicates(xn, sign = "-", method = Score) , which is calculating the zscore at this stage and multiplying by -1. This is absolutely fine. Therefore I don't think there is anything wrong with your analysis. I would not be concerned that you get a score of -76 s.d.. This is perfectly reasonable, given that the standard deviation is ~0.07, i.e. the scores seem high simply because you divide by a small number. Hope this helps, Joseph On Jun 11, 2012, at 9:26 PM, Joseph Barry wrote: Hi Meraj, I noticed in your output that VarianceAdjust="none" so I guess that you have not divided by the MAD (or standard deviation) using cellHTS2, but have rather done this as a post- processing step? Can you check that you have not made a mistake in calculating the zscore? In R, I quickly manually divided by MAD and obtained a more conservative range: range(x$normalized_r1_ch1/mad(x$normalized_r1_ch1, na.rm=TRUE), na.rm=TRUE) [1] -12.46033 63.93462 The median is zero, as it should be, so the subtraction of the median is working fine. As a solution, I recommend you reanalyze your data with the VarianceAdjust="byPlate" option turned on. Best wishes, Joseph On Jun 11, 2012, at 9:04 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Attached is my output from CellHTS2. So I was interested in gene "SMG1" and at a cutoff of "-2 BScore" I get all 4 siRNA, which is good. But the score in the negative goes upto -76.26 Standard deviations which seems a lot. My parameters are as follows: orgDir=getwd() setwd("/temp/cellHTS2/JOB5676000587616137010") Indir="/temp/cellHTS2/JOB5676000587616137010" zz <- file("/temp/cellHTS2/JOB5676000587616137010_RUN13703788433091823 39/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="SCNA_with_pos_ctrl" Outdir_report="/temp/cellHTS2/JOB5676000587616137010_RUN13703788433091 82339" LogTransform=FALSE PlateList="Platelist.txt" Plateconf="PlateConfig.txt" Description="Description.txt" NormalizationMethod="Bscore" NormalizationScaling="additive" VarianceAdjust="none" SummaryMethod="mean" Screenlog="Screenlog.txt" Score="zscore" Annotation="GeneIDs.txt" library(cellHTS2) x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) scores=Data(xsc) ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) setwd(orgDir) sink() Any comments from you will really help guiding me towards the right direction. meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Monday, June 11, 2012 11:53 AM To: Meraj Aziz Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org<mailt o:bioconductor@r-project.org%3cmailto:bioconductor@r-project.org="">> Subject: Re: Question regarding cellhts2 output Hi Meraj, One clarification: the Bscore method in cellHTS2 does not automatically divide by the MAD. One must explicitly specify varianceAdjust="byPlate" to enforce this. Best wishes, Joseph On Jun 11, 2012, at 8:37 PM, Joseph Barry wrote: Hi Meraj, I would recommend that you use the method="median" and varianceAdjust="byPlate" (or alternatively "byExperiment" or "byBatch", depending on the context) options to normalizePlates. This will subtract the median and divide by the median absolute deviation (MAD), which is slightly more robust than the classical zscore, where one subtracts the mean and divides by the standard deviation. The Bscore normalization method subtracts the plate median and divides by the plate MAD, but also applies a two-way median polish to correct for row and column effects. Thus it is essentially a zscore with a few more bells and whistles attached, if you will. The references at the bottom of the ?Bscore documentation explain this in more detail and will help you to decide whether or not this is appropriate for your data. (cc'd to the bioconductor mailing list for future googlers :) ) Best wishes, Joseph On Jun 11, 2012, at 8:11 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Hi Joseph I have a question regarding the scores generated by cellhts2. I would really appreciate if you can answer them. In your paper http://genomebiology.com/content/pdf/gb-2006-7-7-r66.pdf you mention zscore as the basis of your score. Online cellhts2 does not have a zscore normalization mechanism/option. Question is: 1) How can I only choose zscore normalization. 2) And if I choose Bscore normalization. Is the score really standard deviation from the mean/median. In the R_OUTPUT file I see: NormalizationMethod="Bscore" Score="zscore" (what exactly does this imply) Thank you for your help Meraj <cellhts2_output_scna_project.xlsx> <density.pdf><qqplot.pdf> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Meraj, Almost. There are a couple of subtle differences. normalizePlates(KcViabSmall, scale="additive", log=FALSE, method="median", varianceAdjust="byPlate") subtracts the sample median and divides by the sample MAD on a per- plate basis. scoreReplicates(object, sign="+", method="zscore", ) subtracts the sample median and divides by the sample MAD, where both are measured for the whole screen (all plates grouped together). Changing varianceAdjust to "byExperiment" will cause normalizePlates to divide by the whole-screen MAD, but it will still estimate the median on a per-plate basis. I hope the following code makes it clear. library(cellHTS2) data(KcViabSmall) # compare normalization methods, demonstrating shift in plate median x1=normalizePlates(KcViabSmall, scale="additive", log=FALSE, method="median", varianceAdjust="byExperiment") x2=scoreReplicates(KcViabSmall, sign="+", method="zscore") plot(Data(x1),Data(x2)) # an example of how to compute the normalization yourself x3=KcViabSmall I=which(fData(x3)[,"controlStatus"]=="sample") Data(x3)=apply(Data(x3), 2:3, function(x) (x-median(x[I], na.rm=TRUE))/mad(x[I], na.rm=TRUE)) plot(Data(x2),Data(x3)) Best wishes, Joseph On Apr 16, 2013, at 7:56 PM, <maziz@tgen.org> <maziz@tgen.org> wrote: > Hi Joseph, > > If we are using the following: > Documentation: > ## robust Z score method (plate intensities are subtracted by the per-plate median on sample wells and divided by the per-plate MAD on sample wells): > xZ <- normalizePlates(KcViabSmall, scale="additive", log=FALSE, method="median", varianceAdjust="byPlate") > > then essentially we donot really need to use: > scoreReplicates(object, sign="+", method="zscore", ...) > > Documentation: > "method="zscore" (robust z-scores), for each replicate, this is calculated by subtracting the overall median from each measurement and dividing the result > by the overall mad. These are estimated for each replicate by considering the distribution of intensities (over all plates) in the wells whose content is > annotated as sample." > > Which is essentially doing the same thing. > Am I correct? > > Meraj > > > > > > > > > -----Original Message----- > From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of maziz@tgen.org > Sent: Saturday, July 28, 2012 12:11 PM > To: joseph.barry@embl.de > Cc: bioconductor@r-project.org > Subject: Re: [BioC] Question regarding cellhts2 output > > Hi Joseph, > > We are getting ready to writeup our findings. > I am still wondering about one question. I apologize if you have answered that before But in order to clarify please help me understand the process by which CellHTS2 processes out data. > > So as I mentioned before we have 900 siRNA (x4 siRNA per gene), no replicates in our experiments. > The input to cellhts2 is already a simple ratio between a phosphorylated vs baseline protein. > > I am using the online version of cellhts2 (http://web- cellhts2.dkfz.de/cellHTS-java/CellHTS2) and not the R version. > Following are my parameters that are part of the output generated by the online web cellhts2. > > //////////////////////////////////////////////////////////////////// //// > orgDir=getwd() > setwd("/temp/cellHTS2/JOB8905381608534530143") > Indir="/temp/cellHTS2/JOB8905381608534530143" > zz <- file("/temp/cellHTS2/JOB8905381608534530143_RUN441610730504648 4487/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="test" > Outdir_report="/temp/cellHTS2/JOB8905381608534530143_RUN441610730504 6484487" > LogTransform=FALSE > PlateList="Platelist.txt" > Plateconf="PlateConfig.txt" > Description="Description.txt" > NormalizationMethod="Bscore" > NormalizationScaling="additive" > VarianceAdjust="byPlate" > SummaryMethod="mean" > Screenlog="Screenlog.txt" > Score="zscore" > Annotation="GeneIDs.txt" > library(cellHTS2) > x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) > scores=Data(xsc) > ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) > setwd(orgDir) > sink() > //////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////// > > I choose the normalization method as Bscore and VarianceAdjust "byPlate". > One of the questions is After cellhts2 does the Bscore normalization/smoothing of the plate, Does it then take those values and calculate Zscores. > > You have been very helpful, and I really appreciate it. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry@embl.de] > Sent: Monday, July 16, 2012 12:22 AM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Dear Meraj, > > Wolfgang has already replied to your questions on the mailing list. Please make sure that you are properly subscribed: bioconductor@r-project.org<mailto:bioconductor@r-project.org> > > Best wishes, > Joseph Barry > > On Jul 14, 2012, at 1:07 AM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > > Thank you for responding. Somehow I did not receive your reply email and I got to your response to my question when I was searching for a solution online. > > So given the variances are accounted for: > > According to wikipedia: > FPR = FP/(FP+TN) > > Suppose I have 50 wells of "-ve" controls in total across all plates and 20 (TP) show up in the "hit list". > This will give me True Positive Rate (TPR) sensitivity: > > TPR = TP/(TP+FN) > TPR= 20/(20+30) = 0.4 > > I am not sure how to translate that to FPR since I donot know the FPs and TNs. > If we have had done a confirmation screen then we could have found out the false positives and true negatives. > > Am I on the right track? > > meraj > > > > From: Meraj Aziz > Sent: Saturday, June 16, 2012 11:49 PM > To: 'Joseph Barry' > Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org>' > Subject: RE: Question regarding cellhts2 output > > I am using CellHTS2 to calculate Bscores. My experiment has only one replicate. > There are approx 900 genes (x4 siRNA). > > From: Meraj Aziz > Sent: Saturday, June 16, 2012 5:05 PM > To: 'Joseph Barry' > Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org>' > Subject: RE: Question regarding cellhts2 output > > Hi, > > Is there a way to calculate the False Discovery Rate (FDR) for an RNAi Experiment. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Wednesday, June 13, 2012 2:45 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > Yes, that would be great. Thanks for being understanding. > > Best wishes, > Joseph > > On Jun 13, 2012, at 7:49 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > So next time I ask a question I will include bioconductor@r-project.org<mailto:bioconductor@r-project.org> > In my CC. > > I apologize for this. > > Thanks, > Meraj > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Wednesday, June 13, 2012 2:55 AM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > The negative/positive controls are defined by the user, and their "significance" varies greatly from experiment to experiment. Some have no negative controls, others do. It depends on experimental design. Most of the time they are for quality control, as you say. However, the normalization method "negatives" does make use of this information. See the package documentation for further details. > > As regards the assignment of probabilities, I would not interpret the Z or Bscores in this way. Each well can be viewed as being independent from the others (again depending on exp design) so you are not really sampling in the way you are suggesting. The setting of a threshold is usually an arbitrary choice based on the data. It is fine to just state your threshold and present the results directly. > > I am happy to answer any further questions, should you have any. However, it would be great if you could send any such questions out through the bioconductor mailing list so that other users may contribute to the discussion and benefit from the commentary. > > Many thanks, > Joseph > > On Jun 13, 2012, at 1:46 AM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > Hi Joseph > > One more question regarding CellHTS2 is the use of negative controls. > When I run CellHTS2 with Bscore normalization, what is the significance of the negative Controls on the plates. Are negative and positive controls only for quality control and visualization purpose or are they actually used somehow in the Bscore calculations. > > Thanks, > Meraj > > > > From: Meraj Aziz > Sent: Tuesday, June 12, 2012 1:12 PM > To: 'Joseph Barry' > Subject: RE: Question regarding cellhts2 output > > Hi Joseph, > > Thanks for your reply. > The reason i was interested in knowing if my screen was normally distributed is using the Bscores (assuming the scores are standard deviations from the median) to assign probability to each siRNA (using something like Zscore to Probability tables/calculators). > > The outcome from Z/Bscore to probability should give the probability that the given siRNA effect observed by chance is x%. > > For example: > > So at threshold "2" (Bscore) the probability is 0.023 or 2.23%. > This 2.23% means that the probability of a siRNA giving you the observed effect by chance Is less than 2.23%. > For threshold "3" the probability is 0.00135 or 0.135%. > > For that I need to be sure we are assumption of normality is true or not. > > I hope I am interpreting the results from CellHTS2 Bscore normalization the right way. > Our aim is to justify why we are using a particular Bscore cutoff. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Tuesday, June 12, 2012 12:07 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > The density plot just shows the distribution of scores for your screen and conveniently marks the positions of positive/negative controls. Your screen is not fully normal as it does not have the classical bell shape. However I would not read too much into whether a screen is normally distributed or not. Scores which seem to break the trend (such as your SMG1) tend to lie further from the line on the Q-Q plot but I would not waste too much time looking at this. They are primarily for quality control, to check that the distribution does not look "funny". > > Best wishes, > Joseph > > On Jun 12, 2012, at 8:18 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > Hi Joseph > > The Q-Q plots gives a measure of testing for normality of our RNAi distribution. > Attached is my screens Q-Q plot. > > What does the density plot imply and is my screen normally distributed? > You have been really helpful. > > Thanks, > Meraj > > > > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Monday, June 11, 2012 12:55 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > My apologies, I had not spotted the line: > > xsc=scoreReplicates(xn, sign = "-", method = Score) > > , which is calculating the zscore at this stage and multiplying by -1. This is absolutely fine. > > Therefore I don't think there is anything wrong with your analysis. I would not be concerned that you get a score of -76 s.d.. This is perfectly reasonable, given that the standard deviation is ~0.07, i.e. the scores seem high simply because you divide by a small number. > > Hope this helps, > Joseph > > On Jun 11, 2012, at 9:26 PM, Joseph Barry wrote: > > Hi Meraj, > > I noticed in your output that > VarianceAdjust="none" > so I guess that you have not divided by the MAD (or standard deviation) using cellHTS2, but have rather done this as a post- processing step? > > Can you check that you have not made a mistake in calculating the zscore? In R, I quickly manually divided by MAD and obtained a more conservative range: > > range(x$normalized_r1_ch1/mad(x$normalized_r1_ch1, na.rm=TRUE), na.rm=TRUE) [1] -12.46033 63.93462 > > The median is zero, as it should be, so the subtraction of the median is working fine. > > As a solution, I recommend you reanalyze your data with the VarianceAdjust="byPlate" option turned on. > > Best wishes, > Joseph > > > > On Jun 11, 2012, at 9:04 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > Attached is my output from CellHTS2. > So I was interested in gene "SMG1" and at a cutoff of "-2 BScore" > I get all 4 siRNA, which is good. But the score in the negative goes upto > -76.26 Standard deviations which seems a lot. > > My parameters are as follows: > > orgDir=getwd() > setwd("/temp/cellHTS2/JOB5676000587616137010") > Indir="/temp/cellHTS2/JOB5676000587616137010" > zz <- file("/temp/cellHTS2/JOB5676000587616137010_RUN137037884330918 2339/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="SCNA_with_pos_ctrl" > Outdir_report="/temp/cellHTS2/JOB5676000587616137010_RUN137037884330 9182339" > LogTransform=FALSE > PlateList="Platelist.txt" > Plateconf="PlateConfig.txt" > Description="Description.txt" > NormalizationMethod="Bscore" > NormalizationScaling="additive" > VarianceAdjust="none" > SummaryMethod="mean" > Screenlog="Screenlog.txt" > Score="zscore" > Annotation="GeneIDs.txt" > library(cellHTS2) > x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) > scores=Data(xsc) > ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) > setwd(orgDir) > sink() > > Any comments from you will really help guiding me towards the right direction. > > meraj > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Monday, June 11, 2012 11:53 AM > To: Meraj Aziz > Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > One clarification: the Bscore method in cellHTS2 does not automatically divide by the MAD. One must explicitly specify varianceAdjust="byPlate" to enforce this. > > Best wishes, > Joseph > > > On Jun 11, 2012, at 8:37 PM, Joseph Barry wrote: > > > > Hi Meraj, > > I would recommend that you use the method="median" and varianceAdjust="byPlate" (or alternatively "byExperiment" or "byBatch", depending on the context) options to normalizePlates. This will subtract the median and divide by the median absolute deviation (MAD), which is slightly more robust than the classical zscore, where one subtracts the mean and divides by the standard deviation. > > The Bscore normalization method subtracts the plate median and divides by the plate MAD, but also applies a two-way median polish to correct for row and column effects. Thus it is essentially a zscore with a few more bells and whistles attached, if you will. The references at the bottom of the ?Bscore documentation explain this in more detail and will help you to decide whether or not this is appropriate for your data. > > (cc'd to the bioconductor mailing list for future googlers :) ) > > Best wishes, > Joseph > > On Jun 11, 2012, at 8:11 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > > > Hi Joseph > > I have a question regarding the scores generated by cellhts2. > I would really appreciate if you can answer them. > > In your paper > http://genomebiology.com/content/pdf/gb-2006-7-7-r66.pdf > you mention zscore as the basis of your score. Online > cellhts2 does not have a zscore normalization mechanism/option. > > Question is: > 1) How can I only choose zscore normalization. > 2) And if I choose Bscore normalization. Is the score really standard > deviation from the mean/median. > > In the R_OUTPUT file I see: > NormalizationMethod="Bscore" > Score="zscore" > (what exactly does this imply) > > Thank you for your help > > Meraj > > > <cellhts2_output_scna_project.xlsx> > > > <density.pdf><qqplot.pdf> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@maziztgenorg-5342
Last seen 10.3 years ago
Thanks Joseph for answering all my questions. I am trying to compare the outputs from "negative" control based normalization and simply the zscore (median) non-control based normalization. For non-control based normalization I understand that the raw value is subtracted from the median of the all the "sample" wells and divided by the MAD. The MAD in this case is calculated based on "sample" wells but what about the MAD calculation when we consider the "negatives" based normalization. Do we ever take into consideration the variance in the negative controls? %%%% My parameters %%%%%% xn=normalizePlates(x, scale =additive , log =FALSE, method=median, varianceAdjust=byPlate) vs xn=normalizePlates(x, scale =additive , log =FALSE, method=negatives, varianceAdjust=byPlate) %%%%%%%%%%%%%%%%%%% Thanks, Meraj On Apr 16, 2013, at 7:56 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: Hi Joseph, If we are using the following: Documentation: ## robust Z score method (plate intensities are subtracted by the per- plate median on sample wells and divided by the per-plate MAD on sample wells): xZ <- normalizePlates(KcViabSmall, scale="additive", log=FALSE, method="median", varianceAdjust="byPlate") then essentially we donot really need to use: scoreReplicates(object, sign="+", method="zscore", ...) Documentation: "method="zscore" (robust z-scores), for each replicate, this is calculated by subtracting the overall median from each measurement and dividing the result by the overall mad. These are estimated for each replicate by considering the distribution of intensities (over all plates) in the wells whose content is annotated as sample." Which is essentially doing the same thing. Am I correct? Meraj -----Original Message----- From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [mailto:bioconductor-bounces@r-project.org] On Behalf Of maziz@tgen.org<mailto:maziz@tgen.org> Sent: Saturday, July 28, 2012 12:11 PM To: joseph.barry@embl.de<mailto:joseph.barry@embl.de> Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] Question regarding cellhts2 output Hi Joseph, We are getting ready to writeup our findings. I am still wondering about one question. I apologize if you have answered that before But in order to clarify please help me understand the process by which CellHTS2 processes out data. So as I mentioned before we have 900 siRNA (x4 siRNA per gene), no replicates in our experiments. The input to cellhts2 is already a simple ratio between a phosphorylated vs baseline protein. I am using the online version of cellhts2 (http://web-cellhts2.dkfz.de /cellHTS-java/CellHTS2) and not the R version. Following are my parameters that are part of the output generated by the online web cellhts2. ////////////////////////////////////////////////////////////////////// // orgDir=getwd() setwd("/temp/cellHTS2/JOB8905381608534530143") Indir="/temp/cellHTS2/JOB8905381608534530143" zz <- file("/temp/cellHTS2/JOB8905381608534530143_RUN44161073050464844 87/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="test" Outdir_report="/temp/cellHTS2/JOB8905381608534530143_RUN44161073050464 84487" LogTransform=FALSE PlateList="Platelist.txt" Plateconf="PlateConfig.txt" Description="Description.txt" NormalizationMethod="Bscore" NormalizationScaling="additive" VarianceAdjust="byPlate" SummaryMethod="mean" Screenlog="Screenlog.txt" Score="zscore" Annotation="GeneIDs.txt" library(cellHTS2) x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) scores=Data(xsc) ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) setwd(orgDir) sink() ////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////// I choose the normalization method as Bscore and VarianceAdjust "byPlate". One of the questions is After cellhts2 does the Bscore normalization/smoothing of the plate, Does it then take those values and calculate Zscores. You have been very helpful, and I really appreciate it. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de] Sent: Monday, July 16, 2012 12:22 AM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Dear Meraj, Wolfgang has already replied to your questions on the mailing list. Please make sure that you are properly subscribed: bioconductor@r-proj ect.org<mailto:bioconductor@r-project.org<mailto:bioconductor@r-projec t.org%3cmailto:bioconductor@r-project.org="">> Best wishes, Joseph Barry On Jul 14, 2012, at 1:07 AM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Thank you for responding. Somehow I did not receive your reply email and I got to your response to my question when I was searching for a solution online. So given the variances are accounted for: According to wikipedia: FPR = FP/(FP+TN) Suppose I have 50 wells of "-ve" controls in total across all plates and 20 (TP) show up in the "hit list". This will give me True Positive Rate (TPR) sensitivity: TPR = TP/(TP+FN) TPR= 20/(20+30) = 0.4 I am not sure how to translate that to FPR since I donot know the FPs and TNs. If we have had done a confirmation screen then we could have found out the false positives and true negatives. Am I on the right track? meraj From: Meraj Aziz Sent: Saturday, June 16, 2012 11:49 PM To: 'Joseph Barry' Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org><mai lto:bioconductor@r-project.org="">' Subject: RE: Question regarding cellhts2 output I am using CellHTS2 to calculate Bscores. My experiment has only one replicate. There are approx 900 genes (x4 siRNA). From: Meraj Aziz Sent: Saturday, June 16, 2012 5:05 PM To: 'Joseph Barry' Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org><mai lto:bioconductor@r-project.org="">' Subject: RE: Question regarding cellhts2 output Hi, Is there a way to calculate the False Discovery Rate (FDR) for an RNAi Experiment. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Wednesday, June 13, 2012 2:45 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, Yes, that would be great. Thanks for being understanding. Best wishes, Joseph On Jun 13, 2012, at 7:49 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: So next time I ask a question I will include bioconductor@r-project.or g<mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org% 3cmailto:bioconductor@r-project.org="">> In my CC. I apologize for this. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Wednesday, June 13, 2012 2:55 AM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, The negative/positive controls are defined by the user, and their "significance" varies greatly from experiment to experiment. Some have no negative controls, others do. It depends on experimental design. Most of the time they are for quality control, as you say. However, the normalization method "negatives" does make use of this information. See the package documentation for further details. As regards the assignment of probabilities, I would not interpret the Z or Bscores in this way. Each well can be viewed as being independent from the others (again depending on exp design) so you are not really sampling in the way you are suggesting. The setting of a threshold is usually an arbitrary choice based on the data. It is fine to just state your threshold and present the results directly. I am happy to answer any further questions, should you have any. However, it would be great if you could send any such questions out through the bioconductor mailing list so that other users may contribute to the discussion and benefit from the commentary. Many thanks, Joseph On Jun 13, 2012, at 1:46 AM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Hi Joseph One more question regarding CellHTS2 is the use of negative controls. When I run CellHTS2 with Bscore normalization, what is the significance of the negative Controls on the plates. Are negative and positive controls only for quality control and visualization purpose or are they actually used somehow in the Bscore calculations. Thanks, Meraj From: Meraj Aziz Sent: Tuesday, June 12, 2012 1:12 PM To: 'Joseph Barry' Subject: RE: Question regarding cellhts2 output Hi Joseph, Thanks for your reply. The reason i was interested in knowing if my screen was normally distributed is using the Bscores (assuming the scores are standard deviations from the median) to assign probability to each siRNA (using something like Zscore to Probability tables/calculators). The outcome from Z/Bscore to probability should give the probability that the given siRNA effect observed by chance is x%. For example: So at threshold "2" (Bscore) the probability is 0.023 or 2.23%. This 2.23% means that the probability of a siRNA giving you the observed effect by chance Is less than 2.23%. For threshold "3" the probability is 0.00135 or 0.135%. For that I need to be sure we are assumption of normality is true or not. I hope I am interpreting the results from CellHTS2 Bscore normalization the right way. Our aim is to justify why we are using a particular Bscore cutoff. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Tuesday, June 12, 2012 12:07 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, The density plot just shows the distribution of scores for your screen and conveniently marks the positions of positive/negative controls. Your screen is not fully normal as it does not have the classical bell shape. However I would not read too much into whether a screen is normally distributed or not. Scores which seem to break the trend (such as your SMG1) tend to lie further from the line on the Q-Q plot but I would not waste too much time looking at this. They are primarily for quality control, to check that the distribution does not look "funny". Best wishes, Joseph On Jun 12, 2012, at 8:18 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Hi Joseph The Q-Q plots gives a measure of testing for normality of our RNAi distribution. Attached is my screens Q-Q plot. What does the density plot imply and is my screen normally distributed? You have been really helpful. Thanks, Meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Monday, June 11, 2012 12:55 PM To: Meraj Aziz Subject: Re: Question regarding cellhts2 output Hi Meraj, My apologies, I had not spotted the line: xsc=scoreReplicates(xn, sign = "-", method = Score) , which is calculating the zscore at this stage and multiplying by -1. This is absolutely fine. Therefore I don't think there is anything wrong with your analysis. I would not be concerned that you get a score of -76 s.d.. This is perfectly reasonable, given that the standard deviation is ~0.07, i.e. the scores seem high simply because you divide by a small number. Hope this helps, Joseph On Jun 11, 2012, at 9:26 PM, Joseph Barry wrote: Hi Meraj, I noticed in your output that VarianceAdjust="none" so I guess that you have not divided by the MAD (or standard deviation) using cellHTS2, but have rather done this as a post- processing step? Can you check that you have not made a mistake in calculating the zscore? In R, I quickly manually divided by MAD and obtained a more conservative range: range(x$normalized_r1_ch1/mad(x$normalized_r1_ch1, na.rm=TRUE), na.rm=TRUE) [1] -12.46033 63.93462 The median is zero, as it should be, so the subtraction of the median is working fine. As a solution, I recommend you reanalyze your data with the VarianceAdjust="byPlate" option turned on. Best wishes, Joseph On Jun 11, 2012, at 9:04 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Attached is my output from CellHTS2. So I was interested in gene "SMG1" and at a cutoff of "-2 BScore" I get all 4 siRNA, which is good. But the score in the negative goes upto -76.26 Standard deviations which seems a lot. My parameters are as follows: orgDir=getwd() setwd("/temp/cellHTS2/JOB5676000587616137010") Indir="/temp/cellHTS2/JOB5676000587616137010" zz <- file("/temp/cellHTS2/JOB5676000587616137010_RUN13703788433091823 39/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="SCNA_with_pos_ctrl" Outdir_report="/temp/cellHTS2/JOB5676000587616137010_RUN13703788433091 82339" LogTransform=FALSE PlateList="Platelist.txt" Plateconf="PlateConfig.txt" Description="Description.txt" NormalizationMethod="Bscore" NormalizationScaling="additive" VarianceAdjust="none" SummaryMethod="mean" Screenlog="Screenlog.txt" Score="zscore" Annotation="GeneIDs.txt" library(cellHTS2) x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) scores=Data(xsc) ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) setwd(orgDir) sink() Any comments from you will really help guiding me towards the right direction. meraj From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> Sent: Monday, June 11, 2012 11:53 AM To: Meraj Aziz Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org<mailt o:bioconductor@r-project.org%3cmailto:bioconductor@r-project.org="">> Subject: Re: Question regarding cellhts2 output Hi Meraj, One clarification: the Bscore method in cellHTS2 does not automatically divide by the MAD. One must explicitly specify varianceAdjust="byPlate" to enforce this. Best wishes, Joseph On Jun 11, 2012, at 8:37 PM, Joseph Barry wrote: Hi Meraj, I would recommend that you use the method="median" and varianceAdjust="byPlate" (or alternatively "byExperiment" or "byBatch", depending on the context) options to normalizePlates. This will subtract the median and divide by the median absolute deviation (MAD), which is slightly more robust than the classical zscore, where one subtracts the mean and divides by the standard deviation. The Bscore normalization method subtracts the plate median and divides by the plate MAD, but also applies a two-way median polish to correct for row and column effects. Thus it is essentially a zscore with a few more bells and whistles attached, if you will. The references at the bottom of the ?Bscore documentation explain this in more detail and will help you to decide whether or not this is appropriate for your data. (cc'd to the bioconductor mailing list for future googlers :) ) Best wishes, Joseph On Jun 11, 2012, at 8:11 PM, <maziz@tgen.org<mailto:maziz@tgen.org<mai lto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> <maziz@tgen.org<mailto:m aziz@tgen.org<mailto:maziz@tgen.org%3cmailto:maziz@tgen.org="">>> wrote: Hi Joseph I have a question regarding the scores generated by cellhts2. I would really appreciate if you can answer them. In your paper http://genomebiology.com/content/pdf/gb-2006-7-7-r66.pdf you mention zscore as the basis of your score. Online cellhts2 does not have a zscore normalization mechanism/option. Question is: 1) How can I only choose zscore normalization. 2) And if I choose Bscore normalization. Is the score really standard deviation from the mean/median. In the R_OUTPUT file I see: NormalizationMethod="Bscore" Score="zscore" (what exactly does this imply) Thank you for your help Meraj <cellhts2_output_scna_project.xlsx> <density.pdf><qqplot.pdf> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Meraj, In both cases the MAD is determined from the sample wells. As explained in the ?normalizePlates documentation, method="median" and method="negatives" only affect the median subtraction step. The varianceAdjust step always uses the sample wells to calculate the MAD. If you decide that for some reason you wish to divide by the variance of the controls, you would need to do this normalization yourself. You could use the code from my previous email (note to others: this is not below. see mail in yesterday's archive) as a starting point. As I don't know all of the details about your experimental design, I can't comment whether or not this is a reasonable way forward. It is unclear to me why you want to do this. Best wishes, Joe On Apr 17, 2013, at 10:52 PM, <maziz@tgen.org> <maziz@tgen.org> wrote: > Thanks Joseph for answering all my questions. > I am trying to compare the outputs from “negative” control based normalization and simply the zscore (median) non-control based normalization. > For non-control based normalization I understand that the raw value is subtracted from the median of the all the “sample” wells and divided > by the MAD. The MAD in this case is calculated based on “sample” wells but what about the MAD calculation when we consider the “negatives” > based normalization. Do we ever take into consideration the variance in the negative controls? > > %%%% My parameters %%%%%% > xn=normalizePlates(x, scale =additive , log =FALSE, method=median, varianceAdjust=byPlate) > vs > xn=normalizePlates(x, scale =additive , log =FALSE, method=negatives, varianceAdjust=byPlate) > %%%%%%%%%%%%%%%%%%% > > Thanks, > Meraj > > > > > > > On Apr 16, 2013, at 7:56 PM, <maziz@tgen.org> <maziz@tgen.org> wrote: > > > > Hi Joseph, > > If we are using the following: > Documentation: > ## robust Z score method (plate intensities are subtracted by the per-plate median on sample wells and divided by the per-plate MAD on sample wells): > xZ <- normalizePlates(KcViabSmall, scale="additive", log=FALSE, method="median", varianceAdjust="byPlate") > > then essentially we donot really need to use: > scoreReplicates(object, sign="+", method="zscore", ...) > > Documentation: > "method="zscore" (robust z-scores), for each replicate, this is calculated by subtracting the overall median from each measurement and dividing the result > by the overall mad. These are estimated for each replicate by considering the distribution of intensities (over all plates) in the wells whose content is > annotated as sample." > > Which is essentially doing the same thing. > Am I correct? > > Meraj > > > > > > > > > -----Original Message----- > From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of maziz@tgen.org > Sent: Saturday, July 28, 2012 12:11 PM > To: joseph.barry@embl.de > Cc: bioconductor@r-project.org > Subject: Re: [BioC] Question regarding cellhts2 output > > Hi Joseph, > > We are getting ready to writeup our findings. > I am still wondering about one question. I apologize if you have answered that before But in order to clarify please help me understand the process by which CellHTS2 processes out data. > > So as I mentioned before we have 900 siRNA (x4 siRNA per gene), no replicates in our experiments. > The input to cellhts2 is already a simple ratio between a phosphorylated vs baseline protein. > > I am using the online version of cellhts2 (http://web- cellhts2.dkfz.de/cellHTS-java/CellHTS2) and not the R version. > Following are my parameters that are part of the output generated by the online web cellhts2. > > //////////////////////////////////////////////////////////////////// //// > orgDir=getwd() > setwd("/temp/cellHTS2/JOB8905381608534530143") > Indir="/temp/cellHTS2/JOB8905381608534530143" > zz <- file("/temp/cellHTS2/JOB8905381608534530143_RUN441610730504648 4487/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="test" > Outdir_report="/temp/cellHTS2/JOB8905381608534530143_RUN441610730504 6484487" > LogTransform=FALSE > PlateList="Platelist.txt" > Plateconf="PlateConfig.txt" > Description="Description.txt" > NormalizationMethod="Bscore" > NormalizationScaling="additive" > VarianceAdjust="byPlate" > SummaryMethod="mean" > Screenlog="Screenlog.txt" > Score="zscore" > Annotation="GeneIDs.txt" > library(cellHTS2) > x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) > scores=Data(xsc) > ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) > setwd(orgDir) > sink() > //////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////// > > I choose the normalization method as Bscore and VarianceAdjust "byPlate". > One of the questions is After cellhts2 does the Bscore normalization/smoothing of the plate, Does it then take those values and calculate Zscores. > > You have been very helpful, and I really appreciate it. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry@embl.de] > Sent: Monday, July 16, 2012 12:22 AM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Dear Meraj, > > Wolfgang has already replied to your questions on the mailing list. Please make sure that you are properly subscribed: bioconductor@r-project.org<mailto:bioconductor@r-project.org> > > Best wishes, > Joseph Barry > > On Jul 14, 2012, at 1:07 AM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > > Thank you for responding. Somehow I did not receive your reply email and I got to your response to my question when I was searching for a solution online. > > So given the variances are accounted for: > > According to wikipedia: > FPR = FP/(FP+TN) > > Suppose I have 50 wells of "-ve" controls in total across all plates and 20 (TP) show up in the "hit list". > This will give me True Positive Rate (TPR) sensitivity: > > TPR = TP/(TP+FN) > TPR= 20/(20+30) = 0.4 > > I am not sure how to translate that to FPR since I donot know the FPs and TNs. > If we have had done a confirmation screen then we could have found out the false positives and true negatives. > > Am I on the right track? > > meraj > > > > From: Meraj Aziz > Sent: Saturday, June 16, 2012 11:49 PM > To: 'Joseph Barry' > Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org>' > Subject: RE: Question regarding cellhts2 output > > I am using CellHTS2 to calculate Bscores. My experiment has only one replicate. > There are approx 900 genes (x4 siRNA). > > From: Meraj Aziz > Sent: Saturday, June 16, 2012 5:05 PM > To: 'Joseph Barry' > Cc: 'bioconductor@r-project.org<mailto:bioconductor@r-project.org>' > Subject: RE: Question regarding cellhts2 output > > Hi, > > Is there a way to calculate the False Discovery Rate (FDR) for an RNAi Experiment. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Wednesday, June 13, 2012 2:45 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > Yes, that would be great. Thanks for being understanding. > > Best wishes, > Joseph > > On Jun 13, 2012, at 7:49 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > So next time I ask a question I will include bioconductor@r-project.org<mailto:bioconductor@r-project.org> > In my CC. > > I apologize for this. > > Thanks, > Meraj > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Wednesday, June 13, 2012 2:55 AM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > The negative/positive controls are defined by the user, and their "significance" varies greatly from experiment to experiment. Some have no negative controls, others do. It depends on experimental design. Most of the time they are for quality control, as you say. However, the normalization method "negatives" does make use of this information. See the package documentation for further details. > > As regards the assignment of probabilities, I would not interpret the Z or Bscores in this way. Each well can be viewed as being independent from the others (again depending on exp design) so you are not really sampling in the way you are suggesting. The setting of a threshold is usually an arbitrary choice based on the data. It is fine to just state your threshold and present the results directly. > > I am happy to answer any further questions, should you have any. However, it would be great if you could send any such questions out through the bioconductor mailing list so that other users may contribute to the discussion and benefit from the commentary. > > Many thanks, > Joseph > > On Jun 13, 2012, at 1:46 AM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > Hi Joseph > > One more question regarding CellHTS2 is the use of negative controls. > When I run CellHTS2 with Bscore normalization, what is the significance of the negative Controls on the plates. Are negative and positive controls only for quality control and visualization purpose or are they actually used somehow in the Bscore calculations. > > Thanks, > Meraj > > > > From: Meraj Aziz > Sent: Tuesday, June 12, 2012 1:12 PM > To: 'Joseph Barry' > Subject: RE: Question regarding cellhts2 output > > Hi Joseph, > > Thanks for your reply. > The reason i was interested in knowing if my screen was normally distributed is using the Bscores (assuming the scores are standard deviations from the median) to assign probability to each siRNA (using something like Zscore to Probability tables/calculators). > > The outcome from Z/Bscore to probability should give the probability that the given siRNA effect observed by chance is x%. > > For example: > > So at threshold "2" (Bscore) the probability is 0.023 or 2.23%. > This 2.23% means that the probability of a siRNA giving you the observed effect by chance Is less than 2.23%. > For threshold "3" the probability is 0.00135 or 0.135%. > > For that I need to be sure we are assumption of normality is true or not. > > I hope I am interpreting the results from CellHTS2 Bscore normalization the right way. > Our aim is to justify why we are using a particular Bscore cutoff. > > Thanks, > Meraj > > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Tuesday, June 12, 2012 12:07 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > The density plot just shows the distribution of scores for your screen and conveniently marks the positions of positive/negative controls. Your screen is not fully normal as it does not have the classical bell shape. However I would not read too much into whether a screen is normally distributed or not. Scores which seem to break the trend (such as your SMG1) tend to lie further from the line on the Q-Q plot but I would not waste too much time looking at this. They are primarily for quality control, to check that the distribution does not look "funny". > > Best wishes, > Joseph > > On Jun 12, 2012, at 8:18 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > Hi Joseph > > The Q-Q plots gives a measure of testing for normality of our RNAi distribution. > Attached is my screens Q-Q plot. > > What does the density plot imply and is my screen normally distributed? > You have been really helpful. > > Thanks, > Meraj > > > > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Monday, June 11, 2012 12:55 PM > To: Meraj Aziz > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > My apologies, I had not spotted the line: > > xsc=scoreReplicates(xn, sign = "-", method = Score) > > , which is calculating the zscore at this stage and multiplying by -1. This is absolutely fine. > > Therefore I don't think there is anything wrong with your analysis. I would not be concerned that you get a score of -76 s.d.. This is perfectly reasonable, given that the standard deviation is ~0.07, i.e. the scores seem high simply because you divide by a small number. > > Hope this helps, > Joseph > > On Jun 11, 2012, at 9:26 PM, Joseph Barry wrote: > > Hi Meraj, > > I noticed in your output that > VarianceAdjust="none" > so I guess that you have not divided by the MAD (or standard deviation) using cellHTS2, but have rather done this as a post- processing step? > > Can you check that you have not made a mistake in calculating the zscore? In R, I quickly manually divided by MAD and obtained a more conservative range: > > range(x$normalized_r1_ch1/mad(x$normalized_r1_ch1, na.rm=TRUE), na.rm=TRUE) [1] -12.46033 63.93462 > > The median is zero, as it should be, so the subtraction of the median is working fine. > > As a solution, I recommend you reanalyze your data with the VarianceAdjust="byPlate" option turned on. > > Best wishes, > Joseph > > > > On Jun 11, 2012, at 9:04 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > Attached is my output from CellHTS2. > So I was interested in gene "SMG1" and at a cutoff of "-2 BScore" > I get all 4 siRNA, which is good. But the score in the negative goes upto > -76.26 Standard deviations which seems a lot. > > My parameters are as follows: > > orgDir=getwd() > setwd("/temp/cellHTS2/JOB5676000587616137010") > Indir="/temp/cellHTS2/JOB5676000587616137010" > zz <- file("/temp/cellHTS2/JOB5676000587616137010_RUN137037884330918 2339/R_OUTPUT.TXT", open="w") sink(file=zz,type="message" ) Name="SCNA_with_pos_ctrl" > Outdir_report="/temp/cellHTS2/JOB5676000587616137010_RUN137037884330 9182339" > LogTransform=FALSE > PlateList="Platelist.txt" > Plateconf="PlateConfig.txt" > Description="Description.txt" > NormalizationMethod="Bscore" > NormalizationScaling="additive" > VarianceAdjust="none" > SummaryMethod="mean" > Screenlog="Screenlog.txt" > Score="zscore" > Annotation="GeneIDs.txt" > library(cellHTS2) > x=readPlateList(PlateList, name = Name, path = Indir) x=configure(x, descripFile=Description, confFile=Plateconf, logFile=Screenlog,path=Indir) xn=normalizePlates(x, scale =NormalizationScaling , log =LogTransform,method=NormalizationMethod, varianceAdjust=VarianceAdjust) comp=compare2cellHTS(x, xn) xsc=scoreReplicates(xn, sign = "-", method = Score) xsc=summarizeReplicates(xsc, summary = SummaryMethod) > scores=Data(xsc) > ylim=quantile(scores, c(0.001, 0.999), na.rm = TRUE) xsc=annotate(xsc, geneIDFile = Annotation) out=writeReport(raw = x, normalized = xn, scored = xsc, outdir = Outdir_report, force = TRUE, settings = list(xrange = c(0.5,3),zrange = c(-4, 8), ar = 1)) > setwd(orgDir) > sink() > > Any comments from you will really help guiding me towards the right direction. > > meraj > > From: Joseph Barry [mailto:joseph.barry@embl.de]<mailto:[mailto:joseph.barry@embl.de]> > Sent: Monday, June 11, 2012 11:53 AM > To: Meraj Aziz > Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> > Subject: Re: Question regarding cellhts2 output > > Hi Meraj, > > One clarification: the Bscore method in cellHTS2 does not automatically divide by the MAD. One must explicitly specify varianceAdjust="byPlate" to enforce this. > > Best wishes, > Joseph > > > On Jun 11, 2012, at 8:37 PM, Joseph Barry wrote: > > > > Hi Meraj, > > I would recommend that you use the method="median" and varianceAdjust="byPlate" (or alternatively "byExperiment" or "byBatch", depending on the context) options to normalizePlates. This will subtract the median and divide by the median absolute deviation (MAD), which is slightly more robust than the classical zscore, where one subtracts the mean and divides by the standard deviation. > > The Bscore normalization method subtracts the plate median and divides by the plate MAD, but also applies a two-way median polish to correct for row and column effects. Thus it is essentially a zscore with a few more bells and whistles attached, if you will. The references at the bottom of the ?Bscore documentation explain this in more detail and will help you to decide whether or not this is appropriate for your data. > > (cc'd to the bioconductor mailing list for future googlers :) ) > > Best wishes, > Joseph > > On Jun 11, 2012, at 8:11 PM, <maziz@tgen.org<mailto:maziz@tgen.org>> <maziz@tgen.org<mailto:maziz@tgen.org>> wrote: > > > > Hi Joseph > > I have a question regarding the scores generated by cellhts2. > I would really appreciate if you can answer them. > > In your paper > http://genomebiology.com/content/pdf/gb-2006-7-7-r66.pdf > you mention zscore as the basis of your score. Online > cellhts2 does not have a zscore normalization mechanism/option. > > Question is: > 1) How can I only choose zscore normalization. > 2) And if I choose Bscore normalization. Is the score really standard > deviation from the mean/median. > > In the R_OUTPUT file I see: > NormalizationMethod="Bscore" > Score="zscore" > (what exactly does this imply) > > Thank you for your help > > Meraj > > > <cellhts2_output_scna_project.xlsx> > > > <density.pdf><qqplot.pdf> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6