backgroundCorrect offset value

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 9.4 years ago

United States

Hello Users, I have a question regarding the usage of backgroundCorrect function in LIMMA. when I do the following with offset 50, I am getting 2900 differentially expressed genes RG.b <- backgroundCorrect(RG, method = "normexp", offset = 50); where as, when I do the following with offset 1, I am getting 1300 differentially expressed genes RG.b <- backgroundCorrect(RG, method = "normexp", offset = 1); Please advise which offset value to be used? Why is offset value making so much difference? I am using this for TWO channel data, which is read by "genepix". Greatly appreciate your help. Prasad

• 2.4k views

ADD COMMENT • link updated 12.7 years ago by Gordon Smyth 50k • written 12.7 years ago by Prasad Siddavatam ▴ 150

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Prasad, On 8/26/2011 11:00 AM, Prasad Siddavatam wrote: > > > Hello Users, > > I have a question regarding the usage of backgroundCorrect function in LIMMA. > > when I do the following with offset 50, I am getting 2900 differentially > expressed genes > RG.b<- backgroundCorrect(RG, method = "normexp", offset = 50); > > where as, when I do the following with offset 1, > I am getting 1300 differentially expressed genes > RG.b<- backgroundCorrect(RG, method = "normexp", offset = 1); > > Please advise which offset value to be used? Why is offset value making > so much difference? I can't advise you on the offset to use; that is up to you as the data analyst. But I can explain why you get more genes with a larger offset. When you do a local background correction of your data, for the set of spots that are fairly dim (not much different from background intensity), the resulting ratios can become unstable because the numerators and/or denominators get small. This gives the characteristic spreading of the MA plot at low intensities after background correction. An extreme example would be the instance where the R and G channels are nearly identical (say, 200 and 205), so the uncorrected ratio is close to 1. But if the Rb and Gb values are, say 190 and 185, then the background corrected ratio will be 2! Adding 50 to the R and G values before background correction will dampen the ratio to 0.86, which is likely closer to truth. If you do MA plots before background correction and then after, both with and without adding the offset you will see what I mean. Best, Jim > > I am using this for TWO channel data, which is read by "genepix". > > Greatly appreciate your help. > > Prasad > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 12.7 years ago James W. MacDonald 65k

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 9.4 years ago

United States

Hi Jim, Thank you very much for you explanation. Now I can understand the reason. But when you said its up to the analyst to decide on the offset value. Is this statement based on the number of genes I am expecting (to be differentially expressed) or on some other criteria. Its is very critical because I am using several types of arrays (agilent, cDNAs) Appreciate your help. Prasad

ADD COMMENT • link 12.7 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

Hi Prasad, On 8/26/2011 4:30 PM, Prasad Siddavatam wrote: > Hi Jim, > > Thank you very much for you explanation. Now I can understand the reason. > > But when you said its up to the analyst to decide on the offset value. > Is this statement based on the number of genes I am > expecting (to be differentially expressed) or on some other criteria. Yes ;-D. Seriously though, this is where knowledge of the experiment, exploratory data analysis, etc come into play. You will likely have to make some assumptions, based on what your collaborators say about their expectations, what the data look like, etc. It's not easy, and you never know if you made the correct assumptions. All you can do is realize what assumptions you have made, and have a reasonable rationale for why you made them. Best, Jim > > Its is very critical because I am using several types of > arrays (agilent, cDNAs) > > > Appreciate your help. > > Prasad > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 12.7 years ago James W. MacDonald 65k

0

Entering edit mode

Dear Prasad, The offset added to the data is used to achieve a good balance between precision and bias for the data analysis. The larger offset will give rise to higher precision (smaller variation between replicates), but it will yield larger bias as well (e.g. dampened fold changes). The paper below gives a systematic evaluation on the impact of using different offsets on the precision, bias and false discovery rate for the Illumina BeadChip data. But it should be useful for other platforms as well. http://www.ncbi.nlm.nih.gov/pubmed/20929874 Cheers, Wei On Aug 27, 2011, at 6:30 AM, Prasad Siddavatam wrote: > Hi Jim, > > Thank you very much for you explanation. Now I can understand the reason. > > But when you said its up to the analyst to decide on the offset value. > Is this statement based on the number of genes I am > expecting (to be differentially expressed) or on some other criteria. > > Its is very critical because I am using several types of > arrays (agilent, cDNAs) > > > Appreciate your help. > > Prasad > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 12.7 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 9.4 years ago

United States

Hi Jim, I really really appreciate your help. But I have a problem here. The datasets are downloaded from NCBI. I can't get many details about the experiments, hence the trouble. Even the original publications doesn't say much about the experiments. Prasad

ADD COMMENT • link 12.7 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 9.4 years ago

United States

Dear Dr. Smyth, Thank you very much for your detailed response. I am going to read the references. I understand why I am getting more differential genes with bigger offset value. Basically we are bringing the variance close to zero (with more uniformity across probes). Are you suggesting to delete the probes before doing the background correct? If yes, is there any limit on the maximum number of probes to be deleted? In the variance stabilization, which maximum value (approximately) of fit$df.prior is considered high enough to call a good variance stabilization is achieved. Appreciate your time and help. Prasad

ADD COMMENT • link 12.7 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 9.4 years ago

United States

Dear Wei, Thank you very much. I will read the reference. Prasad

ADD COMMENT • link 12.7 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

(Apologies if you have received this already or if this is considered spam. Please feel free to pass on to anyone who might be interested.) The Stazione Zoologica Anton Dohrn in Naples is among the top research institutions in the world in the fields of marine biology and ecology. The new established bioinformatics laboratory is seeking for a candidate interested in the evolution of genome architecture http://bit.ly/okEGvL We are looking for someone who understands basic biological and evolutionary problems and is able to independently accomplish bioinformatics tasks. Candidates will be expected to have knowledge of biology, genetics and functional genomics, to demonstrate the ability to work in a UNIX/Linux environment and to be familiar with a scripting language (e.g. Perl), a database system (e.g. MySQL) and a statistical programming environment (e.g R). Previous experience with comparative genomics and genomics databases as well as an understanding of statistical methods used in the interpretation of biological data is a desirable asset. Wet lab work might be required during the PhD. All the information about the PhD and the guidelines on how to apply are listed on the webpage http://bit.ly/d2WuXk The closing date for applications is 20 September 2011. Kind Regards Remo -- Remo Sanges Bioinformatics - Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Villa Comunale, 80121 Napoli - Italy +39 081 5833428

ADD REPLY • link 12.7 years ago Remo Sanges ▴ 20

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

Dear Prasad, > Date: Mon, 29 Aug 2011 02:17:01 +0000 > From: Prasad Siddavatam <siddavatam at="" gmail.com=""> > To: <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] backgroundCorrect offset value > > Dear Dr. Smyth, > > Thank you very much for your detailed response. I am going to read the > references. I understand why I am getting more differential genes with bigger > offset value. Basically we are bringing the variance close to zero (with more > uniformity across probes). > > Are you suggesting to delete the probes before doing the background correct? No, I'm not. All probes should be retained for background correction. Non-expressed probes should be filtered before using eBayes(). > If yes, is there any limit on the maximum number of probes to be deleted? > > In the variance stabilization, which maximum value (approximately) of > fit$df.prior is considered high enough to call a good variance stabilization is > achieved. There is no maximum value. Higher is better. I think you're worrying about this more than is necessary. A decent value like offset=50 will give good results in a wide variety of situations. You can even use fit <- eBayes(fit, trend=TRUE) which will makes uniformity of the variance less important. Again, use plotSA(fit) to see what this does. Best wishes Gordon > Appreciate your time and help. > > Prasad ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.7 years ago Gordon Smyth 50k

Login before adding your answer.