External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma??

0

Entering edit mode

Matthew Thornton ▴ 380

@matthew-thornton-5564

Last seen 5 weeks ago

USA, Los Angeles, USC

Hello! I am processing Affymetrix gene chip Rat Gene 2.0 ST chips with bioconductor package xps using rma normalization. I have included the ExFold ERCC external RNA controls with 2 mixes of different concentrations. I am able to pull out intensities for the ERCC controls at different points along the processing scheme. If I pull the ERCC raw intensities, order them by increasing concentration, and transform both the concentration and intensity by log base 2, I see a nice sigmoid curve that I can fit with a cubic polynomial. However, when I pull out the ERCC controls after summarization, when I reorder by concentration, and roughly calculate the log-fold change they are all close to 1?? My supposition is that I am overfitting the data with RMA and that I need to find a better normalization scheme. Does anyone have any ideas for different normalization and summarization methods that I should look at? Like iter-PLIER or FARMS or ? Any advice or comments are welcome. Thanks, Matt matthew.thornton at med.usc.edu

Normalization xps Normalization xps • 1.8k views

ADD COMMENT • link updated 10.8 years ago by Davis, Wade ▴ 350 • written 10.8 years ago by Matthew Thornton ▴ 380

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.5 years ago

Austria

Dear Matt, If you want to use a different normalization method, I would suggest to try MAS5. Alternatively, you are free to play around with different methods. As you can see in my vignette 'xpsPreprocess.pdf' you can do the calculation stepwise, i.e. use different methods for background correction, normalization and summarization, e.g. I would try sector background (mas4), then median normalization and lowess summarization. For FARMS you can do no bgrd-correction, quantile normalization and farms summarization, see chapter 5.5. However, if you think that you may be overfitting data I would not use quantile normalization. With respect to PLIER please see my note in Appendix A.1 of vignette 'APTvsXPS.pdf'. Best regards, Christian On 6/24/14 9:38 PM, Thornton, Matthew wrote: > Hello! > > I am processing Affymetrix gene chip Rat Gene 2.0 ST chips with bioconductor package xps using rma normalization. I have included the ExFold ERCC external RNA controls with 2 mixes of different concentrations. I am able to pull out intensities for the ERCC controls at different points along the processing scheme. If I pull the ERCC raw intensities, order them by increasing concentration, and transform both the concentration and intensity by log base 2, I see a nice sigmoid curve that I can fit with a cubic polynomial. > > However, when I pull out the ERCC controls after summarization, when I reorder by concentration, and roughly calculate the log-fold change they are all close to 1?? My supposition is that I am overfitting the data with RMA and that I need to find a better normalization scheme. Does anyone have any ideas for different normalization and summarization methods that I should look at? Like iter-PLIER or FARMS or ? Any advice or comments are welcome. > > Thanks, > > Matt > > matthew.thornton at med.usc.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 10.8 years ago cstrato ★ 3.9k

0

Entering edit mode

Davis, Wade ▴ 350

@davis-wade-2803

Last seen 10.6 years ago

Matt, Here are some comments that may be helpful, but they don't directly address your question... The "Subgroup B" ERCC spike-ins *should* have lfc=0. I like to look at that group first. I also like to look at the raw data "within subject" across the (log) concentration range and see where the linearly breaks down and the concentrations become indistinguishable (i.e., asymptotic parts of sigmoid curve); I am suspicious of any differences among groups for a gene with expression levels falling in those areas. You might also consider looking at a density plot for each sample with a rug plot showing the values of the ERCC controls. (Non-graphically, use ecdf() to see where they fall in each sample.) Are the upper tails dominated by ERCCs? If so, I would be concerned about using RMA because quantile normalization may be too strong in the presence of such (intentional) differences. For example, Mix 1 has a max concentration of 30,000 while Mix 2 only goes up to 15,000. Based on my understanding, if those controls are indeed the strongest signals in your samples, then by definition they would be equal after RMA. Indeed, Bolstad et al. (2003) mention this in their quantile normalization paper, which is one of the three papers that make up the RMA procedure: "One possible problem with this method is that it forces the values of quantiles to be equal. This would be most problematic in the tails where it is possible that a probe could have the same value across all the arrays. However, in practice, since probeset expression measures are typically computed using the value of multiple probes, we have not found this to be a problem. " Based on this, I would filter out ERCC controls that are in the non- linear range or dominant the tails; you want the ERCC used to be intermingled with "real" data to help avoid these problems. Just some thoughts! Wade -----Original Message----- From: Thornton, Matthew [mailto:Matthew.Thornton@med.usc.edu] Sent: Tuesday, June 24, 2014 2:39 PM To: bioconductor at r-project.org Subject: [BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma?? Hello! I am processing Affymetrix gene chip Rat Gene 2.0 ST chips with bioconductor package xps using rma normalization. I have included the ExFold ERCC external RNA controls with 2 mixes of different concentrations. I am able to pull out intensities for the ERCC controls at different points along the processing scheme. If I pull the ERCC raw intensities, order them by increasing concentration, and transform both the concentration and intensity by log base 2, I see a nice sigmoid curve that I can fit with a cubic polynomial. However, when I pull out the ERCC controls after summarization, when I reorder by concentration, and roughly calculate the log-fold change they are all close to 1?? My supposition is that I am overfitting the data with RMA and that I need to find a better normalization scheme. Does anyone have any ideas for different normalization and summarization methods that I should look at? Like iter-PLIER or FARMS or ? Any advice or comments are welcome. Thanks, Matt matthew.thornton at med.usc.edu

ADD COMMENT • link 10.8 years ago Davis, Wade ▴ 350

0

Entering edit mode

Thank you for the suggestions! I will look at where the ERCC controls fall in the data. I am thinking to use a paired-down set of the ERCC controls in the 'linear' range and which are within my experimental data. I am planning to use the spike-in probes procedure in the vsn package. I will also try mas5 and try to iterate with the different processing procedures in xps. It is good to have an outside metric for assessing normalization. If I can get matching observed log-fold changes similar to my expected log-fold changes, it will give me a little more confidence in my data. When you process data with the ERCC controls, what normalization methods do you use? Thanks again! Matt matthew.thornton at med.usc.edu ________________________________________ From: Davis, Wade [davisjwa@health.missouri.edu] Sent: Wednesday, June 25, 2014 8:26 AM To: Thornton, Matthew; bioconductor at r-project.org Subject: RE: [BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma?? Matt, Here are some comments that may be helpful, but they don't directly address your question... The "Subgroup B" ERCC spike-ins *should* have lfc=0. I like to look at that group first. I also like to look at the raw data "within subject" across the (log) concentration range and see where the linearly breaks down and the concentrations become indistinguishable (i.e., asymptotic parts of sigmoid curve); I am suspicious of any differences among groups for a gene with expression levels falling in those areas. You might also consider looking at a density plot for each sample with a rug plot showing the values of the ERCC controls. (Non-graphically, use ecdf() to see where they fall in each sample.) Are the upper tails dominated by ERCCs? If so, I would be concerned about using RMA because quantile normalization may be too strong in the presence of such (intentional) differences. For example, Mix 1 has a max concentration of 30,000 while Mix 2 only goes up to 15,000. Based on my understanding, if those controls are indeed the strongest signals in your samples, then by definition they would be equal after RMA. Indeed, Bolstad et al. (2003) mention this in their quantile normalization paper, which is one of the three papers that make up the RMA procedure: "One possible problem with this method is that it forces the values of quantiles to be equal. This would be most problematic in the tails where it is possible that a probe could have the same value across all the arrays. However, in practice, since probeset expression measures are typically computed using the value of multiple probes, we have not found this to be a problem. " Based on this, I would filter out ERCC controls that are in the non- linear range or dominant the tails; you want the ERCC used to be intermingled with "real" data to help avoid these problems. Just some thoughts! Wade -----Original Message----- From: Thornton, Matthew [mailto:Matthew.Thornton@med.usc.edu] Sent: Tuesday, June 24, 2014 2:39 PM To: bioconductor at r-project.org Subject: [BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma?? Hello! I am processing Affymetrix gene chip Rat Gene 2.0 ST chips with bioconductor package xps using rma normalization. I have included the ExFold ERCC external RNA controls with 2 mixes of different concentrations. I am able to pull out intensities for the ERCC controls at different points along the processing scheme. If I pull the ERCC raw intensities, order them by increasing concentration, and transform both the concentration and intensity by log base 2, I see a nice sigmoid curve that I can fit with a cubic polynomial. However, when I pull out the ERCC controls after summarization, when I reorder by concentration, and roughly calculate the log-fold change they are all close to 1?? My supposition is that I am overfitting the data with RMA and that I need to find a better normalization scheme. Does anyone have any ideas for different normalization and summarization methods that I should look at? Like iter-PLIER or FARMS or ? Any advice or comments are welcome. Thanks, Matt matthew.thornton at med.usc.edu

ADD REPLY • link 10.8 years ago Matthew Thornton ▴ 380

Login before adding your answer.