Question about patchwork affy pre-processing

0

Entering edit mode

Grant Izmirlian ▴ 10

@grant-izmirlian-2206

Last seen 9.6 years ago

Hi: I'm involved in an experiment using affy hgu133 plus2 arrays. I have affy, gcrma, and other relevent libraries up and running on my linux system. I preprocessed using the 'threestep' function in the gcrma library, using the following settings normalize.method = "quantile.robust" summary.method = "median.polish" background.method = "GCRMA" My question is this. Someone suggested that their biostatistician usually preprocesses via RMA and then merges MAS-5 present/absent calls into the resulting dataframe, which are used to omit genes with MAS-5 absent calls from any further analysis. My feeling is that MAS-5.0 is inferior on the three steps mentioned above, and if present/absent calls are based upon inferior techniques they should not be used. I also believe that people are moving away from what I view as a hidden level of filtering. It is my belief that the best way to do filtering is once at the stage of the analysis. Am I right in thinking that this is a bad idea. Grant Izmirlian I have followed the debate on pm only and in my mind the developement of GCRMA now allows an efficient way to model mm's so that background correction can be done without doubling the per gene noise. So normalization Definitely the normalization, background correction and summary methods of 'three-step' are all the result of research that has applied the best statistical principles in lieu of rather ad-hoc techniques contained in MAS-5. suceeded in refining the methods of MAS-5 My read of the literature and best practice tells me that this is not really a preferable way to do things -- ????? ?????????

Normalization affy gcrma Normalization affy gcrma • 1.2k views

ADD COMMENT • link updated 16.9 years ago by Ben Bolstad ★ 1.2k • written 16.9 years ago by Grant Izmirlian ▴ 10

0

Entering edit mode

Ben Bolstad ★ 1.2k

@ben-bolstad-1494

Last seen 6.6 years ago

Note threestep() is actually part of affyPLM, rather than gcrma. I want to point out that P/M/A calls from MAS5 are derived from a slightly different algorithm than the MAS5 expression values. The P/M/A calls seem to do a reasonable job for what they were designed to do, as opposed to the expression values which are (in my opinion) less desirable. That said, if you ask about the wisdom of pre-filtering (and how you should do it) you'll get many different answers, and searching the archives of the mailing list will bring a number of discussion threads on it. My personal feeling is that you don't need to do it with RMA (or for that matter GCRMA), but I get asked this question often enough that I tell people who are insistent on using P/A type filtering that using the MAS5 versions of these with RMA(GCRMA) is ok if you must. Best, Ben On Mon, 2007-06-11 at 16:01 -0400, Grant Izmirlian wrote: > Hi: > > I'm involved in an experiment using affy hgu133 plus2 arrays. > I have affy, gcrma, and other relevent libraries up and running > on my linux system. > > I preprocessed using the 'threestep' function in the > gcrma library, using the following settings > > normalize.method = "quantile.robust" > summary.method = "median.polish" > background.method = "GCRMA" > > My question is this. Someone suggested that their biostatistician > usually preprocesses via RMA and then merges MAS-5 present/absent > calls into the resulting dataframe, which are used to omit genes with MAS-5 > absent calls from any further analysis. > > My feeling is that MAS-5.0 is inferior on the three steps mentioned above, > and if present/absent calls are based upon inferior techniques they should not > be used. I also believe that people are moving away from what I view as > a hidden level of filtering. It is my belief that the best way to do > filtering is once at the stage of the analysis. > > Am I right in thinking that this is a bad idea. > > > Grant Izmirlian > > > > > I have followed the debate on pm only and in my mind the developement of GCRMA > now allows an efficient way to model mm's so that background correction can be > done without doubling the per gene noise. > > So normalization > > Definitely the normalization, background correction and summary methods of > 'three-step' are all the result of research that has applied the best > statistical principles in lieu of rather ad-hoc techniques contained in > MAS-5. > > > > suceeded in refining the methods of MAS-5 > My read of the literature and best practice tells me that this is not really a > preferable way to do things > -- > ????? ????????? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.9 years ago Ben Bolstad ★ 1.2k

0

Entering edit mode

Dear list, This might be a question that has been discussed previously but I could not find any good solution for it. I have lists of human proteins from various proteomics studies that I want to compare with regards to the GO terms associated to them. I have the RefSeq GI protein id for the proteins and my questions is how I best map those to other identifiers that I can use in subsequent GO analysis? It might be that this problem is solved best outside R but maybe someone still can give me a hint to the best solution. For me this is a problem that comes up quite often - the need to map between different identifiers - and I have not yet find any really good solution to it. If I for example use IPI I always loose some proteins/genes since the coverage is rather bad, but maybe there is no solution that will give perfect mapping?! Any input will be greatly appreciated! Thank you! Best regards, Lina Hultin-Rosenberg Karolinska Biomics Center Karolinska Institute

ADD REPLY • link 16.9 years ago Lina Hultin-Rosenberg ▴ 80

0

Entering edit mode

Lina Hultin-Rosenberg wrote: > Dear list, > > This might be a question that has been discussed previously but I could not > find any good solution for it. I have lists of human proteins from various > proteomics studies that I want to compare with regards to the GO terms > associated to them. I have the RefSeq GI protein id for the proteins and my > questions is how I best map those to other identifiers that I can use in > subsequent GO analysis? > > It might be that this problem is solved best outside R but maybe someone > still can give me a hint to the best solution. For me this is a problem that > comes up quite often - the need to map between different identifiers - and I > have not yet find any really good solution to it. If I for example use IPI I > always loose some proteins/genes since the coverage is rather bad, but maybe > there is no solution that will give perfect mapping?! The file located here: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz and described in detail here: ftp://ftp.ncbi.nih.gov/gene/DATA/README maps refseq to Entrez Gene ID. Once you have the Entrez Gene ID, you can use the bioconductor annotation packages to get GO mappings. The file above is a tab-delimited text file, so you should be able to read it into R and do the matching by GI number rather easily. Hope that helps. Sean

ADD REPLY • link 16.9 years ago Sean Davis 21k

Login before adding your answer.