1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization

0

Entering edit mode

Pekka Kohonen ▴ 190

@pekka-kohonen-5862

Last seen 6.3 years ago

Sweden

Hi Stefanie, You could map the Affymetrix identifiers to single Entrez/Ensembl identifier using the "custom cdfs" from "BrainArray". You can do the normalization for instance using the "simpleaffy" package. If the Agilent/illumina chip have duplicate probes for some genes you can just take the median of the fold-change values and use those in the RankProd package. It is best to have just one identifier/gene per array, although having more than one is not strictly forbidden. Custom CDF manuscript: http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200 another package to use might be this. But I have not used it myself. RankAggreg: http://www.biomedcentral.com/1471-2105/10/62 Generally using rank-based analysis can lead to significant results that have very small effect sizes (fold-change). So you should use fold change to filter the results to some extent as well. Best, Pekka 2014-04-30 11:36 GMT+02:00 Stefanie Busch <stefanie.busch2 at="" web.de="">: > > Hello, > > I have two questions and I hope you can help me. > > I want to compare several studies with similar design but different arrays. > The first step was to quantile normalize all data which works well beside > the two color experiment with an Agilent chip. I read the limma User Guide > and find out that I must preprocess with the function > normalizeBetweenArrays. So I get M- and A-values and my question is which > one shows the expression values for this experiment? > > For comparing the results of the different studies I want to use the > package: RankProd. For a better comparision between the studies I used the > Entrez IDs and I download the last chip information directly from affymerix > and illumina. So this reveal a new problem. For example on the chip > Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at stands for three > gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - 12095 /// 12096 /// > 12097. On the Illumina Chip each gene has a single Array ID: > Bglap-rs1 - ILMN_1233122 - 12095 > Bglap1 - ILMN_2610166 - 12096 > Bglap2 - ILMN_2944508 - 12097 > > So I don't no what I should do to compare the results of this two > experiments. When I paste the expression values of 1449880_s_at three times > with the three different entrez-IDs the ranking which was calculating with > the RankProd-Package was changed. > Example: > Chip ID Entrez-Id Control1 control 2 etc. > 1449880_s_at - 12095 - 3,855 - 4,211 ... > 1449880_s_at - 12096 - 3,855 - 4,211 ... > 1449880_s_at - 12097 - 3,855 - 4,211 ... > > The other possibility is to take the three expression Values of the illumina > chip to one value. But I don't know if the is the right way. What is the > better way? > > Kind regards > Stefanie Busch > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

cdf RankProd cdf RankProd • 1.6k views

ADD COMMENT • link updated 10.0 years ago by Stefanie Busch ▴ 30 • written 10.0 years ago by Pekka Kohonen ▴ 190

0

Entering edit mode

Stefanie Busch ▴ 30

@stefanie-busch-6530

Last seen 9.6 years ago

Hi Pekka, I had read about the custom cfds now and it sounds very good. But there are still a few questions. The first problem is to install the custom CDF package. I download the package from BrainArray and want to install with the command: install.packages("C:/Users/Julie/documents/R/win- library/3.0/CustomCDF_1.2.1 .tar.gz", repos=NULL,type="source") and I get the following error message: Installing package into ?C:/Users/Julie/Documents/R/win- library/3.0? (as ?lib? is unspecified) * installing *source* package 'CustomCDF' ... ** libs *** arch - i386 ERROR: compilation failed for package 'CustomCDF' * removing 'C:/Users/Julie/Documents/R/win-library/3.0/CustomCDF' Warnmeldungen: 1: Ausf?hrung von Kommando '"C:/PROGRA~1/R/R-30~1.2/bin/x64/R" CMD INSTALL -l "C:\Users\Julie\Documents\R\win- library\3.0" "C:/Users/Julie/documents/R/win- library/3.0/CustomCDF_1.2.1.tar.gz"'ergab Status 1 (the command has the status 1) 2: In install.packages("C:/Users/Julie/documents/R/win- library/3.0/CustomCDF_1.2.1 .tar.gz", : Installation des Pakets ?C:/Users/Julie/documents/R/win- library/3.0/CustomCDF_1.2.1.tar.gz?hatte Exit-Status ungleich 0 (The Installation of the pacakge has the exit-status unequal 0) So I tried to download the Chip information directly. I take the cdf file version 18 for Affymetrix Mouse Genome 430 2.0 Array ([1]Mouse4302) As I have mentioned I want to use the Entrez IDs so I take ENTREZG (mouse4302mmentrezgcdf). The Installation of the package works very well but I'm irritated when I see that there are only 17607 genes/ affyids data<-ReadAffy(verbose=TRUE,filenames=cels,cdfname="mouse4302mmentr ezgcdf") > data AffyBatch object size of arrays=1002x1002 features (47 kb) cdf=mouse4302mmentrezgcdf (17607 affyids) number of samples=96 number of genes=17607 annotation=mouse4302mmentrezgcdf notes= When I take no cdf file I get more affyids data2<-ReadAffy(verbose=TRUE,filenames=cels) > data2 AffyBatch object size of arrays=1002x1002 features (47 kb) cdf=Mouse430_2 (45101 affyids) number of samples=96 number of genes=45101 annotation=mouse4302 notes= When I take the new cdf file, wasn't there a lost of information? 2. I have a question to the median. Median of what? Until nowI have done this: Example control 1 Control 2 control 3 diet1 diet2 diet3 (this are replicates for the same group) Bglap 2,5 3,2 3,1 3,9 4,8 3,1 Bglap 1 0,7 0,9 1,2 0,7 1 Bglap 4,9 3,3 4,1 4,8 5,5 5,2 mean value Con1 Con2 Con3 diet1 diet2 diet3 Bglap 2,8 2,4 2,7 3,3 3,66 3,1 For this values I calculated the p-value with wilcoxon and than I want to compare the results of different experiments with RankProd. So I put all values in a big excel table and upload them to R. This table looks like this: Experment1 Experiment2 con1 con2 con3 Diet1 diet2 diet3 con1 con2 con3 con4 con4 diet1 diet2 diet3 diet4 diet5 Bglap 2,8 2,4 2,7 3,3 3,66 3,1 5,1 6,6 6,2 6,6 6,3 5,9 6,5 6,4 5,7 6,9 Copd 5,4 7,2 5,8 4,3 5 4,9 3 2,7 4 3,5 4,2 4,3 3,5 3,9 2,5 3,1 Sirt1 7 6,5 7,2 7,3 7,1 6,7 4,5 3,7 4,2 4,6 4,1 4,2 4,5 4,8 4,5 3,9 ... cl<- 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2 origin<- 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Must I upload other values for the rank prod? Kind regards Stefanie Gesendet: Freitag, 02. Mai 2014 um 14:26 Uhr Von: "Pekka Kohonen" <pkpekka at="" gmail.com=""> An: "Stefanie Busch" <stefanie.busch2 at="" web.de=""> Cc: Bioconductor <bioconductor at="" r-project.org=""> Betreff: Re: [BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization Hi Stefanie, You could map the Affymetrix identifiers to single Entrez/Ensembl identifier using the "custom cdfs" from "BrainArray". You can do the normalization for instance using the "simpleaffy" package. If the Agilent/illumina chip have duplicate probes for some genes you can just take the median of the fold-change values and use those in the RankProd package. It is best to have just one identifier/gene per array, although having more than one is not strictly forbidden. Custom CDF manuscript: [2]http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200 another package to use might be this. But I have not used it myself. RankAggreg: [3]http://www.biomedcentral.com/1471-2105/10/62 Generally using rank-based analysis can lead to significant results that have very small effect sizes (fold-change). So you should use fold change to filter the results to some extent as well. Best, Pekka 2014-04-30 11:36 GMT+02:00 Stefanie Busch <stefanie.busch2 at="" web.de="">: > > Hello, > > I have two questions and I hope you can help me. > > I want to compare several studies with similar design but different arrays. > The first step was to quantile normalize all data which works well beside > the two color experiment with an Agilent chip. I read the limma User Guide > and find out that I must preprocess with the function > normalizeBetweenArrays. So I get M- and A-values and my question is which > one shows the expression values for this experiment? > > For comparing the results of the different studies I want to use the > package: RankProd. For a better comparision between the studies I used the > Entrez IDs and I download the last chip information directly from affymerix > and illumina. So this reveal a new problem. For example on the chip > Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at stands for three > gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - 12095 /// 12096 /// > 12097. On the Illumina Chip each gene has a single Array ID: > Bglap-rs1 - ILMN_1233122 - 12095 > Bglap1 - ILMN_2610166 - 12096 > Bglap2 - ILMN_2944508 - 12097 > > So I don't no what I should do to compare the results of this two > experiments. When I paste the expression values of 1449880_s_at three times > with the three different entrez-IDs the ranking which was calculating with > the RankProd-Package was changed. > Example: > Chip ID Entrez-Id Control1 control 2 etc. > 1449880_s_at - 12095 - 3,855 - 4,211 ... > 1449880_s_at - 12096 - 3,855 - 4,211 ... > 1449880_s_at - 12097 - 3,855 - 4,211 ... > > The other possibility is to take the three expression Values of the illumina > chip to one value. But I don't know if the is the right way. What is the > better way? > > Kind regards > Stefanie Busch > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > [4]https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: [5]http://news.gmane.org/gmane.science.biology.informatics.conductor References 1. http://www.affymetrix.com/support/technical/byproduct.affx?produ ct=moe430-20 2. http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200 3. http://www.biomedcentral.com/1471-2105/10/62 4. https://stat.ethz.ch/mailman/listinfo/bioconductor 5. http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.0 years ago Stefanie Busch ▴ 30

Login before adding your answer.