1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization
1
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Stephanie, > Date: 30 April 2014 > From: Pekka Kohonen <pkpekka at="" gmail.com=""> > From: Stefanie Busch <stefanie.busch2 at="" web.de=""> > To: Bioconductor <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization > > Hello, > > I have two questions and I hope you can help me. > > I want to compare several studies with similar design but different > arrays. The first step was to quantile normalize all data which works > well beside the two color experiment with an Agilent chip. As you seem to have realized already, quantile normalization is not usually appropriate for a two colour Agilent array. Loess normalization is generally for two colour arrays, and I recommend a normexp background correction step before that. > I read the limma User Guide and find out that I must preprocess with the > function normalizeBetweenArrays. So I get M- and A-values and my > question is which one shows the expression values for this experiment? Two colour arrays don't return expression values. Instead they return log-ratios, which are stored in M. When you compare Agilent to Affymetrix Chips and Illumina Beadarrays, you need to compare log-fold-changes and DE results, not expression values. > For comparing the results of the different studies I want to use the > package: RankProd. As far as I know, RankProd assesses differential expression and does not in itself help you compare one study to another. The usual methods to compare one study to another are (i) to make a scatterplot of logFC from the two experiments or (ii) to use a gene set test such as roast() in the limma package. The limma package can compute logFC for whatever comparison you are making. > For a better comparision between the studies I used > the Entrez IDs and I download the last chip information directly from > affymerix and illumina. So this reveal a new problem. For example on > the chip Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at > stands for three gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - > 12095 /// 12096 /// 12097. On the Illumina Chip each gene has a single > Array ID: > Bglap-rs1 - ILMN_1233122 - 12095 > Bglap1 - ILMN_2610166 - 12096 > Bglap2 - ILMN_2944508 - 12097 > > So I don't no what I should do to compare the results of this two > experiments. When I paste the expression values of 1449880_s_at three > times with the three different entrez-IDs the ranking which was > calculating with the RankProd-Package was changed. > Example: > Chip ID Entrez-Id Control1 control 2 etc. > 1449880_s_at - 12095 - 3,855 - 4,211 ... > 1449880_s_at - 12096 - 3,855 - 4,211 ... > 1449880_s_at - 12097 - 3,855 - 4,211 ... > > The other possibility is to take the three expression Values of the > illumina chip to one value. But I don't know if the is the right way. > What is the better way? For this purpose, I always recommend that, for each Entrez ID, you use the probe on each platform with the highest overall expression level. The rationale of this is that you are using the probe that represents the dominant transcript for that gene in the cell type. This method has been used for many published studies by now, the first of which may have been: http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed like this for the Agilent data, assuming you have put the EntrezIDs into the object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A) o <- order(A,decreasing=TRUE) MA2 <- MA[o,] d <- duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object with a unique probe for each EntrezID. Simply averaging the probes or probe-sets is not generally recommended, because different probes for the same gene can have quite different behaviour. A common situation is that one probe successfully probes an expressed transcript while another probe is essentially unexpressed. Best wishes Gordon > Kind regards > Stefanie Busch ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ADD COMMENT 0 Entering edit mode @stefanie-busch-6530 Last seen 7.4 years ago Dear Gordon, Thank you for your answer. I have still a few questions. 1. >Two colour arrays don't return expression values. Instead they return log-ratios, which are stored in M. When you compare Agilent to Affymetrix Chips and Illumina Beadarrays, you need to compare log-fold-changes and DE results, not expression values. What does DE results mean? And what should I do with the affymetrix Chips or Illumina Beadarray? I preprocess the affymetrix chips with rma, which already makes a log transformation? The illumina array was background corrected, than log transformed and at last quantile normalized with the package: lumi. 2. > For comparing the results of the different studies I want to use the > package: RankProd. As far as I know, RankProd assesses differential expression and does not in itself help you compare one study to another. The usual methods to compare one study to another are (i) to make a scatterplot of logFC from the two experiments or (ii) to use a gene set test such as roast() in the limma package. The limma package can compute logFC for whatever comparison you are making. I don't want to compare the studies, directly. I want to take the results of all experiments and get a list of genes which would be up- or downregulated over all studies. I think RankProd was a good choice. For this I make a big excel table which look like this. I have seven different experiments, so it is possible that Bglap is not investigated on each chip. RankProd will ignore the missing values. Experment1 Experiment2 con1 con2 con3 Diet1 diet2 diet3 con1 con2 con3 con4 con4 diet1 diet2 diet3 diet4 diet5 Bglap 2,8 2,4 2,7 3,3 3,66 3,1 5,1 6,6 6,2 6,6 6,3 5,9 6,5 6,4 5,7 6,9 Copd 5,4 7,2 5,8 4,3 5 4,9 3 2,7 4 3,5 4,2 4,3 3,5 3,9 2,5 3,1 Sirt1 7 6,5 7,2 7,3 7,1 6,7 4,5 3,7 4,2 4,6 4,1 4,2 4,5 4,8 4,5 3,9 ... cl<- 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2 origin<- 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 --> this means two different experiments My aim is to have a list of up- and downregulated genes for intervention a (7 experiments, intervention a vs. control) and a list of up- and downregulated genes for intervention b (3 experiments, intervention b vs. control) to see if there are genes which are up- or downregulated by both interventions. 3. > For this purpose, I always recommend that, for each Entrez ID, you use the probe on each platform with the highest overall expression level. Example: Example control 1 Control 2 control 3 diet1 diet2 diet3 (this are replicates for the same group) Bglap 2,5 3,2 3,1 3,9 4,8 3,1 Bglap 1 0,7 0,9 1,2 0,7 1 Bglap 4,9 3,3 4,1 4,8 5,5 5,2 So I will only take the last row? Is there a R command to filter for this rows in Affy or Illumina? 4. > The rationale of this is that you are using the probe that represents the dominant transcript for that gene in the cell type. This method has been used for many published studies by now, the first of which may have been: http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed like this for the Agilent data, assuming you have put the EntrezIDs into the object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A) o <- order(A,decreasing=TRUE) MA2 <- MA[o,] d <- duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object with a unique probe for each EntrezID. This command doesn't work with my example http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23523.WhennIfi nished all steps there won't be any value in my list. I think the problem could be that MA$EntrezID is missed. Kind regards Stefanie Gesendet: Sonntag, 04. Mai 2014 um 06:50 Uhr Von: "Gordon K Smyth" <smyth at="" wehi.edu.au=""> An: "Stefanie Busch" <stefanie.busch2 at="" web.de=""> Cc: "Bioconductor mailing list" <bioconductor at="" r-project.org=""> Betreff: 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization Dear Stephanie, > Date: 30 April 2014 > From: Pekka Kohonen > From: Stefanie Busch > To: Bioconductor > Subject: Re: [BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization > > Hello, > > I have two questions and I hope you can help me. > > I want to compare several studies with similar design but different > arrays. The first step was to quantile normalize all data which works > well beside the two color experiment with an Agilent chip. As you seem to have realized already, quantile normalization is not usually appropriate for a two colour Agilent array. Loess normalization is generally for two colour arrays, and I recommend a normexp background correction step before that. > I read the limma User Guide and find out that I must preprocess with the > function normalizeBetweenArrays. So I get M- and A-values and my > question is which one shows the expression values for this experiment? Two colour arrays don't return expression values. Instead they return log-ratios, which are stored in M. When you compare Agilent to Affymetrix Chips and Illumina Beadarrays, you need to compare log-fold-changes and DE results, not expression values. > For comparing the results of the different studies I want to use the > package: RankProd. As far as I know, RankProd assesses differential expression and does not in itself help you compare one study to another. The usual methods to compare one study to another are (i) to make a scatterplot of logFC from the two experiments or (ii) to use a gene set test such as roast() in the limma package. The limma package can compute logFC for whatever comparison you are making. > For a better comparision between the studies I used > the Entrez IDs and I download the last chip information directly from > affymerix and illumina. So this reveal a new problem. For example on > the chip Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at > stands for three gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - > 12095 /// 12096 /// 12097. On the Illumina Chip each gene has a single > Array ID: > Bglap-rs1 - ILMN_1233122 - 12095 > Bglap1 - ILMN_2610166 - 12096 > Bglap2 - ILMN_2944508 - 12097 > > So I don't no what I should do to compare the results of this two > experiments. When I paste the expression values of 1449880_s_at three > times with the three different entrez-IDs the ranking which was > calculating with the RankProd- Package was changed. > Example: > Chip ID Entrez-Id Control1 control 2 etc. > 1449880_s_at - 12095 - 3,855 - 4,211 ... > 1449880_s_at - 12096 - 3,855 - 4,211 ... > 1449880_s_at - 12097 - 3,855 - 4,211 ... > > The other possibility is to take the three expression Values of the > illumina chip to one value. But I don't know if the is the right way. > What is the better way? For this purpose, I always recommend that, for each Entrez ID, you use the probe on each platform with the highest overall expression level. The rationale of this is that you are using the probe that represents the dominant transcript for that gene in the cell type. This method has been used for many published studies by now, the first of which may have been: [1]http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed like this for the Agilent data, assuming you have put the EntrezIDs into the object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A) o <- order(A,decreasing=TRUE) MA2 <- MA[o,] d <- duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object with a unique probe for each EntrezID. Simply averaging the probes or probe-sets is not generally recommended, because different probes for the same gene can have quite different behaviour. A common situation is that one probe successfully probes an expressed transcript while another probe is essentially unexpressed. Best wishes Gordon > Kind regards > Stefanie Busch ___________________________________________________________________ ___The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________ References 1. http://www.biomedcentral.com/1471-2105/7/511