Comparing Two Affymetrix Arrays Question
3
0
Entering edit mode
@abboud-ramzi-4716
Last seen 10.3 years ago
Hello All, I am somewhat new to bioconductor, and am trying to accomplish what I believe is a simple task. I have 15 AffyMetrix gene signatures from 15 subjects, each in the form of a .cel file. The subjects are grouped into 3 groups of 5 - let's say Group A, Group B, and Wild Type. I would like to compare the average gene expression of subjects in Group A to those in Group B, and separately compare the average gene expression in Group A to Wild Type. In the results I would like the genes most significantly different in expression levels and the p-values for each gene comparison. I have code which I believe does this, but the results do not seem totally correct, and I would like some help. I have two very similar R scripts (to keep things separate and simple). One compares Group A to Group B. The other compares Group A to Wild Type. The results from comparing Group A to Wild Type look correct. However, the results from comparing Group A to Group B give an adjusted p value of 0.999992827573282 for every single gene. Here are the top two lines from the output file (columns are ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) : 42970 1458675_at -0.402322454 4.770279273 -4.522575441 0.000900789 0.999992828 -3.268563539 23121 1438815_at 0.437319401 7.866319701 4.013307606 0.002098968 0.999992828 -3.417357338 Obviously something is not right. All the other numbers from the Group A vs Group B comparison look reasonable, but this adjusted p value is making me doubt the whole thing. Does someone see a glaring and obvious mistake in my code (which is included below)? Is there a better or simpler way to do comparison? Please let me know if I can provide any additional information. I would be happy to provide the excel Thank you, Ramzi The following code compares Group A with Group B. It is my R Script, with notes. ## Load Packages library(affy) # Affymetrix pre-processing library(limma) # two-color pre-processing; differential expression library(Biobase) ## Read targets file. pd <- read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names= 1,as.is=TRUE) ## Read .CEL data. rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) ## Normalize the data eset <- rma(rawAffyData) ## The target file information can be recovered from the eset. targets <- pData(eset) ## Define a design matrix. #designMatrix <- createDesignMatrix(eset) designMatrix <- model.matrix(~ -1 + factor(targets$Target,levels=unique(targets$Target))) colnames(designMatrix) <- unique(targets$Target) numParameters <- ncol(designMatrix) parameterNames <- colnames(designMatrix) ## Define a contrasts matrix. #contrastsMatrix <- createContrastMatrix(eset, design=designMatrix contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_")) contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix)) rownames(contrastsMatrix) <- parameterNames colnames(contrastsMatrix) <- contrastNames ## Fit to linear model. fit <- lmFit(eset,design=designMatrix) ## Empirical Bayes statistics fitNoContrastsMatrix <- eBayes(fit) ## Fit a linear model for the contrasts. fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) ## Empirical Bayes statistics fit2 <- eBayes(fit2) numGenes <- nrow(eset) completeTable_A_vs_B <- topTable(fit2,number=numGenes) write.table(completeTable_A_vs_B ,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA)
• 1.6k views
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 10 days ago
United States
Hi Ramzi, There's nothing "wrong" with your results - there is simply not enough statistical evidence for differential expression between Group A and Group B. Is there a reason you are not following the limmaUserGuide() section 8.6, which goes through how to fit one model on 3 groups and pull out the pairwise comparisons you want? You'll get more power to detect differences between Groups A and B by fitting one model with all 3 groups instead of fitting separate models. Best, Jenny At 03:30 PM 6/23/2011, Abboud, Ramzi wrote: >Hello All, > >I am somewhat new to bioconductor, and am trying to accomplish what >I believe is a simple task. I have 15 AffyMetrix gene signatures >from 15 subjects, each in the form of a .cel file. The subjects are >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild >Type. I would like to compare the average gene expression of >subjects in Group A to those in Group B, and separately compare the >average gene expression in Group A to Wild Type. In the results I >would like the genes most significantly different in expression >levels and the p-values for each gene comparison. > >I have code which I believe does this, but the results do not seem >totally correct, and I would like some help. > >I have two very similar R scripts (to keep things separate and >simple). One compares Group A to Group B. The other compares Group >A to Wild Type. The results from comparing Group A to Wild Type >look correct. However, the results from comparing Group A to Group >B give an adjusted p value of 0.999992827573282 for every single >gene. Here are the top two lines from the output file (columns are >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) : > >42970 1458675_at -0.402322454 4.770279273 -4.522575441 > 0.000900789 0.999992828 -3.268563539 >23121 1438815_at 0.437319401 7.866319701 4.013307606 > 0.002098968 0.999992828 -3.417357338 > >Obviously something is not right. All the other numbers from the >Group A vs Group B comparison look reasonable, but this adjusted p >value is making me doubt the whole thing. > >Does someone see a glaring and obvious mistake in my code (which is >included below)? Is there a better or simpler way to do comparison? > >Please let me know if I can provide any additional information. I >would be happy to provide the excel > >Thank you, >Ramzi > >The following code compares Group A with Group B. It is my R >Script, with notes. > >## Load Packages >library(affy) # Affymetrix pre-processing >library(limma) # two-color pre-processing; differential expression >library(Biobase) > >## Read targets file. >pd <- >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,as. is=TRUE) > >## Read .CEL data. >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > >## Normalize the data >eset <- rma(rawAffyData) > >## The target file information can be recovered from the eset. >targets <- pData(eset) > >## Define a design matrix. >#designMatrix <- createDesignMatrix(eset) >designMatrix <- model.matrix(~ -1 + >factor(targets$Target,levels=unique(targets$Target))) >colnames(designMatrix) <- unique(targets$Target) >numParameters <- ncol(designMatrix) >parameterNames <- colnames(designMatrix) > >## Define a contrasts matrix. >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_")) >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix)) >rownames(contrastsMatrix) <- parameterNames >colnames(contrastsMatrix) <- contrastNames > >## Fit to linear model. >fit <- lmFit(eset,design=designMatrix) > >## Empirical Bayes statistics >fitNoContrastsMatrix <- eBayes(fit) > >## Fit a linear model for the contrasts. >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > >## Empirical Bayes statistics >fit2 <- eBayes(fit2) > >numGenes <- nrow(eset) >completeTable_A_vs_B <- topTable(fit2,number=numGenes) >write.table(completeTable_A_vs_B >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA) > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Jenny et al., Thanks again for the help so far. I did some reading about FDR and it makes perfect sense. If you are considering 45,000 genes it is much more likely you will see a difference between the two data sets, and this needs to be corrected for. Following your advice, I reverted my code back to fitting one model on all 3 groups in order to get more power. I then pull out the pairwise comparisons that I want and write them out to files. The comparisons again are Group A vs Group B, Group A vs Wild Type, and Group B vs Wild Type. Below is my code, does it look correct? There are three main things I'd like to confirm are correct. 1 - The statistics calls. 2 - The contrasts matrix. 3 - Pulling out the pairwise comparisons at the end. The adjusted p-value is still the same (.999_) in this 3 group model. So is the final conclusion that Group A and Group B are too similar to show significant difference with this sample size / study power? And if I consider the unadjusted p-value, is the main concern a high false positive rate? Thank you, Ramzi My Code: ## Load Packages library(affy) # Affymetrix pre-processing library(limma) # two-color pre-processing; differential expression library(Biobase) ## Read targets file. pd <- read.AnnotatedDataFrame("Targets.txt",header=TRUE,row.names=1,as .is=TRUE) ## Read .CEL data. rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) ## Normalize the data eset <- rma(rawAffyData) ## The target file information can be recovered from the eset. targets <- pData(eset) ## Define a design matrix. designMatrix <- model.matrix(~ -1 + factor(targets$Target,levels=unique(targets$Target))) colnames(designMatrix) <- unique(targets$Target) numParameters <- ncol(designMatrix) parameterNames <- colnames(designMatrix) ## Define a contrasts matrix. ## Three comparisons: ## 1. Col vs Exp ## 2. Col vs WT ## 3. Exp vs WT contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"), paste(parameterNames[1],parameterNames[3],sep="_vs_"), paste(parameterNames[2],parameterNames[3],sep="_vs_")) contrastsMatrix <- matrix(c(1,-1,0,1,0,-1,0,1,-1),nrow=ncol(designMatrix)) rownames(contrastsMatrix) <- parameterNames colnames(contrastsMatrix) <- contrastNames # This is my contrasts matrix. Collapsed represents Group A, Expanding represents Group B, and WildType is. . . Wild Type. contrastsMatrix Collapsed_vs_Expanding Collapsed_vs_WildType Expanding_vs_WildType Collapsed 1 1 0 Expanding -1 0 1 WildType 0 -1 -1 ## Fit to linear model. fit <- lmFit(eset,design=designMatrix) ## Fit a linear model for the contrasts. fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) ## Empirical Bayes statistics fit2 <- eBayes(fit2) numGenes <- nrow(eset) completeTable_Col_vs_Exp <- topTable(fit2,coef="Collapsed_vs_Expanding",number=numGenes) write.table(completeTable_Col_vs_Exp,file="Col_vs_Exp_genes.xls",sep=" \t",quote=FALSE,col.names=NA) completeTable_Col_vs_WT <- topTable(fit2,coef="Collapsed_vs_WildType",number=numGenes) write.table(completeTable_Col_vs_WT,file="Col_vs_WT_genes.xls",sep="\t ",quote=FALSE,col.names=NA) completeTable_Exp_vs_WT <- topTable(fit2,coef="Expanding_vs_WildType",number=numGenes) write.table(completeTable_Exp_vs_WT,file="Exp_vs_WT_genes.xls",sep="\t ",quote=FALSE,col.names=NA) ________________________________________ From: Jenny Drnevich [drnevich@illinois.edu] Sent: Thursday, June 23, 2011 4:38 PM To: Abboud, Ramzi; bioconductor at r-project.org Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question Hi Ramzi, There's nothing "wrong" with your results - there is simply not enough statistical evidence for differential expression between Group A and Group B. Is there a reason you are not following the limmaUserGuide() section 8.6, which goes through how to fit one model on 3 groups and pull out the pairwise comparisons you want? You'll get more power to detect differences between Groups A and B by fitting one model with all 3 groups instead of fitting separate models. Best, Jenny At 03:30 PM 6/23/2011, Abboud, Ramzi wrote: >Hello All, > >I am somewhat new to bioconductor, and am trying to accomplish what >I believe is a simple task. I have 15 AffyMetrix gene signatures >from 15 subjects, each in the form of a .cel file. The subjects are >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild >Type. I would like to compare the average gene expression of >subjects in Group A to those in Group B, and separately compare the >average gene expression in Group A to Wild Type. In the results I >would like the genes most significantly different in expression >levels and the p-values for each gene comparison. > >I have code which I believe does this, but the results do not seem >totally correct, and I would like some help. > >I have two very similar R scripts (to keep things separate and >simple). One compares Group A to Group B. The other compares Group >A to Wild Type. The results from comparing Group A to Wild Type >look correct. However, the results from comparing Group A to Group >B give an adjusted p value of 0.999992827573282 for every single >gene. Here are the top two lines from the output file (columns are >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) : > >42970 1458675_at -0.402322454 4.770279273 -4.522575441 > 0.000900789 0.999992828 -3.268563539 >23121 1438815_at 0.437319401 7.866319701 4.013307606 > 0.002098968 0.999992828 -3.417357338 > >Obviously something is not right. All the other numbers from the >Group A vs Group B comparison look reasonable, but this adjusted p >value is making me doubt the whole thing. > >Does someone see a glaring and obvious mistake in my code (which is >included below)? Is there a better or simpler way to do comparison? > >Please let me know if I can provide any additional information. I >would be happy to provide the excel > >Thank you, >Ramzi > >The following code compares Group A with Group B. It is my R >Script, with notes. > >## Load Packages >library(affy) # Affymetrix pre-processing >library(limma) # two-color pre-processing; differential expression >library(Biobase) > >## Read targets file. >pd <- >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,as. is=TRUE) > >## Read .CEL data. >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > >## Normalize the data >eset <- rma(rawAffyData) > >## The target file information can be recovered from the eset. >targets <- pData(eset) > >## Define a design matrix. >#designMatrix <- createDesignMatrix(eset) >designMatrix <- model.matrix(~ -1 + >factor(targets$Target,levels=unique(targets$Target))) >colnames(designMatrix) <- unique(targets$Target) >numParameters <- ncol(designMatrix) >parameterNames <- colnames(designMatrix) > >## Define a contrasts matrix. >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_")) >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix)) >rownames(contrastsMatrix) <- parameterNames >colnames(contrastsMatrix) <- contrastNames > >## Fit to linear model. >fit <- lmFit(eset,design=designMatrix) > >## Empirical Bayes statistics >fitNoContrastsMatrix <- eBayes(fit) > >## Fit a linear model for the contrasts. >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > >## Empirical Bayes statistics >fit2 <- eBayes(fit2) > >numGenes <- nrow(eset) >completeTable_A_vs_B <- topTable(fit2,number=numGenes) >write.table(completeTable_A_vs_B >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA) > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 10 days ago
United States
Hi Ramzi, Please keep all replies on the BioC list. Yes, it possible to have every single gene have the same adjusted p-value. You should look into how the False Discovery Rate correction works to see why this is the case. In short, it's because 0.999992828 was the lowest p-value of all genes after the first adjustment, but it was at the end of the ranked list, so all the genes above it also got this p-value. Jenny At 03:46 PM 6/23/2011, Abboud, Ramzi wrote: >Jenny, > >Thank you so much for the fast reply. I had used all 3 groups >originally, but simplified it because I was confused about this >p-value thing. So you think it is possible that the adjusted >p-value for every single gene in the data set would have the same >value? Is that 0.999992828 value simply the highest that is >returned by the function? > >I will revert to the 3 group model and repost once I have it working. > >Thanks, >Ramzi >________________________________________ >From: Jenny Drnevich [drnevich at illinois.edu] >Sent: Thursday, June 23, 2011 4:38 PM >To: Abboud, Ramzi; bioconductor at r-project.org >Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question > >Hi Ramzi, > >There's nothing "wrong" with your results - there is simply not >enough statistical evidence for differential expression between Group >A and Group B. Is there a reason you are not following the >limmaUserGuide() section 8.6, which goes through how to fit one model >on 3 groups and pull out the pairwise comparisons you want? You'll >get more power to detect differences between Groups A and B by >fitting one model with all 3 groups instead of fitting separate models. > >Best, >Jenny > >At 03:30 PM 6/23/2011, Abboud, Ramzi wrote: > >Hello All, > > > >I am somewhat new to bioconductor, and am trying to accomplish what > >I believe is a simple task. I have 15 AffyMetrix gene signatures > >from 15 subjects, each in the form of a .cel file. The subjects are > >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild > >Type. I would like to compare the average gene expression of > >subjects in Group A to those in Group B, and separately compare the > >average gene expression in Group A to Wild Type. In the results I > >would like the genes most significantly different in expression > >levels and the p-values for each gene comparison. > > > >I have code which I believe does this, but the results do not seem > >totally correct, and I would like some help. > > > >I have two very similar R scripts (to keep things separate and > >simple). One compares Group A to Group B. The other compares Group > >A to Wild Type. The results from comparing Group A to Wild Type > >look correct. However, the results from comparing Group A to Group > >B give an adjusted p value of 0.999992827573282 for every single > >gene. Here are the top two lines from the output file (columns are > >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) : > > > >42970 1458675_at -0.402322454 4.770279273 -4.522575441 > > 0.000900789 0.999992828 -3.268563539 > >23121 1438815_at 0.437319401 7.866319701 4.013307606 > > 0.002098968 0.999992828 -3.417357338 > > > >Obviously something is not right. All the other numbers from the > >Group A vs Group B comparison look reasonable, but this adjusted p > >value is making me doubt the whole thing. > > > >Does someone see a glaring and obvious mistake in my code (which is > >included below)? Is there a better or simpler way to do comparison? > > > >Please let me know if I can provide any additional information. I > >would be happy to provide the excel > > > >Thank you, > >Ramzi > > > >The following code compares Group A with Group B. It is my R > >Script, with notes. > > > >## Load Packages > >library(affy) # Affymetrix pre-processing > >library(limma) # two-color pre-processing; differential expression > >library(Biobase) > > > >## Read targets file. > >pd <- > >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,a > s.is=TRUE) > > > >## Read .CEL data. > >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > > > >## Normalize the data > >eset <- rma(rawAffyData) > > > >## The target file information can be recovered from the eset. > >targets <- pData(eset) > > > >## Define a design matrix. > >#designMatrix <- createDesignMatrix(eset) > >designMatrix <- model.matrix(~ -1 + > >factor(targets$Target,levels=unique(targets$Target))) > >colnames(designMatrix) <- unique(targets$Target) > >numParameters <- ncol(designMatrix) > >parameterNames <- colnames(designMatrix) > > > >## Define a contrasts matrix. > >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix > >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_")) > >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix)) > >rownames(contrastsMatrix) <- parameterNames > >colnames(contrastsMatrix) <- contrastNames > > > >## Fit to linear model. > >fit <- lmFit(eset,design=designMatrix) > > > >## Empirical Bayes statistics > >fitNoContrastsMatrix <- eBayes(fit) > > > >## Fit a linear model for the contrasts. > >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > > > >## Empirical Bayes statistics > >fit2 <- eBayes(fit2) > > > >numGenes <- nrow(eset) > >completeTable_A_vs_B <- topTable(fit2,number=numGenes) > >write.table(completeTable_A_vs_B > >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA) > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 10 days ago
United States
Hi Ramzi, As far as I can tell, your code looks correct. So yes, group A and group B are too similar to show significant differences after correction for multiple testing. That's not the same as saying there are no genes that are differentially expressed between the two, just that you can't pick them out of the expected false positives. You could also see this visually by doing some sort of hierarchical clustering and/or PCA clustering on your samples - Wild type sample would probably cluster together, but the Group A and Group B would be intermingled. You can still rank your genes in the A vs. B comparison and looks among the top genes to look for interesting candidates - you could do qPCR on additional samples for these potential candidates to confirm them. You could also do something like GSEA that uses the entire ranked list of genes to see if any pathways or GO terms are over-represented at the top of the list; if there were truly no significant differences, then the placement of genes should be random in your lists. HTH, Jenny At 08:44 AM 6/24/2011, Abboud, Ramzi wrote: >Jenny et al., > >Thanks again for the help so far. > >I did some reading about FDR and it makes perfect sense. If you are >considering 45,000 genes it is much more likely you will see a >difference between the two data sets, and this needs to be corrected for. > >Following your advice, I reverted my code back to fitting one model >on all 3 groups in order to get more power. I then pull out the >pairwise comparisons that I want and write them out to files. The >comparisons again are Group A vs Group B, Group A vs Wild Type, and >Group B vs Wild Type. Below is my code, does it look correct? There >are three main things I'd like to confirm are correct. 1 - The >statistics calls. 2 - The contrasts matrix. 3 - Pulling out the >pairwise comparisons at the end. > >The adjusted p-value is still the same (.999_) in this 3 group >model. So is the final conclusion that Group A and Group B are too >similar to show significant difference with this sample size / study >power? And if I consider the unadjusted p-value, is the main >concern a high false positive rate? > >Thank you, >Ramzi > >My Code: > >## Load Packages >library(affy) # Affymetrix pre-processing >library(limma) # two-color pre-processing; differential expression >library(Biobase) > >## Read targets file. >pd <- >read.AnnotatedDataFrame("Targets.txt",header=TRUE,row.names=1,as.is=T RUE) > >## Read .CEL data. >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > >## Normalize the data >eset <- rma(rawAffyData) > >## The target file information can be recovered from the eset. >targets <- pData(eset) > >## Define a design matrix. >designMatrix <- model.matrix(~ -1 + >factor(targets$Target,levels=unique(targets$Target))) >colnames(designMatrix) <- unique(targets$Target) >numParameters <- ncol(designMatrix) >parameterNames <- colnames(designMatrix) > >## Define a contrasts matrix. >## Three comparisons: >## 1. Col vs Exp >## 2. Col vs WT >## 3. Exp vs WT >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"), > paste(parameterNames[1],parameterNames[3],sep="_vs_"), > paste(parameterNames[2],parameterNames[3],sep="_vs_")) >contrastsMatrix <- matrix(c(1,-1,0,1,0,-1,0,1,-1),nrow=ncol(designMatrix)) >rownames(contrastsMatrix) <- parameterNames >colnames(contrastsMatrix) <- contrastNames > ># This is my contrasts matrix. Collapsed represents Group A, >Expanding represents Group B, and WildType is. . . Wild Type. >contrastsMatrix > Collapsed_vs_Expanding Collapsed_vs_WildType Expanding_vs_WildType >Collapsed 1 1 0 >Expanding -1 0 1 >WildType 0 -1 -1 > > >## Fit to linear model. >fit <- lmFit(eset,design=designMatrix) > >## Fit a linear model for the contrasts. >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > >## Empirical Bayes statistics >fit2 <- eBayes(fit2) > >numGenes <- nrow(eset) >completeTable_Col_vs_Exp <- >topTable(fit2,coef="Collapsed_vs_Expanding",number=numGenes) >write.table(completeTable_Col_vs_Exp,file="Col_vs_Exp_genes.xls",sep= "\t",quote=FALSE,col.names=NA) >completeTable_Col_vs_WT <- >topTable(fit2,coef="Collapsed_vs_WildType",number=numGenes) >write.table(completeTable_Col_vs_WT,file="Col_vs_WT_genes.xls",sep="\ t",quote=FALSE,col.names=NA) >completeTable_Exp_vs_WT <- >topTable(fit2,coef="Expanding_vs_WildType",number=numGenes) >write.table(completeTable_Exp_vs_WT,file="Exp_vs_WT_genes.xls",sep="\ t",quote=FALSE,col.names=NA) > > >________________________________________ >From: Jenny Drnevich [drnevich at illinois.edu] >Sent: Thursday, June 23, 2011 4:38 PM >To: Abboud, Ramzi; bioconductor at r-project.org >Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question > >Hi Ramzi, > >There's nothing "wrong" with your results - there is simply not >enough statistical evidence for differential expression between Group >A and Group B. Is there a reason you are not following the >limmaUserGuide() section 8.6, which goes through how to fit one model >on 3 groups and pull out the pairwise comparisons you want? You'll >get more power to detect differences between Groups A and B by >fitting one model with all 3 groups instead of fitting separate models. > >Best, >Jenny > >At 03:30 PM 6/23/2011, Abboud, Ramzi wrote: > >Hello All, > > > >I am somewhat new to bioconductor, and am trying to accomplish what > >I believe is a simple task. I have 15 AffyMetrix gene signatures > >from 15 subjects, each in the form of a .cel file. The subjects are > >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild > >Type. I would like to compare the average gene expression of > >subjects in Group A to those in Group B, and separately compare the > >average gene expression in Group A to Wild Type. In the results I > >would like the genes most significantly different in expression > >levels and the p-values for each gene comparison. > > > >I have code which I believe does this, but the results do not seem > >totally correct, and I would like some help. > > > >I have two very similar R scripts (to keep things separate and > >simple). One compares Group A to Group B. The other compares Group > >A to Wild Type. The results from comparing Group A to Wild Type > >look correct. However, the results from comparing Group A to Group > >B give an adjusted p value of 0.999992827573282 for every single > >gene. Here are the top two lines from the output file (columns are > >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) : > > > >42970 1458675_at -0.402322454 4.770279273 -4.522575441 > > 0.000900789 0.999992828 -3.268563539 > >23121 1438815_at 0.437319401 7.866319701 4.013307606 > > 0.002098968 0.999992828 -3.417357338 > > > >Obviously something is not right. All the other numbers from the > >Group A vs Group B comparison look reasonable, but this adjusted p > >value is making me doubt the whole thing. > > > >Does someone see a glaring and obvious mistake in my code (which is > >included below)? Is there a better or simpler way to do comparison? > > > >Please let me know if I can provide any additional information. I > >would be happy to provide the excel > > > >Thank you, > >Ramzi > > > >The following code compares Group A with Group B. It is my R > >Script, with notes. > > > >## Load Packages > >library(affy) # Affymetrix pre-processing > >library(limma) # two-color pre-processing; differential expression > >library(Biobase) > > > >## Read targets file. > >pd <- > >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,a > s.is=TRUE) > > > >## Read .CEL data. > >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > > > >## Normalize the data > >eset <- rma(rawAffyData) > > > >## The target file information can be recovered from the eset. > >targets <- pData(eset) > > > >## Define a design matrix. > >#designMatrix <- createDesignMatrix(eset) > >designMatrix <- model.matrix(~ -1 + > >factor(targets$Target,levels=unique(targets$Target))) > >colnames(designMatrix) <- unique(targets$Target) > >numParameters <- ncol(designMatrix) > >parameterNames <- colnames(designMatrix) > > > >## Define a contrasts matrix. > >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix > >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_")) > >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix)) > >rownames(contrastsMatrix) <- parameterNames > >colnames(contrastsMatrix) <- contrastNames > > > >## Fit to linear model. > >fit <- lmFit(eset,design=designMatrix) > > > >## Empirical Bayes statistics > >fitNoContrastsMatrix <- eBayes(fit) > > > >## Fit a linear model for the contrasts. > >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > > > >## Empirical Bayes statistics > >fit2 <- eBayes(fit2) > > > >numGenes <- nrow(eset) > >completeTable_A_vs_B <- topTable(fit2,number=numGenes) > >write.table(completeTable_A_vs_B > >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA) > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Jenny, Thank you for the reply. I just wanted to post to round things out. I have loaded my data into GSEA and am seeing what I can learn in that environment. It is relatively simple to use, the difficulty is in interpreting the results. Thanks to all for the help. Ramzi ________________________________________ From: Jenny Drnevich [drnevich@illinois.edu] Sent: Wednesday, June 29, 2011 9:37 AM To: Abboud, Ramzi; bioconductor at r-project.org Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question Hi Ramzi, As far as I can tell, your code looks correct. So yes, group A and group B are too similar to show significant differences after correction for multiple testing. That's not the same as saying there are no genes that are differentially expressed between the two, just that you can't pick them out of the expected false positives. You could also see this visually by doing some sort of hierarchical clustering and/or PCA clustering on your samples - Wild type sample would probably cluster together, but the Group A and Group B would be intermingled. You can still rank your genes in the A vs. B comparison and looks among the top genes to look for interesting candidates - you could do qPCR on additional samples for these potential candidates to confirm them. You could also do something like GSEA that uses the entire ranked list of genes to see if any pathways or GO terms are over-represented at the top of the list; if there were truly no significant differences, then the placement of genes should be random in your lists. HTH, Jenny At 08:44 AM 6/24/2011, Abboud, Ramzi wrote: >Jenny et al., > >Thanks again for the help so far. > >I did some reading about FDR and it makes perfect sense. If you are >considering 45,000 genes it is much more likely you will see a >difference between the two data sets, and this needs to be corrected for. > >Following your advice, I reverted my code back to fitting one model >on all 3 groups in order to get more power. I then pull out the >pairwise comparisons that I want and write them out to files. The >comparisons again are Group A vs Group B, Group A vs Wild Type, and >Group B vs Wild Type. Below is my code, does it look correct? There >are three main things I'd like to confirm are correct. 1 - The >statistics calls. 2 - The contrasts matrix. 3 - Pulling out the >pairwise comparisons at the end. > >The adjusted p-value is still the same (.999_) in this 3 group >model. So is the final conclusion that Group A and Group B are too >similar to show significant difference with this sample size / study >power? And if I consider the unadjusted p-value, is the main >concern a high false positive rate? > >Thank you, >Ramzi > >My Code: > >## Load Packages >library(affy) # Affymetrix pre-processing >library(limma) # two-color pre-processing; differential expression >library(Biobase) > >## Read targets file. >pd <- >read.AnnotatedDataFrame("Targets.txt",header=TRUE,row.names=1,as.is=T RUE) > >## Read .CEL data. >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > >## Normalize the data >eset <- rma(rawAffyData) > >## The target file information can be recovered from the eset. >targets <- pData(eset) > >## Define a design matrix. >designMatrix <- model.matrix(~ -1 + >factor(targets$Target,levels=unique(targets$Target))) >colnames(designMatrix) <- unique(targets$Target) >numParameters <- ncol(designMatrix) >parameterNames <- colnames(designMatrix) > >## Define a contrasts matrix. >## Three comparisons: >## 1. Col vs Exp >## 2. Col vs WT >## 3. Exp vs WT >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"), > paste(parameterNames[1],parameterNames[3],sep="_vs_"), > paste(parameterNames[2],parameterNames[3],sep="_vs_")) >contrastsMatrix <- matrix(c(1,-1,0,1,0,-1,0,1,-1),nrow=ncol(designMatrix)) >rownames(contrastsMatrix) <- parameterNames >colnames(contrastsMatrix) <- contrastNames > ># This is my contrasts matrix. Collapsed represents Group A, >Expanding represents Group B, and WildType is. . . Wild Type. >contrastsMatrix > Collapsed_vs_Expanding Collapsed_vs_WildType Expanding_vs_WildType >Collapsed 1 1 0 >Expanding -1 0 1 >WildType 0 -1 -1 > > >## Fit to linear model. >fit <- lmFit(eset,design=designMatrix) > >## Fit a linear model for the contrasts. >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > >## Empirical Bayes statistics >fit2 <- eBayes(fit2) > >numGenes <- nrow(eset) >completeTable_Col_vs_Exp <- >topTable(fit2,coef="Collapsed_vs_Expanding",number=numGenes) >write.table(completeTable_Col_vs_Exp,file="Col_vs_Exp_genes.xls",sep= "\t",quote=FALSE,col.names=NA) >completeTable_Col_vs_WT <- >topTable(fit2,coef="Collapsed_vs_WildType",number=numGenes) >write.table(completeTable_Col_vs_WT,file="Col_vs_WT_genes.xls",sep="\ t",quote=FALSE,col.names=NA) >completeTable_Exp_vs_WT <- >topTable(fit2,coef="Expanding_vs_WildType",number=numGenes) >write.table(completeTable_Exp_vs_WT,file="Exp_vs_WT_genes.xls",sep="\ t",quote=FALSE,col.names=NA) > > >________________________________________ >From: Jenny Drnevich [drnevich at illinois.edu] >Sent: Thursday, June 23, 2011 4:38 PM >To: Abboud, Ramzi; bioconductor at r-project.org >Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question > >Hi Ramzi, > >There's nothing "wrong" with your results - there is simply not >enough statistical evidence for differential expression between Group >A and Group B. Is there a reason you are not following the >limmaUserGuide() section 8.6, which goes through how to fit one model >on 3 groups and pull out the pairwise comparisons you want? You'll >get more power to detect differences between Groups A and B by >fitting one model with all 3 groups instead of fitting separate models. > >Best, >Jenny > >At 03:30 PM 6/23/2011, Abboud, Ramzi wrote: > >Hello All, > > > >I am somewhat new to bioconductor, and am trying to accomplish what > >I believe is a simple task. I have 15 AffyMetrix gene signatures > >from 15 subjects, each in the form of a .cel file. The subjects are > >grouped into 3 groups of 5 - let's say Group A, Group B, and Wild > >Type. I would like to compare the average gene expression of > >subjects in Group A to those in Group B, and separately compare the > >average gene expression in Group A to Wild Type. In the results I > >would like the genes most significantly different in expression > >levels and the p-values for each gene comparison. > > > >I have code which I believe does this, but the results do not seem > >totally correct, and I would like some help. > > > >I have two very similar R scripts (to keep things separate and > >simple). One compares Group A to Group B. The other compares Group > >A to Wild Type. The results from comparing Group A to Wild Type > >look correct. However, the results from comparing Group A to Group > >B give an adjusted p value of 0.999992827573282 for every single > >gene. Here are the top two lines from the output file (columns are > >ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) : > > > >42970 1458675_at -0.402322454 4.770279273 -4.522575441 > > 0.000900789 0.999992828 -3.268563539 > >23121 1438815_at 0.437319401 7.866319701 4.013307606 > > 0.002098968 0.999992828 -3.417357338 > > > >Obviously something is not right. All the other numbers from the > >Group A vs Group B comparison look reasonable, but this adjusted p > >value is making me doubt the whole thing. > > > >Does someone see a glaring and obvious mistake in my code (which is > >included below)? Is there a better or simpler way to do comparison? > > > >Please let me know if I can provide any additional information. I > >would be happy to provide the excel > > > >Thank you, > >Ramzi > > > >The following code compares Group A with Group B. It is my R > >Script, with notes. > > > >## Load Packages > >library(affy) # Affymetrix pre-processing > >library(limma) # two-color pre-processing; differential expression > >library(Biobase) > > > >## Read targets file. > >pd <- > >read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,a > s.is=TRUE) > > > >## Read .CEL data. > >rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > > > >## Normalize the data > >eset <- rma(rawAffyData) > > > >## The target file information can be recovered from the eset. > >targets <- pData(eset) > > > >## Define a design matrix. > >#designMatrix <- createDesignMatrix(eset) > >designMatrix <- model.matrix(~ -1 + > >factor(targets$Target,levels=unique(targets$Target))) > >colnames(designMatrix) <- unique(targets$Target) > >numParameters <- ncol(designMatrix) > >parameterNames <- colnames(designMatrix) > > > >## Define a contrasts matrix. > >#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix > >contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_")) > >contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix)) > >rownames(contrastsMatrix) <- parameterNames > >colnames(contrastsMatrix) <- contrastNames > > > >## Fit to linear model. > >fit <- lmFit(eset,design=designMatrix) > > > >## Empirical Bayes statistics > >fitNoContrastsMatrix <- eBayes(fit) > > > >## Fit a linear model for the contrasts. > >fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix) > > > >## Empirical Bayes statistics > >fit2 <- eBayes(fit2) > > > >numGenes <- nrow(eset) > >completeTable_A_vs_B <- topTable(fit2,number=numGenes) > >write.table(completeTable_A_vs_B > >,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA) > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 393 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6