inSilicoMerging and Limma
1
0
Entering edit mode
Ed Siefker ▴ 230
@ed-siefker-5136
Last seen 6 months ago
United States
I am trying to compare find differentially expressed genes between appendix and colon tumor samples, which have been arrayed on different platforms. Namely hgu133a2 and hgu133plus2. hgu133a2 is a subset of hgu133plus2, and Bioconductor provides a package, inSilicoMerging, that's supposed to do this, so I thought it would be straight forward. First read in my CEL files and normalize them: > targets_hsu <- readTargets("Hsu-targets.txt") > targets_kai <- readTargets("kaiser-targets.txt") > ab_hsu <- ReadAffy(filenames=targets_hsu$FileName) > ab_kai <- ReadAffy(filenames=targets_kai$FileName) > eset_hsu <- gcrma(ab_hsu) > eset_kai <- gcrma(ab_kai) So far so good. Now I merge the esets with inSilicoMerging: library(inSilicoMerging) > eset <- merge(list(eset_hsu,eset_kai),method="COMBAT") INSILICOMERGING: Run COMBAT... INSILICOMERGING: => Found 2 batches INSILICOMERGING: => Found 0 covariate(s) > dim(eset_hsu) Features Samples 22277 26 > dim(eset_kai) Features Samples 54675 10 > dim(eset) Features Samples 22277 36 This looks like it worked. I used plotMDS(), and the data are nicely intermixed as one would hope. Now I need to do DE analysis with Limma. Hsu (the first 26 samples) are experimental and Kai (the last 10 samples) are control. So I create a design matrix like this: > design <- cbind(CTL=1, EXPvsCTL=c(rep(1,26),rep(0,10))) > fit <- lmFit(eset, design) > fit <- eBayes(fit) > tt<-topTable(fit, coef="EXPvsCTL",number=100000) > head(tt,n=3) logFC AveExpr t P.Value adj.P.Val B 1007_s_at 1.487227e-15 7.045851 4.223110e-15 1 1 -6.235399 1053_at 7.281216e-16 5.498793 1.127582e-14 1 1 -6.235399 117_at -8.262373e-16 5.047563 -2.068570e-15 1 1 -6.235399 > tail(tt,n=3) logFC AveExpr t P.Value adj.P.Val AFFX-TrpnX-3_at -3.953675e-16 2.257998 -1.507504e-13 1 1 AFFX-TrpnX-5_at 1.024821e-16 2.257637 4.062598e-14 1 1 AFFX-TrpnX-M_at 1.024821e-16 2.257637 4.062598e-14 1 1 B AFFX-TrpnX-3_at -6.235399 AFFX-TrpnX-5_at -6.235399 AFFX-TrpnX-M_at -6.235399 As you can see, the p values and B statistics are the same for every probe. Clearly something is wrong here. Did I do something wrong? Is this sort of thing expected when you merge datasets like this? Any nudges in the right direction would be appreciated. -Ed [[alternative HTML version deleted]]
0
Entering edit mode
Ed Siefker ▴ 230
@ed-siefker-5136
Last seen 6 months ago
United States
I think the problem is with lmFit(). The merged eset definitely contain different data for each samples, since the MDS plot shows data points scattered around the origin. But after feeding this eset to lmFit(), here's what I get: ***************************************************** > fit An object of class "MArrayLM" $coefficients CTL EXPvsCTL 1007_s_at 7.045851 1.487227e-15 1053_at 5.498793 7.281216e-16 117_at 5.047563 -8.262373e-16 121_at 4.350444 4.131187e-16 1255_g_at 2.257968 -2.780886e-16 22272 more rows ...$rank [1] 2 $assign NULL$qr $qr CTL EXPvsCTL [1,] -6.0000000 -4.33333333 [2,] 0.1666667 -2.68741925 [3,] 0.1666667 0.08859624 [4,] 0.1666667 0.08859624 [5,] 0.1666667 0.08859624 31 more rows ...$qraux [1] 1.166667 1.088596 $pivot [1] 1 2$tol [1] 1e-07 $rank [1] 2$df.residual [1] 34 34 34 34 34 22272 more elements ... $sigma 1007_s_at 1053_at 117_at 121_at 1255_g_at 0.954188915 0.174833541 1.082247491 0.346694394 0.001783684 22272 more elements ...$cov.coefficients CTL EXPvsCTL CTL 0.1 -0.1000000 EXPvsCTL -0.1 0.1384615 $stdev.unscaled CTL EXPvsCTL 1007_s_at 0.3162278 0.3721042 1053_at 0.3162278 0.3721042 117_at 0.3162278 0.3721042 121_at 0.3162278 0.3721042 1255_g_at 0.3162278 0.3721042 22272 more rows ...$pivot [1] 1 2 $Amean 1007_s_at 1053_at 117_at 121_at 1255_g_at 7.045851 5.498793 5.047563 4.350444 2.257968 22272 more elements ...$method [1] "ls" $design CTL EXPvsCTL [1,] 1 1 [2,] 1 1 [3,] 1 1 [4,] 1 1 [5,] 1 1 31 more rows ... ***************************************************** For some reason fit$stdev.unscaled gives the same values for every probe. On Fri, Jun 7, 2013 at 12:52 PM, Ed Siefker <ebs15242@gmail.com> wrote: > I am trying to compare find differentially expressed genes between > appendix and colon tumor samples, which have been arrayed > on different platforms. Namely hgu133a2 and hgu133plus2. > hgu133a2 is a subset of hgu133plus2, and Bioconductor provides > a package, inSilicoMerging, that's supposed to do this, so I > thought it would be straight forward. > > First read in my CEL files and normalize them: > > > targets_hsu <- readTargets("Hsu-targets.txt") > > targets_kai <- readTargets("kaiser-targets.txt") > > ab_hsu <- ReadAffy(filenames=targets_hsu$FileName) > > ab_kai <- ReadAffy(filenames=targets_kai$FileName) > > eset_hsu <- gcrma(ab_hsu) > > eset_kai <- gcrma(ab_kai) > > So far so good. Now I merge the esets with inSilicoMerging: > > library(inSilicoMerging) > > eset <- merge(list(eset_hsu,eset_kai),method="COMBAT") > INSILICOMERGING: Run COMBAT... > INSILICOMERGING: => Found 2 batches > INSILICOMERGING: => Found 0 covariate(s) > > dim(eset_hsu) > Features Samples > 22277 26 > > dim(eset_kai) > Features Samples > 54675 10 > > dim(eset) > Features Samples > 22277 36 > > This looks like it worked. I used plotMDS(), and the data are > nicely intermixed as one would hope. Now I need to do DE > analysis with Limma. Hsu (the first 26 samples) are experimental > and Kai (the last 10 samples) are control. So I create a design > matrix like this: > > > > design <- cbind(CTL=1, EXPvsCTL=c(rep(1,26),rep(0,10))) > > fit <- lmFit(eset, design) > > fit <- eBayes(fit) > > tt<-topTable(fit, coef="EXPvsCTL",number=100000) > > head(tt,n=3) > logFC AveExpr t P.Value adj.P.Val B > 1007_s_at 1.487227e-15 7.045851 4.223110e-15 1 1 -6.235399 > 1053_at 7.281216e-16 5.498793 1.127582e-14 1 1 -6.235399 > 117_at -8.262373e-16 5.047563 -2.068570e-15 1 1 -6.235399 > > tail(tt,n=3) > logFC AveExpr t P.Value adj.P.Val > AFFX-TrpnX-3_at -3.953675e-16 2.257998 -1.507504e-13 1 1 > AFFX-TrpnX-5_at 1.024821e-16 2.257637 4.062598e-14 1 1 > AFFX-TrpnX-M_at 1.024821e-16 2.257637 4.062598e-14 1 1 > B > AFFX-TrpnX-3_at -6.235399 > AFFX-TrpnX-5_at -6.235399 > AFFX-TrpnX-M_at -6.235399 > > > As you can see, the p values and B statistics are the same for every > probe. Clearly > something is wrong here. Did I do something wrong? Is this sort of thing > expected > when you merge datasets like this? Any nudges in the right direction > would be > appreciated. > -Ed > [[alternative HTML version deleted]]
0
Entering edit mode
Dear Ed, This behaviour is indeed expected when you merge datasets as you did. The two datasets are completely confounded with your contrast of interest: Hsu corresponds to the EXP condition and Kai to the CTL condition. Using ComBat to merge the two datasets, you actually took care that the mean expression of each gene is equal between the two studies. This explains the log fold changes that are nearly zero in your top table. Because of the confounding, there is no sensible way of merging these datasets. best, Perry --- Perry Moerland, PhD Room J1B-215 Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics Academic Medical Centre, University of Amsterdam Postbus 22660, 1100 DD Amsterdam, The Netherlands tel: +31 20 5666945 p.d.moerland at amc.uva.nl, http://www.bioinformaticslaboratory.nl/ -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Ed Siefker Sent: Friday, June 07, 2013 8:52 PM To: bioconductor Subject: Re: [BioC] inSilicoMerging and Limma I think the problem is with lmFit(). The merged eset definitely contain different data for each samples, since the MDS plot shows data points scattered around the origin. But after feeding this eset to lmFit(), here's what I get: ***************************************************** > fit An object of class "MArrayLM" $coefficients CTL EXPvsCTL 1007_s_at 7.045851 1.487227e-15 1053_at 5.498793 7.281216e-16 117_at 5.047563 -8.262373e-16 121_at 4.350444 4.131187e-16 1255_g_at 2.257968 -2.780886e-16 22272 more rows ...$rank [1] 2 $assign NULL$qr $qr CTL EXPvsCTL [1,] -6.0000000 -4.33333333 [2,] 0.1666667 -2.68741925 [3,] 0.1666667 0.08859624 [4,] 0.1666667 0.08859624 [5,] 0.1666667 0.08859624 31 more rows ...$qraux [1] 1.166667 1.088596 $pivot [1] 1 2$tol [1] 1e-07 $rank [1] 2$df.residual [1] 34 34 34 34 34 22272 more elements ... $sigma 1007_s_at 1053_at 117_at 121_at 1255_g_at 0.954188915 0.174833541 1.082247491 0.346694394 0.001783684 22272 more elements ...$cov.coefficients CTL EXPvsCTL CTL 0.1 -0.1000000 EXPvsCTL -0.1 0.1384615 $stdev.unscaled CTL EXPvsCTL 1007_s_at 0.3162278 0.3721042 1053_at 0.3162278 0.3721042 117_at 0.3162278 0.3721042 121_at 0.3162278 0.3721042 1255_g_at 0.3162278 0.3721042 22272 more rows ...$pivot [1] 1 2 $Amean 1007_s_at 1053_at 117_at 121_at 1255_g_at 7.045851 5.498793 5.047563 4.350444 2.257968 22272 more elements ...$method [1] "ls" $design CTL EXPvsCTL [1,] 1 1 [2,] 1 1 [3,] 1 1 [4,] 1 1 [5,] 1 1 31 more rows ... ***************************************************** For some reason fit$stdev.unscaled gives the same values for every probe. On Fri, Jun 7, 2013 at 12:52 PM, Ed Siefker <ebs15242 at="" gmail.com=""> wrote: > I am trying to compare find differentially expressed genes between > appendix and colon tumor samples, which have been arrayed on different > platforms. Namely hgu133a2 and hgu133plus2. > hgu133a2 is a subset of hgu133plus2, and Bioconductor provides a > package, inSilicoMerging, that's supposed to do this, so I thought it > would be straight forward. > > First read in my CEL files and normalize them: > > > targets_hsu <- readTargets("Hsu-targets.txt") targets_kai <- > > readTargets("kaiser-targets.txt") ab_hsu <- > > ReadAffy(filenames=targets_hsu$FileName) > > ab_kai <- ReadAffy(filenames=targets_kai$FileName) > > eset_hsu <- gcrma(ab_hsu) > > eset_kai <- gcrma(ab_kai) > > So far so good. Now I merge the esets with inSilicoMerging: > > library(inSilicoMerging) > > eset <- merge(list(eset_hsu,eset_kai),method="COMBAT") > INSILICOMERGING: Run COMBAT... > INSILICOMERGING: => Found 2 batches > INSILICOMERGING: => Found 0 covariate(s) > > dim(eset_hsu) > Features Samples > 22277 26 > > dim(eset_kai) > Features Samples > 54675 10 > > dim(eset) > Features Samples > 22277 36 > > This looks like it worked. I used plotMDS(), and the data are > nicely intermixed as one would hope. Now I need to do DE > analysis with Limma. Hsu (the first 26 samples) are experimental > and Kai (the last 10 samples) are control. So I create a design > matrix like this: > > > > design <- cbind(CTL=1, EXPvsCTL=c(rep(1,26),rep(0,10))) fit <- > > lmFit(eset, design) fit <- eBayes(fit) tt<-topTable(fit, > > coef="EXPvsCTL",number=100000) > > head(tt,n=3) > logFC AveExpr t P.Value adj.P.Val B > 1007_s_at 1.487227e-15 7.045851 4.223110e-15 1 1 -6.235399 > 1053_at 7.281216e-16 5.498793 1.127582e-14 1 1 -6.235399 > 117_at -8.262373e-16 5.047563 -2.068570e-15 1 1 -6.235399 > > tail(tt,n=3) > logFC AveExpr t P.Value adj.P.Val > AFFX-TrpnX-3_at -3.953675e-16 2.257998 -1.507504e-13 1 1 > AFFX-TrpnX-5_at 1.024821e-16 2.257637 4.062598e-14 1 1 > AFFX-TrpnX-M_at 1.024821e-16 2.257637 4.062598e-14 1 1 > B > AFFX-TrpnX-3_at -6.235399 > AFFX-TrpnX-5_at -6.235399 > AFFX-TrpnX-M_at -6.235399 > > > As you can see, the p values and B statistics are the same for every > probe. Clearly something is wrong here. Did I do something wrong? > Is this sort of thing expected when you merge datasets like this? Any > nudges in the right direction would be appreciated. > -Ed > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ________________________________ AMC Disclaimer : http://www.amc.nl/disclaimer