gDNA Cy3, cDNA Cy5
1
0
Entering edit mode
Al Ivens ▴ 270
@al-ivens-1646
Last seen 10.2 years ago
Hi, I have a whole pile of 2-colour arrays done with Cy3 genomic DNA, and Cy5 cDNA. There are 7 "test" samples, each done in triplicate against the genomic DNA (so 21 slides). I have seen a previous posting from Jenny about how to analyse these kinds of hybes, and followed her suggestions. The scans are unprocessed BlueFuse, but bg has already been subtracted. > red_green <- read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="A MP CH1", + weights="CONFIDENCE",quality="QUALITY")) Although boxplots of the various slides for the two channels indicate a pretty reasonable degree of similarity within a channel, the two channels as a whole were quite difft: > summary(as.vector(log2(red_green$R))) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.157 7.381 8.700 9.016 10.600 16.420 > summary(as.vector(log2(red_green$G))) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.820 7.666 10.380 10.020 12.470 16.370 Despite this, I proceeded anyway: > ## as Cy3 is gDNA in the ref channel, don't do NWA > NBA <- normalizeBetweenArrays(red_green,method="Gquantile") > > ## get back the R and G values after Gq normalisation > NBA.RG.MA <- RG.MA(NBA) All Cy3 distributions were now identical, as expected, and the Cy5 channel had also been modified: > summaryas.vectorlog2NBA.RG.MA$G))) Min. 1st Qu. Median Mean 3rd Qu. Max. 3.038 7.580 10.400 10.020 12.440 15.600 > summaryas.vectorlog2NBA.RG.MA$R))) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8795 7.3940 8.7480 9.0160 10.5700 17.4500 > ## create a MAList to manipulate > NBA.fake <- NBA > > ## replace the M values with the log2(R) values > NBA.fake$M <- log2NBA.RG.MA$R) I then did fitting, using a design matrix of the ref="gDNA" kind, with a contrast matrix for the cDNAx vs cDNAy comparisons > NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls") > NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts) > eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit) I then looked at the coefficients of the eBayesed contrast fit object: > for(i in 1:length(colnames(contrasts))) + { + cat("Contrast",i," ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n") + } ## individual cDNAs (cDNA1, 2 etc) min 25% median mean 75% max Contrast 1 5.417 8.538 9.79 9.873 10.99 16.23 Contrast 2 4.426 6.867 8.623 8.798 10.46 15.58 Contrast 3 4.623 7.26 8.606 8.951 10.37 15.92 Contrast 4 5.175 7.626 8.853 9.323 10.72 16.52 Contrast 5 3.928 7.175 8.864 9.073 10.71 15.63 Contrast 6 4.167 7.269 8.264 8.848 10.29 16.34 Contrast 7 3.948 6.381 7.956 8.244 9.824 15.58 ## the various comparisons (cDNA2-cDNA1 etc) Contrast 8 -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715 Contrast 9 -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098 Contrast 10 -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093 Contrast 11 -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631 Contrast 12 -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511 Contrast 13 -6.746 -2.3 -1.536 -1.629 -0.9041 4.385 Contrast 14 -5.327 -0.209 0.1646 0.1526 0.5908 5.481 Contrast 15 -2.688 0.1018 0.4627 0.5249 0.9155 6.033 Contrast 16 -3.462 -0.1604 0.2428 0.275 0.7153 6.032 ....... Many of the medians are well below zero, and of course, the output from topTable is almost exclusively negative logFC. This doesnt make sense biologically. Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA reference" biologically speaking (cDNA2-7 are all single gene mutants). "Unfortunately", the median value for the cDNA1 coeff is quite a bit larger than the others, so I think this is skewing most of the contrasts (i.e. I dont think there is a wholesale reduction of transcription in the other mutants). What could/should I do about it? I have contemplated normalizeQuantiles of the red channel after the Gquantile has been done (i.e. before fitting), but not sure whether that is a valid thing to do. Would a single channel analysis be more appropriate here? Thoughts/suggestions greatly appreciated. Cheers, al > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] "tools" "stats" "graphics" "grDevices" "utils" [6] "datasets" "methods" "base" other attached packages: geneplotter annotate Biobase gplots gdata gtools "1.14.0" "1.14.1" "1.14.1" "2.3.2" "2.3.1" "2.3.1" lattice MASS statmod sma limma Hmisc "0.16-2" "7.2-35" "1.3.0" "0.5.15" "2.10.0" "3.4-2" > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Transcription Biobase annotate Transcription Biobase annotate • 1.1k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 12 weeks ago
EMBL European Molecular Biology Laborat…
Dear Al, what you have done looks very reasonable, but of course your outcome doesn't seem to be so. Issuse that one might want to investigate, if you haven't already, are: - array quality - variance of the data at the low end, and the background correction (if you have many very small or even negative raw intensities) - do a single color analysis as if there were no Cy3 - compute log(Cy5/Cy3) within each array, then normalize the log- ratios between arrays, and here I would go successively from least aggressive to most aggressive (scaling, affine linear, a stiff loess-like smoother, quantiles) and only go to most aggressive if the data really ask for it. Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > Hi, > > I have a whole pile of 2-colour arrays done with Cy3 genomic DNA, and > Cy5 cDNA. There are 7 "test" samples, each done in triplicate against > the genomic DNA (so 21 slides). I have seen a previous posting from > Jenny about how to analyse these kinds of hybes, and followed her > suggestions. > > The scans are unprocessed BlueFuse, but bg has already been subtracted. > >> red_green <- > read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R= "AMP > CH1", > + weights="CONFIDENCE",quality="QUALITY")) > > Although boxplots of the various slides for the two channels indicate a > pretty reasonable degree of similarity within a channel, the two > channels as a whole were quite difft: > >> summary(as.vector(log2(red_green$R))) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 1.157 7.381 8.700 9.016 10.600 16.420 >> summary(as.vector(log2(red_green$G))) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 1.820 7.666 10.380 10.020 12.470 16.370 > > Despite this, I proceeded anyway: > >> ## as Cy3 is gDNA in the ref channel, don't do NWA >> NBA <- normalizeBetweenArrays(red_green,method="Gquantile") >> >> ## get back the R and G values after Gq normalisation >> NBA.RG.MA <- RG.MA(NBA) > > All Cy3 distributions were now identical, as expected, and the Cy5 > channel had also been modified: > >> summary(as.vectorlog2NBA.RG.MA$G))) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 3.038 7.580 10.400 10.020 12.440 15.600 >> summary(as.vectorlog2NBA.RG.MA$R))) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.8795 7.3940 8.7480 9.0160 10.5700 17.4500 > >> ## create a MAList to manipulate >> NBA.fake <- NBA >> >> ## replace the M values with the log2(R) values >> NBA.fake$M <- log2NBA.RG.MA$R) > > I then did fitting, using a design matrix of the ref="gDNA" kind, with a > contrast matrix for the cDNAx vs cDNAy comparisons > >> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls") >> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts) >> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit) > > I then looked at the coefficients of the eBayesed contrast fit object: > >> for(i in 1:length(colnames(contrasts))) > + { > + cat("Contrast",i," ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n") > + } > > ## individual cDNAs (cDNA1, 2 etc) > min 25% median mean 75% max > Contrast 1 5.417 8.538 9.79 9.873 10.99 16.23 > Contrast 2 4.426 6.867 8.623 8.798 10.46 15.58 > Contrast 3 4.623 7.26 8.606 8.951 10.37 15.92 > Contrast 4 5.175 7.626 8.853 9.323 10.72 16.52 > Contrast 5 3.928 7.175 8.864 9.073 10.71 15.63 > Contrast 6 4.167 7.269 8.264 8.848 10.29 16.34 > Contrast 7 3.948 6.381 7.956 8.244 9.824 15.58 > ## the various comparisons (cDNA2-cDNA1 etc) > Contrast 8 -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715 > Contrast 9 -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098 > Contrast 10 -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093 > Contrast 11 -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631 > Contrast 12 -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511 > Contrast 13 -6.746 -2.3 -1.536 -1.629 -0.9041 4.385 > Contrast 14 -5.327 -0.209 0.1646 0.1526 0.5908 5.481 > Contrast 15 -2.688 0.1018 0.4627 0.5249 0.9155 6.033 > Contrast 16 -3.462 -0.1604 0.2428 0.275 0.7153 6.032 > ....... > > Many of the medians are well below zero, and of course, the output from > topTable is almost exclusively negative logFC. This doesnt make sense > biologically. Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA reference" > biologically speaking (cDNA2-7 are all single gene mutants). > "Unfortunately", the median value for the cDNA1 coeff is quite a bit > larger than the others, so I think this is skewing most of the contrasts > (i.e. I dont think there is a wholesale reduction of transcription in > the other mutants). What could/should I do about it? I have > contemplated normalizeQuantiles of the red channel after the Gquantile > has been done (i.e. before fitting), but not sure whether that is a > valid thing to do. Would a single channel analysis be more appropriate > here? > > Thoughts/suggestions greatly appreciated. > > Cheers, > > al > > >> sessionInfo() > R version 2.5.1 (2007-06-27) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] "tools" "stats" "graphics" "grDevices" "utils" > [6] "datasets" "methods" "base" > > other attached packages: > geneplotter annotate Biobase gplots gdata gtools > "1.14.0" "1.14.1" "1.14.1" "2.3.2" "2.3.1" "2.3.1" > lattice MASS statmod sma limma Hmisc > "0.16-2" "7.2-35" "1.3.0" "0.5.15" "2.10.0" "3.4-2" >
ADD COMMENT
0
Entering edit mode
Thanks Wolfgang, will explore some of your suggestions. a > -----Original Message----- > From: Wolfgang Huber [mailto:huber at ebi.ac.uk] > Sent: 26 September 2007 21:47 > To: Al Ivens > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] gDNA Cy3, cDNA Cy5 > > > Dear Al, > > what you have done looks very reasonable, but of course your outcome > doesn't seem to be so. Issuse that one might want to > investigate, if you > haven't already, are: > - array quality > - variance of the data at the low end, and the background > correction (if > you have many very small or even negative raw intensities) > - do a single color analysis as if there were no Cy3 > - compute log(Cy5/Cy3) within each array, then normalize the > log-ratios > between arrays, and here I would go successively from least > aggressive > to most aggressive (scaling, affine linear, a stiff > loess-like smoother, > quantiles) and only go to most aggressive if the data really > ask for it. > > Best wishes > Wolfgang > > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > > > Hi, > > > > I have a whole pile of 2-colour arrays done with Cy3 > genomic DNA, and > > Cy5 cDNA. There are 7 "test" samples, each done in > triplicate against > > the genomic DNA (so 21 slides). I have seen a previous > posting from > > Jenny about how to analyse these kinds of hybes, and followed her > > suggestions. > > > > The scans are unprocessed BlueFuse, but bg has already been > > subtracted. > > > >> red_green <- > > > read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="A > > MP > > CH1", > > + > weights="CONFIDENCE",quality="QUALITY")) > > > > Although boxplots of the various slides for the two > channels indicate > > a pretty reasonable degree of similarity within a channel, the two > > channels as a whole were quite difft: > > > >> summary(as.vector(log2(red_green$R))) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 1.157 7.381 8.700 9.016 10.600 16.420 > >> summary(as.vector(log2(red_green$G))) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 1.820 7.666 10.380 10.020 12.470 16.370 > > > > Despite this, I proceeded anyway: > > > >> ## as Cy3 is gDNA in the ref channel, don't do NWA > >> NBA <- normalizeBetweenArrays(red_green,method="Gquantile") > >> > >> ## get back the R and G values after Gq normalisation NBA.RG.MA <- > >> RG.MA(NBA) > > > > All Cy3 distributions were now identical, as expected, and the Cy5 > > channel had also been modified: > > > >> summary(as.vectorlog2NBA.RG.MA$G))) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 3.038 7.580 10.400 10.020 12.440 15.600 > >> summary(as.vectorlog2NBA.RG.MA$R))) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 0.8795 7.3940 8.7480 9.0160 10.5700 17.4500 > > > >> ## create a MAList to manipulate > >> NBA.fake <- NBA > >> > >> ## replace the M values with the log2(R) values > >> NBA.fake$M <- log2NBA.RG.MA$R) > > > > I then did fitting, using a design matrix of the ref="gDNA" > kind, with > > a contrast matrix for the cDNAx vs cDNAy comparisons > > > >> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls") > >> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts) > >> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit) > > > > I then looked at the coefficients of the eBayesed contrast > fit object: > > > >> for(i in 1:length(colnames(contrasts))) > > + { > > + cat("Contrast",i," > ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n") > > + } > > > > ## individual cDNAs (cDNA1, 2 etc) > > min 25% median mean 75% max > > Contrast 1 5.417 8.538 9.79 9.873 10.99 16.23 > > Contrast 2 4.426 6.867 8.623 8.798 10.46 15.58 > > Contrast 3 4.623 7.26 8.606 8.951 10.37 15.92 > > Contrast 4 5.175 7.626 8.853 9.323 10.72 16.52 > > Contrast 5 3.928 7.175 8.864 9.073 10.71 15.63 > > Contrast 6 4.167 7.269 8.264 8.848 10.29 16.34 > > Contrast 7 3.948 6.381 7.956 8.244 9.824 15.58 > > ## the various comparisons (cDNA2-cDNA1 etc) > > Contrast 8 -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715 > > Contrast 9 -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098 > > Contrast 10 -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093 > > Contrast 11 -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631 > > Contrast 12 -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511 > > Contrast 13 -6.746 -2.3 -1.536 -1.629 -0.9041 4.385 > > Contrast 14 -5.327 -0.209 0.1646 0.1526 0.5908 5.481 > > Contrast 15 -2.688 0.1018 0.4627 0.5249 0.9155 6.033 > > Contrast 16 -3.462 -0.1604 0.2428 0.275 0.7153 6.032 > > ....... > > > > Many of the medians are well below zero, and of course, the output > > from topTable is almost exclusively negative logFC. This > doesnt make > > sense biologically. Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA > > reference" biologically speaking (cDNA2-7 are all single gene > > mutants). "Unfortunately", the median value for the cDNA1 coeff is > > quite a bit larger than the others, so I think this is > skewing most of > > the contrasts (i.e. I dont think there is a wholesale reduction of > > transcription in the other mutants). What could/should I > do about it? > > I have contemplated normalizeQuantiles of the red channel after the > > Gquantile has been done (i.e. before fitting), but not sure whether > > that is a valid thing to do. Would a single channel > analysis be more > > appropriate here? > > > > Thoughts/suggestions greatly appreciated. > > > > Cheers, > > > > al > > > > > >> sessionInfo() > > R version 2.5.1 (2007-06-27) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > > Kingdom.1252;LC_MONETARY=English_United > > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > > > attached base packages: > > [1] "tools" "stats" "graphics" "grDevices" "utils" > > [6] "datasets" "methods" "base" > > > > other attached packages: > > geneplotter annotate Biobase gplots gdata > gtools > > "1.14.0" "1.14.1" "1.14.1" "2.3.2" "2.3.1" > "2.3.1" > > lattice MASS statmod sma limma > Hmisc > > "0.16-2" "7.2-35" "1.3.0" "0.5.15" "2.10.0" > "3.4-2" > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
ADD REPLY

Login before adding your answer.

Traffic: 498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6