Design/Contrast for Two-Channel Experimental Setup

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hi all, I'm currently looking at data collected from a two-channel microarray experiment; the experimental design is as follows: - The data represents the results of a competitive hybridization process between control RNA and treatment RNA. - The data comprises n*m slides (*n* biological replicates and *m* technical replicates for each biological replicate). - The control label dye (cy5) treatment label dye (cy3) remain the same across all slides - hence, **there is no dye-swap aspect to the experiment**. - The data were generated by ScanArray Express and slide data are stored in separate .csv files. I'm very new to the limma package. Is it possible to use the limma package to identify differentially expressed genes for this experimental setup? If so, - how can the design matrix be specified? will a "dye effect" term still be required even if there is no dye-swap? - is a contrast matrix necessary for this procedure? - are there any specialist normalisation techniques required for this setup? My code so far is as follows: > ># Assuming the contents of the targets file have been identified: > > > RG<-read.maimages(targets, source="scanarrayexpress", sep=",") > RGbk <- backgroundCorrect(RG, method="normexp", offset=50) > MA <- normalizeWithinArrays(RGbk, method="loess") > MA.b=normalizeBetweenArrays(MA, method="quantile") > design <- modelMatrix(targets, ref="control") # nmx1 matrix; all elements set to -1. > fit <- lmFit(MA, design) > fit <- eBayes(fit) > topTable(fit, coef=1, adjust="fdr") > Any assistance with the above would be greatly appreciated. Joseph -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.18.7 -- Sent via the guest posting facility at bioconductor.org.

limma limma • 1.3k views

ADD COMMENT • link updated 10.3 years ago by Gordon Smyth 50k • written 10.3 years ago by Guest User ★ 13k

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hi all, I'm currently looking at data collected from a two-channel microarray experiment; the experimental design is as follows: - The data represents the results of a competitive hybridization process between control RNA and treatment RNA. - The data comprises n*m slides (*n* biological replicates and *m* technical replicates for each biological replicate). - The control label dye (cy5) treatment label dye (cy3) remain the same across all slides - hence, **there is no dye-swap aspect to the experiment**. - The data were generated by ScanArray Express and slide data are stored in separate .csv files. I'm very new to the limma package. Is it possible to use the limma package to identify differentially expressed genes for this experimental setup? If so, - how can the design matrix be specified? will a "dye effect" term still be required even if there is no dye-swap? - is a contrast matrix necessary for this procedure? - are there any specialist normalisation techniques required for this setup? My code so far is as follows: > ># Assuming the contents of the targets file have been identified: > > > RG<-read.maimages(targets, source="scanarrayexpress", sep=",") > RGbk <- backgroundCorrect(RG, method="normexp", offset=50) > MA <- normalizeWithinArrays(RGbk, method="loess") > MA.b=normalizeBetweenArrays(MA, method="quantile") > design <- modelMatrix(targets, ref="control") # nmx1 matrix; all elements set to -1. > fit <- lmFit(MA, design) > fit <- eBayes(fit) > topTable(fit, coef=1, adjust="fdr") > Any assistance with the above would be greatly appreciated. Joseph -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.18.7 -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 10.3 years ago Guest User ★ 13k

0

Entering edit mode

Hi Joseph, You cannot include a dye effect term in this design, because the biological effect and dye effect are completely confounded due to the lack of dye swaps. Hence, I believe this design is incapable of distinguishing between dye effects and biological effects. The only way to proceed would be to make an arbitrary assumption about the dye effects (e.g. assume dye effects are zero). -Ryan On Sat Jan 4 09:43:02 2014, Joseph Shaw [guest] wrote: > > Hi all, > > I'm currently looking at data collected from a two-channel microarray experiment; the experimental design is as follows: > > - The data represents the results of a competitive hybridization process between control RNA and treatment RNA. > - The data comprises n*m slides (*n* biological replicates and *m* technical replicates for each biological replicate). > - The control label dye (cy5) treatment label dye (cy3) remain the same across all slides - hence, **there is no dye-swap aspect to the experiment**. > - The data were generated by ScanArray Express and slide data are stored in separate .csv files. > > I'm very new to the limma package. Is it possible to use the limma package to identify differentially expressed genes for this experimental setup? > > If so, > > - how can the design matrix be specified? will a "dye effect" term still be required even if there is no dye-swap? > - is a contrast matrix necessary for this procedure? > - are there any specialist normalisation techniques required for this setup? > > My code so far is as follows: > >> >> # Assuming the contents of the targets file have been identified: >> >> >> RG<-read.maimages(targets, source="scanarrayexpress", sep=",") >> RGbk <- backgroundCorrect(RG, method="normexp", offset=50) >> MA <- normalizeWithinArrays(RGbk, method="loess") >> MA.b=normalizeBetweenArrays(MA, method="quantile") >> design <- modelMatrix(targets, ref="control") # nmx1 matrix; all elements set to -1. >> fit <- lmFit(MA, design) >> fit <- eBayes(fit) >> topTable(fit, coef=1, adjust="fdr") >> > > Any assistance with the above would be greatly appreciated. > > Joseph > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.18.7 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.3 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Ryan, Thanks for your reply! It was my belief that the experimental setup would imply that the dye effect would be confounded with the biological effect - thanks for clarifying that this is indeed the case. However, I'm still slightly confused about the dye effect term; specifically, shouldn't the loess normalisation (performed by *normalizeWithinArray*s*()* function) correct for the dye effect? If this is the case, why is a dye effect term required? Also, with a view to identifying differentially expressed genes, is the sample code provided in my previous mail otherwise correct? Are there any alterations that I should consider? Joseph On Sat, Jan 4, 2014 at 2:57 PM, Ryan <rct@thompsonclan.org> wrote: > Hi Joseph, > > You cannot include a dye effect term in this design, because the > biological effect and dye effect are completely confounded due to the lack > of dye swaps. Hence, I believe this design is incapable of distinguishing > between dye effects and biological effects. The only way to proceed would > be to make an arbitrary assumption about the dye effects (e.g. assume dye > effects are zero). > > -Ryan > > > On Sat Jan 4 09:43:02 2014, Joseph Shaw [guest] wrote: > >> >> Hi all, >> >> I'm currently looking at data collected from a two-channel microarray >> experiment; the experimental design is as follows: >> >> - The data represents the results of a competitive hybridization >> process between control RNA and treatment RNA. >> - The data comprises n*m slides (*n* biological replicates and *m* >> technical replicates for each biological replicate). >> - The control label dye (cy5) treatment label dye (cy3) remain the same >> across all slides - hence, **there is no dye-swap aspect to the >> experiment**. >> - The data were generated by ScanArray Express and slide data are >> stored in separate .csv files. >> >> I'm very new to the limma package. Is it possible to use the limma >> package to identify differentially expressed genes for this experimental >> setup? >> >> If so, >> >> - how can the design matrix be specified? will a "dye effect" term >> still be required even if there is no dye-swap? >> - is a contrast matrix necessary for this procedure? >> - are there any specialist normalisation techniques required for this >> setup? >> >> My code so far is as follows: >> >> >>> # Assuming the contents of the targets file have been identified: >>> >>> >>> RG<-read.maimages(targets, source="scanarrayexpress", sep=",") >>> RGbk <- backgroundCorrect(RG, method="normexp", offset=50) >>> MA <- normalizeWithinArrays(RGbk, method="loess") >>> MA.b=normalizeBetweenArrays(MA, method="quantile") >>> design <- modelMatrix(targets, ref="control") # nmx1 matrix; all >>> elements set to -1. >>> fit <- lmFit(MA, design) >>> fit <- eBayes(fit) >>> topTable(fit, coef=1, adjust="fdr") >>> >>> >> Any assistance with the above would be greatly appreciated. >> >> Joseph >> >> -- output of sessionInfo(): >> >> sessionInfo() >>> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_3.18.7 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > [[alternative HTML version deleted]]

ADD REPLY • link 10.3 years ago Joseph Shaw ▴ 100

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 minutes ago

WEHI, Melbourne, Australia

Dear Joseph, > Date: Sat, 4 Jan 2014 19:58:32 +0000 > From: Joseph Shaw <josph.sh at="" gmail.com=""> > To: Ryan <rct at="" thompsonclan.org=""> > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Design/Contrast for Two-Channel Experimental Setup > > Hi Ryan, > > Thanks for your reply! > > It was my belief that the experimental setup would imply that the dye > effect would be confounded with the biological effect - thanks for > clarifying that this is indeed the case. However, I'm still slightly > confused about the dye effect term; specifically, shouldn't the loess > normalisation (performed by *normalizeWithinArray*s*()* function) correct > for the dye effect? If this is the case, why is a dye effect term required? The loess normalization done by normalizeWithinArrays() accounts for a global dye effect trend. However it is possible that some of the probes on the array might show special dye effects specific to those probes which don't follow the overall dye effect trend. It is the purpose of a dye effect term in the linear model to allow for the possibility of such probe-specific dye effects. > Also, with a view to identifying differentially expressed genes, is the > sample code provided in my previous mail otherwise correct? Are there > any alterations that I should consider? The line MA.b=normalizeBetweenArrays(MA, method="quantile") is not needed, and is obviously superfluous in your code anyway. Best wishes Gordon > Joseph > > On Sat, Jan 4, 2014 at 2:57 PM, Ryan <rct at="" thompsonclan.org=""> wrote: > >> Hi Joseph, >> >> You cannot include a dye effect term in this design, because the >> biological effect and dye effect are completely confounded due to the lack >> of dye swaps. Hence, I believe this design is incapable of distinguishing >> between dye effects and biological effects. The only way to proceed would >> be to make an arbitrary assumption about the dye effects (e.g. assume dye >> effects are zero). >> >> -Ryan >> >> >> On Sat Jan 4 09:43:02 2014, Joseph Shaw [guest] wrote: >> >>> >>> Hi all, >>> >>> I'm currently looking at data collected from a two-channel microarray >>> experiment; the experimental design is as follows: >>> >>> - The data represents the results of a competitive hybridization >>> process between control RNA and treatment RNA. >>> - The data comprises n*m slides (*n* biological replicates and *m* >>> technical replicates for each biological replicate). >>> - The control label dye (cy5) treatment label dye (cy3) remain the same >>> across all slides - hence, **there is no dye-swap aspect to the >>> experiment**. >>> - The data were generated by ScanArray Express and slide data are >>> stored in separate .csv files. >>> >>> I'm very new to the limma package. Is it possible to use the limma >>> package to identify differentially expressed genes for this experimental >>> setup? >>> >>> If so, >>> >>> - how can the design matrix be specified? will a "dye effect" term >>> still be required even if there is no dye-swap? >>> - is a contrast matrix necessary for this procedure? >>> - are there any specialist normalisation techniques required for this >>> setup? >>> >>> My code so far is as follows: >>> >>> >>>> # Assuming the contents of the targets file have been identified: >>>> >>>> >>>> RG<-read.maimages(targets, source="scanarrayexpress", sep=",") >>>> RGbk <- backgroundCorrect(RG, method="normexp", offset=50) >>>> MA <- normalizeWithinArrays(RGbk, method="loess") >>>> MA.b=normalizeBetweenArrays(MA, method="quantile") >>>> design <- modelMatrix(targets, ref="control") # nmx1 matrix; all >>>> elements set to -1. >>>> fit <- lmFit(MA, design) >>>> fit <- eBayes(fit) >>>> topTable(fit, coef=1, adjust="fdr") >>>> >>>> >>> Any assistance with the above would be greatly appreciated. >>> >>> Joseph >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_3.18.7 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 10.3 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, Thank you very much for your response! I have two brief follow-on questions pertaining to your previous mail. On Sun, Jan 5, 2014 at 10:39 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Date: Sat, 4 Jan 2014 19:58:32 +0000 >> From: Joseph Shaw <josph.sh at="" gmail.com=""> >> To: Ryan <rct at="" thompsonclan.org=""> >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] Design/Contrast for Two-Channel Experimental Setup >> >> It was my belief that the experimental setup would imply that the dye >> effect would be confounded with the biological effect - thanks for >> clarifying that this is indeed the case. However, I'm still slightly >> confused about the dye effect term; specifically, shouldn't the loess >> normalisation (performed by normalizeWithinArrays() function) correct >> for the dye effect? If this is the case, why is a dye effect term >> required? > > > The loess normalization done by normalizeWithinArrays() accounts for a > global dye effect trend. However it is possible that some of the probes on > the array might show special dye effects specific to those probes which > don't follow the overall dye effect trend. It is the purpose of a dye > effect term in the linear model to allow for the possibility of such > probe-specific dye effects. > Am I correct in suggesting that such a dye-effect term (assuming one exists) will be represented by a model parameter (intercept) estimate common to all observations in a given gene model? If this is the case, this dye effect (the intercept estimate) will be applied to all observations (across replicates) for a given gene model as opposed to any single observation within the gene model. As such, the dye-effect term is a gene-specific as opposed to observation specific. >> Also, with a view to identifying differentially expressed genes, is the >> sample code provided in my previous mail otherwise correct? Are there any >> alterations that I should consider? > > > The line > > MA.b=normalizeBetweenArrays(MA, method="quantile") > > is not needed, and is obviously superfluous in your code anyway. > Could you briefly elaborate on why this line is not needed? As I currently understand it, normalization between arrays is advantageous if there exists a disparity between replicate distributions (in which case a scaling procedure such as quantile normalization can be implemented); is this correct? Kind regards, Joseph

ADD REPLY • link 10.3 years ago Joseph Shaw ▴ 100

0

Entering edit mode

Dear Joseph, On Mon, 6 Jan 2014, Joseph Shaw wrote: > Dear Gordon, > > Thank you very much for your response! > > I have two brief follow-on questions pertaining to your previous mail. > > On Sun, Jan 5, 2014 at 10:39 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >>> Date: Sat, 4 Jan 2014 19:58:32 +0000 >>> From: Joseph Shaw <josph.sh at="" gmail.com=""> >>> To: Ryan <rct at="" thompsonclan.org=""> >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] Design/Contrast for Two-Channel Experimental Setup >>> >>> It was my belief that the experimental setup would imply that the dye >>> effect would be confounded with the biological effect - thanks for >>> clarifying that this is indeed the case. However, I'm still slightly >>> confused about the dye effect term; specifically, shouldn't the loess >>> normalisation (performed by normalizeWithinArrays() function) correct >>> for the dye effect? If this is the case, why is a dye effect term >>> required? >> >> The loess normalization done by normalizeWithinArrays() accounts for a >> global dye effect trend. However it is possible that some of the >> probes on the array might show special dye effects specific to those >> probes which don't follow the overall dye effect trend. It is the >> purpose of a dye effect term in the linear model to allow for the >> possibility of such probe-specific dye effects. > > Am I correct in suggesting that such a dye-effect term (assuming one > exists) will be represented by a model parameter (intercept) estimate > common to all observations in a given gene model? If this is the case, > this dye effect (the intercept estimate) will be applied to all > observations (across replicates) for a given gene model as opposed to > any single observation within the gene model. Yes, each dye effect term will be applied to a row of expression values. Each row of data from a microarray corresponds to a microarray spot or probe, not necessarily a "gene model". > As such, the dye-effect term is a gene-specific as opposed to > observation specific. It is probe-specific, as I said before. No one claimed it was "observation specific". >>> Also, with a view to identifying differentially expressed genes, is the >>> sample code provided in my previous mail otherwise correct? Are there any >>> alterations that I should consider? >> >> >> The line >> >> MA.b=normalizeBetweenArrays(MA, method="quantile") >> >> is not needed, and is obviously superfluous in your code anyway. >> > Could you briefly elaborate on why this line is not needed? As I > currently understand it, normalization between arrays is advantageous > if there exists a disparity between replicate distributions (in which > case a scaling procedure such as quantile normalization can be > implemented); is this correct? Well, first off, your code never did between-array normalization, because the between-array command produced a data object that was not used in the subsequent analysis. Your understanding about between array normalization might be from experience with single channel arrays. For two colour arrays, the loess normalization step already puts the M-values for different arrays on a common scale so there is (usually) nothing more to do. Loess normalization is superior to quantile normalization because it uses the pairing of channel values from the same spot. A subsequent quantile normalization step is not needed and would mess up the job done by loess normalization. Also, I would not describe quantile normalization as a "scaling" procedure. Best wishes Gordon > Kind regards, > > Joseph > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 10.3 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, > > It is probe-specific, as I said before. No one claimed it was "observation > specific". > My apologies - I didn't mean to imply that anybody had claimed the above. I was just hoping to eliminate some ambiguities that I had held. > > Well, first off, your code never did between-array normalization, because > the between-array command produced a data object that was not used in the > subsequent analysis. > Oh, I see. That was actually a typo - it was my intention to use the resulting data object. > > Your understanding about between array normalization might be from > experience with single channel arrays. For two colour arrays, the loess > normalization step already puts the M-values for different arrays on a > common scale so there is (usually) nothing more to do. Loess normalization > is superior to quantile normalization because it uses the pairing of channel > values from the same spot. A subsequent quantile normalization step is not > needed and would mess up the job done by loess normalization. > That makes a lot of sense. Thanks for clearing that up. > > Also, I would not describe quantile normalization as a "scaling" procedure. > I agree. It was definitely a clumsy phrasing; I meant it in terms of inducing a degree of uniformity. Thank you once again for all your assistance - it's greatly appreciated. Kind regards, Joseph

ADD REPLY • link 10.3 years ago Joseph Shaw ▴ 100

Login before adding your answer.