Advice with RemoveBatchEffect and Rank Product package
4
0
Entering edit mode
@y-osee-sanogo-4189
Last seen 9.6 years ago
Dear Members of the list, I have been using Rank Prod to analyze Agilent two-color data. However, I would like to remove the dye effect prior to analysis. I read on the forum that RemoveBatchEffect should be used in the Limma linear model, a type of model that is not in Rank Product. I have two questions: 1) Would it be appropriate to use RemoveBatchEffect to correct for dye effect prior to running the expression data using Rank Prod? 2) a) If no, what other function could I use to do this? b) If yes, I would like a help with the correct design and how to properly indicate the batch. Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are different areas of the brain): > design1 BS C D T 1 0 0 0 1 2 0 0 0 -1 3 0 0 0 1 4 0 0 0 -1 5 0 0 0 1 6 0 0 0 -1 7 0 0 -1 0 8 0 0 1 0 9 0 0 -1 0 10 0 0 1 0 11 0 0 -1 0 12 0 0 1 0 13 0 1 0 0 14 0 -1 0 0 15 0 1 0 0 16 0 -1 0 0 17 0 1 0 0 18 0 -1 0 0 19 -1 0 0 0 20 1 0 0 0 21 -1 0 0 0 22 1 0 0 0 23 -1 0 0 0 24 1 0 0 0 attr(,"assign") [1] 1 1 1 1 I've tried this (Data_RP are my data, the M values of the expression set): > DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, design=design1) but it is returning an error message " Error in contr.sum(levels(batch)) : not enough degrees of freedom to define contrasts" Please help me correct this code. Thank you so much for your help. Osee -- -- -- Y. Osee Sanogo Integrative Biology Institute for Genomic Biology University of Illinois at Urbana 505 S. Goodwin Ave Urbana, IL-61801 Tel: 217-333 2308 (Office) 217-417 9593 (Cell)
limma limma • 2.3k views
ADD COMMENT
0
Entering edit mode
@y-osee-sanogo-4183
Last seen 9.6 years ago
Dear Members of the list, (I apologize for posting this again -I sent it earlier to the list but from another account and I was listed me as non-Member -and Member I am since 2008:-)). I have been using Rank Prod to analyze Agilent two-color data. However, I would like to remove the dye effect prior to analysis. I read on the forum that RemoveBatchEffect should be used in the Limma linear model, a type of modeling that is not in Rank Product. I have two questions: 1) Would it be appropriate to use RemoveBatchEffect to correct for dye effect prior to running the expression data using Rank Prod? 2) a) If no, what other function could I use to do this? b) If yes, I would like a help with the correct design and how to properly indicate the batch. Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are different areas of the brain): > design1 BS C D T 1 0 0 0 1 2 0 0 0 -1 3 0 0 0 1 4 0 0 0 -1 5 0 0 0 1 6 0 0 0 -1 7 0 0 -1 0 8 0 0 1 0 9 0 0 -1 0 10 0 0 1 0 11 0 0 -1 0 12 0 0 1 0 13 0 1 0 0 14 0 -1 0 0 15 0 1 0 0 16 0 -1 0 0 17 0 1 0 0 18 0 -1 0 0 19 -1 0 0 0 20 1 0 0 0 21 -1 0 0 0 22 1 0 0 0 23 -1 0 0 0 24 1 0 0 0 attr(,"assign") [1] 1 1 1 1 I've tried this (Data_RP are my data, the M values of the expression set): > DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, design=design1) but it is returning an error message " Error in contr.sum(levels(batch)) : not enough degrees of freedom to define contrasts" Please help me correct this code. Thank you so much for your help. Osee -- -- -- Y. Osee Sanogo Integrative Biology Institute for Genomic Biology University of Illinois at Urbana 505 S. Goodwin Ave Urbana, IL-61801 Tel: 217-333 2308 (Office) 217-417 9593 (Cell) [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@y-osee-sanogo-4189
Last seen 9.6 years ago
Dear Gordon, >From what you said, it seems that I am oversimplifying my experiment by attempting to analyze it with RankProd, which doesn't offer the option for complex modeling. Could you please explain to me how I could analyze the experiment using Limma? Please let me know if you'd like me to provide further details of the experiment. Thank you so much. Osee On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > Dear Osee, > > No, you can't use removeBatchEffect to control for dye bias. > > Can you ignore the dye effect? Not in general, but who knows? > > Your experiment seems too complex to be properly analysed using RankProd. > For one thing, it seems clear that you have obtained multiple parts of the > brain from the same biological replicates, meaning that your samples are > paired by fish number. > > I could explain how to analyse this experiment using limma. However, if > you are determined that you will use RankProd, it might be best to email > the authors of that package for advice. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > On Sun, 9 Sep 2012, Osee Sanogo wrote: > >> Dear Gordon, >> >> Thank you for getting back to me about my questions. >> >> My experiment is trying to identify differentially expressed genes in four >> regions of the brain in response to a stressor. I have 6 biol. replicates in >> each brain region for the control and experimental groups in each region, >> and the comparison is being done within brain region (i.e., T control vs T >> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >> run in two-color Agilent Array. >> >> You're right that the design I sent was from the separate channel analysis, >> in which I attempted to account for array and dye effect, and then run the >> data in RankProd. Now I know that this is not right. Ok, I will use the >> single channel analysis in Limma. >> >> I still would like to run the two-channel data (ratios) in RankProd, as my >> previous experience found this useful for my dara (low replicate numbers). >> >> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >> before running the two-channel data in RankProd? If yes, how should I do >> this using the RemoveBatch Effect function? >> 2) I found that about 3% of my probes have dye effect. Can I >> omit controlling for dye effect without compromising the result of my >> analysis? >> >> The data were loess/scale normalized into an expression set (Data_RP). >> >> Here is the design of the experiment >> >> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >> 1 1T.gpr 1 -1 1 2 T 39 0.63 >> 2 2T.gpr -1 1 2 1 T 39 0.63 >> 3 3T.gpr 1 -1 3 4 T 39 0.63 >> 4 4T.gpr -1 1 4 3 T 39 0.63 >> 5 5T.gpr 1 -1 5 6 T 39 0.63 >> 6 6T.gpr -1 1 6 5 T NA NA >> 7 1D.gpr -1 1 1 5 D 47 1.21 >> 8 2D.gpr 1 -1 2 4 D 47 1.21 >> 9 3D.gpr -1 1 3 1 D 47 1.21 >> 10 4D.gpr 1 -1 4 6 D 47 1.21 >> 11 5D.gpr -1 1 5 3 D 47 1.21 >> 12 6D.gpr 1 -1 6 2 D NA NA >> 13 1C.gpr 1 -1 1 4 C 47 1.31 >> 14 2C.gpr -1 1 2 3 C 47 1.31 >> 15 3C.gpr 1 -1 3 6 C 47 1.31 >> 16 4C.gpr -1 1 4 5 C 47 1.31 >> 17 5C.gpr 1 -1 5 2 C 47 1.31 >> 18 6C.gpr -1 1 6 1 C NA NA >> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >> 22 4BS.gpr 1 -1 4 4 BS NA NA >> 23 5BS.gpr -1 1 5 5 BS NA NA >> 24 6BS.gpr 1 -1 6 6 BS NA NA >> >> Thank you for your help and please let me know if you need further >> explanation of the experiment. >> >> Best regards, >> >> Osee >> >>> >> >> >> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Osee, >>> >>> You are attempting to do a number of things that don't seem correct to me. >>> >>> First, you seem to attempting a separate channel analysis of two color >>> microarray data, but ignoring the pairing of the red and green channels. >>> It isn't correct to do this. I don't see any way to use RankProd, or any >>> other package designed for independent samples, correctly in this context. >>> If you must do a separate channel analysis, you would be better off using >>> the separate channel analysis facilities of the limma package. >>> >>> Second, when you set batch=rep(1,24), you are defining a batch that >>> consists of every array in your data set. Obviously it doesn't make sense >>> to remove batch effects unless there are at least two batches. >>> >>> Third, I don't follow your design matrix. >>> >>> It would be better to go back to the start, and describe in more basic >>> terms what is the nature of your data and what comparison you are trying >>> to make. >>> >>> Best wishes >>> Gordon >>> >>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>> >>>> Dear Members of the list, >>>> >>>> (I apologize for posting this again -I sent it earlier to the list but >>>> from another account and I was listed me as non-Member -and Member I am >>>> since 2008:-)). >>>> >>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>> would like to remove the dye effect prior to analysis. I read on the forum >>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>> modeling that is not in Rank Product. >>>> >>>> I have two questions: >>>> >>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>> effect prior to running the expression data using Rank Prod? >>>> >>>> 2) a) If no, what other function could I use to do this? >>>> b) If yes, I would like a help with the correct design and how to >>>> properly indicate the batch. >>>> >>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>> different areas of the brain): >>>> >>>> design1 >>>> BS C D T >>>> 1 0 0 0 1 >>>> 2 0 0 0 -1 >>>> 3 0 0 0 1 >>>> 4 0 0 0 -1 >>>> 5 0 0 0 1 >>>> 6 0 0 0 -1 >>>> 7 0 0 -1 0 >>>> 8 0 0 1 0 >>>> 9 0 0 -1 0 >>>> 10 0 0 1 0 >>>> 11 0 0 -1 0 >>>> 12 0 0 1 0 >>>> 13 0 1 0 0 >>>> 14 0 -1 0 0 >>>> 15 0 1 0 0 >>>> 16 0 -1 0 0 >>>> 17 0 1 0 0 >>>> 18 0 -1 0 0 >>>> 19 -1 0 0 0 >>>> 20 1 0 0 0 >>>> 21 -1 0 0 0 >>>> 22 1 0 0 0 >>>> 23 -1 0 0 0 >>>> 24 1 0 0 0 >>>> >>>> attr(,"assign") >>>> [1] 1 1 1 1 >>>> >>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>> >>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>> design=design1) >>>> >>>> but it is returning an error message >>>> " Error in contr.sum(levels(batch)) : >>>> not enough degrees of freedom to define contrasts" >>>> >>>> Please help me correct this code. >>>> >>>> Thank you so much for your help. >>>> >>>> Osee >>>> >>>> -- -- -- >>>> Y. Osee Sanogo >>>> Integrative Biology >>>> Institute for Genomic Biology >>>> University of Illinois at Urbana >>>> 505 S. Goodwin Ave >>>> Urbana, IL-61801 >>>> >>>> Tel: 217-333 2308 (Office) >>>> 217-417 9593 (Cell) > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:7}}
ADD COMMENT
0
Entering edit mode
Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > From what you said, it seems that I am oversimplifying my experiment by > attempting to analyze it with RankProd, which doesn't offer the option for > complex modeling. > > Could you please explain to me how I could analyze the experiment using > Limma? > Please let me know if you'd like me to provide further details of the > experiment. > > Thank you so much. > > Osee > > > On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> No, you can't use removeBatchEffect to control for dye bias. >> >> Can you ignore the dye effect? Not in general, but who knows? >> >> Your experiment seems too complex to be properly analysed using RankProd. >> For one thing, it seems clear that you have obtained multiple parts of the >> brain from the same biological replicates, meaning that your samples are >> paired by fish number. >> >> I could explain how to analyse this experiment using limma. However, if >> you are determined that you will use RankProd, it might be best to email >> the authors of that package for advice. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Sun, 9 Sep 2012, Osee Sanogo wrote: >> >>> Dear Gordon, >>> >>> Thank you for getting back to me about my questions. >>> >>> My experiment is trying to identify differentially expressed genes in four >>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>> each brain region for the control and experimental groups in each region, >>> and the comparison is being done within brain region (i.e., T control vs T >>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>> run in two-color Agilent Array. >>> >>> You're right that the design I sent was from the separate channel analysis, >>> in which I attempted to account for array and dye effect, and then run the >>> data in RankProd. Now I know that this is not right. Ok, I will use the >>> single channel analysis in Limma. >>> >>> I still would like to run the two-channel data (ratios) in RankProd, as my >>> previous experience found this useful for my dara (low replicate numbers). >>> >>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>> before running the two-channel data in RankProd? If yes, how should I do >>> this using the RemoveBatch Effect function? >>> 2) I found that about 3% of my probes have dye effect. Can I >>> omit controlling for dye effect without compromising the result of my >>> analysis? >>> >>> The data were loess/scale normalized into an expression set (Data_RP). >>> >>> Here is the design of the experiment >>> >>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>> 6 6T.gpr -1 1 6 5 T NA NA >>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>> 12 6D.gpr 1 -1 6 2 D NA NA >>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>> 18 6C.gpr -1 1 6 1 C NA NA >>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>> 23 5BS.gpr -1 1 5 5 BS NA NA >>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>> >>> Thank you for your help and please let me know if you need further >>> explanation of the experiment. >>> >>> Best regards, >>> >>> Osee >>> >>>> >>> >>> >>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Osee, >>>> >>>> You are attempting to do a number of things that don't seem correct to me. >>>> >>>> First, you seem to attempting a separate channel analysis of two color >>>> microarray data, but ignoring the pairing of the red and green channels. >>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>> other package designed for independent samples, correctly in this context. >>>> If you must do a separate channel analysis, you would be better off using >>>> the separate channel analysis facilities of the limma package. >>>> >>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>> consists of every array in your data set. Obviously it doesn't make sense >>>> to remove batch effects unless there are at least two batches. >>>> >>>> Third, I don't follow your design matrix. >>>> >>>> It would be better to go back to the start, and describe in more basic >>>> terms what is the nature of your data and what comparison you are trying >>>> to make. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>> >>>>> Dear Members of the list, >>>>> >>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>> from another account and I was listed me as non-Member -and Member I am >>>>> since 2008:-)). >>>>> >>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>> modeling that is not in Rank Product. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>> effect prior to running the expression data using Rank Prod? >>>>> >>>>> 2) a) If no, what other function could I use to do this? >>>>> b) If yes, I would like a help with the correct design and how to >>>>> properly indicate the batch. >>>>> >>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>> different areas of the brain): >>>>> >>>>> design1 >>>>> BS C D T >>>>> 1 0 0 0 1 >>>>> 2 0 0 0 -1 >>>>> 3 0 0 0 1 >>>>> 4 0 0 0 -1 >>>>> 5 0 0 0 1 >>>>> 6 0 0 0 -1 >>>>> 7 0 0 -1 0 >>>>> 8 0 0 1 0 >>>>> 9 0 0 -1 0 >>>>> 10 0 0 1 0 >>>>> 11 0 0 -1 0 >>>>> 12 0 0 1 0 >>>>> 13 0 1 0 0 >>>>> 14 0 -1 0 0 >>>>> 15 0 1 0 0 >>>>> 16 0 -1 0 0 >>>>> 17 0 1 0 0 >>>>> 18 0 -1 0 0 >>>>> 19 -1 0 0 0 >>>>> 20 1 0 0 0 >>>>> 21 -1 0 0 0 >>>>> 22 1 0 0 0 >>>>> 23 -1 0 0 0 >>>>> 24 1 0 0 0 >>>>> >>>>> attr(,"assign") >>>>> [1] 1 1 1 1 >>>>> >>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>> >>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>> design=design1) >>>>> >>>>> but it is returning an error message >>>>> " Error in contr.sum(levels(batch)) : >>>>> not enough degrees of freedom to define contrasts" >>>>> >>>>> Please help me correct this code. >>>>> >>>>> Thank you so much for your help. >>>>> >>>>> Osee >>>>> >>>>> -- -- -- >>>>> Y. Osee Sanogo >>>>> Integrative Biology >>>>> Institute for Genomic Biology >>>>> University of Illinois at Urbana >>>>> 505 S. Goodwin Ave >>>>> Urbana, IL-61801 >>>>> >>>>> Tel: 217-333 2308 (Office) >>>>> 217-417 9593 (Cell) >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of the >> sender. >> ______________________________________________________________________ > > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY
0
Entering edit mode
Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > From what you said, it seems that I am oversimplifying my experiment by > attempting to analyze it with RankProd, which doesn't offer the option for > complex modeling. > > Could you please explain to me how I could analyze the experiment using > Limma? > Please let me know if you'd like me to provide further details of the > experiment. > > Thank you so much. > > Osee > > > On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> No, you can't use removeBatchEffect to control for dye bias. >> >> Can you ignore the dye effect? Not in general, but who knows? >> >> Your experiment seems too complex to be properly analysed using RankProd. >> For one thing, it seems clear that you have obtained multiple parts of the >> brain from the same biological replicates, meaning that your samples are >> paired by fish number. >> >> I could explain how to analyse this experiment using limma. However, if >> you are determined that you will use RankProd, it might be best to email >> the authors of that package for advice. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Sun, 9 Sep 2012, Osee Sanogo wrote: >> >>> Dear Gordon, >>> >>> Thank you for getting back to me about my questions. >>> >>> My experiment is trying to identify differentially expressed genes in four >>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>> each brain region for the control and experimental groups in each region, >>> and the comparison is being done within brain region (i.e., T control vs T >>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>> run in two-color Agilent Array. >>> >>> You're right that the design I sent was from the separate channel analysis, >>> in which I attempted to account for array and dye effect, and then run the >>> data in RankProd. Now I know that this is not right. Ok, I will use the >>> single channel analysis in Limma. >>> >>> I still would like to run the two-channel data (ratios) in RankProd, as my >>> previous experience found this useful for my dara (low replicate numbers). >>> >>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>> before running the two-channel data in RankProd? If yes, how should I do >>> this using the RemoveBatch Effect function? >>> 2) I found that about 3% of my probes have dye effect. Can I >>> omit controlling for dye effect without compromising the result of my >>> analysis? >>> >>> The data were loess/scale normalized into an expression set (Data_RP). >>> >>> Here is the design of the experiment >>> >>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>> 6 6T.gpr -1 1 6 5 T NA NA >>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>> 12 6D.gpr 1 -1 6 2 D NA NA >>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>> 18 6C.gpr -1 1 6 1 C NA NA >>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>> 23 5BS.gpr -1 1 5 5 BS NA NA >>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>> >>> Thank you for your help and please let me know if you need further >>> explanation of the experiment. >>> >>> Best regards, >>> >>> Osee >>> >>>> >>> >>> >>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Osee, >>>> >>>> You are attempting to do a number of things that don't seem correct to me. >>>> >>>> First, you seem to attempting a separate channel analysis of two color >>>> microarray data, but ignoring the pairing of the red and green channels. >>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>> other package designed for independent samples, correctly in this context. >>>> If you must do a separate channel analysis, you would be better off using >>>> the separate channel analysis facilities of the limma package. >>>> >>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>> consists of every array in your data set. Obviously it doesn't make sense >>>> to remove batch effects unless there are at least two batches. >>>> >>>> Third, I don't follow your design matrix. >>>> >>>> It would be better to go back to the start, and describe in more basic >>>> terms what is the nature of your data and what comparison you are trying >>>> to make. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>> >>>>> Dear Members of the list, >>>>> >>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>> from another account and I was listed me as non-Member -and Member I am >>>>> since 2008:-)). >>>>> >>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>> modeling that is not in Rank Product. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>> effect prior to running the expression data using Rank Prod? >>>>> >>>>> 2) a) If no, what other function could I use to do this? >>>>> b) If yes, I would like a help with the correct design and how to >>>>> properly indicate the batch. >>>>> >>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>> different areas of the brain): >>>>> >>>>> design1 >>>>> BS C D T >>>>> 1 0 0 0 1 >>>>> 2 0 0 0 -1 >>>>> 3 0 0 0 1 >>>>> 4 0 0 0 -1 >>>>> 5 0 0 0 1 >>>>> 6 0 0 0 -1 >>>>> 7 0 0 -1 0 >>>>> 8 0 0 1 0 >>>>> 9 0 0 -1 0 >>>>> 10 0 0 1 0 >>>>> 11 0 0 -1 0 >>>>> 12 0 0 1 0 >>>>> 13 0 1 0 0 >>>>> 14 0 -1 0 0 >>>>> 15 0 1 0 0 >>>>> 16 0 -1 0 0 >>>>> 17 0 1 0 0 >>>>> 18 0 -1 0 0 >>>>> 19 -1 0 0 0 >>>>> 20 1 0 0 0 >>>>> 21 -1 0 0 0 >>>>> 22 1 0 0 0 >>>>> 23 -1 0 0 0 >>>>> 24 1 0 0 0 >>>>> >>>>> attr(,"assign") >>>>> [1] 1 1 1 1 >>>>> >>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>> >>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>> design=design1) >>>>> >>>>> but it is returning an error message >>>>> " Error in contr.sum(levels(batch)) : >>>>> not enough degrees of freedom to define contrasts" >>>>> >>>>> Please help me correct this code. >>>>> >>>>> Thank you so much for your help. >>>>> >>>>> Osee >>>>> >>>>> -- -- -- >>>>> Y. Osee Sanogo >>>>> Integrative Biology >>>>> Institute for Genomic Biology >>>>> University of Illinois at Urbana >>>>> 505 S. Goodwin Ave >>>>> Urbana, IL-61801 >>>>> >>>>> Tel: 217-333 2308 (Office) >>>>> 217-417 9593 (Cell) >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of the >> sender. >> ______________________________________________________________________ > > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY
0
Entering edit mode
Dear Gordon, Attached please find the revised design of my experiment. Thank you for your willingness to help me analyze these data. Best regards, Osee ________________________________________ From: Sanogo, Yibayiri O Sent: Monday, September 10, 2012 9:51 PM To: Gordon K Smyth; Osee Sanogo Cc: Bioconductor mailing list Subject: RE: Advice with RemoveBatchEffect and Rank Product package Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > From what you said, it seems that I am oversimplifying my experiment by > attempting to analyze it with RankProd, which doesn't offer the option for > complex modeling. > > Could you please explain to me how I could analyze the experiment using > Limma? > Please let me know if you'd like me to provide further details of the > experiment. > > Thank you so much. > > Osee > > > On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> No, you can't use removeBatchEffect to control for dye bias. >> >> Can you ignore the dye effect? Not in general, but who knows? >> >> Your experiment seems too complex to be properly analysed using RankProd. >> For one thing, it seems clear that you have obtained multiple parts of the >> brain from the same biological replicates, meaning that your samples are >> paired by fish number. >> >> I could explain how to analyse this experiment using limma. However, if >> you are determined that you will use RankProd, it might be best to email >> the authors of that package for advice. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Sun, 9 Sep 2012, Osee Sanogo wrote: >> >>> Dear Gordon, >>> >>> Thank you for getting back to me about my questions. >>> >>> My experiment is trying to identify differentially expressed genes in four >>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>> each brain region for the control and experimental groups in each region, >>> and the comparison is being done within brain region (i.e., T control vs T >>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>> run in two-color Agilent Array. >>> >>> You're right that the design I sent was from the separate channel analysis, >>> in which I attempted to account for array and dye effect, and then run the >>> data in RankProd. Now I know that this is not right. Ok, I will use the >>> single channel analysis in Limma. >>> >>> I still would like to run the two-channel data (ratios) in RankProd, as my >>> previous experience found this useful for my dara (low replicate numbers). >>> >>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>> before running the two-channel data in RankProd? If yes, how should I do >>> this using the RemoveBatch Effect function? >>> 2) I found that about 3% of my probes have dye effect. Can I >>> omit controlling for dye effect without compromising the result of my >>> analysis? >>> >>> The data were loess/scale normalized into an expression set (Data_RP). >>> >>> Here is the design of the experiment >>> >>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>> 6 6T.gpr -1 1 6 5 T NA NA >>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>> 12 6D.gpr 1 -1 6 2 D NA NA >>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>> 18 6C.gpr -1 1 6 1 C NA NA >>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>> 23 5BS.gpr -1 1 5 5 BS NA NA >>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>> >>> Thank you for your help and please let me know if you need further >>> explanation of the experiment. >>> >>> Best regards, >>> >>> Osee >>> >>>> >>> >>> >>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Osee, >>>> >>>> You are attempting to do a number of things that don't seem correct to me. >>>> >>>> First, you seem to attempting a separate channel analysis of two color >>>> microarray data, but ignoring the pairing of the red and green channels. >>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>> other package designed for independent samples, correctly in this context. >>>> If you must do a separate channel analysis, you would be better off using >>>> the separate channel analysis facilities of the limma package. >>>> >>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>> consists of every array in your data set. Obviously it doesn't make sense >>>> to remove batch effects unless there are at least two batches. >>>> >>>> Third, I don't follow your design matrix. >>>> >>>> It would be better to go back to the start, and describe in more basic >>>> terms what is the nature of your data and what comparison you are trying >>>> to make. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>> >>>>> Dear Members of the list, >>>>> >>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>> from another account and I was listed me as non-Member -and Member I am >>>>> since 2008:-)). >>>>> >>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>> modeling that is not in Rank Product. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>> effect prior to running the expression data using Rank Prod? >>>>> >>>>> 2) a) If no, what other function could I use to do this? >>>>> b) If yes, I would like a help with the correct design and how to >>>>> properly indicate the batch. >>>>> >>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>> different areas of the brain): >>>>> >>>>> design1 >>>>> BS C D T >>>>> 1 0 0 0 1 >>>>> 2 0 0 0 -1 >>>>> 3 0 0 0 1 >>>>> 4 0 0 0 -1 >>>>> 5 0 0 0 1 >>>>> 6 0 0 0 -1 >>>>> 7 0 0 -1 0 >>>>> 8 0 0 1 0 >>>>> 9 0 0 -1 0 >>>>> 10 0 0 1 0 >>>>> 11 0 0 -1 0 >>>>> 12 0 0 1 0 >>>>> 13 0 1 0 0 >>>>> 14 0 -1 0 0 >>>>> 15 0 1 0 0 >>>>> 16 0 -1 0 0 >>>>> 17 0 1 0 0 >>>>> 18 0 -1 0 0 >>>>> 19 -1 0 0 0 >>>>> 20 1 0 0 0 >>>>> 21 -1 0 0 0 >>>>> 22 1 0 0 0 >>>>> 23 -1 0 0 0 >>>>> 24 1 0 0 0 >>>>> >>>>> attr(,"assign") >>>>> [1] 1 1 1 1 >>>>> >>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>> >>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>> design=design1) >>>>> >>>>> but it is returning an error message >>>>> " Error in contr.sum(levels(batch)) : >>>>> not enough degrees of freedom to define contrasts" >>>>> >>>>> Please help me correct this code. >>>>> >>>>> Thank you so much for your help. >>>>> >>>>> Osee >>>>> >>>>> -- -- -- >>>>> Y. Osee Sanogo >>>>> Integrative Biology >>>>> Institute for Genomic Biology >>>>> University of Illinois at Urbana >>>>> 505 S. Goodwin Ave >>>>> Urbana, IL-61801 >>>>> >>>>> Tel: 217-333 2308 (Office) >>>>> 217-417 9593 (Cell) >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of the >> sender. >> ______________________________________________________________________ > > > ______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Design_Brain.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120911="" 297d0895="" attachment.txt="">
ADD REPLY
0
Entering edit mode
Dear Osee, Thank you for attaching the revised design of your experiment. I found it somewhat tricky to analyse, so my recommendations are more involved than I usually give. I think that you will need to perform a separate channel analysis, otherwise it will not be possible to make baseline comparisons between the brain parts. I also think that you may need to re-normalize your data using the limma package. You say that the data has been "loess/scale normalized into an expression set", but I don't think that scale normalization should be necessary for Agilent data, and an expression set is not fully suitable for two colour data. The following code assumes that you have normalized the data into an MAList object using the limma package. I will call it 'MA'. I strongly recommend that you use normexp background correction and loess normalization as described in the limma User's Guide for GenePix data. The first step is to put your targets data into separate channel format: targets <- read.delim("Design_Brain.txt") Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) Treat <- ifelse(Treat>0,"Exp","Ctrl") Treat <- factor(Treat) Brain <- rep(targets$Brain.Part,each=2,len=48) Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) Dye <- rep(0:1,len=48) data.frame(Fish,Brain,Treat,Dye) Now you can fit a linear model, correcting for probe-specific dye- bias, correcting for any differences between the fish, and computing Treatment vs Control effects for each brain part: design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) Now you can extract genes that have a treatment effect. To find genes that have a treatment vs control effect in telencephalon, you would use: topTable(fit, coef="BrainT:TreatExp") For genes DE for treatment vs control in brain stem, it would be topTable(fit, coef="BrainBS:TreatExp") And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: Dear Gordon, Attached please find the revised design of my experiment. Thank you for your willingness to help me analyze these data. Best regards, Osee ________________________________________ From: Sanogo, Yibayiri O Sent: Monday, September 10, 2012 9:51 PM To: Gordon K Smyth; Osee Sanogo Cc: Bioconductor mailing list Subject: RE: Advice with RemoveBatchEffect and Rank Product package Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > From what you said, it seems that I am oversimplifying my experiment by > attempting to analyze it with RankProd, which doesn't offer the option for > complex modeling. > > Could you please explain to me how I could analyze the experiment using > Limma? > Please let me know if you'd like me to provide further details of the > experiment. > > Thank you so much. > > Osee > > > On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> No, you can't use removeBatchEffect to control for dye bias. >> >> Can you ignore the dye effect? Not in general, but who knows? >> >> Your experiment seems too complex to be properly analysed using RankProd. >> For one thing, it seems clear that you have obtained multiple parts of the >> brain from the same biological replicates, meaning that your samples are >> paired by fish number. >> >> I could explain how to analyse this experiment using limma. However, if >> you are determined that you will use RankProd, it might be best to email >> the authors of that package for advice. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Sun, 9 Sep 2012, Osee Sanogo wrote: >> >>> Dear Gordon, >>> >>> Thank you for getting back to me about my questions. >>> >>> My experiment is trying to identify differentially expressed genes in four >>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>> each brain region for the control and experimental groups in each region, >>> and the comparison is being done within brain region (i.e., T control vs T >>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>> run in two-color Agilent Array. >>> >>> You're right that the design I sent was from the separate channel analysis, >>> in which I attempted to account for array and dye effect, and then run the >>> data in RankProd. Now I know that this is not right. Ok, I will use the >>> single channel analysis in Limma. >>> >>> I still would like to run the two-channel data (ratios) in RankProd, as my >>> previous experience found this useful for my dara (low replicate numbers). >>> >>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>> before running the two-channel data in RankProd? If yes, how should I do >>> this using the RemoveBatch Effect function? >>> 2) I found that about 3% of my probes have dye effect. Can I >>> omit controlling for dye effect without compromising the result of my >>> analysis? >>> >>> The data were loess/scale normalized into an expression set (Data_RP). >>> >>> Here is the design of the experiment >>> >>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>> 6 6T.gpr -1 1 6 5 T NA NA >>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>> 12 6D.gpr 1 -1 6 2 D NA NA >>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>> 18 6C.gpr -1 1 6 1 C NA NA >>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>> 23 5BS.gpr -1 1 5 5 BS NA NA >>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>> >>> Thank you for your help and please let me know if you need further >>> explanation of the experiment. >>> >>> Best regards, >>> >>> Osee >>> >>>> >>> >>> >>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Osee, >>>> >>>> You are attempting to do a number of things that don't seem correct to me. >>>> >>>> First, you seem to attempting a separate channel analysis of two color >>>> microarray data, but ignoring the pairing of the red and green channels. >>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>> other package designed for independent samples, correctly in this context. >>>> If you must do a separate channel analysis, you would be better off using >>>> the separate channel analysis facilities of the limma package. >>>> >>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>> consists of every array in your data set. Obviously it doesn't make sense >>>> to remove batch effects unless there are at least two batches. >>>> >>>> Third, I don't follow your design matrix. >>>> >>>> It would be better to go back to the start, and describe in more basic >>>> terms what is the nature of your data and what comparison you are trying >>>> to make. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>> >>>>> Dear Members of the list, >>>>> >>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>> from another account and I was listed me as non-Member -and Member I am >>>>> since 2008:-)). >>>>> >>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>> modeling that is not in Rank Product. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>> effect prior to running the expression data using Rank Prod? >>>>> >>>>> 2) a) If no, what other function could I use to do this? >>>>> b) If yes, I would like a help with the correct design and how to >>>>> properly indicate the batch. >>>>> >>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>> different areas of the brain): >>>>> >>>>> design1 >>>>> BS C D T >>>>> 1 0 0 0 1 >>>>> 2 0 0 0 -1 >>>>> 3 0 0 0 1 >>>>> 4 0 0 0 -1 >>>>> 5 0 0 0 1 >>>>> 6 0 0 0 -1 >>>>> 7 0 0 -1 0 >>>>> 8 0 0 1 0 >>>>> 9 0 0 -1 0 >>>>> 10 0 0 1 0 >>>>> 11 0 0 -1 0 >>>>> 12 0 0 1 0 >>>>> 13 0 1 0 0 >>>>> 14 0 -1 0 0 >>>>> 15 0 1 0 0 >>>>> 16 0 -1 0 0 >>>>> 17 0 1 0 0 >>>>> 18 0 -1 0 0 >>>>> 19 -1 0 0 0 >>>>> 20 1 0 0 0 >>>>> 21 -1 0 0 0 >>>>> 22 1 0 0 0 >>>>> 23 -1 0 0 0 >>>>> 24 1 0 0 0 >>>>> >>>>> attr(,"assign") >>>>> [1] 1 1 1 1 >>>>> >>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>> >>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>> design=design1) >>>>> >>>>> but it is returning an error message >>>>> " Error in contr.sum(levels(batch)) : >>>>> not enough degrees of freedom to define contrasts" >>>>> >>>>> Please help me correct this code. >>>>> >>>>> Thank you so much for your help. >>>>> >>>>> Osee >>>>> >>>>> -- -- -- >>>>> Y. Osee Sanogo >>>>> Integrative Biology >>>>> Institute for Genomic Biology >>>>> University of Illinois at Urbana >>>>> 505 S. Goodwin Ave >>>>> Urbana, IL-61801 >>>>> >>>>> Tel: 217-333 2308 (Office) >>>>> 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY
0
Entering edit mode
Dear Gordon, Many, many thanks for your advice on how to properly analyze my data. I also want to thank you for the extra step you took to provide me with detailed step by step instructions on the analysis. I will use the single channel analysis. I like the fact that I can account for all the confounding factors in the analysis. I really appreciate your assistance and I will keep you informed about how it goes. Have a great day! Best, Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Tuesday, September 11, 2012 8:13 PM To: Sanogo, Yibayiri O Cc: Osee Sanogo; Bioconductor mailing list Subject: limma separate channel analysis of your data Dear Osee, Thank you for attaching the revised design of your experiment. I found it somewhat tricky to analyse, so my recommendations are more involved than I usually give. I think that you will need to perform a separate channel analysis, otherwise it will not be possible to make baseline comparisons between the brain parts. I also think that you may need to re-normalize your data using the limma package. You say that the data has been "loess/scale normalized into an expression set", but I don't think that scale normalization should be necessary for Agilent data, and an expression set is not fully suitable for two colour data. The following code assumes that you have normalized the data into an MAList object using the limma package. I will call it 'MA'. I strongly recommend that you use normexp background correction and loess normalization as described in the limma User's Guide for GenePix data. The first step is to put your targets data into separate channel format: targets <- read.delim("Design_Brain.txt") Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) Treat <- ifelse(Treat>0,"Exp","Ctrl") Treat <- factor(Treat) Brain <- rep(targets$Brain.Part,each=2,len=48) Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) Dye <- rep(0:1,len=48) data.frame(Fish,Brain,Treat,Dye) Now you can fit a linear model, correcting for probe-specific dye- bias, correcting for any differences between the fish, and computing Treatment vs Control effects for each brain part: design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) Now you can extract genes that have a treatment effect. To find genes that have a treatment vs control effect in telencephalon, you would use: topTable(fit, coef="BrainT:TreatExp") For genes DE for treatment vs control in brain stem, it would be topTable(fit, coef="BrainBS:TreatExp") And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: Dear Gordon, Attached please find the revised design of my experiment. Thank you for your willingness to help me analyze these data. Best regards, Osee ________________________________________ From: Sanogo, Yibayiri O Sent: Monday, September 10, 2012 9:51 PM To: Gordon K Smyth; Osee Sanogo Cc: Bioconductor mailing list Subject: RE: Advice with RemoveBatchEffect and Rank Product package Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > From what you said, it seems that I am oversimplifying my experiment by > attempting to analyze it with RankProd, which doesn't offer the option for > complex modeling. > > Could you please explain to me how I could analyze the experiment using > Limma? > Please let me know if you'd like me to provide further details of the > experiment. > > Thank you so much. > > Osee > > > On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> No, you can't use removeBatchEffect to control for dye bias. >> >> Can you ignore the dye effect? Not in general, but who knows? >> >> Your experiment seems too complex to be properly analysed using RankProd. >> For one thing, it seems clear that you have obtained multiple parts of the >> brain from the same biological replicates, meaning that your samples are >> paired by fish number. >> >> I could explain how to analyse this experiment using limma. However, if >> you are determined that you will use RankProd, it might be best to email >> the authors of that package for advice. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Sun, 9 Sep 2012, Osee Sanogo wrote: >> >>> Dear Gordon, >>> >>> Thank you for getting back to me about my questions. >>> >>> My experiment is trying to identify differentially expressed genes in four >>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>> each brain region for the control and experimental groups in each region, >>> and the comparison is being done within brain region (i.e., T control vs T >>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>> run in two-color Agilent Array. >>> >>> You're right that the design I sent was from the separate channel analysis, >>> in which I attempted to account for array and dye effect, and then run the >>> data in RankProd. Now I know that this is not right. Ok, I will use the >>> single channel analysis in Limma. >>> >>> I still would like to run the two-channel data (ratios) in RankProd, as my >>> previous experience found this useful for my dara (low replicate numbers). >>> >>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>> before running the two-channel data in RankProd? If yes, how should I do >>> this using the RemoveBatch Effect function? >>> 2) I found that about 3% of my probes have dye effect. Can I >>> omit controlling for dye effect without compromising the result of my >>> analysis? >>> >>> The data were loess/scale normalized into an expression set (Data_RP). >>> >>> Here is the design of the experiment >>> >>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>> 6 6T.gpr -1 1 6 5 T NA NA >>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>> 12 6D.gpr 1 -1 6 2 D NA NA >>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>> 18 6C.gpr -1 1 6 1 C NA NA >>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>> 23 5BS.gpr -1 1 5 5 BS NA NA >>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>> >>> Thank you for your help and please let me know if you need further >>> explanation of the experiment. >>> >>> Best regards, >>> >>> Osee >>> >>>> >>> >>> >>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Osee, >>>> >>>> You are attempting to do a number of things that don't seem correct to me. >>>> >>>> First, you seem to attempting a separate channel analysis of two color >>>> microarray data, but ignoring the pairing of the red and green channels. >>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>> other package designed for independent samples, correctly in this context. >>>> If you must do a separate channel analysis, you would be better off using >>>> the separate channel analysis facilities of the limma package. >>>> >>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>> consists of every array in your data set. Obviously it doesn't make sense >>>> to remove batch effects unless there are at least two batches. >>>> >>>> Third, I don't follow your design matrix. >>>> >>>> It would be better to go back to the start, and describe in more basic >>>> terms what is the nature of your data and what comparison you are trying >>>> to make. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>> >>>>> Dear Members of the list, >>>>> >>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>> from another account and I was listed me as non-Member -and Member I am >>>>> since 2008:-)). >>>>> >>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>> modeling that is not in Rank Product. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>> effect prior to running the expression data using Rank Prod? >>>>> >>>>> 2) a) If no, what other function could I use to do this? >>>>> b) If yes, I would like a help with the correct design and how to >>>>> properly indicate the batch. >>>>> >>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>> different areas of the brain): >>>>> >>>>> design1 >>>>> BS C D T >>>>> 1 0 0 0 1 >>>>> 2 0 0 0 -1 >>>>> 3 0 0 0 1 >>>>> 4 0 0 0 -1 >>>>> 5 0 0 0 1 >>>>> 6 0 0 0 -1 >>>>> 7 0 0 -1 0 >>>>> 8 0 0 1 0 >>>>> 9 0 0 -1 0 >>>>> 10 0 0 1 0 >>>>> 11 0 0 -1 0 >>>>> 12 0 0 1 0 >>>>> 13 0 1 0 0 >>>>> 14 0 -1 0 0 >>>>> 15 0 1 0 0 >>>>> 16 0 -1 0 0 >>>>> 17 0 1 0 0 >>>>> 18 0 -1 0 0 >>>>> 19 -1 0 0 0 >>>>> 20 1 0 0 0 >>>>> 21 -1 0 0 0 >>>>> 22 1 0 0 0 >>>>> 23 -1 0 0 0 >>>>> 24 1 0 0 0 >>>>> >>>>> attr(,"assign") >>>>> [1] 1 1 1 1 >>>>> >>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>> >>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>> design=design1) >>>>> >>>>> but it is returning an error message >>>>> " Error in contr.sum(levels(batch)) : >>>>> not enough degrees of freedom to define contrasts" >>>>> >>>>> Please help me correct this code. >>>>> >>>>> Thank you so much for your help. >>>>> >>>>> Osee >>>>> >>>>> -- -- -- >>>>> Y. Osee Sanogo >>>>> Integrative Biology >>>>> Institute for Genomic Biology >>>>> University of Illinois at Urbana >>>>> 505 S. Goodwin Ave >>>>> Urbana, IL-61801 >>>>> >>>>> Tel: 217-333 2308 (Office) >>>>> 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY
0
Entering edit mode
Dear Gordon, In the code below, design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) shouldn't we include the intracorrelation coefficient in the single channel model? Is the following code correct? corfit <- intraspotCorrelation(MA, design) fit <- lmscFit(MA,design, correlation=corfit$consensus) Please let me know if this is the right way of doing it. Thank you. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Tuesday, September 11, 2012 8:13 PM To: Sanogo, Yibayiri O Cc: Osee Sanogo; Bioconductor mailing list Subject: limma separate channel analysis of your data Dear Osee, Thank you for attaching the revised design of your experiment. I found it somewhat tricky to analyse, so my recommendations are more involved than I usually give. I think that you will need to perform a separate channel analysis, otherwise it will not be possible to make baseline comparisons between the brain parts. I also think that you may need to re-normalize your data using the limma package. You say that the data has been "loess/scale normalized into an expression set", but I don't think that scale normalization should be necessary for Agilent data, and an expression set is not fully suitable for two colour data. The following code assumes that you have normalized the data into an MAList object using the limma package. I will call it 'MA'. I strongly recommend that you use normexp background correction and loess normalization as described in the limma User's Guide for GenePix data. The first step is to put your targets data into separate channel format: targets <- read.delim("Design_Brain.txt") Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) Treat <- ifelse(Treat>0,"Exp","Ctrl") Treat <- factor(Treat) Brain <- rep(targets$Brain.Part,each=2,len=48) Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) Dye <- rep(0:1,len=48) data.frame(Fish,Brain,Treat,Dye) Now you can fit a linear model, correcting for probe-specific dye- bias, correcting for any differences between the fish, and computing Treatment vs Control effects for each brain part: design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) Now you can extract genes that have a treatment effect. To find genes that have a treatment vs control effect in telencephalon, you would use: topTable(fit, coef="BrainT:TreatExp") For genes DE for treatment vs control in brain stem, it would be topTable(fit, coef="BrainBS:TreatExp") And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: Dear Gordon, Attached please find the revised design of my experiment. Thank you for your willingness to help me analyze these data. Best regards, Osee ________________________________________ From: Sanogo, Yibayiri O Sent: Monday, September 10, 2012 9:51 PM To: Gordon K Smyth; Osee Sanogo Cc: Bioconductor mailing list Subject: RE: Advice with RemoveBatchEffect and Rank Product package Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > From what you said, it seems that I am oversimplifying my experiment by > attempting to analyze it with RankProd, which doesn't offer the option for > complex modeling. > > Could you please explain to me how I could analyze the experiment using > Limma? > Please let me know if you'd like me to provide further details of the > experiment. > > Thank you so much. > > Osee > > > On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth@wehi.edu.au> wrote: > >> Dear Osee, >> >> No, you can't use removeBatchEffect to control for dye bias. >> >> Can you ignore the dye effect? Not in general, but who knows? >> >> Your experiment seems too complex to be properly analysed using RankProd. >> For one thing, it seems clear that you have obtained multiple parts of the >> brain from the same biological replicates, meaning that your samples are >> paired by fish number. >> >> I could explain how to analyse this experiment using limma. However, if >> you are determined that you will use RankProd, it might be best to email >> the authors of that package for advice. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Sun, 9 Sep 2012, Osee Sanogo wrote: >> >>> Dear Gordon, >>> >>> Thank you for getting back to me about my questions. >>> >>> My experiment is trying to identify differentially expressed genes in four >>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>> each brain region for the control and experimental groups in each region, >>> and the comparison is being done within brain region (i.e., T control vs T >>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>> run in two-color Agilent Array. >>> >>> You're right that the design I sent was from the separate channel analysis, >>> in which I attempted to account for array and dye effect, and then run the >>> data in RankProd. Now I know that this is not right. Ok, I will use the >>> single channel analysis in Limma. >>> >>> I still would like to run the two-channel data (ratios) in RankProd, as my >>> previous experience found this useful for my dara (low replicate numbers). >>> >>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>> before running the two-channel data in RankProd? If yes, how should I do >>> this using the RemoveBatch Effect function? >>> 2) I found that about 3% of my probes have dye effect. Can I >>> omit controlling for dye effect without compromising the result of my >>> analysis? >>> >>> The data were loess/scale normalized into an expression set (Data_RP). >>> >>> Here is the design of the experiment >>> >>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>> 6 6T.gpr -1 1 6 5 T NA NA >>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>> 12 6D.gpr 1 -1 6 2 D NA NA >>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>> 18 6C.gpr -1 1 6 1 C NA NA >>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>> 23 5BS.gpr -1 1 5 5 BS NA NA >>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>> >>> Thank you for your help and please let me know if you need further >>> explanation of the experiment. >>> >>> Best regards, >>> >>> Osee >>> >>>> >>> >>> >>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth@wehi.edu.au> wrote: >>> >>>> Dear Osee, >>>> >>>> You are attempting to do a number of things that don't seem correct to me. >>>> >>>> First, you seem to attempting a separate channel analysis of two color >>>> microarray data, but ignoring the pairing of the red and green channels. >>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>> other package designed for independent samples, correctly in this context. >>>> If you must do a separate channel analysis, you would be better off using >>>> the separate channel analysis facilities of the limma package. >>>> >>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>> consists of every array in your data set. Obviously it doesn't make sense >>>> to remove batch effects unless there are at least two batches. >>>> >>>> Third, I don't follow your design matrix. >>>> >>>> It would be better to go back to the start, and describe in more basic >>>> terms what is the nature of your data and what comparison you are trying >>>> to make. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>> From: "Sanogo, Yibayiri O" <sanogo@illinois.edu> >>>>> To: "bioconductor@r-project.org" <bioconductor@r-project.org> >>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>> >>>>> Dear Members of the list, >>>>> >>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>> from another account and I was listed me as non-Member -and Member I am >>>>> since 2008:-)). >>>>> >>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>> modeling that is not in Rank Product. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>> effect prior to running the expression data using Rank Prod? >>>>> >>>>> 2) a) If no, what other function could I use to do this? >>>>> b) If yes, I would like a help with the correct design and how to >>>>> properly indicate the batch. >>>>> >>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>> different areas of the brain): >>>>> >>>>> design1 >>>>> BS C D T >>>>> 1 0 0 0 1 >>>>> 2 0 0 0 -1 >>>>> 3 0 0 0 1 >>>>> 4 0 0 0 -1 >>>>> 5 0 0 0 1 >>>>> 6 0 0 0 -1 >>>>> 7 0 0 -1 0 >>>>> 8 0 0 1 0 >>>>> 9 0 0 -1 0 >>>>> 10 0 0 1 0 >>>>> 11 0 0 -1 0 >>>>> 12 0 0 1 0 >>>>> 13 0 1 0 0 >>>>> 14 0 -1 0 0 >>>>> 15 0 1 0 0 >>>>> 16 0 -1 0 0 >>>>> 17 0 1 0 0 >>>>> 18 0 -1 0 0 >>>>> 19 -1 0 0 0 >>>>> 20 1 0 0 0 >>>>> 21 -1 0 0 0 >>>>> 22 1 0 0 0 >>>>> 23 -1 0 0 0 >>>>> 24 1 0 0 0 >>>>> >>>>> attr(,"assign") >>>>> [1] 1 1 1 1 >>>>> >>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>> >>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>> design=design1) >>>>> >>>>> but it is returning an error message >>>>> " Error in contr.sum(levels(batch)) : >>>>> not enough degrees of freedom to define contrasts" >>>>> >>>>> Please help me correct this code. >>>>> >>>>> Thank you so much for your help. >>>>> >>>>> Osee >>>>> >>>>> -- -- -- >>>>> Y. Osee Sanogo >>>>> Integrative Biology >>>>> Institute for Genomic Biology >>>>> University of Illinois at Urbana >>>>> 505 S. Goodwin Ave >>>>> Urbana, IL-61801 >>>>> >>>>> Tel: 217-333 2308 (Office) >>>>> 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:9}}
ADD REPLY
0
Entering edit mode
Dear Osee, Yes, your code is correct. I forgot to add the intraspotCorrelation estimation. Best wishes Gordon On Thu, 13 Sep 2012, Sanogo, Yibayiri O wrote: > Dear Gordon, > > In the code below, > > design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) > fit <- lmscFit(MA,design) > fit <- eBayes(fit, trend=TRUE) > > shouldn't we include the intracorrelation coefficient in the single channel model? > Is the following code correct? > > corfit <- intraspotCorrelation(MA, design) > fit <- lmscFit(MA,design, correlation=corfit$consensus) > > Please let me know if this is the right way of doing it. > > Thank you. > > Osee > > > ________________________________________ > From: Gordon K Smyth [smyth at wehi.EDU.AU] > Sent: Tuesday, September 11, 2012 8:13 PM > To: Sanogo, Yibayiri O > Cc: Osee Sanogo; Bioconductor mailing list > Subject: limma separate channel analysis of your data > > Dear Osee, > > Thank you for attaching the revised design of your experiment. I found it > somewhat tricky to analyse, so my recommendations are more involved than I > usually give. I think that you will need to perform a separate channel > analysis, otherwise it will not be possible to make baseline comparisons > between the brain parts. > > I also think that you may need to re-normalize your data using the limma > package. You say that the data has been "loess/scale normalized into an > expression set", but I don't think that scale normalization should be > necessary for Agilent data, and an expression set is not fully suitable > for two colour data. The following code assumes that you have normalized > the data into an MAList object using the limma package. I will call it > 'MA'. I strongly recommend that you use normexp background correction and > loess normalization as described in the limma User's Guide for GenePix > data. > > The first step is to put your targets data into separate channel format: > > targets <- read.delim("Design_Brain.txt") > Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) > Treat <- ifelse(Treat>0,"Exp","Ctrl") > Treat <- factor(Treat) > Brain <- rep(targets$Brain.Part,each=2,len=48) > Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) > Dye <- rep(0:1,len=48) > data.frame(Fish,Brain,Treat,Dye) > > Now you can fit a linear model, correcting for probe-specific dye- bias, > correcting for any differences between the fish, and computing Treatment > vs Control effects for each brain part: > > design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) > fit <- lmscFit(MA,design) > fit <- eBayes(fit, trend=TRUE) > > Now you can extract genes that have a treatment effect. To find genes > that have a treatment vs control effect in telencephalon, you would use: > > topTable(fit, coef="BrainT:TreatExp") > > For genes DE for treatment vs control in brain stem, it would be > > topTable(fit, coef="BrainBS:TreatExp") > > And so on. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > > On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: > > Dear Gordon, > > Attached please find the revised design of my experiment. > > Thank you for your willingness to help me analyze these data. > > Best regards, > > Osee > ________________________________________ > From: Sanogo, Yibayiri O > Sent: Monday, September 10, 2012 9:51 PM > To: Gordon K Smyth; Osee Sanogo > Cc: Bioconductor mailing list > Subject: RE: Advice with RemoveBatchEffect and Rank Product package > > Dear Gordon, > > Please ignore the last two columns (weight and length) of the design for > now: I think they are screwed up due to some kind of sorting. I will have > to double check them (look in the notebook which is not here now). Only > different brain regions (T, D, C, BS) belonging to the same fish should > have same weight, length. > > The other columns are: > > Fish.Number= corresponds to fish ID, the individual fish (biological > replicates) Slide= the Agilent 4X44k slides on each the arrays were > located Brain.Part= the different brain regions that were dissected and > hybridized (within region). They were T (telencephalon), D (diencephalon), > C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control > and "1" is experimental. > > FileName Cy3 Cy5 Fish Number Slide Brain Part > 1T.gpr 1 -1 1 2 T > 2T.gpr -1 1 2 1 T > 3T.gpr 1 -1 3 4 T > 4T.gpr -1 1 4 3 T > 5T.gpr 1 -1 5 6 T > 6T.gpr -1 1 6 5 T > 1D.gpr -1 1 1 5 D > 2D.gpr 1 -1 2 4 D > 3D.gpr -1 1 3 1 D > 4D.gpr 1 -1 4 6 D > 5D.gpr -1 1 5 3 D > 6D.gpr 1 -1 6 2 D > 1C.gpr 1 -1 1 4 C > 2C.gpr -1 1 2 3 C > 3C.gpr 1 -1 3 6 C > 4C.gpr -1 1 4 5 C > 5C.gpr 1 -1 5 2 C > 6C.gpr -1 1 6 1 C > 1BS.gpr -1 1 1 1 BS > 2BS.gpr 1 -1 2 2 BS > 3BS.gpr -1 1 3 3 BS > 4BS.gpr 1 -1 4 4 BS > 6BS.gpr 1 -1 6 6 BS > > Thank you for your willingness to help. I will stay up a bit so that I can > respond faster to your questions given the time difference between us. > > Thank you so much. > > Osee > > ________________________________________ > From: Gordon K Smyth [smyth at wehi.EDU.AU] > Sent: Monday, September 10, 2012 6:34 PM > To: Osee Sanogo > Cc: Sanogo, Yibayiri O; Bioconductor mailing list > Subject: Re: Advice with RemoveBatchEffect and Rank Product package > > Dear Osee, > > Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight > and Length mean in your experiment? If the first six arrays are from > biologically independent samples, why do they have the same weight and > length? > > Best wishes > Gordon > > > On Mon, 10 Sep 2012, Osee Sanogo wrote: > >> Dear Gordon, >> >> From what you said, it seems that I am oversimplifying my experiment by >> attempting to analyze it with RankProd, which doesn't offer the option for >> complex modeling. >> >> Could you please explain to me how I could analyze the experiment using >> Limma? >> Please let me know if you'd like me to provide further details of the >> experiment. >> >> Thank you so much. >> >> Osee >> >> >> On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Osee, >>> >>> No, you can't use removeBatchEffect to control for dye bias. >>> >>> Can you ignore the dye effect? Not in general, but who knows? >>> >>> Your experiment seems too complex to be properly analysed using RankProd. >>> For one thing, it seems clear that you have obtained multiple parts of the >>> brain from the same biological replicates, meaning that your samples are >>> paired by fish number. >>> >>> I could explain how to analyse this experiment using limma. However, if >>> you are determined that you will use RankProd, it might be best to email >>> the authors of that package for advice. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> http://www.statsci.org/smyth >>> >>> On Sun, 9 Sep 2012, Osee Sanogo wrote: >>> >>>> Dear Gordon, >>>> >>>> Thank you for getting back to me about my questions. >>>> >>>> My experiment is trying to identify differentially expressed genes in four >>>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>>> each brain region for the control and experimental groups in each region, >>>> and the comparison is being done within brain region (i.e., T control vs T >>>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>>> run in two-color Agilent Array. >>>> >>>> You're right that the design I sent was from the separate channel analysis, >>>> in which I attempted to account for array and dye effect, and then run the >>>> data in RankProd. Now I know that this is not right. Ok, I will use the >>>> single channel analysis in Limma. >>>> >>>> I still would like to run the two-channel data (ratios) in RankProd, as my >>>> previous experience found this useful for my dara (low replicate numbers). >>>> >>>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>>> before running the two-channel data in RankProd? If yes, how should I do >>>> this using the RemoveBatch Effect function? >>>> 2) I found that about 3% of my probes have dye effect. Can I >>>> omit controlling for dye effect without compromising the result of my >>>> analysis? >>>> >>>> The data were loess/scale normalized into an expression set (Data_RP). >>>> >>>> Here is the design of the experiment >>>> >>>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>>> 6 6T.gpr -1 1 6 5 T NA NA >>>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>>> 12 6D.gpr 1 -1 6 2 D NA NA >>>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>>> 18 6C.gpr -1 1 6 1 C NA NA >>>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>>> 23 5BS.gpr -1 1 5 5 BS NA NA >>>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>>> >>>> Thank you for your help and please let me know if you need further >>>> explanation of the experiment. >>>> >>>> Best regards, >>>> >>>> Osee >>>> >>>>> >>>> >>>> >>>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>>> >>>>> Dear Osee, >>>>> >>>>> You are attempting to do a number of things that don't seem correct to me. >>>>> >>>>> First, you seem to attempting a separate channel analysis of two color >>>>> microarray data, but ignoring the pairing of the red and green channels. >>>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>>> other package designed for independent samples, correctly in this context. >>>>> If you must do a separate channel analysis, you would be better off using >>>>> the separate channel analysis facilities of the limma package. >>>>> >>>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>>> consists of every array in your data set. Obviously it doesn't make sense >>>>> to remove batch effects unless there are at least two batches. >>>>> >>>>> Third, I don't follow your design matrix. >>>>> >>>>> It would be better to go back to the start, and describe in more basic >>>>> terms what is the nature of your data and what comparison you are trying >>>>> to make. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>>> >>>>>> Dear Members of the list, >>>>>> >>>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>>> from another account and I was listed me as non-Member -and Member I am >>>>>> since 2008:-)). >>>>>> >>>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>>> modeling that is not in Rank Product. >>>>>> >>>>>> I have two questions: >>>>>> >>>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>>> effect prior to running the expression data using Rank Prod? >>>>>> >>>>>> 2) a) If no, what other function could I use to do this? >>>>>> b) If yes, I would like a help with the correct design and how to >>>>>> properly indicate the batch. >>>>>> >>>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>>> different areas of the brain): >>>>>> >>>>>> design1 >>>>>> BS C D T >>>>>> 1 0 0 0 1 >>>>>> 2 0 0 0 -1 >>>>>> 3 0 0 0 1 >>>>>> 4 0 0 0 -1 >>>>>> 5 0 0 0 1 >>>>>> 6 0 0 0 -1 >>>>>> 7 0 0 -1 0 >>>>>> 8 0 0 1 0 >>>>>> 9 0 0 -1 0 >>>>>> 10 0 0 1 0 >>>>>> 11 0 0 -1 0 >>>>>> 12 0 0 1 0 >>>>>> 13 0 1 0 0 >>>>>> 14 0 -1 0 0 >>>>>> 15 0 1 0 0 >>>>>> 16 0 -1 0 0 >>>>>> 17 0 1 0 0 >>>>>> 18 0 -1 0 0 >>>>>> 19 -1 0 0 0 >>>>>> 20 1 0 0 0 >>>>>> 21 -1 0 0 0 >>>>>> 22 1 0 0 0 >>>>>> 23 -1 0 0 0 >>>>>> 24 1 0 0 0 >>>>>> >>>>>> attr(,"assign") >>>>>> [1] 1 1 1 1 >>>>>> >>>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>>> >>>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>>> design=design1) >>>>>> >>>>>> but it is returning an error message >>>>>> " Error in contr.sum(levels(batch)) : >>>>>> not enough degrees of freedom to define contrasts" >>>>>> >>>>>> Please help me correct this code. >>>>>> >>>>>> Thank you so much for your help. >>>>>> >>>>>> Osee >>>>>> >>>>>> -- -- -- >>>>>> Y. Osee Sanogo >>>>>> Integrative Biology >>>>>> Institute for Genomic Biology >>>>>> University of Illinois at Urbana >>>>>> 505 S. Goodwin Ave >>>>>> Urbana, IL-61801 >>>>>> >>>>>> Tel: 217-333 2308 (Office) >>>>>> 217-417 9593 (Cell) > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
Dear Gordon, Again, thank you so much for your assistance. This will get me going. Best regards, Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Thursday, September 13, 2012 5:57 PM To: Sanogo, Yibayiri O Cc: Osee Sanogo; Bioconductor mailing list Subject: RE: limma separate channel analysis of your data Dear Osee, Yes, your code is correct. I forgot to add the intraspotCorrelation estimation. Best wishes Gordon On Thu, 13 Sep 2012, Sanogo, Yibayiri O wrote: > Dear Gordon, > > In the code below, > > design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) > fit <- lmscFit(MA,design) > fit <- eBayes(fit, trend=TRUE) > > shouldn't we include the intracorrelation coefficient in the single channel model? > Is the following code correct? > > corfit <- intraspotCorrelation(MA, design) > fit <- lmscFit(MA,design, correlation=corfit$consensus) > > Please let me know if this is the right way of doing it. > > Thank you. > > Osee > > > ________________________________________ > From: Gordon K Smyth [smyth at wehi.EDU.AU] > Sent: Tuesday, September 11, 2012 8:13 PM > To: Sanogo, Yibayiri O > Cc: Osee Sanogo; Bioconductor mailing list > Subject: limma separate channel analysis of your data > > Dear Osee, > > Thank you for attaching the revised design of your experiment. I found it > somewhat tricky to analyse, so my recommendations are more involved than I > usually give. I think that you will need to perform a separate channel > analysis, otherwise it will not be possible to make baseline comparisons > between the brain parts. > > I also think that you may need to re-normalize your data using the limma > package. You say that the data has been "loess/scale normalized into an > expression set", but I don't think that scale normalization should be > necessary for Agilent data, and an expression set is not fully suitable > for two colour data. The following code assumes that you have normalized > the data into an MAList object using the limma package. I will call it > 'MA'. I strongly recommend that you use normexp background correction and > loess normalization as described in the limma User's Guide for GenePix > data. > > The first step is to put your targets data into separate channel format: > > targets <- read.delim("Design_Brain.txt") > Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) > Treat <- ifelse(Treat>0,"Exp","Ctrl") > Treat <- factor(Treat) > Brain <- rep(targets$Brain.Part,each=2,len=48) > Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) > Dye <- rep(0:1,len=48) > data.frame(Fish,Brain,Treat,Dye) > > Now you can fit a linear model, correcting for probe-specific dye- bias, > correcting for any differences between the fish, and computing Treatment > vs Control effects for each brain part: > > design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) > fit <- lmscFit(MA,design) > fit <- eBayes(fit, trend=TRUE) > > Now you can extract genes that have a treatment effect. To find genes > that have a treatment vs control effect in telencephalon, you would use: > > topTable(fit, coef="BrainT:TreatExp") > > For genes DE for treatment vs control in brain stem, it would be > > topTable(fit, coef="BrainBS:TreatExp") > > And so on. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > > On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: > > Dear Gordon, > > Attached please find the revised design of my experiment. > > Thank you for your willingness to help me analyze these data. > > Best regards, > > Osee > ________________________________________ > From: Sanogo, Yibayiri O > Sent: Monday, September 10, 2012 9:51 PM > To: Gordon K Smyth; Osee Sanogo > Cc: Bioconductor mailing list > Subject: RE: Advice with RemoveBatchEffect and Rank Product package > > Dear Gordon, > > Please ignore the last two columns (weight and length) of the design for > now: I think they are screwed up due to some kind of sorting. I will have > to double check them (look in the notebook which is not here now). Only > different brain regions (T, D, C, BS) belonging to the same fish should > have same weight, length. > > The other columns are: > > Fish.Number= corresponds to fish ID, the individual fish (biological > replicates) Slide= the Agilent 4X44k slides on each the arrays were > located Brain.Part= the different brain regions that were dissected and > hybridized (within region). They were T (telencephalon), D (diencephalon), > C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control > and "1" is experimental. > > FileName Cy3 Cy5 Fish Number Slide Brain Part > 1T.gpr 1 -1 1 2 T > 2T.gpr -1 1 2 1 T > 3T.gpr 1 -1 3 4 T > 4T.gpr -1 1 4 3 T > 5T.gpr 1 -1 5 6 T > 6T.gpr -1 1 6 5 T > 1D.gpr -1 1 1 5 D > 2D.gpr 1 -1 2 4 D > 3D.gpr -1 1 3 1 D > 4D.gpr 1 -1 4 6 D > 5D.gpr -1 1 5 3 D > 6D.gpr 1 -1 6 2 D > 1C.gpr 1 -1 1 4 C > 2C.gpr -1 1 2 3 C > 3C.gpr 1 -1 3 6 C > 4C.gpr -1 1 4 5 C > 5C.gpr 1 -1 5 2 C > 6C.gpr -1 1 6 1 C > 1BS.gpr -1 1 1 1 BS > 2BS.gpr 1 -1 2 2 BS > 3BS.gpr -1 1 3 3 BS > 4BS.gpr 1 -1 4 4 BS > 6BS.gpr 1 -1 6 6 BS > > Thank you for your willingness to help. I will stay up a bit so that I can > respond faster to your questions given the time difference between us. > > Thank you so much. > > Osee > > ________________________________________ > From: Gordon K Smyth [smyth at wehi.EDU.AU] > Sent: Monday, September 10, 2012 6:34 PM > To: Osee Sanogo > Cc: Sanogo, Yibayiri O; Bioconductor mailing list > Subject: Re: Advice with RemoveBatchEffect and Rank Product package > > Dear Osee, > > Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight > and Length mean in your experiment? If the first six arrays are from > biologically independent samples, why do they have the same weight and > length? > > Best wishes > Gordon > > > On Mon, 10 Sep 2012, Osee Sanogo wrote: > >> Dear Gordon, >> >> From what you said, it seems that I am oversimplifying my experiment by >> attempting to analyze it with RankProd, which doesn't offer the option for >> complex modeling. >> >> Could you please explain to me how I could analyze the experiment using >> Limma? >> Please let me know if you'd like me to provide further details of the >> experiment. >> >> Thank you so much. >> >> Osee >> >> >> On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Osee, >>> >>> No, you can't use removeBatchEffect to control for dye bias. >>> >>> Can you ignore the dye effect? Not in general, but who knows? >>> >>> Your experiment seems too complex to be properly analysed using RankProd. >>> For one thing, it seems clear that you have obtained multiple parts of the >>> brain from the same biological replicates, meaning that your samples are >>> paired by fish number. >>> >>> I could explain how to analyse this experiment using limma. However, if >>> you are determined that you will use RankProd, it might be best to email >>> the authors of that package for advice. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> http://www.statsci.org/smyth >>> >>> On Sun, 9 Sep 2012, Osee Sanogo wrote: >>> >>>> Dear Gordon, >>>> >>>> Thank you for getting back to me about my questions. >>>> >>>> My experiment is trying to identify differentially expressed genes in four >>>> regions of the brain in response to a stressor. I have 6 biol. replicates in >>>> each brain region for the control and experimental groups in each region, >>>> and the comparison is being done within brain region (i.e., T control vs T >>>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were >>>> run in two-color Agilent Array. >>>> >>>> You're right that the design I sent was from the separate channel analysis, >>>> in which I attempted to account for array and dye effect, and then run the >>>> data in RankProd. Now I know that this is not right. Ok, I will use the >>>> single channel analysis in Limma. >>>> >>>> I still would like to run the two-channel data (ratios) in RankProd, as my >>>> previous experience found this useful for my dara (low replicate numbers). >>>> >>>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias >>>> before running the two-channel data in RankProd? If yes, how should I do >>>> this using the RemoveBatch Effect function? >>>> 2) I found that about 3% of my probes have dye effect. Can I >>>> omit controlling for dye effect without compromising the result of my >>>> analysis? >>>> >>>> The data were loess/scale normalized into an expression set (Data_RP). >>>> >>>> Here is the design of the experiment >>>> >>>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length >>>> 1 1T.gpr 1 -1 1 2 T 39 0.63 >>>> 2 2T.gpr -1 1 2 1 T 39 0.63 >>>> 3 3T.gpr 1 -1 3 4 T 39 0.63 >>>> 4 4T.gpr -1 1 4 3 T 39 0.63 >>>> 5 5T.gpr 1 -1 5 6 T 39 0.63 >>>> 6 6T.gpr -1 1 6 5 T NA NA >>>> 7 1D.gpr -1 1 1 5 D 47 1.21 >>>> 8 2D.gpr 1 -1 2 4 D 47 1.21 >>>> 9 3D.gpr -1 1 3 1 D 47 1.21 >>>> 10 4D.gpr 1 -1 4 6 D 47 1.21 >>>> 11 5D.gpr -1 1 5 3 D 47 1.21 >>>> 12 6D.gpr 1 -1 6 2 D NA NA >>>> 13 1C.gpr 1 -1 1 4 C 47 1.31 >>>> 14 2C.gpr -1 1 2 3 C 47 1.31 >>>> 15 3C.gpr 1 -1 3 6 C 47 1.31 >>>> 16 4C.gpr -1 1 4 5 C 47 1.31 >>>> 17 5C.gpr 1 -1 5 2 C 47 1.31 >>>> 18 6C.gpr -1 1 6 1 C NA NA >>>> 19 1BS.gpr -1 1 1 1 BS 89 1.44 >>>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44 >>>> 21 3BS.gpr -1 1 3 3 BS 89 1.44 >>>> 22 4BS.gpr 1 -1 4 4 BS NA NA >>>> 23 5BS.gpr -1 1 5 5 BS NA NA >>>> 24 6BS.gpr 1 -1 6 6 BS NA NA >>>> >>>> Thank you for your help and please let me know if you need further >>>> explanation of the experiment. >>>> >>>> Best regards, >>>> >>>> Osee >>>> >>>>> >>>> >>>> >>>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>>> >>>>> Dear Osee, >>>>> >>>>> You are attempting to do a number of things that don't seem correct to me. >>>>> >>>>> First, you seem to attempting a separate channel analysis of two color >>>>> microarray data, but ignoring the pairing of the red and green channels. >>>>> It isn't correct to do this. I don't see any way to use RankProd, or any >>>>> other package designed for independent samples, correctly in this context. >>>>> If you must do a separate channel analysis, you would be better off using >>>>> the separate channel analysis facilities of the limma package. >>>>> >>>>> Second, when you set batch=rep(1,24), you are defining a batch that >>>>> consists of every array in your data set. Obviously it doesn't make sense >>>>> to remove batch effects unless there are at least two batches. >>>>> >>>>> Third, I don't follow your design matrix. >>>>> >>>>> It would be better to go back to the start, and describe in more basic >>>>> terms what is the nature of your data and what comparison you are trying >>>>> to make. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>>>>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>>>>> >>>>>> Dear Members of the list, >>>>>> >>>>>> (I apologize for posting this again -I sent it earlier to the list but >>>>>> from another account and I was listed me as non-Member -and Member I am >>>>>> since 2008:-)). >>>>>> >>>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>>>>> would like to remove the dye effect prior to analysis. I read on the forum >>>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>>>>> modeling that is not in Rank Product. >>>>>> >>>>>> I have two questions: >>>>>> >>>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>>>>> effect prior to running the expression data using Rank Prod? >>>>>> >>>>>> 2) a) If no, what other function could I use to do this? >>>>>> b) If yes, I would like a help with the correct design and how to >>>>>> properly indicate the batch. >>>>>> >>>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>>>>> different areas of the brain): >>>>>> >>>>>> design1 >>>>>> BS C D T >>>>>> 1 0 0 0 1 >>>>>> 2 0 0 0 -1 >>>>>> 3 0 0 0 1 >>>>>> 4 0 0 0 -1 >>>>>> 5 0 0 0 1 >>>>>> 6 0 0 0 -1 >>>>>> 7 0 0 -1 0 >>>>>> 8 0 0 1 0 >>>>>> 9 0 0 -1 0 >>>>>> 10 0 0 1 0 >>>>>> 11 0 0 -1 0 >>>>>> 12 0 0 1 0 >>>>>> 13 0 1 0 0 >>>>>> 14 0 -1 0 0 >>>>>> 15 0 1 0 0 >>>>>> 16 0 -1 0 0 >>>>>> 17 0 1 0 0 >>>>>> 18 0 -1 0 0 >>>>>> 19 -1 0 0 0 >>>>>> 20 1 0 0 0 >>>>>> 21 -1 0 0 0 >>>>>> 22 1 0 0 0 >>>>>> 23 -1 0 0 0 >>>>>> 24 1 0 0 0 >>>>>> >>>>>> attr(,"assign") >>>>>> [1] 1 1 1 1 >>>>>> >>>>>> I've tried this (Data_RP are my data, the M values of the expression set): >>>>>> >>>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>>>>> design=design1) >>>>>> >>>>>> but it is returning an error message >>>>>> " Error in contr.sum(levels(batch)) : >>>>>> not enough degrees of freedom to define contrasts" >>>>>> >>>>>> Please help me correct this code. >>>>>> >>>>>> Thank you so much for your help. >>>>>> >>>>>> Osee >>>>>> >>>>>> -- -- -- >>>>>> Y. Osee Sanogo >>>>>> Integrative Biology >>>>>> Institute for Genomic Biology >>>>>> University of Illinois at Urbana >>>>>> 505 S. Goodwin Ave >>>>>> Urbana, IL-61801 >>>>>> >>>>>> Tel: 217-333 2308 (Office) >>>>>> 217-417 9593 (Cell) > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 3 minutes ago
WEHI, Melbourne, Australia
Dear Osee, You are attempting to do a number of things that don't seem correct to me. First, you seem to attempting a separate channel analysis of two color microarray data, but ignoring the pairing of the red and green channels. It isn't correct to do this. I don't see any way to use RankProd, or any other package designed for independent samples, correctly in this context. If you must do a separate channel analysis, you would be better off using the separate channel analysis facilities of the limma package. Second, when you set batch=rep(1,24), you are defining a batch that consists of every array in your data set. Obviously it doesn't make sense to remove batch effects unless there are at least two batches. Third, I don't follow your design matrix. It would be better to go back to the start, and describe in more basic terms what is the nature of your data and what comparison you are trying to make. Best wishes Gordon > Date: Sat, 8 Sep 2012 11:40:45 +0000 > From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package > > Dear Members of the list, > > (I apologize for posting this again -I sent it earlier to the list but > from another account and I was listed me as non-Member -and Member I am > since 2008:-)). > > I have been using Rank Prod to analyze Agilent two-color data. However, I > would like to remove the dye effect prior to analysis. I read on the forum > that RemoveBatchEffect should be used in the Limma linear model, a type of > modeling that is not in Rank Product. > > I have two questions: > > 1) Would it be appropriate to use RemoveBatchEffect to correct for dye > effect prior to running the expression data using Rank Prod? > > 2) a) If no, what other function could I use to do this? > b) If yes, I would like a help with the correct design and how to > properly indicate the batch. > > Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are > different areas of the brain): > > design1 > BS C D T > 1 0 0 0 1 > 2 0 0 0 -1 > 3 0 0 0 1 > 4 0 0 0 -1 > 5 0 0 0 1 > 6 0 0 0 -1 > 7 0 0 -1 0 > 8 0 0 1 0 > 9 0 0 -1 0 > 10 0 0 1 0 > 11 0 0 -1 0 > 12 0 0 1 0 > 13 0 1 0 0 > 14 0 -1 0 0 > 15 0 1 0 0 > 16 0 -1 0 0 > 17 0 1 0 0 > 18 0 -1 0 0 > 19 -1 0 0 0 > 20 1 0 0 0 > 21 -1 0 0 0 > 22 1 0 0 0 > 23 -1 0 0 0 > 24 1 0 0 0 > > attr(,"assign") > [1] 1 1 1 1 > > I've tried this (Data_RP are my data, the M values of the expression set): > > DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, > design=design1) > > but it is returning an error message > " Error in contr.sum(levels(batch)) : > not enough degrees of freedom to define contrasts" > > Please help me correct this code. > > Thank you so much for your help. > > Osee > > -- -- -- > Y. Osee Sanogo > Integrative Biology > Institute for Genomic Biology > University of Illinois at Urbana > 505 S. Goodwin Ave > Urbana, IL-61801 > > Tel: 217-333 2308 (Office) > 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 minutes ago
WEHI, Melbourne, Australia
Dear Osee, No, you can't use removeBatchEffect to control for dye bias. Can you ignore the dye effect? Not in general, but who knows? Your experiment seems too complex to be properly analysed using RankProd. For one thing, it seems clear that you have obtained multiple parts of the brain from the same biological replicates, meaning that your samples are paired by fish number. I could explain how to analyse this experiment using limma. However, if you are determined that you will use RankProd, it might be best to email the authors of that package for advice. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Sun, 9 Sep 2012, Osee Sanogo wrote: > Dear Gordon, > > Thank you for getting back to me about my questions. > > My experiment is trying to identify differentially expressed genes in four > regions of the brain in response to a stressor. I have 6 biol. replicates in > each brain region for the control and experimental groups in each region, > and the comparison is being done within brain region (i.e., T control vs T > exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were > run in two-color Agilent Array. > > You're right that the design I sent was from the separate channel analysis, > in which I attempted to account for array and dye effect, and then run the > data in RankProd. Now I know that this is not right. Ok, I will use the > single channel analysis in Limma. > > I still would like to run the two-channel data (ratios) in RankProd, as my > previous experience found this useful for my dara (low replicate numbers). > > My questions: 1) Could I use RemoveBatchEffect to control for dye bias > before running the two-channel data in RankProd? If yes, how should I do > this using the RemoveBatch Effect function? > 2) I found that about 3% of my probes have dye effect. Can I > omit controlling for dye effect without compromising the result of my > analysis? > > The data were loess/scale normalized into an expression set (Data_RP). > > Here is the design of the experiment > > FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length > 1 1T.gpr 1 -1 1 2 T 39 0.63 > 2 2T.gpr -1 1 2 1 T 39 0.63 > 3 3T.gpr 1 -1 3 4 T 39 0.63 > 4 4T.gpr -1 1 4 3 T 39 0.63 > 5 5T.gpr 1 -1 5 6 T 39 0.63 > 6 6T.gpr -1 1 6 5 T NA NA > 7 1D.gpr -1 1 1 5 D 47 1.21 > 8 2D.gpr 1 -1 2 4 D 47 1.21 > 9 3D.gpr -1 1 3 1 D 47 1.21 > 10 4D.gpr 1 -1 4 6 D 47 1.21 > 11 5D.gpr -1 1 5 3 D 47 1.21 > 12 6D.gpr 1 -1 6 2 D NA NA > 13 1C.gpr 1 -1 1 4 C 47 1.31 > 14 2C.gpr -1 1 2 3 C 47 1.31 > 15 3C.gpr 1 -1 3 6 C 47 1.31 > 16 4C.gpr -1 1 4 5 C 47 1.31 > 17 5C.gpr 1 -1 5 2 C 47 1.31 > 18 6C.gpr -1 1 6 1 C NA NA > 19 1BS.gpr -1 1 1 1 BS 89 1.44 > 20 2BS.gpr 1 -1 2 2 BS 89 1.44 > 21 3BS.gpr -1 1 3 3 BS 89 1.44 > 22 4BS.gpr 1 -1 4 4 BS NA NA > 23 5BS.gpr -1 1 5 5 BS NA NA > 24 6BS.gpr 1 -1 6 6 BS NA NA > > Thank you for your help and please let me know if you need further > explanation of the experiment. > > Best regards, > > Osee > >> > > > On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> You are attempting to do a number of things that don't seem correct to me. >> >> First, you seem to attempting a separate channel analysis of two color >> microarray data, but ignoring the pairing of the red and green channels. >> It isn't correct to do this. I don't see any way to use RankProd, or any >> other package designed for independent samples, correctly in this context. >> If you must do a separate channel analysis, you would be better off using >> the separate channel analysis facilities of the limma package. >> >> Second, when you set batch=rep(1,24), you are defining a batch that >> consists of every array in your data set. Obviously it doesn't make sense >> to remove batch effects unless there are at least two batches. >> >> Third, I don't follow your design matrix. >> >> It would be better to go back to the start, and describe in more basic >> terms what is the nature of your data and what comparison you are trying >> to make. >> >> Best wishes >> Gordon >> >>> Date: Sat, 8 Sep 2012 11:40:45 +0000 >>> From: "Sanogo, Yibayiri O" <sanogo at="" illinois.edu=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package >>> >>> Dear Members of the list, >>> >>> (I apologize for posting this again -I sent it earlier to the list but >>> from another account and I was listed me as non-Member -and Member I am >>> since 2008:-)). >>> >>> I have been using Rank Prod to analyze Agilent two-color data. However, I >>> would like to remove the dye effect prior to analysis. I read on the forum >>> that RemoveBatchEffect should be used in the Limma linear model, a type of >>> modeling that is not in Rank Product. >>> >>> I have two questions: >>> >>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye >>> effect prior to running the expression data using Rank Prod? >>> >>> 2) a) If no, what other function could I use to do this? >>> b) If yes, I would like a help with the correct design and how to >>> properly indicate the batch. >>> >>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are >>> different areas of the brain): >>> >>> design1 >>> BS C D T >>> 1 0 0 0 1 >>> 2 0 0 0 -1 >>> 3 0 0 0 1 >>> 4 0 0 0 -1 >>> 5 0 0 0 1 >>> 6 0 0 0 -1 >>> 7 0 0 -1 0 >>> 8 0 0 1 0 >>> 9 0 0 -1 0 >>> 10 0 0 1 0 >>> 11 0 0 -1 0 >>> 12 0 0 1 0 >>> 13 0 1 0 0 >>> 14 0 -1 0 0 >>> 15 0 1 0 0 >>> 16 0 -1 0 0 >>> 17 0 1 0 0 >>> 18 0 -1 0 0 >>> 19 -1 0 0 0 >>> 20 1 0 0 0 >>> 21 -1 0 0 0 >>> 22 1 0 0 0 >>> 23 -1 0 0 0 >>> 24 1 0 0 0 >>> >>> attr(,"assign") >>> [1] 1 1 1 1 >>> >>> I've tried this (Data_RP are my data, the M values of the expression set): >>> >>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, >>> design=design1) >>> >>> but it is returning an error message >>> " Error in contr.sum(levels(batch)) : >>> not enough degrees of freedom to define contrasts" >>> >>> Please help me correct this code. >>> >>> Thank you so much for your help. >>> >>> Osee >>> >>> -- -- -- >>> Y. Osee Sanogo >>> Integrative Biology >>> Institute for Genomic Biology >>> University of Illinois at Urbana >>> 505 S. Goodwin Ave >>> Urbana, IL-61801 >>> >>> Tel: 217-333 2308 (Office) >>> 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6