Multifactorial edgeR GLM design question (contrast that I should make)
1
0
Entering edit mode
Zhihao Tan ▴ 20
@zhihao-tan-6364
Last seen 11.4 years ago
Hi there, I have a question on whether some of the contrasts I am making in a multifactorial experiment should actually be made. I don't have a strong grasp of GLMs, so I might be missing something conceptually, and am hoping someone can advise. I am basically looking for genes that are differentially expressed in a certain phenotypic state (e.g. fluffy vs. smooth), but have set it up with 2 time-points (Day 2 and Day 5). I have trouble setting up the design using an equation (columns seem to disappear) so have gone ahead and created the design matrix using the method in 3.3.1 of the manual (pasting factors together). The design looks like this (I have removed replicates and many samples to simplify): Day2.Fluffy Day2.Smooth Day5.Fluffy Day5.Smooth 1 0 1 0 0 7 1 0 0 0 13 0 0 0 1 16 0 0 1 0 19 0 0 0 1 35 0 0 1 0 36 0 0 1 0 >From what I understand, the above design is set up for 2 main effects (phenotype and time), and if I reduce it to 1 main effect (phenotype), I get the design below. Fluffy Smooth 1 0 1 7 1 0 13 0 1 16 1 0 19 0 1 35 1 0 36 1 0 The contrast I make in the latter case is basically (Fluffy - Smooth). The contrast that I did for the former case, and this is what I'm unsure of, is ((Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - Day5.Smooth)). These tests are definitely not equivalent, and I get different number of sig. DE genes for both (more for the 2 effect design). In my mind, it makes sense, because the experiment *is *set up with 2 effects, and accounting for the biological variation in your model should allow you to be more powered to detect DE genes. However, I've never seen a contrast like that before. Does it even make sense to have an addition sign in the equation? What does that actually mean? Should I instead make contrasts of (Day2.Fluffy - Day2.Smooth) and (Day5.Fluffy - Day5.Smooth) and get the union or intersect of them? Hope someone can help on this, and thanks in advance! Regards, Zhihao Graduate Student University of Washington [[alternative HTML version deleted]]
• 786 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States
Hi Zhihao, On Tuesday, January 28, 2014 6:56:20 PM, Zhihao Tan wrote: > Hi there, > > I have a question on whether some of the contrasts I am making in a > multifactorial experiment should actually be made. I don't have a strong > grasp of GLMs, so I might be missing something conceptually, and am hoping > someone can advise. > > I am basically looking for genes that are differentially expressed in a > certain phenotypic state (e.g. fluffy vs. smooth), but have set it up with > 2 time-points (Day 2 and Day 5). I have trouble setting up the design using > an equation (columns seem to disappear) so have gone ahead and created the > design matrix using the method in 3.3.1 of the manual (pasting factors > together). The design looks like this (I have removed replicates and many > samples to simplify): > > Day2.Fluffy Day2.Smooth Day5.Fluffy Day5.Smooth > 1 0 1 0 0 > 7 1 0 0 0 > 13 0 0 0 1 > 16 0 0 1 0 > 19 0 0 0 1 > 35 0 0 1 0 > 36 0 0 1 0 > > >From what I understand, the above design is set up for 2 main effects > (phenotype and time), and if I reduce it to 1 main effect (phenotype), I > get the design below. > > Fluffy Smooth > 1 0 1 > 7 1 0 > 13 0 1 > 16 1 0 > 19 0 1 > 35 1 0 > 36 1 0 > > The contrast I make in the latter case is basically (Fluffy - Smooth). The > contrast that I did for the former case, and this is what I'm unsure of, is > ((Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - Day5.Smooth)). These tests > are definitely not equivalent, and I get different number of sig. DE genes > for both (more for the 2 effect design). In my mind, it makes sense, > because the experiment *is *set up with 2 effects, and accounting for the > biological variation in your model should allow you to be more powered to > detect DE genes. However, I've never seen a contrast like that before. Does > it even make sense to have an addition sign in the equation? What does that > actually mean? Should I instead make contrasts of (Day2.Fluffy - > Day2.Smooth) and (Day5.Fluffy - Day5.Smooth) and get the union or intersect > of them? The contrast you are using doesn't really make sense, because a contrast is usually testing the difference between groups, so you subtract rather than sum. If you were to use (Day2.Fluffy - Day2.Smooth) - (Day5.Fluffy - Day5.Smooth) then you would be testing the interaction of time and phenotype. In other words the interaction looks for genes that are different between fluffy and smooth, depending on the day. So if you think the fluffiness of your samples is dependent on time, that is what you would likely want to test. Best, Jim > > Hope someone can help on this, and thanks in advance! > > Regards, > Zhihao > Graduate Student > University of Washington > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi Jim, Thanks for your reply. Alright, I'm guessing that contrast really doesn't make sense, and actually going back to look at the additional genes that the (Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - Day5.Smooth) brought up, it seems like it might mostly be noise, or day specific effects. (Day2.Fluffy - Day2.Smooth) - (Day5.Fluffy - Day5.Smooth) would indeed be an interesting set of genes to look at, and I will definitely try that. I do want to try and understand the best (whatever best means...) way to get at the genes that are DE between fluffy and smooth though. I can briefly think of three ways: - Lump all the samples, build a model with 1 effect (Phenotype), and get a set of DE genes (Fluffy - Smooth). But that seems to be averaging out the values, and not supplying information to the model about a source of biological variation (Time) that we know about. - Lump all the samples, build a model with 2 effects (Phenotype and Time), test (Day2.Fluffy - Day2.Smooth) and (Day5.Fluffy - Day5.Smooth), get the intersection of the DE genes. This should account for biological variation, though normalization of Day2 and Day5 samples together would add a little noise? - Separate samples by day, test (Fluffy - Smooth) for each, get the intersect. And only when I want to test for interaction between effects do I build the 2 effect interaction model. This to me seems the cleanest, but I'm not sure if that makes sense in the world of biostatistics... Hope to get your advice on this... and thanks once again! Cheers, Zhihao On Wed, Jan 29, 2014 at 6:17 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Zhihao, > > > On Tuesday, January 28, 2014 6:56:20 PM, Zhihao Tan wrote: > >> Hi there, >> >> I have a question on whether some of the contrasts I am making in a >> multifactorial experiment should actually be made. I don't have a strong >> grasp of GLMs, so I might be missing something conceptually, and am hoping >> someone can advise. >> >> I am basically looking for genes that are differentially expressed in a >> certain phenotypic state (e.g. fluffy vs. smooth), but have set it up with >> 2 time-points (Day 2 and Day 5). I have trouble setting up the design >> using >> an equation (columns seem to disappear) so have gone ahead and created the >> design matrix using the method in 3.3.1 of the manual (pasting factors >> together). The design looks like this (I have removed replicates and many >> samples to simplify): >> >> Day2.Fluffy Day2.Smooth Day5.Fluffy Day5.Smooth >> 1 0 1 0 0 >> 7 1 0 0 0 >> 13 0 0 0 1 >> 16 0 0 1 0 >> 19 0 0 0 1 >> 35 0 0 1 0 >> 36 0 0 1 0 >> >> >From what I understand, the above design is set up for 2 main effects >> (phenotype and time), and if I reduce it to 1 main effect (phenotype), I >> get the design below. >> >> Fluffy Smooth >> 1 0 1 >> 7 1 0 >> 13 0 1 >> 16 1 0 >> 19 0 1 >> 35 1 0 >> 36 1 0 >> >> The contrast I make in the latter case is basically (Fluffy - Smooth). The >> contrast that I did for the former case, and this is what I'm unsure of, >> is >> ((Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - Day5.Smooth)). These tests >> are definitely not equivalent, and I get different number of sig. DE genes >> for both (more for the 2 effect design). In my mind, it makes sense, >> because the experiment *is *set up with 2 effects, and accounting for the >> >> biological variation in your model should allow you to be more powered to >> detect DE genes. However, I've never seen a contrast like that before. >> Does >> it even make sense to have an addition sign in the equation? What does >> that >> actually mean? Should I instead make contrasts of (Day2.Fluffy - >> Day2.Smooth) and (Day5.Fluffy - Day5.Smooth) and get the union or >> intersect >> of them? >> > > The contrast you are using doesn't really make sense, because a contrast > is usually testing the difference between groups, so you subtract rather > than sum. If you were to use > > (Day2.Fluffy - Day2.Smooth) - (Day5.Fluffy - Day5.Smooth) > > then you would be testing the interaction of time and phenotype. In other > words the interaction looks for genes that are different between fluffy and > smooth, depending on the day. So if you think the fluffiness of your > samples is dependent on time, that is what you would likely want to test. > > Best, > > Jim > > > >> Hope someone can help on this, and thanks in advance! >> >> Regards, >> Zhihao >> Graduate Student >> University of Washington >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 1647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6