Multifactorial edgeR GLM design question (contrast that I should make)
0
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States
Hi Zhihao, On 1/29/2014 12:07 PM, Zhihao Tan wrote: > Hi Jim, > > Thanks for your reply. Alright, I'm guessing that contrast really > doesn't make sense, and actually going back to look at the additional > genes that the (Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - > Day5.Smooth) brought up, it seems like it might mostly be noise, or > day specific effects. > > (Day2.Fluffy - Day2.Smooth) - (Day5.Fluffy - Day5.Smooth) would indeed > be an interesting set of genes to look at, and I will definitely try that. > > I do want to try and understand the best (whatever best means...) way > to get at the genes that are DE between fluffy and smooth though. I > can briefly think of three ways: > - Lump all the samples, build a model with 1 effect (Phenotype), and > get a set of DE genes (Fluffy - Smooth). But that seems to be > averaging out the values, and not supplying information to the model > about a source of biological variation (Time) that we know about. > - Lump all the samples, build a model with 2 effects (Phenotype and > Time), test (Day2.Fluffy - Day2.Smooth) and (Day5.Fluffy - > Day5.Smooth), get the intersection of the DE genes. This should > account for biological variation, though normalization of Day2 and > Day5 samples together would add a little noise? > - Separate samples by day, test (Fluffy - Smooth) for each, get the > intersect. And only when I want to test for interaction between > effects do I build the 2 effect interaction model. This to me seems > the cleanest, but I'm not sure if that makes sense in the world of > biostatistics... The conventional way to do this would be to fit a model like design <- model.matrix(~phenotype*time) where phenotype is a factor with the two levels (fluffy and smooth) and time is a factor with the two levels (2 and 5). This will result in a design matrix like this: > model.matrix(~phenotype*time) (Intercept) phenotypeSmooth time5 phenotypeSmooth:time5 1 1 0 0 0 2 1 0 0 0 Where phenotypeSmooth is inherently a contrast comparing Smooth - Fluffy after controlling for time, and time5 is a contrast of day5 - day2 after controlling for phenotype, and phenotypeSmooth:time5 tests the interaction. Note that the second and third coefficient are not interpretable for any genes that have a significant interaction, so you should first look for genes with an interaction, and then look for genes that are different in Smooth - Fluffy only in the set of genes that do not have a significant interaction term. Does that make sense? Best, Jim > > Hope to get your advice on this... and thanks once again! > > Cheers, > Zhihao > > > > > On Wed, Jan 29, 2014 at 6:17 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Zhihao, > > > On Tuesday, January 28, 2014 6:56:20 PM, Zhihao Tan wrote: > > Hi there, > > I have a question on whether some of the contrasts I am making > in a > multifactorial experiment should actually be made. I don't > have a strong > grasp of GLMs, so I might be missing something conceptually, > and am hoping > someone can advise. > > I am basically looking for genes that are differentially > expressed in a > certain phenotypic state (e.g. fluffy vs. smooth), but have > set it up with > 2 time-points (Day 2 and Day 5). I have trouble setting up the > design using > an equation (columns seem to disappear) so have gone ahead and > created the > design matrix using the method in 3.3.1 of the manual (pasting > factors > together). The design looks like this (I have removed > replicates and many > samples to simplify): > > Day2.Fluffy Day2.Smooth Day5.Fluffy Day5.Smooth > 1 0 1 0 0 > 7 1 0 0 0 > 13 0 0 0 1 > 16 0 0 1 0 > 19 0 0 0 1 > 35 0 0 1 0 > 36 0 0 1 0 > > >From what I understand, the above design is set up for 2 main > effects > (phenotype and time), and if I reduce it to 1 main effect > (phenotype), I > get the design below. > > Fluffy Smooth > 1 0 1 > 7 1 0 > 13 0 1 > 16 1 0 > 19 0 1 > 35 1 0 > 36 1 0 > > The contrast I make in the latter case is basically (Fluffy - > Smooth). The > contrast that I did for the former case, and this is what I'm > unsure of, is > ((Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - Day5.Smooth)). > These tests > are definitely not equivalent, and I get different number of > sig. DE genes > for both (more for the 2 effect design). In my mind, it makes > sense, > because the experiment *is *set up with 2 effects, and > accounting for the > > biological variation in your model should allow you to be more > powered to > detect DE genes. However, I've never seen a contrast like that > before. Does > it even make sense to have an addition sign in the equation? > What does that > actually mean? Should I instead make contrasts of (Day2.Fluffy - > Day2.Smooth) and (Day5.Fluffy - Day5.Smooth) and get the union > or intersect > of them? > > > The contrast you are using doesn't really make sense, because a > contrast is usually testing the difference between groups, so you > subtract rather than sum. If you were to use > > (Day2.Fluffy - Day2.Smooth) - (Day5.Fluffy - Day5.Smooth) > > then you would be testing the interaction of time and phenotype. > In other words the interaction looks for genes that are different > between fluffy and smooth, depending on the day. So if you think > the fluffiness of your samples is dependent on time, that is what > you would likely want to test. > > Best, > > Jim > > > > Hope someone can help on this, and thanks in advance! > > Regards, > Zhihao > Graduate Student > University of Washington > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
Normalization Normalization • 784 views
ADD COMMENT

Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6