Question

Design matrix for analyzing time points

0

Entering edit mode

Tanner • 0

@5ed88adc

Last seen 20 months ago

Canada

I'm following the EdgeR documentation and I'm unclear on a few things regarding setting up a design matrix and contrast with more than one time point. I think I may have it set up correctly but I'm not confident.

My experiment consists of 2 time points; 16 hours and 32 hours, and each time point has it's own negative control. My design matrix looks like this.

              Condition Time        Group
HPI16_2A_1      control 16hr control.16hr
HPI16_2A_2      control 16hr control.16hr
HPI16_2A_3      control 16hr control.16hr
HPI16_miniT_1     miniT 16hr   miniT.16hr
HPI16_miniT_2     miniT 16hr   miniT.16hr
HPI16_miniT_3     miniT 16hr   miniT.16hr
HPI32_2A_1      control 32hr control.32hr
HPI32_2A_2      control 32hr control.32hr
HPI32_2A_3      control 32hr control.32hr
HPI32_miniT_1     miniT 32hr   miniT.32hr
HPI32_miniT_2     miniT 32hr   miniT.32hr
HPI32_miniT_3     miniT 32hr   miniT.32hr

I want to compare miniT.32hr to miniT.16hr while incorporating their respective controls. Is the appropriate contrast?

my.contrasts <- makeContrasts(
    miniTvscontrol.32hr = (miniT.32hr - control.32hr) - (miniT.16hr - control.16hr),
levels=design)

Hopefully I have provided enough information!

edgeR • 931 views

ADD COMMENT • link updated 21 months ago by Gordon Smyth 51k • written 21 months ago by Tanner • 0

score 1 · Answer 1 · 2023-02-21

1

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen just now

WEHI, Melbourne, Australia

My design matrix looks like this

No, that is not a design matrix. That is a sample information data frame (also known as a targets frame in the limma documentation). To make the question complete you should show the command used to create the design matrix.

Is the appropriate contrast?

That seems a natural and sensible contrast, and it does do what you say, so it is probably what you want (assuming you have created the design matrix by ~0+Group). But judging the scientific suitability of a contrast is up to you. We provide you with the means to make any comparison you want. But interpreting the biological meaning and appropriateness of the contrast in the context of your own scientific experiment is not a statistical issue and so you have to take responsibility for it yourself.

ADD COMMENT • link 21 months ago Gordon Smyth 51k

0

Entering edit mode

Yes, my mistake, thanks for pointing that out. Thank you for the feedback as well.

ADD REPLY • link 21 months ago Tanner • 0

0

Entering edit mode

I've been thinking about this comparison a bit more, and now I'm not quite sure I'm interpreting it correctly. I'm worried I'm not properly normalizing (this probably isn't the right term) or accounting for each time points control.

How is this contrast (miniT.32hr - control.32hr) - (miniT.16hr - control.16hr) working and how would it differ from (miniT.32hr - miniT.16hr) - (control.32hr - control.16hr)?

Is it possible that a gene is not a DEG at either time point individually (ie it's just background noise), but when doing this sort of contrast (32hr vs 16hr) it may be detected as a DEG? This sort of scenario I'm most worried about.

ADD REPLY • link 21 months ago Tanner • 0

1

Entering edit mode

How is this contrast working

The contrast does exact what it says. I don't know what else to say.

how would it differ from (miniT.32hr - miniT.16hr) - (control.32hr - control.16hr)?

It doesn't differ. The two contrasts are identical.

Is it possible that a gene is not a DEG at either time point individually (ie it's just background noise), but when doing this sort of contrast (32hr vs 16hr) it may be detected as a DEG?

Well, no. I'm not really understanding your concern. Clearly if T32 = miniT.32hr - control.32hr and T16 = miniT.16hr - control.16hr are both small then the whole contrast must be small as well. The contrast tests whether the treatment effect differs between the two times points. If the treatment effect truly differs between the times, then inevitably the true treatment effect must be nonzero for at least one of the times.

On the other hand, it could be that T32 is positive but just misses out of being statistically significant as a time-specific contrast and T16 is negative but just misses out on being significant by itself, and the complete contrast T32 - T16 becomes significant by contrasting the positive and negative effects. So the interaction can pick up some genes that are non-significant by individual testing at both time points. This is intended behavior: it's an advantage, not a disadvantage, because the time-specific testing will obviously have some false negatives.

ADD REPLY • link 21 months ago Gordon Smyth 51k

0

Entering edit mode

Thanks, once again. However, I have more questions!

There are a different number of genes/tags between the two time points (16 and 32 hours). These data were collected from cells infected with a virus so these are timepoints post infection. As a result, there are more genes/tags at 32 hours post infection than there are at 16. In order to process this in EdgeR I have merged the data frames; however, this obviously creates NA values. There are a minimal number of NA values though, maybe 3-4%. I'd rather not merge the results and toss out genes that are only present at one time point, because these are useful data points. How should I handle the NAs? Can I replace them with 1?

ADD REPLY • link 21 months ago Tanner • 0

0

Entering edit mode

Your new questions are not follow-ups to the original questiion above but are about quite different issues. Rather than asking new questions as comments, please open a new question.

When you do so, you need to give proper context. What techology you are using? Why do you have missing values? Is this thread a continuation of your previous question about proteomics data? As it is, your new question is unexpected to say the least. You cannot possibly be using edgeR in the first place if you have NAs. I cannot possibly advise you how to impute NAs while knowing absolutely nothing about your data type. Replacing NAs with 1 with would be really strange and is not recommended by anyone as far as I know. In your previous question you said that you already knew how to impute NAs so I am quite bamboozled.

ADD REPLY • link 21 months ago Gordon Smyth 51k