Design matrix design for the differentially expression analysis
1
1
Entering edit mode
riya ▴ 10
@riya-22831
Last seen 4.2 years ago

Hi, I am pretty new to the RNA seq data analysis and I need some advice on the design matrix formulation:

This is my sample file with a column group id which I made to group all the variables in one to distinguish one sample from another and then use it in the design matrix:

Sample Cell Type Concentration Time(hrs) GroupID
1 AB1 Control 0 5 AB1Control0_5
2 AB1 Control 0 5 AB1Control0_5
3 AB1 Control 0 5 AB1Control0_5
4 AB1 Treatment 5 5 AB1Treatment5_5
5 AB1 Treatment 5 5 AB1Treatment5_5
6 AB1 Treatment 5 5 AB1Treatment5_5
7 ST1 Control 0 5 ST1Control0_5
8 ST1 Control 0 5 ST1Control0_5
9 ST1 Control 0 5 ST1Control0_5
10 ST1 Treatment 5 5 ST1Treatment5_5
11 ST1 Treatment 5 5 ST1Treatment5_5
12 ST1 Treatment 5 5 ST1Treatment5_5
13 AB1 Control 0 8 AB1Control0_8
14 AB1 Control 0 8 AB1Control0_8
15 AB1 Control 0 8 AB1Control0_8
16 AB1 Treatment 8 8 AB1Treatment8_8
17 AB1 Treatment 8 8 AB1Treatment8_8
18 AB1 Treatment 8 8 AB1Treatment8_8
19 ST1 Control 0 8 ST1Control0_8
20 ST1 Control 0 8 ST1Control0_8
21 ST1 Control 0 8 ST1Control0_8
22 ST1 Treatment 8 8 ST1Treatment8_8
23 ST1 Treatment 8 8 ST1Treatment8_8
24 ST1 Treatment 8 8 ST1Treatment8_8

Contrasts I need

for my contrast matrix I want to compare: 1) AB1.Treatment at concentration 5 and time point 5 vs AB1.control at concentration 0 and time point 5 2)AB1.Treatment at concentration 8 and time point 8 vs AB1.control at concentration 0 and time point 8 3) ST1.Treatment at concentration 5 and time point 5 vs ST1.control at concentration 0 and time point 5 4)ST1.Treatment at concentration 8 and time point 8 vs ST1.control at concentration 0 and time point 8

design matrix I used:

dds = DESeqDataSetFromMatrix(countData = Countdata, colData = Metadata, design = ~ GroupID)

results(dds,contrast=c("GroupID","AB1Treatment55","AB1Control05")) results(dds,contrast=c("GroupID","ST1Treatment55","ST1Control05")) results(dds,contrast=c("GroupID","AB1Treatment88","AB1Control08")) results(dds,contrast=c("GroupID","ST1Treatment88","ST1Control08"))

and when I run resultnames(dds)I see some contrast I don't need and not the ones I need : for example:: GroupIDAB1Treatment55vsAB1Control05 GroupIDST1Treatment55vsAB1Control05 ( this is not what I want) but I want GroupIDST1Treatment55vsST1Control05.

Also I get the results from this but I see some only 1 as adj values for all the genes in some contrasts. so , is my design matrix right? Could somebody help me on this ??

Please tell me if I am doing wrong somewhere

Thanks in advance!

deseq2 • 636 views
ADD COMMENT
0
Entering edit mode

Could you please update your post ot fix the formatting? Perhaps specifying the formatting of the table you listed to be a "code sample" (the 100110 button in the tool bar, or just directly edit the formatting using markdown syntax).

Also, you've shown us the design of your experiment, but you didn't show us the results() call you made to extrat the statistics you are after. Note that getting adjust pvalues hammered to all 1 (or close to it) isn't all that uncommon (even though it can be surprising):

Plotting a histogram of your nominal p-values is often a good diagnostic tool after you run your analyses.

ADD REPLY
0
Entering edit mode

Thanks for the tip Steve! I want to know if adding another column(here GroupID) in your data to club all the variables together good enough to then use it in design matrix ??because as far as I understood , the whole idea of design matrix is to distinguish one sample from another based on the conditions and in my case I want to consider all these conditions which are varying in the experiment and do the treatment vs their corresponding controls.

ADD REPLY
0
Entering edit mode

If you want to compare one subset of samples to another subset of samples, giving them unique labels in the GroupID column is usually the best way to do that. If you want to compare all the controls to all the treated, and you want the software to model the differences introduced by there being two time points in addition to the larger question of differences between treatment, that's when you do a design of ~ treatment + day

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

hi Riya,

I'd recommend to work with a statistician or someone familiar with linear models. We have a lot of documentation in the vignette and ?results but I don't have extra time these days for statistical consulting on the support site. I have to reserve my time for software related questions.

ADD COMMENT

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6