multilevel limma analysis
1
1
Entering edit mode
zzjerryzhang ▴ 10
@zzjerryzhang-11066
Last seen 7.8 years ago

Hello,
I have 107 RNAseq (1 is bad from 108 samples) samples for analysis using limma. We start from one line of stem cell, then clone it to 3 replicates lines(represent by 1,2,3), and each replicate we did 2 sister replicates (represent by A,B)again, so totally we have 6 subjects cell line. for each subject we did treatment experiment at 3 Passage time (represent by P8,P23,P32),at each Passage time and each subject, we did 6 treatment experiment with control (represent by T1,T2,T3,T4 and control1, control2).  T1,T2,T3,T4  use different medicine concentration increasingly. control1, control2 are without medicine. 

the target is :

Sample Subject Passage Treatment
433_97 1A P8 Tr1
433_98 1A P8 Tr2
433_99 1A P8 Tr3
433_100 1A P8 Tr4
433_101 1A P8 Ct
433_102 1A P8 Ct
433_103 1B P8 Tr1
433_104 1B P8 Tr2
433_105 1B P8 Tr3
433_106 1B P8 Tr4
433_107 1B P8 Ct
433_108 1B P8 Ct
433_109 2A P8 Tr1
433_110 2A P8 Tr2
433_111 2A P8 Tr3
433_112 2A P8 Tr4
433_113 2A P8 Ct
433_114 2A P8 Ct
433_115 2B P8 Tr1
433_116 2B P8 Tr2
433_117 2B P8 Tr3
433_118 2B P8 Tr4
433_119 2B P8 Ct
433_120 2B P8 Ct
433_121 3A P8 Tr1
433_122 3A P8 Tr2
433_123 3A P8 Tr3
433_124 3A P8 Tr4
433_126 3A P8 Ct
433_127 3B P8 Tr1
433_128 3B P8 Tr2
433_129 3B P8 Tr3
433_130 3B P8 Tr4
433_131 3B P8 Ct
433_132 3B P8 Ct
433_133 1A P23 Tr1
433_134 1A P23 Tr2
433_135 1A P23 Tr3
433_136 1A P23 Tr4
433_137 1A P23 Ct
433_138 1A P23 Ct
433_139 1B P23 Tr1
433_140 1B P23 Tr2
433_141 1B P23 Tr3
433_142 1B P23 Tr4
433_143 1B P23 Ct
433_144 1B P23 Ct
433_145 2A P23 Tr1
433_146 2A P23 Tr2
433_147 2A P23 Tr3
433_148 2A P23 Tr4
433_149 2A P23 Ct
433_150 2A P23 Ct
433_151 2B P23 Tr1
433_152 2B P23 Tr2
433_153 2B P23 Tr3
433_154 2B P23 Tr4
433_155 2B P23 Ct
433_156 2B P23 Ct
433_157 3A P23 Tr1
433_158 3A P23 Tr2
433_159 3A P23 Tr3
433_160 3A P23 Tr4
433_161 3A P23 Ct
433_162 3A P23 Ct
433_163 3B P23 Tr1
433_164 3B P23 Tr2
433_165 3B P23 Tr3
433_166 3B P23 Tr4
433_167 3B P23 Ct
433_168 3B P23 Ct
433_169 1A P32 Tr1
433_170 1A P32 Tr2
433_171 1A P32 Tr3
433_172 1A P32 Tr4
433_173 1A P32 Ct
433_174 1A P32 Ct
433_175 1B P32 Tr1
433_176 1B P32 Tr2
433_177 1B P32 Tr3
433_178 1B P32 Tr4
433_179 1B P32 Ct
433_180 1B P32 Ct
433_181 2A P32 Tr1
433_182 2A P32 Tr2
433_183 2A P32 Tr3
433_184 2A P32 Tr4
433_185 2A P32 Ct
433_186 2A P32 Ct
433_187 2B P32 Tr1
433_188 2B P32 Tr2
433_189 2B P32 Tr3
433_190 2B P32 Tr4
433_191 2B P32 Ct
433_192 2B P32 Ct
433_193 3A P32 Tr1
433_194 3A P32 Tr2
433_195 3A P32 Tr3
433_196 3A P32 Tr4
433_197 3A P32 Ct
433_198 3A P32 Ct
433_199 3B P32 Tr1
433_200 3B P32 Tr2
433_201 3B P32 Tr3
433_202 3B P32 Tr4
433_203 3B P32 Ct
433_204 3B P32 Ct

My question is 

1. whether I can calculate coefficient within each block(subject)

2. how to estimate which subject is more sensitive to passage and treatment?

 

Thanks

 

 

limma • 1.2k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

To answer your first question, you need to set up a design matrix with subject-specific coefficients. You haven't described how you intend to model the experimental factors, but I would probably do something like this:

design <- model.matrix(~0 + Passage:Subject + Treatment:Subject)

Looking at colnames(design) gives:

 [1] "PassageP23:Subject1A"   "PassageP32:Subject1A"   "PassageP8:Subject1A"   
 [4] "PassageP23:Subject1B"   "PassageP32:Subject1B"   "PassageP8:Subject1B"   
 [7] "PassageP23:Subject2A"   "PassageP32:Subject2A"   "PassageP8:Subject2A"   
[10] "PassageP23:Subject2B"   "PassageP32:Subject2B"   "PassageP8:Subject2B"   
[13] "PassageP23:Subject3A"   "PassageP32:Subject3A"   "PassageP8:Subject3A"   
[16] "PassageP23:Subject3B"   "PassageP32:Subject3B"   "PassageP8:Subject3B"   
[19] "Subject1A:TreatmentTr1" "Subject1B:TreatmentTr1" "Subject2A:TreatmentTr1"
[22] "Subject2B:TreatmentTr1" "Subject3A:TreatmentTr1" "Subject3B:TreatmentTr1"
[25] "Subject1A:TreatmentTr2" "Subject1B:TreatmentTr2" "Subject2A:TreatmentTr2"
[28] "Subject2B:TreatmentTr2" "Subject3A:TreatmentTr2" "Subject3B:TreatmentTr2"
[31] "Subject1A:TreatmentTr3" "Subject1B:TreatmentTr3" "Subject2A:TreatmentTr3"
[34] "Subject2B:TreatmentTr3" "Subject3A:TreatmentTr3" "Subject3B:TreatmentTr3"
[37] "Subject1A:TreatmentTr4" "Subject1B:TreatmentTr4" "Subject2A:TreatmentTr4"
[40] "Subject2B:TreatmentTr4" "Subject3A:TreatmentTr4" "Subject3B:TreatmentTr4"

The first 18 coefficients represent expression of the control samples for each subject/passage combination. The remaining coefficients represent the log-fold change of the named treatment over the control for each subject. (We assume that, for each subject, the treatment effect is the same for each passage. You don't really have enough residual d.f. to allow for passage/treatment-specific treatment effects within each subject.) In this manner, you can compute coefficients for each level of the Subject blocking factor.

For your second question, I don't know what you mean by "sensitivity", but I'll assume that you want to find which subject has more DE genes across passages or across treatments. To find DE across passages for, say, subject 1A, you could do something like the below (replace colons with underscores in colnames(design), otherwise makeContrasts will complain):

con <- makeContrasts(PassageP23_Subject1A - PassageP32_Subject1A,
   PassageP8_Subject1A - PassageP32_Subject1A, levels=design)

This will test for DE between any of the passages for subject 1A. To test for DE across treatments, you would simply do:

con <- makeContrasts(Subject1A_TreatmentTr1, Subject1A_TreatmentTr2,
    Subject1A_TreatmentTr3, Subject1A_TreatmentTr4, levels=design)

... which will test if any of the treatments have a non-zero effect relative to the control for subject 1A. If you repeat this process for all of the subjects, you can figure out which subject is most "sensitive" based on which one has the most DE genes.

ADD COMMENT
0
Entering edit mode

Thanks Aaron. I just wonder for example the 1st col is 

PassageP23:Subject1A, in design matrix is label number one for six chips which are 
433_133 1A P23 Tr1
433_134 1A P23 Tr2
433_135 1A P23 Tr3
433_136 1A P23 Tr4
433_137 1A P23 Ct
433_138 1A P23

Ct

 

 but the meaning is at control level the coefficient of P23 within 1A block, is it right? If I want to test how many DE between P23 and P8 within 1A block at control level , I should compare 

PassageP23_Subject1A-PassageP8_Subject1A

?  Thanks again.

ADD REPLY
1
Entering edit mode

Yes, and sort of yes. The second answer is a bit complicated; remember, we've assumed that the effect of each treatment is the same at each passage in order to use the additive model. If there was a DE (let's say upregulated) gene between P23 and P8 within 1A at treatment 1 only, this could not be directly handled by the additive model because we don't have passage-/treatment-specific coefficients. Instead, it'll be handled by increasing the PassageP23:Subject1A coefficient and decreasing PassageP8:Subject1A (but only to a small extent, otherwise the fit to the expression values for the other treatments would be compromised). In summary, your contrast above will test for DE between P23 and P8 at all levels of the Treatment factor, not just at the control level. This is slightly non-intuitive as it seems to contradict the interpretation of the coefficients that I stated before; it's caused by the fact that all coefficients are interdependent in an additive model.

ADD REPLY
0
Entering edit mode

OK.

PassageP23_Subject1A-PassageP8_Subject1A means at the average of T1,T2,T3,T4,  two controls, which is difference between P23 and P8 within 1A.  

 

ADD REPLY
0
Entering edit mode

Hi, further question

the meaning of 

Subject1A_TreatmentTr1 coefficient from your answer is within 1A block average of P8,P23,P32 at treatment1 minus average of P8,P23,P32 at control, is it right?

If I want to get every coefficient for each P8, P23 and P32 treatment1 minus control within 1A, can I get that and how?

ADD REPLY
0
Entering edit mode

First question: more or less. It's more complicated as it depends on the other treatments, but close enough.

Second question: the additive model does not allow for this. You're asking for a coefficient for each passage/treatment/subject combination. If you make a model to do this, the fitted coefficients will be exactly equal to the observed expression values because you don't have any residual d.f. for each combination (except for the control samples, where the coefficient will be equal to the average of the two controls for each combination). So, if you want these log-fold changes, you might as well just subtract the observed expression for each treatment from the control average within each subject/passage combination.

ADD REPLY
0
Entering edit mode

Hi Aaron,

Greetings of the day. I need a help from you. Actually, I am doing the same thing as you suggested . This is the code

design <- model.matrix(~0 + sam$Soil_origin:sam$Plant_compartments + sam$Plant_genotype:sam$Plant_compartments)

Here, sam is the sample metadata. This is the overview of sam

enter image description here

But colnames(design) give this output where design for "G1" genotype is missing

[1] "sam$Soil_originC:sam$Plant_compartmentsL"     
 [2] "sam$Soil_originF:sam$Plant_compartmentsL"     
 [3] "sam$Soil_originC:sam$Plant_compartmentsN"     
 [4] "sam$Soil_originF:sam$Plant_compartmentsN"     
 [5] "sam$Soil_originC:sam$Plant_compartmentsR"     
 [6] "sam$Soil_originF:sam$Plant_compartmentsR"     
 [7] "sam$Soil_originC:sam$Plant_compartmentsS"     
 [8] "sam$Soil_originF:sam$Plant_compartmentsS"     
 [9] "sam$Plant_compartmentsL:sam$Plant_genotypeG27"
[10] "sam$Plant_compartmentsN:sam$Plant_genotypeG27"
[11] "sam$Plant_compartmentsR:sam$Plant_genotypeG27"
[12] "sam$Plant_compartmentsS:sam$Plant_genotypeG27"
[13] "sam$Plant_compartmentsL:sam$Plant_genotypeG96"
[14] "sam$Plant_compartmentsN:sam$Plant_genotypeG96"
[15] "sam$Plant_compartmentsR:sam$Plant_genotypeG96"
[16] "sam$Plant_compartmentsS:sam$Plant_genotypeG96"

So, I request you to kindly help me regarding this concern

ADD REPLY

Login before adding your answer.

Traffic: 425 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6