Question

multilevel limma analysis

1

Entering edit mode

zzjerryzhang ▴ 10

@zzjerryzhang-11066

Last seen 7.8 years ago

Hello,
I have 107 RNAseq (1 is bad from 108 samples) samples for analysis using limma. We start from one line of stem cell, then clone it to 3 replicates lines(represent by 1,2,3), and each replicate we did 2 sister replicates (represent by A,B)again, so totally we have 6 subjects cell line. for each subject we did treatment experiment at 3 Passage time (represent by P8,P23,P32),at each Passage time and each subject, we did 6 treatment experiment with control (represent by T1,T2,T3,T4 and control1, control2). T1,T2,T3,T4 use different medicine concentration increasingly. control1, control2 are without medicine.

the target is :

Sample	Subject	Passage	Treatment
433_97	1A	P8	Tr1
433_98	1A	P8	Tr2
433_99	1A	P8	Tr3
433_100	1A	P8	Tr4
433_101	1A	P8	Ct
433_102	1A	P8	Ct
433_103	1B	P8	Tr1
433_104	1B	P8	Tr2
433_105	1B	P8	Tr3
433_106	1B	P8	Tr4
433_107	1B	P8	Ct
433_108	1B	P8	Ct
433_109	2A	P8	Tr1
433_110	2A	P8	Tr2
433_111	2A	P8	Tr3
433_112	2A	P8	Tr4
433_113	2A	P8	Ct
433_114	2A	P8	Ct
433_115	2B	P8	Tr1
433_116	2B	P8	Tr2
433_117	2B	P8	Tr3
433_118	2B	P8	Tr4
433_119	2B	P8	Ct
433_120	2B	P8	Ct
433_121	3A	P8	Tr1
433_122	3A	P8	Tr2
433_123	3A	P8	Tr3
433_124	3A	P8	Tr4
433_126	3A	P8	Ct
433_127	3B	P8	Tr1
433_128	3B	P8	Tr2
433_129	3B	P8	Tr3
433_130	3B	P8	Tr4
433_131	3B	P8	Ct
433_132	3B	P8	Ct
433_133	1A	P23	Tr1
433_134	1A	P23	Tr2
433_135	1A	P23	Tr3
433_136	1A	P23	Tr4
433_137	1A	P23	Ct
433_138	1A	P23	Ct
433_139	1B	P23	Tr1
433_140	1B	P23	Tr2
433_141	1B	P23	Tr3
433_142	1B	P23	Tr4
433_143	1B	P23	Ct
433_144	1B	P23	Ct
433_145	2A	P23	Tr1
433_146	2A	P23	Tr2
433_147	2A	P23	Tr3
433_148	2A	P23	Tr4
433_149	2A	P23	Ct
433_150	2A	P23	Ct
433_151	2B	P23	Tr1
433_152	2B	P23	Tr2
433_153	2B	P23	Tr3
433_154	2B	P23	Tr4
433_155	2B	P23	Ct
433_156	2B	P23	Ct
433_157	3A	P23	Tr1
433_158	3A	P23	Tr2
433_159	3A	P23	Tr3
433_160	3A	P23	Tr4
433_161	3A	P23	Ct
433_162	3A	P23	Ct
433_163	3B	P23	Tr1
433_164	3B	P23	Tr2
433_165	3B	P23	Tr3
433_166	3B	P23	Tr4
433_167	3B	P23	Ct
433_168	3B	P23	Ct
433_169	1A	P32	Tr1
433_170	1A	P32	Tr2
433_171	1A	P32	Tr3
433_172	1A	P32	Tr4
433_173	1A	P32	Ct
433_174	1A	P32	Ct
433_175	1B	P32	Tr1
433_176	1B	P32	Tr2
433_177	1B	P32	Tr3
433_178	1B	P32	Tr4
433_179	1B	P32	Ct
433_180	1B	P32	Ct
433_181	2A	P32	Tr1
433_182	2A	P32	Tr2
433_183	2A	P32	Tr3
433_184	2A	P32	Tr4
433_185	2A	P32	Ct
433_186	2A	P32	Ct
433_187	2B	P32	Tr1
433_188	2B	P32	Tr2
433_189	2B	P32	Tr3
433_190	2B	P32	Tr4
433_191	2B	P32	Ct
433_192	2B	P32	Ct
433_193	3A	P32	Tr1
433_194	3A	P32	Tr2
433_195	3A	P32	Tr3
433_196	3A	P32	Tr4
433_197	3A	P32	Ct
433_198	3A	P32	Ct
433_199	3B	P32	Tr1
433_200	3B	P32	Tr2
433_201	3B	P32	Tr3
433_202	3B	P32	Tr4
433_203	3B	P32	Ct
433_204	3B	P32	Ct

My question is

1. whether I can calculate coefficient within each block(subject)

2. how to estimate which subject is more sensitive to passage and treatment?

Thanks

limma • 1.2k views

ADD COMMENT • link updated 14 months ago by Rishav ▴ 20 • written 7.8 years ago by zzjerryzhang ▴ 10

score 2 · Answer 1 · 2016-07-09

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 19 hours ago

The city by the bay

To answer your first question, you need to set up a design matrix with subject-specific coefficients. You haven't described how you intend to model the experimental factors, but I would probably do something like this:

design <- model.matrix(~0 + Passage:Subject + Treatment:Subject)

Looking at colnames(design) gives:

 [1] "PassageP23:Subject1A"   "PassageP32:Subject1A"   "PassageP8:Subject1A"   
 [4] "PassageP23:Subject1B"   "PassageP32:Subject1B"   "PassageP8:Subject1B"   
 [7] "PassageP23:Subject2A"   "PassageP32:Subject2A"   "PassageP8:Subject2A"   
[10] "PassageP23:Subject2B"   "PassageP32:Subject2B"   "PassageP8:Subject2B"   
[13] "PassageP23:Subject3A"   "PassageP32:Subject3A"   "PassageP8:Subject3A"   
[16] "PassageP23:Subject3B"   "PassageP32:Subject3B"   "PassageP8:Subject3B"   
[19] "Subject1A:TreatmentTr1" "Subject1B:TreatmentTr1" "Subject2A:TreatmentTr1"
[22] "Subject2B:TreatmentTr1" "Subject3A:TreatmentTr1" "Subject3B:TreatmentTr1"
[25] "Subject1A:TreatmentTr2" "Subject1B:TreatmentTr2" "Subject2A:TreatmentTr2"
[28] "Subject2B:TreatmentTr2" "Subject3A:TreatmentTr2" "Subject3B:TreatmentTr2"
[31] "Subject1A:TreatmentTr3" "Subject1B:TreatmentTr3" "Subject2A:TreatmentTr3"
[34] "Subject2B:TreatmentTr3" "Subject3A:TreatmentTr3" "Subject3B:TreatmentTr3"
[37] "Subject1A:TreatmentTr4" "Subject1B:TreatmentTr4" "Subject2A:TreatmentTr4"
[40] "Subject2B:TreatmentTr4" "Subject3A:TreatmentTr4" "Subject3B:TreatmentTr4"

The first 18 coefficients represent expression of the control samples for each subject/passage combination. The remaining coefficients represent the log-fold change of the named treatment over the control for each subject. (We assume that, for each subject, the treatment effect is the same for each passage. You don't really have enough residual d.f. to allow for passage/treatment-specific treatment effects within each subject.) In this manner, you can compute coefficients for each level of the Subject blocking factor.

For your second question, I don't know what you mean by "sensitivity", but I'll assume that you want to find which subject has more DE genes across passages or across treatments. To find DE across passages for, say, subject 1A, you could do something like the below (replace colons with underscores in colnames(design), otherwise makeContrasts will complain):

con <- makeContrasts(PassageP23_Subject1A - PassageP32_Subject1A,
   PassageP8_Subject1A - PassageP32_Subject1A, levels=design)

This will test for DE between any of the passages for subject 1A. To test for DE across treatments, you would simply do:

con <- makeContrasts(Subject1A_TreatmentTr1, Subject1A_TreatmentTr2,
    Subject1A_TreatmentTr3, Subject1A_TreatmentTr4, levels=design)

... which will test if any of the treatments have a non-zero effect relative to the control for subject 1A. If you repeat this process for all of the subjects, you can figure out which subject is most "sensitive" based on which one has the most DE genes.

ADD COMMENT • link 7.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron. I just wonder for example the 1st col is

PassageP23:Subject1A, in design matrix is label number one for six chips which are

433_133	1A	P23	Tr1
433_134	1A	P23	Tr2
433_135	1A	P23	Tr3
433_136	1A	P23	Tr4
433_137	1A	P23	Ct
433_138	1A	P23	Ct

but the meaning is at control level the coefficient of P23 within 1A block, is it right? If I want to test how many DE between P23 and P8 within 1A block at control level , I should compare

PassageP23_Subject1A-PassageP8_Subject1A

? Thanks again.

ADD REPLY • link 7.8 years ago zzjerryzhang ▴ 10

1

Entering edit mode

Yes, and sort of yes. The second answer is a bit complicated; remember, we've assumed that the effect of each treatment is the same at each passage in order to use the additive model. If there was a DE (let's say upregulated) gene between P23 and P8 within 1A at treatment 1 only, this could not be directly handled by the additive model because we don't have passage-/treatment-specific coefficients. Instead, it'll be handled by increasing the PassageP23:Subject1A coefficient and decreasing PassageP8:Subject1A (but only to a small extent, otherwise the fit to the expression values for the other treatments would be compromised). In summary, your contrast above will test for DE between P23 and P8 at all levels of the Treatment factor, not just at the control level. This is slightly non-intuitive as it seems to contradict the interpretation of the coefficients that I stated before; it's caused by the fact that all coefficients are interdependent in an additive model.

ADD REPLY • link 7.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

OK.

PassageP23_Subject1A-PassageP8_Subject1A means at the average of T1,T2,T3,T4,  two controls, which is difference between P23 and P8 within 1A.

ADD REPLY • link 7.8 years ago zzjerryzhang ▴ 10

0

Entering edit mode

Hi, further question

the meaning of

Subject1A_TreatmentTr1 coefficient from your answer is within 1A block average of P8,P23,P32 at treatment1 minus average of P8,P23,P32 at control, is it right?

If I want to get every coefficient for each P8, P23 and P32 treatment1 minus control within 1A, can I get that and how?

ADD REPLY • link 7.8 years ago zzjerryzhang ▴ 10

0

Entering edit mode

First question: more or less. It's more complicated as it depends on the other treatments, but close enough.

Second question: the additive model does not allow for this. You're asking for a coefficient for each passage/treatment/subject combination. If you make a model to do this, the fitted coefficients will be exactly equal to the observed expression values because you don't have any residual d.f. for each combination (except for the control samples, where the coefficient will be equal to the average of the two controls for each combination). So, if you want these log-fold changes, you might as well just subtract the observed expression for each treatment from the control average within each subject/passage combination.

ADD REPLY • link 7.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

Greetings of the day. I need a help from you. Actually, I am doing the same thing as you suggested . This is the code

design <- model.matrix(~0 + sam$Soil_origin:sam$Plant_compartments + sam$Plant_genotype:sam$Plant_compartments)

Here, sam is the sample metadata. This is the overview of sam

enter image description here

But colnames(design) give this output where design for "G1" genotype is missing

[1] "sam$Soil_originC:sam$Plant_compartmentsL"     
 [2] "sam$Soil_originF:sam$Plant_compartmentsL"     
 [3] "sam$Soil_originC:sam$Plant_compartmentsN"     
 [4] "sam$Soil_originF:sam$Plant_compartmentsN"     
 [5] "sam$Soil_originC:sam$Plant_compartmentsR"     
 [6] "sam$Soil_originF:sam$Plant_compartmentsR"     
 [7] "sam$Soil_originC:sam$Plant_compartmentsS"     
 [8] "sam$Soil_originF:sam$Plant_compartmentsS"     
 [9] "sam$Plant_compartmentsL:sam$Plant_genotypeG27"
[10] "sam$Plant_compartmentsN:sam$Plant_genotypeG27"
[11] "sam$Plant_compartmentsR:sam$Plant_genotypeG27"
[12] "sam$Plant_compartmentsS:sam$Plant_genotypeG27"
[13] "sam$Plant_compartmentsL:sam$Plant_genotypeG96"
[14] "sam$Plant_compartmentsN:sam$Plant_genotypeG96"
[15] "sam$Plant_compartmentsR:sam$Plant_genotypeG96"
[16] "sam$Plant_compartmentsS:sam$Plant_genotypeG96"

So, I request you to kindly help me regarding this concern

ADD REPLY • link 14 months ago Rishav ▴ 20