Question: Correct implementation of multifactorial paired analysis in limma software package for DE expression
1
gravatar for svlachavas
4.4 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Dear Bioconductor Community,

in a previous thread( https://support.bioconductor.org/p/64372/#64413) i have asked about the implementation of a paired analysis, and Mr. James W. MacDonald kindly provided me an suggestion of a paired limma analysis, which i used on the code below :

library(limma)
conditions <- data.trusted.eset$condition
condition <- factor(conditions, levels(condition)[c(2,1)])
pairs <- factor(rep(1:13, each = 2))
design <- model.matrix(~condition+pairs)
fit <- lmFit(data.trusted.eset, design)
fit2 <- eBayes(fit)

In the next step i wanted to implement a multifactorial analysis in limma to evaluate the interaction between two factors, condition(control & cancer tissue) & meta_factor(metastatic & non-metastatic cancer), in order to evaluate possible DE genes between metastatic cancer and non-metastatic cancer samples:

f <- paste(data.trusted.eset$condition, data.trusted.eset$Meta_factor, sep=".")
f <- factor(f)
pairs <- factor(rep(1:13, each = 2))
design <- model.matrix(~0 +f +pairs)

fit <- lmFit(data.trusted.eset, design)
head(names$coefficients)
contrast.matrix <- makeContrasts(M.CvsC="fCancer.1-fCancer.0", M.CvsControl="fCancer.1-fNormal.0", levels=design)
fitb <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fitb)

My first question is if it is wrong for the statistical model that i didnt use an intercept term(because with the intercept term i couldnt use the coefficient "fCancer.0" for the contrast.matrix) ? And secondly, is it also important as with the first analysis with just one factor, to include the term pairs to get meaningful results ? My last question is that with topTable, i get only 4 genes with adj.p,Val < 0.05. Is that necessarily wrong or it might illustrate possible small differences between metastatic and non-metastatic tumors ?

Thank you for your patience(& please excuse me for any "begginer" questions, as i have only using R & Bioconductor for a few months).

 

ADD COMMENTlink modified 4.4 years ago by Gordon Smyth38k • written 4.4 years ago by svlachavas660

any help or suggestions ?

ADD REPLYlink written 4.4 years ago by svlachavas660

What is data.trusted.eset$condition? What is data.trusted.eset$Meta_factor? We'd be able to give more help if you define it here, rather than us having to dig through an old thread to figure out what you're talking about.

ADD REPLYlink written 4.4 years ago by Aaron Lun24k

Please excuse me, because i forgot to mention it in the start when i was refering to the variables. Data.trusted.eset is my ExpressionSet, and the two other variables are in the phenoData object as above

 

pData(data.trusted.eset) :
                                   condition         Meta_factor
7_Tesch.CEL                    Normal           0
8_Tesch.CEL                   Cancer            0
0687_04_Blauth.CEL        Normal            0
0687_04_Blauth_1.CEL      Cancer           0
0863_03_Schmidt.CEL       Normal           0
0863_03_Schmidt_1.CEL     Cancer          0
0948_04_Leiber.CEL        Normal              0
0948_04_Leiber_1.CEL      Cancer            0
1043_04_Nagel.CEL         Normal             0
1043_04_Nagel_1.CEL      Cancer             0
1103_03_Braun.CEL         Normal             1
1103_03_Braun_1.CEL      Cancer             1
1234_06_Floersch.CEL      Normal             0
1234_06_Floersch_1.CEL    Cancer           0
1235_06_Hey.CEL           Normal              0
1235_06_Hey_1.CEL         Cancer             0
1236_06_Liebich.CEL       Normal              1
1236_06_Liebich_1.CEL     Cancer           1
1410_03_Urbaniak.CEL      Normal           1
1410_03_Urbaniak_1.CEL    Cancer          1
1430_04_Patschke.CEL      Normal           0
1430_04_Patschke_1.CEL    Cancer          0
1518_03_Dege.CEL          Normal              0
1518_03_Dege_1.CEL        Cancer             0
1554_03_Gemmer.CEL        Normal            1
1554_03_Gemmer_1.CEL      Cancer          1

ADD REPLYlink written 4.4 years ago by svlachavas660
Answer: Correct implementation of multifactorial paired analysis in limma software packa
2
gravatar for Gordon Smyth
4.4 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

You can no longer use a paired design, because Meta_factor is a between-subject factor and paired analyses only deal with within-subject comparisons. Instead you have to treat subject (pair) as a random effect, which is done in limma using the duplicateCorrelation function.

The limma User's Guide calls your experiment a "Multi-level experiment". Just look up Section 9.7 in the User's Guide and follow the instructions.

ADD COMMENTlink written 4.4 years ago by Gordon Smyth38k

Thank you for your valuable information and direction to solve this problem. As i checked the vignette in usersguide regarding the specific topic, i would like to ask one more question that is very important: as i want to block on the patients like you mentioned above, the corresponding variable in the vignette was referred as subject. So i suppose i have to create a similar variable called pairs(or subject) like this: 

pairs <- factor(rep(1:13, each = 2)) 

and then enter it into the phenoData of my eset, which is described similarly as "targets" in the section 9.7 ??

ADD REPLYlink written 4.4 years ago by svlachavas660
Answer: Correct implementation of multifactorial paired analysis in limma software packa
1
gravatar for Aaron Lun
4.4 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Your current design matrix is a bit tricky to interpret. I'd suggest doing something like this:

> f <- relevel(f, ref="Normal.0")
> design <- model.matrix(~0 + pairs + f)
> design <- design[,-ncol(design)]
> colnames(design)
 [1] "pairs1"    "pairs2"    "pairs3"    "pairs4"    "pairs5"    "pairs6"   
 [7] "pairs7"    "pairs8"    "pairs9"    "pairs10"   "pairs11"   "pairs12"  
[13] "pairs13"   "fCancer.0" "fCancer.1"

The first 13 coefficients describe the patient blocking effect. Coefficient 14 describes the fold change of Cancer.0 over Normal.0. Coefficient 15 describes the fold change of Cancer.1 over Normal.1.

Now, you can't directly compare between Cancer.1 and Cancer.0, because any differences will be absorbed into the patient blocking factors (this is also true of your original design). Instead, you can ask whether the log-fold change  between metastatic cancer and paired normal samples is equal to that between non-metastatic cancer and their normal samples, by using:

contrast.matrix <- makeContrasts(M.CvsC="fCancer.1-fCancer.0", level=design)

Of course, you can also compare between each cancer and its normal sample directly.

If you don't choose to include the pairing in design, you're going to be treating different patients as biological replicates. This will likely result in large dispersion estimates, which will confound detection of DE genes. On the other hand, it means that your dataset simplifies to a three-group design, so you can compare between metastatic and non-metastatic cancer samples directly.

As to the small number of DE genes; that depends on the variance of your data, the size of the fold-changes between normal and cancer samples for each patient, and the parametrization of your design and contrast matrices. For example, the contrast between Cancer.1 and Cancer.0 is unlikely to pick up any DE genes, as any changes will be absorbed by the blocking factors.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Aaron Lun24k

thank you also for your comment and useful explanation in how to deal with this specific approach

ADD REPLYlink written 4.4 years ago by svlachavas660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 153 users visited in the last hour