Question: Correct implementation of multifactorial paired analysis in limma software package for DE expression
1
4.8 years ago by
svlachavas740
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Bioconductor Community,

in a previous thread( https://support.bioconductor.org/p/64372/#64413) i have asked about the implementation of a paired analysis, and Mr. James W. MacDonald kindly provided me an suggestion of a paired limma analysis, which i used on the code below :

library(limma)
conditions <- data.trusted.eset$condition condition <- factor(conditions, levels(condition)[c(2,1)]) pairs <- factor(rep(1:13, each = 2)) design <- model.matrix(~condition+pairs) fit <- lmFit(data.trusted.eset, design) fit2 <- eBayes(fit) In the next step i wanted to implement a multifactorial analysis in limma to evaluate the interaction between two factors, condition(control & cancer tissue) & meta_factor(metastatic & non-metastatic cancer), in order to evaluate possible DE genes between metastatic cancer and non-metastatic cancer samples: f <- paste(data.trusted.eset$condition, data.trusted.eset$Meta_factor, sep=".") f <- factor(f) pairs <- factor(rep(1:13, each = 2)) design <- model.matrix(~0 +f +pairs) fit <- lmFit(data.trusted.eset, design) head(names$coefficients)
contrast.matrix <- makeContrasts(M.CvsC="fCancer.1-fCancer.0", M.CvsControl="fCancer.1-fNormal.0", levels=design)
fitb <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fitb)

My first question is if it is wrong for the statistical model that i didnt use an intercept term(because with the intercept term i couldnt use the coefficient "fCancer.0" for the contrast.matrix) ? And secondly, is it also important as with the first analysis with just one factor, to include the term pairs to get meaningful results ? My last question is that with topTable, i get only 4 genes with adj.p,Val < 0.05. Is that necessarily wrong or it might illustrate possible small differences between metastatic and non-metastatic tumors ?

Thank you for your patience(& please excuse me for any "begginer" questions, as i have only using R & Bioconductor for a few months).

modified 4.8 years ago by Gordon Smyth39k • written 4.8 years ago by svlachavas740

any help or suggestions ?

What is data.trusted.eset$condition? What is data.trusted.eset$Meta_factor? We'd be able to give more help if you define it here, rather than us having to dig through an old thread to figure out what you're talking about.

Please excuse me, because i forgot to mention it in the start when i was refering to the variables. Data.trusted.eset is my ExpressionSet, and the two other variables are in the phenoData object as above

pData(data.trusted.eset) :
condition         Meta_factor
7_Tesch.CEL                    Normal           0
8_Tesch.CEL                   Cancer            0
0687_04_Blauth.CEL        Normal            0
0687_04_Blauth_1.CEL      Cancer           0
0863_03_Schmidt.CEL       Normal           0
0863_03_Schmidt_1.CEL     Cancer          0
0948_04_Leiber.CEL        Normal              0
0948_04_Leiber_1.CEL      Cancer            0
1043_04_Nagel.CEL         Normal             0
1043_04_Nagel_1.CEL      Cancer             0
1103_03_Braun.CEL         Normal             1
1103_03_Braun_1.CEL      Cancer             1
1234_06_Floersch.CEL      Normal             0
1234_06_Floersch_1.CEL    Cancer           0
1235_06_Hey.CEL           Normal              0
1235_06_Hey_1.CEL         Cancer             0
1236_06_Liebich.CEL       Normal              1
1236_06_Liebich_1.CEL     Cancer           1
1410_03_Urbaniak.CEL      Normal           1
1410_03_Urbaniak_1.CEL    Cancer          1
1430_04_Patschke.CEL      Normal           0
1430_04_Patschke_1.CEL    Cancer          0
1518_03_Dege.CEL          Normal              0
1518_03_Dege_1.CEL        Cancer             0
1554_03_Gemmer.CEL        Normal            1
1554_03_Gemmer_1.CEL      Cancer          1

Answer: Correct implementation of multifactorial paired analysis in limma software packa
2
4.8 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

You can no longer use a paired design, because Meta_factor is a between-subject factor and paired analyses only deal with within-subject comparisons. Instead you have to treat subject (pair) as a random effect, which is done in limma using the duplicateCorrelation function.

The limma User's Guide calls your experiment a "Multi-level experiment". Just look up Section 9.7 in the User's Guide and follow the instructions.

Thank you for your valuable information and direction to solve this problem. As i checked the vignette in usersguide regarding the specific topic, i would like to ask one more question that is very important: as i want to block on the patients like you mentioned above, the corresponding variable in the vignette was referred as subject. So i suppose i have to create a similar variable called pairs(or subject) like this:

pairs <- factor(rep(1:13, each = 2))

and then enter it into the phenoData of my eset, which is described similarly as "targets" in the section 9.7 ??

Answer: Correct implementation of multifactorial paired analysis in limma software packa
1
4.8 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Your current design matrix is a bit tricky to interpret. I'd suggest doing something like this:

> f <- relevel(f, ref="Normal.0")
> design <- model.matrix(~0 + pairs + f)
> design <- design[,-ncol(design)]
> colnames(design)
[1] "pairs1"    "pairs2"    "pairs3"    "pairs4"    "pairs5"    "pairs6"
[7] "pairs7"    "pairs8"    "pairs9"    "pairs10"   "pairs11"   "pairs12"
[13] "pairs13"   "fCancer.0" "fCancer.1"


The first 13 coefficients describe the patient blocking effect. Coefficient 14 describes the fold change of Cancer.0 over Normal.0. Coefficient 15 describes the fold change of Cancer.1 over Normal.1.

Now, you can't directly compare between Cancer.1 and Cancer.0, because any differences will be absorbed into the patient blocking factors (this is also true of your original design). Instead, you can ask whether the log-fold change  between metastatic cancer and paired normal samples is equal to that between non-metastatic cancer and their normal samples, by using:

contrast.matrix <- makeContrasts(M.CvsC="fCancer.1-fCancer.0", level=design)

Of course, you can also compare between each cancer and its normal sample directly.

If you don't choose to include the pairing in design, you're going to be treating different patients as biological replicates. This will likely result in large dispersion estimates, which will confound detection of DE genes. On the other hand, it means that your dataset simplifies to a three-group design, so you can compare between metastatic and non-metastatic cancer samples directly.

As to the small number of DE genes; that depends on the variance of your data, the size of the fold-changes between normal and cancer samples for each patient, and the parametrization of your design and contrast matrices. For example, the contrast between Cancer.1 and Cancer.0 is unlikely to pick up any DE genes, as any changes will be absorbed by the blocking factors.