Question

Appropriate selection of DE list in limma before and after implementation of arrayWeights in conjuction with duplicateCorrelation functions

0

Entering edit mode

svlachavas ▴ 830

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

because after some re-evaluation and discussion of my previous analyses regarding one of my current projects with my lab coordinators, (https://support.bioconductor.org/p/71730/#71863), i decided to return and perform again some additional (asked) statistical comparisons regarding my expressionSet. In detail, except the first general paired comparison of cancer & adjucent control samples , and also the anatomic tumor location comparison (that i also performed with the essential feedback and help of Gordon, Aaron and other members of the group)--- i proceeded to perform two "separate" analyses--that is compare only the primary cancer samples vs their adjucent ones, but also only the "metastatic"(these primary colorectal cancer that also had synchronous metastases) versus their respective controls, in order to see any common functional enrichment modules and overlaping genes at the end of the analysis. Thus, because the term Meta_factor describes each patient(i.e. value 1 describes both a "metastatic" and control sample of a patient-thus a between subject comparison), and Disease a within subject comparison, i implemented duplicateCorrelation to be able to perform the below comparisons:

Disease Meta_factor
St_1_WL57.CEL Normal 0
St_2_WL57.CEL Cancer 0
St_N_EC59.CEL Normal 0
St_T_EC59.CEL Cancer 0
St_N_EJ58.CEL Normal 0
St_T_EJ58.CEL Cancer 0.....

(Just a illustration of the phenotype object):

In my first approach, i continued with:

> condition <- factor(eset.2$Disease, levels=c("Normal","Cancer"))
> pairs <- factor(rep(1:30, each = 2))
> metastatic <- factor(eset.2$Meta_factor)
> f <- paste(condition, metastatic, sep=".")
> f <- factor(f)
> design1 <- model.matrix(~0 +f)
> colnames(design1) <- levels(f)
> dupcor <- duplicateCorrelation(eset.2, design1, block=pairs)
> fit <- lmFit(eset.2, design1, block=pairs, correlation=dupcor$consensus)
> cm <- makeContrasts(Meta_Cancer=Cancer.1-Normal.1 , Cancer= Cancer.0-Normal.0, levels=design1)
> colnames(design1)
[1] "Cancer.0" "Cancer.1" "Normal.0" "Normal.1"
> fit2 <- contrasts.fit(fit, cm)
> fit3 <- eBayes(fit2, trend=TRUE)...

Regarding the inspection of the number of DE genes from my above comparisons, i get (with two cutoffs) for the "non-metastatic comparison" 1133 genes, whereas on the "metastatic" comparison i get no DE genes( no adjusted p-value less than 0.05).

As then i implemented arrayWeights (as my samples are from tissue specimens) which indeed shows a noticeable variation in quality(below the link to the plot), i implemented then arrayWeights along with duplicateCorrelation:

https://www.dropbox.com/s/yr0zzvebqe7s2nm/arrayWeights_new_design.jpeg?dl=0

> aw <- arrayWeights(eset.2, design1)
> w <- asMatrixWeights(aw, dim(eset.2))
> dupcor <- duplicateCorrelation(eset.2, design1, block=pairs, weights=w)
> fit <- lmFit(eset.2, design1, block=pairs, correlation=dupcor$consensus, weights=w)
> cm <- makeContrasts(Meta_Cancer=Cancer.1-Normal.1 , Cancer= Cancer.0-Normal.0, levels=design1)......

Then, my metastatic comparison returns 861 DE genes, where also the number of DE genes for the non-metastatic comparison increases too-1506

I "naively" can assume than possibly due to the smaller number of samples in the "metastatic comparison" (6 vs 6) where on the non-metastatic(24 vs 24) and also due to various other reasons(i.e. sample quality) arrayWeights is essential very beneficial for the metastatic context. But for the other comparison, i still acquired DE genes before implementing arrayWeights. Thus, i should leave my second implementation with both comparisons with arrayWeights, and not make a separate contrast matrix only for the contrast of the metastatic samples with arrayWeights ? Because even the "moderate" increase from 1133 to 1506, might include some interesting genes to the pathophysiology of my system, that could not be "detected" prior to the usage of arrayWeights ??

Thank you for your consideration on this matter !!

Any opinions on this subject ??

Efstathios

limma microarray multifactorial design duplicatecorrelation arrayweights • 1.3k views