Question

Limma-Voom - Time series experiment with one group (duplicateCorrelation() function)

1

Entering edit mode

heikki.sarin ▴ 10

@heikkisarin-13379

Last seen 3.4 years ago

I have problems performing a time-series experiment with Limma-Voom while accounting between-subject variability and age as a covariate. We're especially interested in finding out the effect of age on the time series results, but as I'm new to the Limma-interface, I'm rather hesitant, whether I can trust the pipeline and results.

We have one group with each individual having three time point measurements (PRE, MID, POST). We're interested in detecting differential effects between time points while accounting for between individual variability and age as a covariates. As I've understood in one group time series analysis the duplicateCorrelation() function should be used but not quite sure how to implement in the pipeline based on the vignettes.

The following code runs just fine without clitches and results are somewhat expected i) with and ii) without the age variable in design matrix, but not sure whether the between subject variability is now accounted for in the pipeline.

Thank you so much in advance!

########################### Predata processing ########################### 
##########################################################################
#Create a edgeR object for analysis
dge <- DGEList(counts=countdata)

################Check how many genes have zero counts across samples ############
table(rowSums(dge$counts==0)==72) #24 individuals, 3 time points

####################### Filter out lowly expressed genes ######################## 
keep.exprs <- filterByExpr(dge, group=coldata$PRE_MID_POST)
dge <- dge[keep.exprs,, keep.lib.sizes=FALSE]
dim(dge)

######################## Normalize the data ##################
dge <- calcNormFactors(dge, method = "TMM")
dge$samples$norm.factors

######################## Desing matrix generation ######################
design <- model.matrix(~ time + age, coldata)
voom_dge = voom(dge, design, plot=TRUE)

################### Duplicate correlation calculation ##################
dupcor <- duplicateCorrelation(voom_dge, design, block=coldata$subject_id) #cor between samples of the same clone
dupcor$consensus.correlation

#### Transform the data object for differential expression analysis#####
logCPM <- cpm(dge, log=TRUE, prior.count=3)

##########################################################################
########### Differential expression: Limma-Voom ##########################
##########################################################################
#Run the analysis
fit <- lmFit(logCPM, design, block=coldata$subject_id, correlation=dupcor$consensus.correlation)
fit <- eBayes(fit, trend=TRUE)

voom duplicateCorrelation() timeseries limma • 1.4k views

ADD COMMENT • link updated 3.4 years ago by Gordon Smyth 50k • written 3.4 years ago by heikki.sarin ▴ 10

score 0 · Answer 1 · 2020-12-26

Your code is roughly correct and will probably give ok results but you are mixing two different pipelines (limma-voom and limma-trend). Your basic idea of running voom(), then duplicateCorrelation() then lmFit() is correct.

See Chapter 15 of the limma User's Guide. limma-voom and limma-trend are two different pipelines for RNA-seq data. If you use voom() then you don't need cpm() and logCPM. If you use cpm() and logCPM then you don't need voom().

Another issue is the filtering. The group that you use for filterByExpr() should be the same that you use for the design matrix rather than the one you use duplicateCorrelationo(). Just use

keep.expr <- filterByExpr(dge, design)