Question: Incorporating factors of unwanted variation from RUVr into EdgeR cell means model for DE
0
6 months ago by
garbuzov0
garbuzov0 wrote:

Hi there, I'm new to working with EdgeR's generalized linear models, but I have a data set where I'm trying to get complex comparisons between groups, times, and treatments, so here I am...

I've figured out that the best approach for me is to use the cell means model. Right now I have two groups and two treatments (at each time point, but ignore that for now).

Here is my code:

#countsDGE_d3 is my count matrix
>key_3d
sample    group condition
F.3dayInjury.IP1 F.3dayInjury forelimb    injury
F.3dayInjury.IP2 F.3dayInjury forelimb    injury
F.3dayInjury.IP3 F.3dayInjury forelimb    injury
F.Intact.IP1         F.Intact forelimb no_lesion
F.Intact.IP2         F.Intact forelimb no_lesion
F.Intact.IP3         F.Intact forelimb no_lesion
H.3dayInjury.IP1 H.3dayInjury hindlimb    injury
H.3dayInjury.IP2 H.3dayInjury hindlimb    injury
H.3dayInjury.IP3 H.3dayInjury hindlimb    injury
H.Intact.IP1         H.Intact hindlimb no_lesion
H.Intact.IP2         H.Intact hindlimb no_lesion
H.Intact.IP3         H.Intact hindlimb no_lesion
H.Intact.IP4         H.Intact hindlimb no_lesion

design3d <- model.matrix(~0+key_3d$sample) colnames(design3d) <- levels(key_3d$sample)

countsDGE_d3 <- calcNormFactors(countsDGE_d3)
countsDGE_d3 <- estimateDisp(countsDGE_d3, design3d, robust = TRUE)
countsDGE_d3$common.dispersion fit <- glmQLFit(countsDGE_d3, design3d) cont.matrix <- makeContrasts(NvsINJinFL=F.3dayInjury-F.Intact, NvsINJinHL=H.3dayInjury-H.Intact, Diff=(F.3dayInjury-F.Intact)-(H.3dayInjury-H.Intact), levels=design3d) qlf <- glmQLFTest(fit, contrast=cont.matrix[,"NvsINJinFL"]) topTags(qlf, n = 20) table <- topTags(qlf, n = Inf) write.csv( as.data.frame(table), file="All_FLDay3_InjuredvsIntact_results.csv")  I then repeat running glmQLFTest on all three contrasts and export the topTags table. My question is, due to the data being pretty noisy, I used the RUVseq package to calculate the factors of unwanted variation for my samples. I would like to incorporate these into my design matrix and into my contrasts. How do I construct this model matrix and contrast matrix? Does this make sense? design3d <- model.matrix(~0+key_3d$sample+W_1, data=pData_3d)


I'm not sure what my contrasts should be. I tried:

cont.matrixW1 <- makeContrasts(NvsINJinFL=(F.3dayInjury-F.Intact)-W_1,
NvsINJinHL=(H.3dayInjury-H.Intact)-W_1,
Diff=((F.3dayInjury-F.Intact)-(H.3dayInjury-H.Intact))-W_1,
levels=design3dW1)


Where pData_3d is a matrix of the factors of unwanted variation from:

seq_set <- newSeqExpressionSet(as.matrix(dge_counts, phenoData = key$sample)) design <- model.matrix(~0 + key$sample)
colnames(design) <- levels(key$sample) dge <- DGEList(counts = dge_counts, group = key$sample)
dge <- calcNormFactors(dge, method="upperquartile", rubust = TRUE)
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)

fit <- glmFit(dge, design)
res <- residuals(fit, type = "deviance")

#use all the genes to estimate the factors of unwanted variation.
seqUQ <- betweenLaneNormalization(seq_set, which="upper")
controls <- rownames(seq_set)
seqRUVr <- RUVr(seqUQ, controls, k=1, res)


Thank you!

rnaseq edger ruvseq glm ruvr • 134 views