Question

RUVSeq DE analysis for data with replicate samples and 4 conditions

1

Entering edit mode

sharvari gujja ▴ 40

@sharvari-gujja-6614

Last seen 14 months ago

United States

Hello,

I have a question on using RUVSeq for removing batch effects for 8 samples of RNA-Seq data - 2 batches with 4 samples each.

The example data in the tutorial shows 2 conditions with 3 replicates each:
https://www.bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf

Following the steps in section 3 to estimate factors of unwanted variation using replicate samples, I am trying to define the matrix as:
differences <- matrix(data=c(1:2,3:4,5:6,7:8), byrow=TRUE, nrow=2)
PCA plot for set3 doesn't group the replicates:

However, if I define matrix for pairwise condition as:

differences1 <- matrix(data=c(1:2,3:4), byrow=TRUE, nrow=2)
set3_1 <- RUVs(set, genes, k=1, differences1)
plotPCA(set3_1, col=colors[x], cex=1.2)
and
differences2 <- matrix(data=c(5:6,7:8), byrow=TRUE, nrow=2)
set3_2 <- RUVs(set, genes, k=1, differences2)
plotPCA(set3_2, col=colors[x], cex=1.2)

Each PCA plot grouping of replicates for each condition.

Could you please let me know what is the correct approach to define the matrix and do DE analysis for pairs of conditions

Thanks

Sharvari

RUVSeq • 3.1k views

ADD COMMENT • link updated 7.8 years ago by anthonycolombo60 • 0 • written 8.7 years ago by sharvari gujja ▴ 40

score 1 · Answer 1 · 2015-09-01

If you want to test pairwise comparisons with edgeR, you should use contrasts. See Section 3.2 of the edgeR Users guide. (You can access it within R via the edgeRUsersGuide() function.)This doesn't change with / without RUV factors, except that you need one more coefficient (0) in the contrast.

In your example, if you want to compare say, the second and third levels, you can do it with:

lrt <- glmLRT(fit, contrast=c(0, -1, 1, 0, 0))

Note that the contrast coefficients refer to the columns of the design matrix, which should have 5 columns: 1 intercept, three columns for the x factor and one column for the RUV factor. On the other hand, if you want to compare, say, the first and the second level, using the coef=2 option will work, which is what you're doing in your code. If you want an ANOVA-like test for the differences among all the groups, you should use coef=2:4.

Again I recommend reading Section 3.2 of the edgeR users guide, which explains these concepts much better than my answer. Another useful reading is the limma users guide that has a general section on linear models and on contrasts, most of which applies to GLMs as well.

score 0 · Answer 2 · 2015-08-28

Hi Sharvari,

maybe I'm missing something, but your PCA plot looks fine to me. The samples don't cluster by batch (which I'm assuming is _1 vs _2 in the sample names) but by biological conditions (which I'm assuming are represented by the colors).

Am I completely misunderstanding what are your batches and what are your conditions?

I imagine that your two replicates per condition are biological replicates. If that's the case, you should expect some biological variability between them and you should not expect the replicates to be projected one on top of the other in the PCA plot.

score 0 · Answer 3 · 2015-09-01

You are right. The clustering looks fine. The batches are _1 and _2 and the replicates are in the same color. I have 4 conditions with 2 replicates each.

I would like to perform DE analysis on pairwise conditions:

differences <- matrix(data=c(1:2,3:4,5:6,7:8), byrow=TRUE, nrow=4)

After defining set1 as :

set1 <- RUVs(set, genes, k=1, differences)

DE analysis: how to define a subset for set1 for pairwise comparison? For ex: Scrambled vs. CNOT7KD

design <- model.matrix(˜x + W_1, data=pData(set1))

y <- DGEList(counts=counts(set), group=x)

y <- calcNormFactors(y, method="upperquartile")

y <- estimateGLMCommonDisp(y, design)

y <- estimateGLMTagwiseDisp(y, design)

fit <- glmFit(y, design)

lrt <- glmLRT(fit, coef=2)

topTags(lrt)

Thanks for all the help.

Sharvari

score 0 · Answer 4 · 2015-09-02

0

Entering edit mode

sharvari gujja ▴ 40

@sharvari-gujja-6614

Last seen 14 months ago

United States

Thank you so much for your comments. This was very helpful.

Sharvari

ADD COMMENT • link 8.7 years ago sharvari gujja ▴ 40

score 0 · Answer 5 · 2016-06-24

0

Entering edit mode

anthonycolombo60 • 0

@anthonycolombo60-8475

Last seen 2.7 years ago

United States

Hi.

so for contrasts with the RUVSeq factors, do you set them to zero? in this example above

lrt <- glmLRT(fit, contrast=c(0, -1, 1, 0, 0)) the fifth column is the W_1 from RUVSeq, but does this take into account the unwanted variance factor? it appears the fit is contrasting Cov_3/Cov_2 but don't you want (Cov_3 + W_1) vs Cov_2 (the control) ?

thank you very much. For my contrasts, I'm not sure how to set the contrasts with the RUV factors using the group-means parameterization (intercept=0) and not the factor-means parameterization (intercept=1)

thank you

ADD COMMENT • link 7.8 years ago anthonycolombo60 • 0

0

Entering edit mode

Yes, you should set the contrast coefficient for W_1 to zero, because that's not the comparison of interest. The W_1 variable is used only to adjust for unwanted variation.

There is a very nice Section on contrasts in the limma vignette that I suggest you read.

ADD REPLY • link 7.8 years ago davide risso ▴ 950