Question

DESeq2 results with RUVSeq values

2

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 18 days ago

Germany

Hi,

I have a data set, where we assume one of the conditions to have transcriptional amplification toward a set of genes. For that reason I decide to test the data with the RUVSeq package.

I have ran the analysis as explained in the vignette and calculated the estimated factors of unwanted variation. I have got the following factors

> pData(set1)[,2]
[1] -0.4209557 -0.3984871 -0.4039097  0.4393171  0.4079229  0.3761126

I wanted to use these factors to analyse the dataset using DESeq2, but DESeq accepts only positive values as sizeFactors.

> sizeFactors(cds.postRUV) <- pData(set1)$W_1
Error in .local(object, ..., value) : size factors must be positive

Is there a way to re-calculate these RUVSeq factors in to positive values to fit for the DESeq2 analysis?

thanks,

Assa

deseq2 ruvseq sizefacotrs • 5.5k views

ADD COMMENT • link updated 9.5 years ago by davide risso ▴ 980 • written 9.5 years ago by Assa Yeroslaviz ★ 1.5k

score 1 · Answer 1 · 2015-05-13

1

Entering edit mode

davide risso ▴ 980

@davide-risso-5075

Last seen 8 months ago

University of Padova

Mix 1 and Mix 2 are supposed to be different in the two samples, so I'm not surprised that you capture the group difference with the first factor of UV based on them. You can either use only group B or subtract the expected fold-change from the expression matrix to use all the spike-ins. From our paper:

"Interestingly, one can relax the negative control gene assumption by requiring instead the identification of a set of Jc positive or negative controls, for which the value of βc is known a priori but need not be zero. Then, Xβc is known and one can perform the singular value decomposition of logYc − Xbc − Oc to estimate W as in step 3 of RUVg above. Steps 4 and 5 remain the same. This allows us to make full use of all 92 ERCC spike-in controls for the SEQC data set. "

I hope this helps.

ADD COMMENT • link 9.5 years ago davide risso ▴ 980

0

Entering edit mode

I have been struggling with this paragraph in the paper for sometime, I am still unsure how it is possible to use positive controls. How/where does one specify the expected values for the fold changes.

ADD REPLY • link 8.9 years ago Pietà Schofield • 0

0

Entering edit mode

Let me omit the offset for simplicity. The model for the control genes becomes log Y_c ~ X b_c + W a_c.

If b_c is known (and different from zero), you can estimate W a_c by SVD on the matrix log Y_c - X b_c.

In R, it would be something like:

logY <- log1p(Y)
logY[controls,] <- logY[controls,] - X * b_c[controls]

Assuming that Y is the full matrix of counts, controls is an indicator of the positive controls and X is a one dimensional indicator variable (i.e., two-class comparison).

ADD REPLY • link 8.8 years ago davide risso ▴ 980

score 0 · Answer 2 · 2015-05-12

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 15 hours ago

United States

Add the factor(s) of unwanted variation to the column data:

dds$fuv1 <- fuv1

Then include the factor(s) in the design:

design(dds) <- ~ fuv1 + fuv2 + condition

ADD COMMENT • link 9.5 years ago Michael Love 43k

score 0 · Answer 3 · 2015-05-12

Thanks Michael,

somehow it doesn't really look right.

these are my samples and factors:

> pData(set1)
               x        W_1
ctrl1       ctrl -0.4209557
ctrl2       ctrl -0.3984871
ctrl3       ctrl -0.4039097
strain1 strained  0.4393171
strain2 strained  0.4079229
strain3 strained  0.3761126

and this the analysis:

col.d$fuv1 <- pData(set1)$W_1

cds.postRUV <- DESeqDataSetFromMatrix(
  countData = counts_deseq_filt,
  colData   = col.d,
  design    = ~ fuv1 + conditions)

dds.postRUV <- DESeq(cds.postRUV)

dds.postRUV$conditions <- relevel(dds.postRUV$conditions, "ctrl")
res.postRUV <- results(object = dds.postRUV)

But than, checking the adjusted p-values i have in almost all the genes the same value:

> table(res.postRUV$padj)

0.999903715587095 0.999947942273539 0.999989492263923 0.999997165182665
            28703                 1                 5                 1

Did I do something wrong here?

score 0 · Answer 4 · 2015-05-12

Hi Assa,

the reason you get these results is because your factor of UV is almost perfectly correlated with the biology: all negative values for your controls and all positive values for your "strained" samples. So you're testing and correcting for basically the same variable, and this causes troubles in the DE test.

Either your negative controls are not really negative controls and capture the biological difference between your samples, or the unwanted variation is truly perfectly correlated with the biology. If the latter is true, than I don't think RUV can help you here.

What are you using as negative controls?

score 0 · Answer 5 · 2015-05-12

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 18 days ago

Germany

I am using the ercc Spike In as a data set.

ADD COMMENT • link 9.5 years ago Assa Yeroslaviz ★ 1.5k

score 0 · Answer 6 · 2015-05-12

It looks like the expression of the spike-ins is different in the two groups. Are you using the same mix for both controls and treated samples, or are you using Mix 1 vs Mix 2?

If you're using the same mix, it might be worth looking at the distribution of the spike-in reads across samples (e.g., box plots, MAplots, ...) as well as plotting the first principal components of the spike-in expressions. If you see the samples cluster by biology, removing the variation inferred from the spike-ins will likely remove your signal of interest.

score 0 · Answer 7 · 2015-05-13

Hi Davide,

I am using a mix1-mix2 mix of the ERCC spikeIns. I have tested this and it came out quite good for the two conditions. Where I wasn't sure how to proceed was whether or not to use only group B of the spikeIn mix, where in both samples I have the same concentration. But as far as I understand it from your above explanation, this will also remove the signal of interest I'm looking for.

Is this assumption correct?