DESeq2 results with RUVSeq values
7
2
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 18 days ago
Germany

Hi,

I have a data set, where we assume one of the conditions to have transcriptional amplification toward a set of genes. For that reason I decide to test the data with the RUVSeq package.

I have ran the analysis as explained in the vignette and calculated the estimated factors of unwanted variation. I have got the following factors

> pData(set1)[,2]
[1] -0.4209557 -0.3984871 -0.4039097  0.4393171  0.4079229  0.3761126

I wanted to use these factors to analyse the dataset using DESeq2, but DESeq accepts only positive values as sizeFactors.

> sizeFactors(cds.postRUV) <- pData(set1)$W_1
Error in .local(object, ..., value) : size factors must be positive


Is there a way to re-calculate these RUVSeq factors in to positive values to fit for the DESeq2 analysis?

thanks,

Assa

 

deseq2 ruvseq sizefacotrs • 5.5k views
ADD COMMENT
1
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Mix 1 and Mix 2 are supposed to be different in the two samples, so I'm not surprised that you capture the group difference with the first factor of UV based on them. You can either use only group B or subtract the expected fold-change from the expression matrix to use all the spike-ins. From our paper:

"Interestingly, one can relax the negative control gene assumption by requiring instead the identification of a set of Jc positive or negative controls, for which the value of βc is known a priori but need not be zero. Then, Xβc is known and one can perform the singular value decomposition of logYc Xbc Oc to estimate W as in step 3 of RUVg above. Steps 4 and 5 remain the same. This allows us to make full use of all 92 ERCC spike-in controls for the SEQC data set. "

I hope this helps.

ADD COMMENT
0
Entering edit mode

I have been struggling with this paragraph in the paper for sometime, I am still unsure how it is possible to use positive controls. How/where does one specify the expected values for the fold changes.

ADD REPLY
0
Entering edit mode

Let me omit the offset for simplicity. The model for the control genes becomes log Y_c ~ X b_c + W a_c.

If b_c is known (and different from zero), you can estimate W a_c by SVD on the matrix log Y_c - X b_c.

In R, it would be something like:

logY <- log1p(Y)
logY[controls,] <- logY[controls,] - X * b_c[controls]

Assuming that Y is the full matrix of counts, controls is an indicator of the positive controls and X is a one dimensional indicator variable (i.e., two-class comparison).

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

Add the factor(s) of unwanted variation to the column data:

dds$fuv1 <- fuv1

Then include the factor(s) in the design:

design(dds) <- ~ fuv1 + fuv2 + condition
ADD COMMENT
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 18 days ago
Germany

Thanks Michael,

somehow it doesn't really look right.

these are my samples and factors:

> pData(set1)
               x        W_1
ctrl1       ctrl -0.4209557
ctrl2       ctrl -0.3984871
ctrl3       ctrl -0.4039097
strain1 strained  0.4393171
strain2 strained  0.4079229
strain3 strained  0.3761126

and this the analysis:

col.d$fuv1 <- pData(set1)$W_1

cds.postRUV <- DESeqDataSetFromMatrix(
  countData = counts_deseq_filt,
  colData   = col.d,
  design    = ~ fuv1 + conditions)

dds.postRUV <- DESeq(cds.postRUV)

dds.postRUV$conditions <- relevel(dds.postRUV$conditions, "ctrl")
res.postRUV <- results(object = dds.postRUV)


But than, checking the adjusted p-values i have in almost all the genes the same value:

> table(res.postRUV$padj)

0.999903715587095 0.999947942273539 0.999989492263923 0.999997165182665
            28703                 1                 5                 1

 

Did I do something wrong here?

 

 

ADD COMMENT
0
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Hi Assa,

 

the reason you get these results is because your factor of UV is almost perfectly correlated with the biology: all negative values for your controls and all positive values for your "strained" samples. So you're testing and correcting for basically the same variable, and this causes troubles in the DE test.

Either your negative controls are not really negative controls and capture the biological difference between your samples, or the unwanted variation is truly perfectly correlated with the biology. If the latter is true, than I don't think RUV can help you here.

What are you using as negative controls?

 

ADD COMMENT
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 18 days ago
Germany

I am using the ercc Spike In as a data set.

ADD COMMENT
0
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

It looks like the expression of the spike-ins is different in the two groups. Are you using the same mix for both controls and treated samples, or are you using Mix 1 vs Mix 2?

If you're using the same mix, it might be worth looking at the distribution of the spike-in reads across samples (e.g., box plots, MAplots, ...) as well as plotting the first principal components of the spike-in expressions. If you see the samples cluster by biology, removing the variation inferred from the spike-ins will likely remove your signal of interest.

ADD COMMENT
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 18 days ago
Germany

Hi Davide,

I am using a mix1-mix2 mix of the ERCC spikeIns. I have tested this and it came out quite good for the two conditions. Where I wasn't sure how to proceed was whether or not to use only group B of the spikeIn mix, where in both samples I have the same concentration. But as far as I understand it from your above explanation, this will also remove the signal of interest I'm looking for.

Is this assumption correct?

ADD COMMENT

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6