**15k**wrote:

I received a question about RIP-Seq by email, and wanted to post a reminder about how to test for ratio of ratios using DESeq2.

Suppose we have two assays: Input and IP, and we have two conditions: A and B.

Then using a design: '~ assay + condition + assay:condition', the interaction term 'assay:condition' represents the ratio of the ratios: (IP for B / Input for B) / (IP for A / Input for A)... which with some algebra you can see is equivalent to (IP for B / IP for A) / (Input for B / Input for A).

Using a likelihood ratio test which removes the interaction term in the reduced model, we test whether the IP enrichment is different in condition B than in A:

dds <- DESeqDataSet(se, design= ~ assay + condition + assay:condition) dds <- DESeq(dds, test="LRT", reduced= ~ assay + condition) results(dds)

(Or similarly, using another DESeqDataSet constructor function).

Note: if there are only 4 samples total (no replicates), then there are not enough residual degrees of freedom to estimate the dispersion, and DESeq2 will automatically use a design of ~ 1 for estimating the dispersion (and will print a message about this). This means that the analysis is underpowered to detect differences, because we are overestimating the dispersion in ignoring the differences due to experimental design. See ?DESeq for more on this case.

Further questions:

what if the depth is lower for the IP samples? This is handled by the size factors in the model.

what if the distribution of counts is different for the IP samples? Suppose first that the counts are still decently modeled with a Negative Binomial. DESeq2 estimates a single dispersion parameter per gene. If the IP samples have a higher dispersion than the Input samples, then the dispersion estimate from all samples will be in-between the dispersion estimate you would get from the assays separately. I would guess that this wouldn't be too problematic, but it's hard to say without seeing the actual data, and will depend on how far apart the dispersions of the two assays are. Secondly, what if the IP counts are be poorly fit with a Negative Binomial? Again, it's hard to say exactly how this would effect the performance without knowing how far off from a Negative Binomial they are.

Hello Michael,

First of all thank you very much for the quick reply!

I have an additional question regarding to the normalization: I know that the normalization corrects for differences between sequencing depth, but what if all samples receive approximately the same total amount of reads, but in the IP samples we only have a small fraction of transcripts represented, yielding much more reads for those transcripts as apposed to the input samples, where all the expressed genes appear in the data. will the normalization still correct for the differences, even though the total amount of reads is identical for all samples?

In addition, after i will receive the data, what is the best way for me to check whether the IP samples can be modeled with the negative binomial distribution? and what is the best way for me to see how far apart are the dispersions of both assays?

Thank you very much,

Olga.

50"what if all samples receive approximately the same total amount of reads, but in the IP samples we only have a small fraction of transcripts represented, yielding much more reads for those transcripts as apposed to the input samples, where all the expressed genes appear in the data"This will be taken care of by the 'assay' term in the design specified above.

"after i will receive the data, what is the best way for me to check whether the IP samples can be modeled with the negative binomial distribution"If you don't have any biological replicates, then it's hard to make any comparison of how well different

~~parameterizations~~distributions fit. The Negative Binomial allows us to fit the mean and the variance, so it is fairly flexible. For data with many replicates for each individual condition, extreme bimodality within one condition (half the samples at 0 and half large counts) would be something not handled well by the Negative Binomial. With many replicates, a useful diagnostic is looking at histograms of the normalized counts for a single gene and a single condition. Also if you have many replicates, you can estimate the dispersion within each (i.e. build a dataset with only one condition and ~1) and compare them.15kThank you very much for the suggestions.

I would like to raise another question regarding the biological replicates.

As it seems, the intention of the researcher is to sequence 2 biological replicates for the IP samples and sequence only one input replicate for each condition. I know that it is not much and it is better to use more replicates, but will it be OK to work with input samples without replicates at all?

502 vs 1 has lower statistical power than 2 vs 2, but large consistent fold changes should still be detected.

15kHey Micheal,

I was having problems setting colData for this type of analysis. My analysis is not RipSeq, but theoretically something very similar to RipSeq. I have two conditions "A" & "B". Each condition has a "fraction" samples and a "total" sample. I have 5 replicates for each condition. The comparison that I want to make is "factionA/totalA" vs "fractionB/totalB", which essentially looks like a RipSeq analysis. The other catch here is that, the "total" sample for both conditions are the same. So "totalA" & "totalB" are the same samples. So, in all I have 15 samples : 5 "fractionA", 5 "fractionB" & 5 "total" samples which are common for conditions A & B.

The colData that I currently have is as follows:

Samples condition assay

F1 A fraction

F2 A fraction

F3 A fraction

F4 A fraction

F5 A fraction

T1 A T

T2 A T

T3 A T

T4 A T

T5 A T

F6 B fraction

F7 B fraction

F8 B fraction

F9 B fraction

F10 B fraction

T6 B T

T7 B T

T8 B T

T9 B T

T10 B T

T1-T5 are the same as T6-T10. I added them as separate to keep the order of the analysis. I keep getting the following error:

"the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others"

which I assume is because of the incorrect order of the colData. I have tried a couple of other ways to set up the colData, but to no use. Any help on this would be greatly appreciated.

Thanks!

Shashank

0hi Shashank,

You would use a different design if you only have 3 groups. Note that:

(fraction A / total) / (fraction B / total) = fraction A / fraction B.

Because the total samples are the same for both A and B, these fall out of the equation.

You should combine condition and assay into a new variable:

dds$group <- factor(paste0(dds$condition, dds$assay))

Then use a design of ~group.

Then you can investigate three contrasts: fraction A vs total, fraction B vs total, or fraction B vs fraction A.

15kHey Micheal,

thanks for redirecting me to this post.

I tried to apply DESeq2 in this way to my polysome profiling data. However I am running into an issue and a question.

First of all the issue, it has to do with extracting the contrast of interest. Since it did not work with my data I tried to run the examples given when I write ?results in R, specifically shown here example 2 of the results help section.

Do you have an idea what might cause this error? A search on the internet regarding this error in DESeq2 did not bring me any hints. (session info posted below)

Regarding my question, I use the following setup

This should give me the contrast as from the example you of shashank fractionA/fractionB, if I am not mistaken... is this right?

Currently this gives me the same error as stated in my Issue.

Thanks in advance,

Christian

sessioninfo:

0Something is funny here because DESeq2 is tested against that example. If DESeq2 threw an error with that example, it wouldn't be available on the Bioconductor page.

Can you try the example again in a fresh R session?

Your package versions seem fine, but can you try:

15kI tried to restart the R session and it worked. There were phantom packages, which showed up in the sessioninfo() but were not actually loaded... after detaching them all worked fine.

Do I apply the right lists in the contrast to get the contrast for fractionA/fractionB or in my case

Polysome/Cytosolic data? (see previous post second section)Thanks for the help!

Cheers,

Christian

015kThat is what I was trying but I had the exact same problem with the "Model matrix not full rank". I found an issue in my colData and now it worked as far as I can tell.

Thanks for the help and patience!

0