Hello, I am new to DESeq2/RNAseq and would like a little help. I have performed a RIPSeq experiment with conditions described below (mouse neural tissue harvested after drug injections). I am aware that there are alternatives such as RipSeeker, however I am treating my samples as RNAseq for the time being. I have imported my raw counts into R and analyzed them with DESeq2. However, I am unsure how to take my knockouts into account. In this case, the knockouts essentially represent background bead binding. Since RIPSeq tends to be noisy, I need to "subtract" out any RNAs that might bind at a background level. I have read online that I cannot subtract counts but then I am not quite sure how to compare my DESeq2 results data frames between WT and KO.
<caption>Experimental design</caption>Category | Condition | Replicates |
---|---|---|
Control | Solvent 1 | 2 |
Control | Solvent 2 | 3 |
KO | Solvent 2 KO | 3 |
Experimental | Drug 1 | 3 |
KO | Drug 1 KO | 1 |
Experimental | Drug 1 + Drug 2 | 3 |
KO | Drug 1 + Drug 2 KO | 3 |
Input | Total RNA | 3 |
It is also important to mention that since drug 1 and drug 2 have different solvents, I have controls for each separate solvent. The hope is that the two solvents alone will not show a huge difference in what I immunoprecipitated.
Here is my analysis strategy:
- Collapse technical replicates (there are two technicals for each biological replicate)
- Slice up DESeq2 object into subsets that include the condition as well as the total RNA
- Run results on each of these comparing the condition against the total RNA to find enrichment
- Compare each condition and its knockout pair to see if any top differentially expressed genes showed up in both experimental and KO
- If so, remove gene from experimental condition list
So my question is whether or not this is valid and if there is a better way to do this kind of comparison to deal with the knockouts given my experimental conditions.
First, just adding a comment: usually when people want to "subtract" out background expression, the best approach is to use a design with interaction terms. Interaction terms help you test for differences of differences, for example: is the ratio of KO/WT different with drug 1 compared to control, or drug 1+2 compared to control, or drug 1+2 compared to drug 1?
This is fairly easy to set up, and does not require subsetting the DESeq2 object. You just run DESeq() once and get multiple results tables.
Does this sound like what you are interested in testing?
Some experiments though which I'm not sure the role they play in the analysis are: solvent 1 control, and total RNA.
I would be interested in setting up something similar to what you described. Would this be like testing the difference between WT and KO within each condition? I think I will have to try this out and see the data before I'd be sure that I am understanding exactly what I would get back. Essentially, I want the DGE for the experimental sample to take into account the fact that the KO gives you a sense of the amount of noise that may exist in the experimental sample.
For solvent 1 control, I think something that could be done is to compare drug1+2 against solvent 1 and then drug1+2 against solvent 2. I was not able to dissolve the two drugs in the same solvent for the drug injections (each was separately dissolved in a different solvent then combined together in the syringe) so I need a way to measure similarity in the results between the fold change relative to each control to show that one solvent wasn't biasing the results I see. This design may need to be different from the KO one described above.
Is there anything wrong with comparing directly to my total RNA as I described above? This kind of analysis is done with qPCR for example where you see relative enrichment against the entire population. Is there no way to factor this into the analysis of WT and KO?
Okay, I think I figured out how to use interactions as you said. In my condition column I added in either CTRL (solvent 2) or drug1 or drug1+2. Then in a genotype column I listed which were WT or KO. I removed the solvent 1 and totals data for now. After running it, I got the following result tables back:
Then I can use:
This should give me the differential expression of either the drug1 or drug1/2 v. control controlling for the interaction of WT vs KO. Is this correct?
I feel the usage of total RNA is still valid but it is really just telling me which genes change in expression relative to the whole background population. It doesn't tell me anything about how it changed relative to a condition.