Overwhelmed by DESEq2 options and how to answer my questions.: Two conditions and Three families
Entering edit mode
lgspeight • 0
Last seen 2 days ago
United States

I have been working with DESeq2 for the past couple of months analyzing my data, I have read over the vignette many times, found other workshops, read message boards, but I still second guess my decisions and the options I have chosen.

Basically my design is that I have multiple clam lines, lets say 3 (A,B,C), and two salinities I am comparing (35 ppt vs 15 ppt). Salinity 35 ppt is my control, but I do not have a control or reference clam line.

The questions I am asking are: 1) How does the Hard clam respond to low salinity (15 vs 35), so regardless of clam line, how does this species respond to 15 ppt? 2) Do different clam lines respond differently to low salinity/ what genes are differentially expressed between (A&B), (B&C), (C&A) in 15 ppt?

I have come up with many different ways to approach these questions, but which approach is best or the right one?

I have had suggestions that I need to flip these questions and first ask question number 2 then 1.

I have struggled with if I need an interaction term or just groups. Do I just put salinity in the model and leave clam line out and vice versa to answer these different questions? When I but multiple variables or an interaction term in, the coefficient start to get vary confusing, especially since I don't have a reference clam line, but DESeq2 make one of my clam lines the reference.

Then there is the decision of shrinkage estimators. I have decided that Apeglm is best with my data. Ashr leaves dispersion outliers among my significant genes. However, contrast statements cannot be used with Apeglm.

Do I need to run multiple models or can I use one? What is ethical? I am defending my thesis in January and am in the process of creating my results. But I am terrified of doing something wrong and in the very end, when I go to defend or publish, all my results are incorrect.

If you have any guidance or suggestions, that would be great. Please don't just point me to a link or the vignette, because I have most likely read it and feel like everyone has a different solution to similar problems.

My advisors have limited experience with RNA-Seq and neither have used DESeq2, so I have been figuring this all out on my own.

I appreciate your time reading this and responding.



DESeq2 • 137 views
Entering edit mode
swbarnes2 ★ 1.1k
Last seen 7 hours ago
San Diego

I have struggled with if I need an interaction term or just groups

Well, you use different things for different questions.

Use the grouping way described at the beginning of the "Interactions" section A_15, C 35, etc to do a simple comparison of a subset of samples to another subset. While you could do this with interaction...it's way easier to understand with the grouping way. And while you could make a subset of dds with just one line..in general, it's better not to do this, and you don't have to, because you can specify the exact subset of samples you want to contrast, no matter what is set as the reference level.

To see if different lines respond differently to the salinity challenge, use interactions. Here, you are pretty much going to have to use names from ResultsNames, which means you might have to relevel to make sure that you get a ResultName that specifies the two lines you want to compare.

I think you should just try to tackle this empirically. Get all your normalized counts, get averages for the different groupings. The DESeq2 results should be fairly close to the difference of averages you work out in Excel. Its important to make sure that the answer DESeq2 is giving you matches the question you intended to ask.

Entering edit mode

Thank you @swbarnes2. I very much agree with your comment "Its important to make sure that the answer DESeq2 is giving you matches the question you intended to ask"

I think using an interaction term is good and does allow me to answer my questions (at least question 2). This issue I run into is that only shrinkage estimator ashr allows me to use contrast or incorporate multiple coefficients. However, with ashr I still get LFC in the 20s, which seems outrageous. Apeglm takes the same data and gives me an LFC of 8 or 10 at the largest.

I have found a very nice page that explains interactions and how to use different coef to answer different questions, but they are strictly using the results() and not lfcshrink(). https://rstudio-pubs-static.s3.amazonaws.com/329027_593046fb6d7a427da6b2c538caf601e1.html


Login before adding your answer.

Traffic: 307 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6