Search
0
23 months ago by
mat.lesche40
Germany
mat.lesche40 wrote:

Hello,

I want to do an analysis of some mice data with DESeq2. We have a total 24 samples and three factors. Factor one is the genotyp (6 wildtype, 6x knockout 1, 6x knockout2, 6x knockout3) and factor two are two different cell lines (cellA and cellB). Factor three is the sex (male and female

Samples Genotyp Cell Sex
1 wt

cellA

m
7 wt cellA f
5 wt cellA m
2 wt cellB m
8 wt cellB f
6 wt cellB m
9 ko1 cellA m
11 ko1 cellA m
13 ko1 cellA m
10 ko1 cellB m
12 ko1 cellB m
14 ko1 cellB m
17 ko2 cellA m
21 ko2 cellA m
23 ko2 cellA f
18 ko2 cellB m
22 ko2 cellB m
24 ko2 cellB f
25 ko3 cellA f
29 ko3 cellA m
31 ko3 cellA m
26 ko3 cellB f
30 ko3 cellB m
32 ko3 cellB m

Here is the PCA

We want to compare the subgroups to each other, for example wt-cell1 vs wt-cell2, ko1-cell2 vs ko1-cell2 but also wt-cell1 vs ko1-cell1. Therefore I we wanted to group both factors into one factor and don't use interactions.

dds$group <- factor(paste0(dds$Genotyp, dds$Cell)) Then we realised that 6 of the samples didn't cluster very well. Further investigations have shown that these are female mice which is why the want to chose the following design ~ Sex + group I this the correct way? We will still be able to do all the comparison and would account for the sex. Or would it be better to remove the samples because it affects 6 out of 8 subgroups (ko1 of cellA and cellB don't have female mice). The inner variance of these two groups which don't have female mouse should be different compared to the others. If we leave the female mice in could this lead to a different set of DEGs compared to removing the 6 samples? Thanks for your feedback Best Mathias ADD COMMENTlink modified 23 months ago by Michael Love19k • written 23 months ago by mat.lesche40 How can you have the same cell line from both a female and a male mouse? I thought a cell line was a culture of cells derived from a single common ancestor. ADD REPLYlink written 23 months ago by Ryan C. Thompson6.8k Hey Ryan, Sorry this was my mistake and was not precise. It's not a cell line, it's two different mouse line. One is our standard mouse line, the other one is infected with a disease. ADD REPLYlink written 23 months ago by mat.lesche40 0 23 months ago by Michael Love19k United States Michael Love19k wrote: "~sex + group: Is this the correct way?" Yes. A fixed sex effect on gene expression for each gene can be estimated because it is not totally confounded with group. So it doesn't matter that some groups don't have female mice. ADD COMMENTlink written 23 months ago by Michael Love19k I'd recommend you make the PCA plot again with a subset of the data excluding sex chromosomes. It could be that many of the top 500 genes are on the sex chromosomes. (This doesn't change the recommended design, just would give you a picture of variation in gene expression from the autosome.) ADD REPLYlink written 23 months ago by Michael Love19k Dear Michael, Thanks for the answer and the confirmation. I have found this thread in which you answered a question regarding pairwise comparison (How to perform multiple pairwise comparisons ) Does your answer apply to my task as well. I have 28 comparisons because I have 8 pairs. Thanks Mathias ADD REPLYlink written 22 months ago by mat.lesche40 Yes you should correct globally. The default performed by DESeq2 (and edgeR and limma) is to correct the multiple testing across genes when you produce a single results table. DESeq2 does not have something like the decideTests() function in limma which helps to arrange multiple contrasts at once, so we don't have the functionality to keep track of how many tests you are doing. But it's simple enough to correct globally by following those instructions. As I see it, if one is making a small set of comparisons, e.g. B vs A, C vs A, C vs B, and one is then going to present the result of all of these comparisons in the paper, then correcting each result table separately is fine. However, you can see that running 28 or 45 contrasts is a different situation. One is not going to put the results of 45 contrasts into the paper, but instead highlight the contrasts that had the most number of DEG. And here is where FPR or FDR control would be lost, because one has performed more tests than are presented in the paper and not corrected for that multiplicity. ADD REPLYlink modified 22 months ago • written 22 months ago by Michael Love19k Hey Michael, Thanks for mentioning the global corrections. I wasn't aware of it. Maybe it would be helpful to add this to the Q&A of the DESeq2 vignette and to refer to the section in the limma vignette? For how many pairwise comparisons do you recommend to do the global corrections? Should one do it with 6 or 10 or more than 10? I read the section in the vignette and understand why and how to do it. I just want to outline it to have a confirmation. In this example and only use 6 results but I still have 28. Just want to keep it short. All my results are in a list and I extract six of them. res1 <- deresults[[1]] res2 <- deresults[[2]] res3 <- deresults[[3]] res4 <- deresults[[4]] res5 <- deresults[[5]] res6 <- deresults[[6]] # combine them to a vector together <- c(res1$pvalue,res2$pvalue,res3$pvalue,res4$pvalue,res5$pvalue,res6\$pvalue)
newtogether <- p.adjust(together,method = 'BH')

Afterwards I would take this vector and replace the padj column in all my results with the corresponding entries in 'newtogether'

Lastly, I had 22,711 DEGs in the original padj with a FDR of <0.1. Now, I have 19,505 DEGs with lower than 0.1. p.adjust doesn't offer any setting for the value (e.g 0.1 for 10% or 0.05 for 5%). For example if I want to have the padj for 0.05, do I have to set it in the results() function to 0.05, then run the global p.adjust and look at genes which have <0.05 or can I stick to 0.1 in the results() function. I know that results uses FDR for the independent filtering and I want to make sure it is a stringent as possible.

Thanks

Mathias

That's a good point. I should add it to FAQ. It's not a common setup that users do so many comparisons, but anyway I should add it.

Yes your code looks correct, you just need to combine them and then split them back up.

You can do something like split(newtogether, rep(1:6, each=x)) where x is the number of rows of the dds. This will break the long vector into a list again.

You should use the same alpha in results() and when you do thresholding of the global adjusted p-values.

Thanks for your answer, I really appreciate your help! Because we have another project with 10 comparisons coming up, I'd really like to know at how many comparisons one should do the global corrections? Sorry for all the questions but I always learn something new here.

I don't have a number per se.

I think it comes down to my comment (C: Grouping a multifactor design in DESeq2 and accouting for sex) above:

"One is not going to put the results of 45 contrasts into the paper, but instead highlight the contrasts that had the most number of DEG."

If you are picking among all of the comparisons you did and only mentioning those comparisons with the most DEG in the paper, you need to do global correction. Any kind of highlighting of individual comparisons based on the number of DEG, when many comparisons were done is problematic.

Hey Michael,

I did the PCA again without the sex chromosomes and PC1 went from 50 to 55% and PC2 from 26 to 23%. A plot of genes with the highest variance doesn't show Xist anymore. So I stick to ~sex+group and that's all I can do.