1
0
Entering edit mode
Biologist ▴ 90
@biologist-9801
Last seen 2.4 years ago

Hello,

For the Differential analysis I'm using "limma" R package. I'm using counts data. A matrix "U" with genes as rows and samples as columns. Initially I have done some filtering steps like following:

mycpm <- cpm(U)
thresh <- mycpm > 2
keep <- rowSums(thresh) >= 3

table(keep)

keep
FALSE  TRUE
41720 18768
counts.keep <- U[keep,]

dim(counts.keep)
[1] 18768   366

y <- DGEList(counts.keep)
y <- calcNormFactors(y, method = "TMM")

design2 <- model.matrix(~type -1)

GP1n3 GP2
1     1   0
2     1   0
3     1   0
4     1   0
5     1   0

contrast.matrix <- makeContrasts(GP1n3-GP2, levels=design2)

contrast.matrix
Contrasts
Levels  GP1n3 - GP2
GP1n3           1
GP2              -1

v <- voom(y, design2, plot=TRUE)

fit <- lmFit(v,design2)

fitC <- contrasts.fit(fit,contrast.matrix)
fitC <- eBayes(fitC,robust=TRUE,trend=TRUE)

summary(decideTests(fitC))

GP1n3 - GP2
-1           0
0        18768
1            0

There are no upregulated and downregulated genes between GP1n3 and GP2. But when I looked for significant differential expressed genes:

sig.deg2 = subset(tab, abs(logFC)>0.5 & tab$P.Value < 0.01) head(sig.deg2) logFC AveExpr t P.Value adj.P.Val B ENSG00000092621.10 0.6884128 4.7057954 3.367474 0.0008388354 0.9999277 -1.308562 ENSG00000204610.11 0.6203649 2.5270859 3.050272 0.0024522414 0.9999277 -2.310364 ENSG00000143195.11 0.8738004 0.5791553 3.080063 0.0022253302 0.9999277 -2.828140 ENSG00000204613.9 0.5080240 0.5260141 2.756266 0.0061375019 0.9999277 -3.245375 ENSG00000254521.5 -0.6332099 -1.9619206 -3.072489 0.0022808742 0.9999277 -3.488171 ENSG00000100625.8 0.5970696 -1.3266248 2.713731 0.0069658381 0.9999277 -3.650128 I don't know for every comparison I get adj.P.Val same for all the genes and decideTests give "-1" and "1" as zeros. Am I doing wrong some where? Can anyone help me in this. Thank you !! limma r differential gene expression • 1.5k views ADD COMMENT 1 Entering edit mode @gordon-smyth Last seen 1 hour ago WEHI, Melbourne, Australia You should not use voom() together with eBayes(trend=TRUE). You should use one or the other, but not both (as explained in Chapter 15 of the limma User's Guide). Apart from that, I don't see any clear problems. You just don't have any differential expression. There's no contradiction in the results. ADD COMMENT 0 Entering edit mode Dear Gordon, Thanks for the reply. As you said I used only eBayes(trend=TRUE). I have done another comparison. MB1 - MB2 -1 1 0 16904 1 1 sig.deg = subset(tab, abs(logFC)>0.5 & tab$P.Value < 0.01)

logFC  AveExpr         t             P.Value          adj.P.Val          B
PRR18   0.7536122 5.163854  3.396209 0.0007577785 0.3583424 -0.7996981
DACH2  -0.6016372 0.845996 -3.375142 0.0008163282 0.3583424 -0.8561809
HELLS  -0.5986486 5.800163 -3.353323 0.0008813889 0.3583424 -0.9143339
OSBPL6 -0.6416259 5.887980 -3.226050 0.0013674819 0.4128330 -1.2465415
POLQ   -0.5932946 5.515293 -3.212282 0.0014328446 0.4131078 -1.2817619
MYH1    0.6999817 1.053430  3.211477 0.0014367504 0.4131078 -1.2838148

Does this mean that this comparison shows only one upregulated and one downregulated genes? May I know one what cutoff this gives up and down regulated info.

And with adj.P.val is same for three genes and two genes. What does this mean?

I must admit that my statistical knowledge is pretty low. If adj.P.val less than 0.05 then those genes are differentially expressed?

0
Entering edit mode

There are a few things wrong here.

• Don't do post hoc filtering on the log-fold change. This will invalidate the p-values. If you have a  change threshold below which the DE is uninteresting, use it in treat instead.
• Don't select genes based on the p-value, use the adjusted p-value to correct for multiple testing.
• Yes, you only have 2 genes that are detected at a FDR of 5% (the default for decideTests).
• The adjusted p-value can be the same for several genes due to enforced monotonicity in the BH correction, i.e., to ensure that lower p-values result in lower adjusted p-values. This is expected.
• The idea is to define all genes with adjusted p-values below X% as being DE. Within that set of of DE genes, the expected proportion of false positives should be less than X%.
0
Entering edit mode

Dear Aaron,

Thanks for the reply. I didn't understand it. Could you please tell me clearly. You mean I need to select siginificant differential genes like following?

sig.deg = subset(tab, abs(logFC)>0.5 & tab$adj.P.Val < 0.05) What is post hoc filtering on log FC? ADD REPLY 0 Entering edit mode No. Don't filter in the log-fold change manually. Use treat. ADD REPLY 0 Entering edit mode I'm really not aware about this. Could you please tell me how to give that? ADD REPLY 0 Entering edit mode Read the documentation in ?treat. ADD REPLY 0 Entering edit mode You mean instead of eBayes and topTable like mentioned above in the code, I should use treat? fitC <- treat(fitC,lfc=log2(1.1),trend=TRUE, robust=TRUE) top_treat <- topTreat(fitC,number=nrow(fitC),adjust.method="fdr") Do you think this is right? ADD REPLY 0 Entering edit mode Yes, that's good. ADD REPLY 0 Entering edit mode OK, now you have 2 DE genes. The adjusted p-value represents false discovery rate FDR, and you have 2 genes that are significantly DE at FDR < 0.05. To see which genes these are, please use head(tab). Don't use subset -- there's just no point in that because topTable has already ranked the most important genes for you. ADD REPLY 0 Entering edit mode Dear Gordon, You mean 2 DE genes are the one showing "1" and "-1" with summary(decideTests(fitC)) ? May I know on what cutoff this is decided? Ok. I'm not using subset now. So, as you said To see the two differential expressed genes I gave head(tab) logFC AveExpr t P.Value adj.P.Val B GINS3 0.4986436 6.588254 4.708722 3.536275e-06 0.03412768 3.342665 C17orf103 -0.4577375 8.099347 -4.680004 4.037345e-06 0.03412768 3.239338 POLE3 0.2610686 10.143267 4.325623 1.961349e-05 0.10254673 2.009803 MNT -0.2703702 8.353955 -4.223649 3.033044e-05 0.10254673 1.671833 SUDS3 0.2435457 9.462002 4.167922 3.835098e-05 0.10254673 1.490172 NHSL1 0.4976037 9.729718 4.165704 3.870899e-05 0.10254673 1.482981 But what does this mean? From this which are differential expressed genes? Here for the first two genes adj.p.values are less than 0.05. could you please tell me clearly. my statistical knowledge is pretty low ADD REPLY 1 Entering edit mode For your first question, the documentation is your friend. If you read ?decideTests, it will tell you that the threshold on the adjusted p-value is 0.05. For your second question, look at the all the rows with adjusted p-values below 0.05. This will give you a set of DE genes at a FDR of 5%. In this case, your set of DE genes would contain 2 genes, GINS3 and C17orf103. ADD REPLY 0 Entering edit mode Thank you very much !! But I see some people selecting DEG's based on logFC and P.value like I mentioned above sig.deg = subset(tab, abs(logFC)>0.5 & tab$P.Value < 0.05)

So, I'm confused with this. But Gordon said not to use subset.

2
Entering edit mode

Those people are misguided. Your subset approach is incorrect; for starters, there is no correction for multiple testing, but even if there was, filtering on a minimum log-fold change can inflate the actual false discovery rate. This is because the log-fold change statistic does not consider the variability of the data. In the worst case, your log-fold change filter will enrich for false positive genes that have high variances and thus are more likely to achieve large log-fold changes just by chance. This is why we recommend treat.

0
Entering edit mode

Dear Aaron,

In this post (C: Possible ways of performing differential gene expression and analysis of RNA-Seq) Gordon gave a code for Differential analysis with RSEM values. In that he used a filtering step keeping genes that have about 10 counts or more in atleast 14 samples. Which means there are 14 samples in the smallest "experimental group".

In my case:

table(targets\$Sample.type)

MB1   MB2
286     80

So, on what sample number should I filter now?

Do I need to filter like this?

keep <- rowSums(y > log2(11)) >= 80

0
Entering edit mode

Your question is unrelated to the current post. Post a new question.

0
Entering edit mode