Search
Question: Appropriate use of the function treat of the limma package
3
3.4 years ago by
svlachavas610
Greece/Athens/National Hellenic Research Foundation
svlachavas610 wrote:

Dear ALL,

i would like to use the function treat implemented in limma, in order to test simultaneously the test significance of my microarray linear model fit and my design. Because i have never used in the past, the argument fit in treat refers to the output of lmFit or from the ebayes fuction ? Moreover, if i would like to put an absolut log fold change > 1 i use lfc=1 ?

And finally, i can use the function topTreat to get the output from treat ? Please excuse me for my "naive" questions but im a beginner in R.

modified 3.4 years ago • written 3.4 years ago by svlachavas610
3
3.4 years ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

The output from lmFit (or contrasts.fit) can be used as input into treat. You don't need to run eBayes beforehand, though there's no problem in doing so; all the EB statistics from eBayes will just be ignored within treat.

For your second question - yes, you can set lfc=1 to get an absolute log-fold change above 1. Note that the "effective" threshold will be larger, as treat tests for significant differences from the specified lfc. A gene with a log-fold change that is just above 1 will probably not be detected.

Finally, you can use topTreat to obtain output from treat. The interpretation of the output should be the same as that from topTable. You could also use topTable itself, so long as you set sort.by to something other than "B".

0
3.4 years ago by
svlachavas610
Greece/Athens/National Hellenic Research Foundation
svlachavas610 wrote:

Dear Aaron, i just implement your above directions, but the topTreat it gives me the below output:

201492_s_at 15.38098 15.21090 99.51217 3.215498e-22 1.023059e-18
212284_x_at 14.88396 14.91979 96.78134 4.773642e-22 1.023059e-18
213583_x_at 14.75143 14.69249 95.69347 5.611040e-22 1.023059e-18
212869_x_at 14.98989 15.12222 95.51570 5.786827e-22 1.023059e-18
206559_x_at 14.98286 14.95147 94.10926 7.173611e-22 1.023059e-18
204892_x_at 14.99753 15.00060 93.45458 7.939056e-22 1.023059e-18
213614_x_at 15.21086 15.10123 92.62842 9.060488e-22 1.023059e-18
211943_x_at 14.60429 14.81567 90.02475 1.355923e-21 1.242940e-18
211970_x_at 14.50193 14.48519 89.69634 1.427240e-21 1.242940e-18
213477_x_at 15.19064 15.04754 89.16746 1.572545e-21 1.242940e-18

it is something wrong with my logFCs being so big? or something wrong with my code?

1

It's okay, treat doesn't shrink the log-fold changes so they end up being very large if there's any zero counts involved. Check out A: treatDGE vs glmLRT for more details.

I understand. Also in the paper it is mentioned that the p-values(& so adjusted p-values) returned by treat are larger than those of the moderated t-thus it also makes sense from the output above. In your opinion it is better to keep the top ranked genes from the output of the topTreat than to filter the output from topTable for simultaneously adj.p value and logFC cutoff ? I need also to mention that my design matrix includes a paired analysis between cancer and control samples

1

I'd use the top-ranked genes from topTreat. There are statistical issues with filtering simultaneously on the p-value and log-fold change, as this may select for variable genes that are more likely to be false positives. Your experimental design shouldn't matter at this stage of the analysis, so long as you can fit an appropriate linear model.

Basically i found a mistake from above: i leave by accident the default coef=1--but design table looks like this:

(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 pairs6 pairs7 pairs8 pairs9 pairs10 pairs11

So when i put coef=2 in treat(which is what i want to compare---cancer vs control samples), the output changes:

206209_s_at -7.728190 8.438558 -5.511732 3.566947e-05 0.07797065
209612_s_at -6.102062 6.009863 -5.469030 3.736761e-05 0.07797065
208399_s_at -4.557960 7.965032 -5.441256 3.856201e-05 0.07797065
206784_at   -8.451506 8.467254 -5.466068 3.945883e-05 0.07797065
204719_at   -5.489989 5.589615 -5.233019 5.740157e-05 0.08771005
207502_at   -6.597180 6.788759 -5.164605 6.658152e-05 0.08771005
209687_at   -5.317351 6.320637 -5.018424 8.587035e-05 0.09177763
205950_s_at -7.047032 9.629112 -4.948804 1.014201e-04 0.09177763
207003_at   -6.409210 9.641009 -4.925924 1.045039e-04 0.09177763
203797_at    4.144089 6.873026  4.792469 1.305887e-04 0.09576808

1

Setting coef=2 would be correct, assuming that the second coefficient represents the change in cancer above normal. In this case, you don't get any DE genes at a FDR of 5%. You may want to consider reducing lfc to, e.g., log2(1.5), as a DE gene with a log-FC close to 1 will not be detected for reasons previously discussed.

Your notification is very important--although the p-values returned from TREAT are usually larger than the moderated t-test as i mentioned---but TREAT is testing a different hypothesis and needs stronger evidence for DE if i understand well. So in your opinion should i use lfc=0.5 or even lfc=0.3 in order to keep the FDR < 0.05 ?

2

Yes, the higher the threshold, the stronger the evidence needed for DE. As for the threshold value, it depends on your research question, as well as how different your cancer samples are from the normal ones. If there isn't much differential expression, it might be better not to use TREAT at all.