Appropriate use of Limma Treat function
1
0
Entering edit mode
@daniellenewby-12503
Last seen 6.8 years ago

Hi everyone,

I am carrying out DE analysis on case versus control microarray datasets from array express. I have been reading lots of posts relating to obtaining DE genes from limma. A lot of people seem to carry out DE in limma and then select DE by using both logFC (absolute > 2 /1.5) and FDR pvalue (<0.05). However i have read that this might not be suitable and that by using the treat function in limma this takes into account the fold change when computing the p value.

The code i have been using for the DE involving treat is (as defined by the publication for TREAT):

resultmulti1 <- treat( resultmulti1, lfc=log2(1.1),trend=TRUE , robust = TRUE)

Question1: are there instances where people would advise not to use treat? For example where there is low fold change for many genes?

Question 2: If i use treat in my DE analysis to define DE genes i will then only need to filter on FDR pvalues as treat has already taken into account Fold change when calculating the pvalue? I was going to use a cutoff of 0.05 but can i use higher values if my list of DE is low?

Question 3: I am planning on using the pvalues obtained from DE in limma using the treat function for pathway enrichment (Kolmogorov–Smirnov testing the distribution of p values of genes in a KEGG pathway versus p values from genes not in a KEGG pathway) i was wondering if this is ok and that the adjustment of the p value that takes into account the fold change is a suitable for pathway enrichment?

Thanks for any help it is much appreciated!

Danielle

limma treat • 2.2k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay
  1. Well, if you don't have a lot of DE or if your DE log-fold changes are weak, you would be better off with a standard DE analysis if you want to get some results that you can follow up. treat is intended to cull large DE lists to focus on genes that have with strong changes (and thus are more likely to be interesting).
  2. That's right, in the sense that the absolute value of the log-fold change of DE genes will be significantly larger than whatever threshold you set in lfc. No additional filtering on the log-fold change is required. As for the other question, you can use any FDR threshold you want, but obviously the list will become less reliable as you increase the threshold. There's no point getting a larger list if it's filled with false positives.
  3. I've never seen the KS test applied to the p-values before. I guess it's okay, though there are some potential issues with power (e.g., underpowered if you don't have a lot of DE genes, sensitive to irrelevant fluctuations in significance at low p-values). Unless there's a good reason, I would use the inbuilt functions goana and kegga. In any case, if you still want to use the KS test, I would at least use the uncorrected p-values - the enforced monotonicity in the BH correction may result in many ties, which could do funny things with the KS calculation.
ADD COMMENT
0
Entering edit mode

Hi Aaron,

Thank you so much for your reply it is very helpful.

Just a bit more explanation of what i am doing. I have about 20 gene expression datasets and i want to see if the gene signatures extracted from these relate and/or can separate case versus control for another disease (bit of a long shot so it is very exploratory at the moment!). So i was hoping for a single workflow that i could analyse these datasets however from what i have read and your response i dont think it is that simple!

For some of my datasets there is minimal logfold change even when i use a threshold of 1 so the DE signal is weak so i am unsure how to proceed with this as i thought i could apply a one threshold for all my datasets but now i see that is not really possible as the threshold can really depend on a dataset, I was thinking of ranking the genes based on logfold change and p value and taking the top genes and go from there- your thoughts and expertise on this would be great.

I have not much experience with using the KS test however it seems to work ok! When i used it on a disease case control dataset the disease in question came up as a significant pathway. I have been using the uncorrected p values so i am glad you pointed that out as well!

Thank you for the suggestions for the inbuilt packages i am definitely going to have a play with them for my datasets

ADD REPLY

Login before adding your answer.

Traffic: 779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6