Hi everyone,
I am carrying out DE analysis on case versus control microarray datasets from array express. I have been reading lots of posts relating to obtaining DE genes from limma. A lot of people seem to carry out DE in limma and then select DE by using both logFC (absolute > 2 /1.5) and FDR pvalue (<0.05). However i have read that this might not be suitable and that by using the treat function in limma this takes into account the fold change when computing the p value.
The code i have been using for the DE involving treat is (as defined by the publication for TREAT):
resultmulti1 <- treat( resultmulti1, lfc=log2(1.1),trend=TRUE , robust = TRUE)
Question1: are there instances where people would advise not to use treat? For example where there is low fold change for many genes?
Question 2: If i use treat in my DE analysis to define DE genes i will then only need to filter on FDR pvalues as treat has already taken into account Fold change when calculating the pvalue? I was going to use a cutoff of 0.05 but can i use higher values if my list of DE is low?
Question 3: I am planning on using the pvalues obtained from DE in limma using the treat function for pathway enrichment (Kolmogorov–Smirnov testing the distribution of p values of genes in a KEGG pathway versus p values from genes not in a KEGG pathway) i was wondering if this is ok and that the adjustment of the p value that takes into account the fold change is a suitable for pathway enrichment?
Thanks for any help it is much appreciated!
Danielle
Hi Aaron,
Thank you so much for your reply it is very helpful.
Just a bit more explanation of what i am doing. I have about 20 gene expression datasets and i want to see if the gene signatures extracted from these relate and/or can separate case versus control for another disease (bit of a long shot so it is very exploratory at the moment!). So i was hoping for a single workflow that i could analyse these datasets however from what i have read and your response i dont think it is that simple!
For some of my datasets there is minimal logfold change even when i use a threshold of 1 so the DE signal is weak so i am unsure how to proceed with this as i thought i could apply a one threshold for all my datasets but now i see that is not really possible as the threshold can really depend on a dataset, I was thinking of ranking the genes based on logfold change and p value and taking the top genes and go from there- your thoughts and expertise on this would be great.
I have not much experience with using the KS test however it seems to work ok! When i used it on a disease case control dataset the disease in question came up as a significant pathway. I have been using the uncorrected p values so i am glad you pointed that out as well!
Thank you for the suggestions for the inbuilt packages i am definitely going to have a play with them for my datasets