I have a question on the correct way to use camera with a filtered gene expression matrix.
Let's assume that I have a n * m expression matrix (array or RNA-Seq, it doesn't matter) and that after filtering the features below a certain intensity/cpm, I am left with a matrix n' * m, being k = n - n' the number of features removed. Presumably most of these filtered features will not be significantly associated with the phenotype.
Now, if I understand correctly, when I use 'ids2indices' to map the elements of a given gene set to the features of the expression matrix, the elements with no match will contain an NA, and will not excluded from the rest of the analysis. This means that if I have a gene set where only 10% of the genes are present in the filtered expression matrix, the actual gene set that will be tested will be composed by that 10%. In my (very possibly incorrect) understanding, this makes perfect sense if the non-matching features are actually not testable (for example if the array does not contain probe sets mapping them). However, in the case of filtered features I am a bit confused. In the example above, if that 10% of the genes in the gene set was associated with the phenotype, and the remaining 90% was removed, I would probably see a significant association of the gene set with the phenotype. If, instead, I kept that 90% of genes that are not significantly associated with the phenotype in the gene set, I would probably obtain a non-significant result. My questions therefore are:
1. Is my understanding correct?
2. If yes, what would be the best way to retain the information of the k weak, (moslty) non-significant features in the analysis?
Apologies for the somewhat lengthy question, and many thanks in advance.