Hi everyone!
I'm working on a gene expression file from an RNA seq experiment and try to compare between two groups of samples. I'm pretty new to this world of gene expressions and need some help. I have a strong reason to believe that there should be a biological difference between the two groups (from the biological aspect). When I do different gene expression I get many differentially expressed genes and when I cluster the samples according to the genses, I don't see two clear separation between the groups. Is there a way to filter the genes I get from my differential expression algorithm so I can see a better clustering effect? I am interested in selecting genes from the list of differentially expressed genes that will separate the two groups in the best way, however, there are so many genes that are differentially expressed that I don't know how to effectively do it. Also, I have been trying two different differential gene expression methods (ttest and limma) and the genes I get from each limma don't appear in the gene list I get from the ttest. Which one should I use?
Thanks a lot!
If you get lots of DE genes from limma, and you cluster the samples on those genes, then the samples will separate by group. That is always true. There may be other subgroups as well, but certainly the main groups should separate. If this doesn't happen then it follows that you must have done something wrong, either in the DE analysis or in the clustering. We can't tell what's wrong though from the limited information you've given. You might find it helpful to have a read through the posting guide to see the sort of information that a question on this support site should provide.
Thanks!
To be more specific, I am using an expression file which was the output of Kallisto, and the values in the expression matrix are in TPM, not the raw counts. I transformed the expression file to log(TPM+1) and ran limma on it.
Could that explain (at least in part) the source of my problem?
Thanks
In my opinion there is no good way to conduct a DE analysis of TPM values from Kallisto. TPM values throw away too much information that DE software needs to know.
I would be surprised though if that is the entire source of the problem.