Dear all,
We are two biologists (so very new in bioinformatic field...) working with RNAseq data and having little "troubles" with pathways analysis. We performed mRNA sequencing on 4 distinct cell populations to compare their transcriptional profile (platform Illumina HiSeq 2000). Row reads were mapped using TopHat and differential analysis was performed with edgeR+voom+limma packages. Our final output is a table (.txt file) for each contrast containing our 16058 expressed genes with respective log fold change, expression values (normalized) and adjusted p-values. We wish to perform pathway enrichment analysis (first GO for a global level and KEGG for a more precise analysis) to determine which pathways are enriched/depleted in specific cell population compared to the others in order to infer cell-type specific functional signatures. However, we have difficulties to find an optimal method to do this. We tried several packages (e.g gage, goseq) and web-based softwares (e.g GeneGO, AmiGO) and found different outputs (sometimes opposite results). What could be the more "validated" method/package for these analysis? In addition, we found differences considering the input data (raw reads, log FC, a list of differentially expressed genes?) and finally we don't understand what should be the input data for the analysis (we think that it is dependent of the package/method used?). So, does anyone of you experiences with GO/KEGG for RNAseq and maybe help us to use a good quantification method please?
Thank you very much for your help.
Best,
Nicolas
Dear Steve, Julie and Gordon,
Thank you for your advices. We will first go through goseq analysis in detail to try fully understand statistic concepts behind this king of analysis. In our case, it should be sufficient as we are just interested to illustrate some pathways/ontologies overrepresented in specific cell populations. We will come back to the post if we encounter problems...
Thank you all.
Very best,
Nicolas