I'm new to bioinformatics and ran into block while trying to run GSEA for ChIRP-seq data from murine lncRNA-overexpressed vs control.
I received a BED file each that has raw counts for treated and control. I also received a BED file with peaks and their enrichment scores. I used Bedmap (Bedops) to align the peaks that have the highest enrichments scores with max count scores from sequences. I thought I would be inputting the max count scores from these peak-aligned sequences into GSEA for treated vs control but I'm starting to realize this is not the proper way.
Can someone help me figure out which count data I would input for each gene? At the moment I have something like the following expression dataset (gct):
How do I pick the peak and their corresponding counts? My understanding is there is only one peak/count per gene.
Don't make it too complicated. A differential analysis starts from a count matrix and for ChIP-seq or any other assays that is usually the sum of the intersecting reads with the respective regions. Get the BAM files and use something like featureCounts to extract counts based on your peak file. Personally, I would not even start an analysis when the first step would be to fiddle with these kinds of non-standard files like enrichment scores, "highest enrichment", "max counts" etc, so do it as everyone else and start from the alignments (bam files). Generally, for these kinds of general questions a community such as biostars.org is helpful, as Bioconductor is for technical support with the packages that require the developers expertise towards the codebase or underlying methodology.