Question

Guidance Requested on GCN Network Analysis and Module Enrichment in R

0

Entering edit mode

Sabina • 0

@6235b5f7

Last seen 7 months ago

United States

Hello,

I am currently working on gene co-expression network (GCN) analysis, specifically focusing on the inference and analysis of GCNs using R. I have generated a SummarizedExperiment object using an RPKM expression matrix, a GTF file for gene annotation, and associated sample metadata. However, I am uncertain whether the processing steps were correctly implemented.

To verify the consistency of my processed expression matrices (final_exp1 and final_exp2), I used the following comparison:

identical(
    assay(final_exp1)[1:5, 1:5],
    assay(final_exp2)[1:5, 1:5]
)

This returned FALSE, indicating a discrepancy between the two matrices, despite both having the same dimensions. I am unsure what may have caused this difference.

Additionally, I attempted to perform functional enrichment analysis using the module_enrichment() function with MapMan annotations:

sea_mapman <- module_enrichment(
    net = gcn,
    background_genes = rownames(final_exp1),
    annotation = gma_annotation$MapMan
)

However, the analysis did not return any enriched terms. I am uncertain whether this is due to an issue with the annotation object, the network structure, or the format of background genes.

Could you please advise me on:

Possible reasons for the discrepancy between final_exp1 and final_exp2, even when dimensions match.

How to ensure that module_enrichment() works correctly with MapMan annotations, and what checks I should perform on the inputs.

Thank you in advance for your guidance.

Best regards, Sabina Tiwari

BioNERO Bioconductor Arabidopsis_thaliana_Data • 497 views

ADD COMMENT • link updated 3 months ago by Kevin Blighe ★ 4.0k • written 7 months ago by Sabina • 0

score 0 · Answer 1 · 2025-11-21

The discrepancy between final_exp1 and final_exp2 could arise from several factors, despite matching dimensions. The identical() function requires exact equality, including data types, attributes, and values. Possible causes include minor numerical differences due to floating-point precision in processing steps, such as normalization or imputation. Differences in row or column names, even if subtle, would also trigger FALSE. Additionally, one matrix might contain integers while the other has numerics, or there could be unnoticed variations from different filtering thresholds or annotation merging.

To investigate, use:

all.equal(assay(final_exp1), assay(final_exp2))

This will highlight specific differences. Also check:

identical(rownames(final_exp1), rownames(final_exp2))
identical(colnames(final_exp1), colnames(final_exp2))
str(assay(final_exp1))
str(assay(final_exp2))

For module_enrichment() --assuming this is from the BioNERO package for gene co-expression networks-- the lack of enriched terms may stem from mismatched gene identifiers between the network, background genes, and MapMan annotations. MapMan uses hierarchical bins for plant functional categories, so ensure gma_annotation$MapMan is formatted correctly as a list (with bin names as elements containing gene vectors) or a data frame with columns for genes and bins.

Perform these checks:

Verify the structure: str(gma_annotation$MapMan). It should align with the function's expected input for hypergeometric testing.
Confirm overlap: length(intersect(rownames(final_exp1), unlist(gma_annotation$MapMan))) should be substantial; low overlap prevents enrichment.
Inspect network modules: Extract genes per module from gcn and check their intersection with background and annotations.
Test significance: Run with adjusted p-value thresholds, as default settings may filter out terms.

If annotations are from Mercator or GoMapMan, validate gene IDs match your expression data (e.g., Ensembl or TAIR format).

Kevin