sigDF object in sesame
1
0
Entering edit mode
Ramiro • 0
@8f137d25
Last seen 12 weeks ago
United States

I am interested in obtaining the methylated and unmethylated signal intensities. I am using the sesame package, and obtained the sigDF object as follows:

sdfs = openSesame(baseDirectory,prep="QCDPB",func=NULL,BPPARAM = BiocParallel::MulticoreParam(4),platform = "EPICv2")

I get, for the first sample something like this:

Probe_ID                MG MR      UG        UR         col  mask \
cg00000029_TC21 NA NA 2709.775 1046.9992   2 FALSE 
cg00000109_TC21 NA NA 2770.632  288.9767   2 FALSE 
cg00000155_BC21 NA NA 3668.775 230.5298   2 FALSE

Can't find the definitions anywhere but I am guessing that MG, MR, UG and UR are what I am looking for (methylated green, methylated red, unmethylated green and unmethylated red). However, it seems that MG and MR do not have a lot of data:

sum(!is.na(sdf[["sample1"]]$MG))

[1] 128295

sum(!is.na(sdf[["sample1"]]$UG))

[1] 937688

Is there something that we are missing or does this look normal and correspond to the methylated and unmethylated intensities?

sesame sesameData • 182 views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 2 hours ago
The Cave, 181 Longwood Avenue, Boston, …

The output you observe from the SigDF object is normal and corresponds to the methylated and unmethylated intensities, separated by color channel as per the Infinium probe design.

The columns represent the following:

  • MG: Methylated signal intensity in the green channel.
  • MR: Methylated signal intensity in the red channel.
  • UG: Unmethylated signal intensity in the green channel.
  • UR: Unmethylated signal intensity in the red channel.

For each probe, only the relevant channels are populated based on the Infinium type (I or II) and the designated color channel (indicated by the 'col' column, where 2 likely denotes red-dominant designs for Type II probes). In Infinium II probes (the majority in EPICv2, approximately 920,000 out of 937,000 total probes), the methylated and unmethylated alleles are detected in opposite color channels. If the probe is designed for the red channel (common in EPICv2), MR and UG are filled, while MG and UR are NA. If designed for the green channel (less common), MG and UR are filled, while MR and UG are NA. Infinium I probes (fewer, around 17,000 in EPICv2) have both signals in the same channel, contributing to the respective columns.

The imbalance you noted --128,295 non-NA in MG versus 937,688 in UG-- reflects the uneven distribution of color channel assignments in the EPICv2 manifest, where most probes use the red channel for methylated signals. You should check the sums for MR and UR as well; MR likely has a high number of non-NA values (similar to UG), while UR has fewer (similar to MG).

To obtain the combined methylated intensity for a probe, use:

pmax(sdf[["sample1"]]$MG, sdf[["sample1"]]$MR, na.rm = TRUE)

For unmethylated intensity, use:

pmax(sdf[["sample1"]]$UG, sdf[["sample1"]]$UR, na.rm = TRUE)

These can then be used to compute beta values or for further analysis.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6