Question

MethylKit::calculateDiffMeth with two experimental factors

0

Entering edit mode

Andrés • 0

@986856dc

Last seen 23 months ago

Argentina

Hi, I'm working with whole-genome EM-seq data from an experiment with two factors:

Two genotypes (with contrasting response to a pathogen)
Infected and non-infected

Only one tissue and one time analyzed, with two biological replicates for each treatment combination.

I would like to know:

Which sites/regions are differentially methylated in response to infection independently of genotype. (Figure 1)
Which sites/regions are diff. methylated in response to infection and between genotypes (Figure 2)
Which sites/regions are diff. methylated in response to infection in a genotype-dependent manner. (Figure 3)
Which sites/regions are differentially methylated between genotypes independently of infection. (Figure 4)

Figures 1-4

I followed the Bismark alignment and extraction pipeline, and now I'm following the methylKit package tutorials.

According to it, in cases like mine with two factors and two biological replicates, I can apply logistic regressions to test for differential methylation. In the calculateDiffMeth fuction, one of the factors should be "treatment" and the other should be "covariate". I assume that to get sites diff. methylated in response to infection (goals 1-3) I need to load the infection factor as "treatment" and the genotypes as "covariate".

BUT, the resulting object is simply a list of sites/regions and their pvalue, qvalue and methdiff, without specifying the significance of each component in the regression. Which means that, for example, I can't know whether the interaction between factors was significant (case Figure 3).

My question is,

How do I access the full information of the tested logistic regressions?
And if I can't, how do I work around it?

I understand that to get the sites diff. methylated due to genotype (goal 4) I should switch genotype as "treatment" and infection as "covariate". But would this somehow alter the correction I should apply to the p-values?

And again, how do I identify sites diff. methylated in response to infection depending of genotype?

I thought about testing diff. meth. against infection separately for each genotype and then contrast the resulting lists, but once again, would this somehow alter the correction I must apply to the p-values?

Here's an example of my code and my results:

> meth.CHG_promoters
methylBase object with 61192 rows
--------------
    chr start  end strand coverage1 numCs1 numTs1 coverage2 numCs2 numTs2 coverage3 numCs3 numTs3 coverage4 numCs4 numTs4 coverage5 numCs5 numTs5 coverage6 numCs6 numTs6 coverage7 numCs7 numTs7 coverage8 numCs8 numTs8
1 HanXRQCP    97 1597      -     41775    936  40839     40096   1047  39049     49664   1041  48623     40612    628  39984     69526   3333  66193     75164   1084  74080     53468   1418  52050     53950   2110  51840
2 HanXRQCP  1266 2766      +     46631    866  45765     42949   1025  41924     54426   1054  53372     45499    669  44830     75929   3657  72272     86452   1419  85033     60449   1657  58792     59809   2511  57298
3 HanXRQCP  1524 3024      -     29216    419  28797     25039    497  24542     33037    540  32497     28064    374  27690     44816   2021  42795     53065    812  52253     37224    934  36290     35293   1300  33993
4 HanXRQCP  1876 3376      -     29126    420  28706     24639    465  24174     32690    481  32209     27573    354  27219     43789   1889  41900     53152    783  52369     37005    883  36122     34954   1186  33768
5 HanXRQCP  4681 6181      -     34258    838  33420     28101    716  27385     37988    756  37232     33004    552  32452     49461   1798  47663     64667    700  63967     45144    883  44261     42012   1227  40785
6 HanXRQCP  5495 6995      -     43559    979  42580     37327    974  36353     49203   1055  48148     42015    782  41233     64553   3094  61459     84218   1236  82982     57475   1503  55972     56022   2263  53759
--------------
sample.ids: infected_var1_r1 infected_var1_r2 control_var1_r1 control_var1_r2 infected_var2_r1 infected_var2_r2 control_var2_r1 control_var2_r2 
destranded FALSE 
assembly: HanXRQ-2.0 
context: CHG 
treament: 1 1 0 0 1 1 0 0 
resolution: region 

> myDiff.CHG_promoters <- calculateDiffMeth(meth.CHG_promoters,
+ covariates=data.frame(genotype=c('var1','var1','var1','var1','var2','var2','var2','var2')),
+ overdispersion = "MN",adjust="BH",mc.cores=30)

> myDiff.CHG_promoters
methylDiff object with 61192 rows
--------------
    chr start  end strand    pvalue qvalue    meth.diff
1 HanXRQCP    97 1597      - 0.9161677      1  0.196036107
2 HanXRQCP  1266 2766      + 0.9603097      1  0.089608732
3 HanXRQCP  1524 3024      - 0.9686611      1  0.108272516
4 HanXRQCP  1876 3376      - 0.9622652      1  0.163918331
5 HanXRQCP  4681 6181      - 0.8713753      1  0.134653151
6 HanXRQCP  5495 6995      - 0.9290056      1 -0.001156726
--------------
sample.ids:  infected_var1_r1 infected_var1_r2 control_var1_r1 control_var1_r2 infected_var2_r1 infected_var2_r2 control_var2_r1 control_var2_r2  
destranded FALSE 
assembly: HanXRQ-2.0 
context: CHG 
treament: 1 1 0 0 1 1 0 0 
resolution: region

PS1: I understand the three classic cytosine contexts (CpG, CHG and CHH) are analyzed independently from each other.

PS2: I have posted this same question in Biostars, I hope this isn't a problem https://www.biostars.org/p/9571357/#9571357

> sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /shared/software/lib/R/lib/libRblas.so
LAPACK: /shared/software/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8         LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] BiocParallel_1.28.3  Biobase_2.54.0       genomation_1.32.0
[4] methylKit_1.26.0     GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[7] IRanges_2.28.0       S4Vectors_0.32.4     BiocGenerics_0.40.0

methylKit • 936 views

ADD COMMENT • link 23 months ago Andrés • 0