Question: qsea non-TCGA 450K Human Methylation Calibration Dataset
0
gravatar for wildist
15 months ago by
wildist0
wildist0 wrote:

Hello,

         I am using qsea to study the methylome in paediatric cancer. Since TCGA does not provide such calibration data, I have to do it myself. I mimic the tutorial example to create a Granges object contain all 450k probe data. After executing qseaSet=addEnrichmentParameters(qseaSet, enrichmentPattern="CpG", windowIdx=wd, signal=signal), I get following error: 

   Error in estimateEnrichmentLM(qs, windowIdx = windowIdx, signal = signal,  : number of samples (20) does not match number of columns (6) in "signal".

         I separately run the tutorial again and cross-check the variable "signal", their format is the same. I have no idea what is the number of columns(6) refer to. Could anyone help me with this problem?

Thank you,

Walter

qsea • 358 views
ADD COMMENTlink modified 15 months ago by Matthias Lienhard140 • written 15 months ago by wildist0

Hello,

        The error has been solved. But I have encountered another one. I successfully added the normalization file for calculation. Then I compared the data with/without normalization which give me the same number of windows (500bp each). I used nfpkm instead of beta as the norm_method. I would like to ask is it normal? Since I am expecting a change in the total number of window after using 450k data to normalize.

 

Thank you

Walter

ADD REPLYlink modified 15 months ago • written 15 months ago by wildist0
Answer: qsea non-TCGA 450K Human Methylation Calibration Dataset
0
gravatar for Matthias Lienhard
15 months ago by
Max Planck Institute for molecular Genetics, Berlin, Germany
Matthias Lienhard140 wrote:

Hi Walter,

"signal" can either be

  • a single numeric value (e.g. 1 for 100% methylation) which is used for all selected windows and all samples, or
  • a numeric vector of the same length as "windowIdx" (the selected windows), or
  • a matrix, were each row corresponds to a selected window and each column to a sample.

It seems to me you provide a matrix, now (in the comment) with the correct dimension.

The calibration (either "blind" or with additional data) enables the estimation of beta-values, which for example allow comparison with 450k data. It does not change the number of genomic windows. The number of windows is determined by the window size (specified in the first step createQseaSet) and the genome size.

Best, Matthias

ADD COMMENTlink modified 15 months ago • written 15 months ago by Matthias Lienhard140

Hello Matthias,

         Thank you for your reply. We finally complete the first set of analysis by using qsea and bisulfites sequencing has been used to confirm some DMR. It really works. In next stage, I want to use clustering method to see if there is any molecular subtype of the disease. I would like to ask if there is any method to get the beta value of all regions for doing cluster analysis?

Best,

Walter

 

ADD REPLYlink written 13 months ago by wildist0

Hi Walter,

great that you was able to validate the QSEA results! Sure you can extract all beta values using the makeTable function. Just leave the parameters "ROIs" and "keep" empty, and make sure to select norm_methods="beta" and samples=getSampleNames(qseaSet).

Note that for regions with low enrichment signal (due to low CpG density in the region), estimates become unreliable and are set to NA. You can control the threshold with the minEnrichment Parameter, e.g. minEnrichment=0 gives you values for all regions - however some at low confidence. You may also want to consider the credibility interval (CI) of the estimates, which can be produced by setting norm_methods=c("beta", "betaLB", "betaUB") for 95% CI or norm_methods=c("beta", "q5", "q95") for a 90% CI. You can adapt the numbers of the Interval bounderies. betaLB is alias for q2.5 and betaUB for q97.5.

see also: ?normMethod

ADD REPLYlink written 13 months ago by Matthias Lienhard140

Solved. Thank you

Best, Walter

ADD REPLYlink modified 4 months ago • written 4 months ago by wildist0

Hello Matthias,

     After update qsea to 1,10, I found problem in loading bam file into analysis. It happens in the addCoverage step which has following error code:

    [W::bam_hdr_read] bgzf_check_EOF: No error
    Error in lapply(ReadsL, "names<-", fields) : 
    'names' attribute [4] must be the same length as the vector [0]

     I tried both newly generated and old bam (previously ok), this error randomly happened.Could you help me to figure out?

    Thank you for your help.

Regards, Walter

ADD REPLYlink written 3 months ago by wildist0

Hi Walter, thanks for letting me know. However, I cannot reproduce the error here. Can you please confirm that you updated all packages to Bioconductor 3.6? By "this error randomly happened" you mean that it does not always occur, but only with specific data?

Anyway, if this error persists, please open another thread as this seems to be unrelated to the previous issues. Best, Matthias

ADD REPLYlink written 3 months ago by Matthias Lienhard140

Hello Matthias,

     Thank you for your reply. The problem is persists. I will start another thread after summarise what we get.

Best, Walter

ADD REPLYlink written 3 months ago by wildist0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 109 users visited in the last hour