Question

cn.mops encounter negative width problem

0

Entering edit mode

thestaroceanster • 0

@thestaroceanster-10072

Last seen 6.0 years ago

The problem occurred when I run referencecn.mops on paired exome sequencing data.

Here is the command:

resRef <- referencecn.mops(cases=X[,1],controls=X[,4],classes=c("CN0", "CN1", "CN2", "CN3", "CN4", "CN5", "CN6","CN7","CN8","CN16","CN32","CN64","CN128"),I = c(0.025, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 8, 16, 32, 64),segAlgorithm="DNAcopy")

and the error messasge said:

Starting local modeling, please be patient...

Reference sequence: chr1

Reference sequence: chr2

Reference sequence: chr3

Reference sequence: chr4

Reference sequence: chr5

Reference sequence: chr6

Reference sequence: chr7

Reference sequence: chr8

Reference sequence: chr9

Reference sequence: chrM

Reference sequence: chrX

Reference sequence: chrY

Reference sequence: chr10

Reference sequence: chr11

Reference sequence: chr12

Reference sequence: chr13

Reference sequence: chr14

Reference sequence: chr15

Reference sequence: chr16

Reference sequence: chr17

Reference sequence: chr18

Reference sequence: chr19

Reference sequence: chr20

Reference sequence: chr21

Reference sequence: chr22

Starting segmentation algorithm...

Using "DNAcopy" for segmentation.

Analyzing: Sample.1

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :

solving row 4: negative widths are not allowed

SO I try to run the algorithm with part of the segments information, which located the problem in line 2100:

However the the 2100 line seems to be without negative width:

X[2100,1]

GRanges object with 1 range and 1 metadata column:

seqnames ranges strand |

<Rle> <IRanges> <Rle> |

[1] chr1 [237060305, 237060418] * |

P1194_5_cancer_reclapse.final.bam

[1] 112

-------

seqinfo: 25 sequences from an unspecified genome; no seqlengths

> X[2100,4]

GRanges object with 1 range and 1 metadata column:

seqnames ranges strand | P1194_NC.final.bam

[1] chr1 [237060305, 237060418] * | 93

-------

seqinfo: 25 sequences from an unspecified genome; no seqlengths

I don't know which part goes wrong ,could anyone help me please?

cn.mops referencecn.mops • 1.9k views

ADD COMMENT • link updated 8.7 years ago by Günter Klambauer ▴ 540 • written 8.7 years ago by thestaroceanster • 0

score 0 · Answer 1 · 2017-04-24

0

Entering edit mode

Günter Klambauer ▴ 540

@gunter-klambauer-5426

Last seen 4.9 years ago

Austria

Hello again,

This error occurs if there is some inconsistency in the BED file (the one that defines the segments in which the reads are counted) that you use to run "getSegmentReadCountsFromBAM" (I assume you used that function, right?). Please check, if your BED file is correct, i.e. segment ends are larger than segment starts.

If that does not help, could you please run "traceback()" right after the error occurs and post the result?

Regards,

Günter

ADD COMMENT • link 8.7 years ago Günter Klambauer ▴ 540

0

Entering edit mode

I've checked my bed files, which showed that the starting position and end position are correct, with segments end always larger than segments start point.

However, when I reviewed the whole bed files, I found that some segments are slightly disordered and sometimes overlapped (maybe because I used table browser from UCSC to get exon position, with different transcript corresponding to the same gene). Since I only remove duplicate entries, some overlapping still remained in my data. So I sort the bed files and merge those overlapping segments, the problem seemed to be gone!

The result looks fine. But there is a slight problem. I noticed that the result only give CNV regions information, with some of this regions showed "CN2". Then what is the copy number of those regions that do not present in the result matrix? Are they also with CN2?

Here is part of the results:

	seqnames	start	end	width	strand	P1194_5_cancer_reclapse.final.bam
1	chr1	154213988	172414298	18200311	*	CN3
2	chr1	244780830	249213345	4432516	*	CN3
3	chr3	92827	43318713	43225887	*	CN128
4	chr4	44407849	44465355	57507	*	CN2

ADD REPLY • link 8.7 years ago thestaroceanster • 0

score 0 · Answer 2 · 2017-04-25

Hello,

Great that you figured out the problem with your BED file!

About your second question: "CN2" is the symbol cn.MOPS uses for the normal state (non-CNV state). If you check the slot "cnvr(result)", then you will see a lot of "CN2", because a CNV region in the genome is displayed, in which at least one individual has a CNV. Since CNVs are typically rare, you will find a few individuals with a CNV, such as "CN1", "CN3", but most of them will still have normal copy number ("CN2").

There is another slot "cnvs(result)" which displays the individuals' CNVs as a kind of list. In this list, there should be predominantly integer copy numbers other than 2. "CN2" could still appear in the list due to the following: The copy number call is not based on the integer copy number estimate, but on a threshold on the I/NI call. If the I/NI call exceeds a the threshold for amplification or is below the threshold for deletion, a CNV is called and the segment appears in the CNV list. However, the most probable integer copy number can -- in rare cases -- still be copy number 2 ("CN2"). In these rare cases, CN3 or CN1 also have high probabilities. You can check the probability of each copy number in the "posterior probabilities" array (I think it is only returned, if you set "returnPosterior=TRUE" or something like that). What you can also check is the values for segment's mean (column "mean" the "cnvs" slot) and segment's median I/NI call (column "median" of the "cnvs" slot). They should be above or below the specified thresholds.

I hope this helps!

Regards,

Günter