Calling amplifications and deletions from CopywriteR output
4
0
Entering edit mode
smcnulty • 0
@smcnulty-9671
Last seen 8.8 years ago

I’m very interested in using CopywriteR to detect copy number variation in target capture sequencing data. I’m looking to select a tool to implement in a high-throughput bioinformatic workflow. Therefore, I won’t be able to manually assess each output plot or IGV file. I was hoping to get some advice as to how I could access the results in a more streamlined way. I assume the per-chromosome plots are generated from data in the segment.Rdata file? Would I need to set a cutoff and determine amplified/depleted loci from that file?

CopywriteR copy number • 2.8k views
ADD COMMENT
0
Entering edit mode
t.kuilman ▴ 170
@tkuilman-6868
Last seen 2.5 years ago
Netherlands

The typical output from a CoywriteR run looks as follows:

-rw-r--r--  1 t.kuilman domain users  18K Jan 22 14:19 CopywriteR.log
-rw-r--r--  1 t.kuilman domain users  492 Jan 22 14:10 input.Rdata
-rw-r--r--  1 t.kuilman domain users  23M Jan 22 14:10 log2_read_counts.igv
drwxr-xr-x 16 t.kuilman domain users 2.0K Jan 22 14:19 plots/
drwxr-xr-x  2 t.kuilman domain users 1.6K Jan 22 14:10 qc/
-rw-r--r--  1 t.kuilman domain users  22M Jan 22 14:08 read_counts.txt
-rw-r--r--  1 t.kuilman domain users  11M Jan 22 14:18 segment.Rdata

You can read in the vignette of the tool (which you can get by typing browseVignette(“CopywriteR”)) what they are in detail.
With regards to copy number profiling, one can in general divide analysis into several stages, each requiring their own tools. These are the actual acquisition of the raw data (this is what CopywriteR does), segmentation of the data (to see whether there are consecutive copy number data points that have a similar value), and finally the ‘calling' of the data, where one determines which regions are gained or lost (and sometimes even an estimation of the actual copy number). The raw data processed by CopywriteR are in the log2_read_counts.igv file, while the segmentation values that are used for plotting can be found in the segment.Rdata file. The per-chromosome plots are based upon the raw (black dots) and segmented (red lines) data.
I presume what you want is to call amplifications and deletions, i.e. perform the third ’shell’ of analysis. You could simply use fixed cutoffs (for instance log2(3 / 2) = 0.5850 for gain and log2(1 / 2) = -1 for loss. Alternatively, you could use a dedicated tool for the actual calling. We have mainly experience with the Bioconductor package CGHcall. This package performs segmentation and calling (the last two steps in the analysis) and therefore you could use the data in the log2_read_counts.igv file as input. Alternatively, if you are not interested in single samples but repetitively gained / lost regions, you could try the ADMIRE algorithm (http://www.ncbi.nlm.nih.gov/pubmed/23476020).

I am sure there are more packages that you could try, also in Bioconductor (see BiocViews term “CopyNumberVariation”), but my experience with any of these is limited.

Best, Thomas

 

ADD COMMENT
0
Entering edit mode
smcnulty • 0
@smcnulty-9671
Last seen 8.8 years ago

You said that you have experience with CGHcall. Would you mind providing some example code as to how you would use the log2_read_counts.igv file as input with CGHcall? I'm reading the vignette now, but I'm a bit confused as to where the CopywriteR output intersects the CGHcall workflow. I'm sorry to ask such a rudimentary question, but I'm quite new to this kind of analysis.

ADD COMMENT
0
Entering edit mode

You could continue from the output of CopywriteR using the log2_read_counts.igv file. An easy way to continue can be found below (courtesy of Oscar Krijgsman who co-developed the tool with me). I hope this gets you on the way.

Best, Thomas

### Import data from CopywriteR
data<-read.table(file="log2_read_counts.igv", sep="\t", header=T, quote="", fill=T, skip=1)

### CGHcall analysis
library(CGHcall)

# Change order of annotation colums (IGV to Stanford format)
Corrected_logratio<-data[,c(4,1,2,3,5:ncol(data))]

## The following code is similar to the examples provided in the CGHcall manual:
## http://www.bioconductor.org/packages/release/bioc/vignettes/CGHcall/inst/doc/CGHcall.pdf

# PreProcessing of raw copy number data
raw <- make_cghRaw(Corrected_logratio)
prep <- preprocess(raw, maxmiss = 30, nchrom = 22)
nor <-  normalize(prep,method = "median", smoothOutliers = TRUE)  

# Segmentation using Circular Binary Segmentation (CBS)
seg <-  segmentData(nor, method = "DNAcopy",nperm=2000,undo.splits="sdundo",min.width=5,undo.SD=2, clen=25, relSDlong=5)

segnorm <- postsegnormalize(seg,inter=c(-0.4,0.4))
listcalls <- CGHcall(segnorm,nclass=5,robustsig="yes",cellularity=1,ncpus=4)
results <- ExpandCGHcall(listcalls,segnorm, divide=5, memeff=FALSE)
save(results, file="CGHcall_output.Rdata")  

# Plot sample 1
plot(results[,1])

# Frequency plot:
summaryPlot(result)

# To extract calls (-2, -1, 0, 1, 2) from the output:
calls(results)

# Segmented values:
segmented(results)

# Raw copy number values:
copynumber(results)

# Annotation info:
chromosomes(results)
ADD REPLY
0
Entering edit mode
@matthewgfield-9786
Last seen 8.8 years ago

Is there a way to extract calls (-2,-1,0,1,2) and segmented values directly from the segment.Rdata file, or do we have to go from the log2_read_counts.igv file?  I would like to create a heatmap of gains and losses across all the chromosomes of a bunch of samples combined.  Could you provide example code on how to extract the necessary info from the segment.Rdata file if it is possible?

ADD COMMENT
0
Entering edit mode

First of all, there are no 'calls' created in the CopywriteR package, but only the raw data and (when using plotCNA) the segmented data. For calling you would need a tool alike CGHcall discussed above (see the previous post also for how to run this tool).

The segmented values you can get straight out of the segment.Rdata file as follows:

load(segment.Rdata)
library(DNAcopy)
segment.CNA.object

Please note that these values are in run length encoding (rle) format. I have written another small function (OverviewPlot) to plot straight from a DNAcopy object. You can find it on our GitHub page; let me know if you find this useful.

ADD REPLY
0
Entering edit mode

Excellent tool! And this thread is extremely helpful! On a similar note as the previous inquiry, would you be able to offer guidance on how one could convert the data within segment.CNA.object to something that could be saved and visualized as a *.igv file (or any other IGV compatible file format)? (Similar to the *.igv file for the raw log2 transformed read counts)  

ADD REPLY
0
Entering edit mode

Thanks! I put a quick script together for that. For reasons of 'browsability' I feel however that it would be better to start a new post, would you be so kind to do that so that I can reply and share the script in the new post? Thomas

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6