Question: Preparing CopywriteR output for GISTIC 2.0 analysis
0
gravatar for jcarter
3.6 years ago by
jcarter0
jcarter0 wrote:

I'm interested in looking for recurrent somatic copy number aberrations (RCNAs) amongst the CNVs detected by CopywriteR using GISTIC 2.0 (which is arguably the standard tool used for identifying RCNAs). As per the GISTIC 2.0 documentation ( ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm ), I imagine I can transform CopywriteR's DNAcopy object, segment.CNA.object, into a GISTIC 2.0 compatible Segmentation File while also using the log2_read_counts.igv file to generate the required Markers File. Not sure about the best approach to accomplish this though. Any ideas?

copywriter • 2.0k views
ADD COMMENTlink modified 3.6 years ago by t.kuilman140 • written 3.6 years ago by jcarter0
Answer: Preparing CopywriteR output for GISTIC 2.0 analysis
1
gravatar for t.kuilman
3.6 years ago by
t.kuilman140
Netherlands
t.kuilman140 wrote:

GISTIC is probably written for analyzing DNAcopy objects since it requires exactly the output format of DNAcopy objects. The segment.Rdata file contains all the information that you need. By loading this file and accessing the data structure using str(segment.CNA.object), you can see that both the original data (in segment.CNA.object$data), as well as the segmentation values (in segment.CNA.object$output) are within the DNAcopy object:

library(DNAcopy)
load("/PATH/TO/segment.Rdata")
str(segment.CNA.object)

## Code above yields following output:
List of 4
 $ data   :Classes ‘CNA’ and 'data.frame':    137807 obs. of  6 variables:
  ..$ chrom                     :Class 'AsIs'  int [1:137807] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ maploc                    : num [1:137807] 750000 770000 790000 810000 850000 ...
  ..$ log2.2M.bam.vs.log2.2M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ log2.4M.bam.vs.log2.4M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ log2.2M.bam.vs.none       : num [1:137807] -1.811 -1.99 -1.587 1.525 -0.464 ...
  ..$ log2.4M.bam.vs.none       : num [1:137807] -0.178 -2.459 -1.293 1.864 -1.254 ...
  ..- attr(*, "data.type")= chr "logratio"
 $ output :'data.frame':    4204 obs. of  6 variables:
  ..$ ID       : chr [1:4204] "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" ...
  ..$ chrom    : int [1:4204] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ loc.start: num [1:4204] 750000 30000 70000 50000 30000 ...
  ..$ loc.end  : num [1:4204] 2.49e+08 2.43e+08 1.98e+08 1.91e+08 1.81e+08 ...
  ..$ num.mark : num [1:4204] 10830 11534 9667 9280 8655 ...
  ..$ seg.mean : num [1:4204] 0 0 0 0 0 0 0 0 0 0 ...
 $ segRows:'data.frame':    4204 obs. of  2 variables:
  ..$ startRow: int [1:4204] 1 10836 22379 32048 41331 49989 58262 65685 72647 78019 ...
  ..$ endRow  : int [1:4204] 10835 22378 32047 41330 49988 58261 65684 72646 78018 84317 ...
 $ call   : language segment(x = CNA.object, verbose = 1)
 - attr(*, "class")= chr "DNAcopy"

Here's how you can reformat the data:

library(DNAcopy)

load("/PATH/TO/segment.Rdata")
segmentation.values <- segment.CNA.object$output
colnames(segmentation.values) <- c("Sample", "Chromosome", "Start Position", "End Position",
                                   "Num markers", "Seg.CN")
write.table(segmentation.values, file = "/PATH/TO/segmentation_values.tsv", quote = FALSE,
            row.names = FALSE, sep = "\t")

markers <- data.frame(paste(segment.CNA.object$data$chrom, segment.CNA.object$data$maploc,
                            sep = ":"),
                      segment.CNA.object$data$chrom, segment.CNA.object$data$maploc)
colnames(markers) <- c("Marker Name", "Chromosome", "Marker Position")
write.table(markers, file = "/PATH/TO/markers.tsv", quote = FALSE, row.names = FALSE,
            sep = "\t")

Let me know if this works for you.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by t.kuilman140

Thank you! I took out the floor function for the markers part since it was causing a mismatch error in GISTIC (secondary to my segmentation_values start positions ending in ".5"), but after that it ran perfectly!

ADD REPLYlink written 3.6 years ago by jcarter0

Thanks for pointing that out, I will adjust the original answer to incorporate your comment.

ADD REPLYlink written 3.6 years ago by t.kuilman140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour