Question

Preparing CopywriteR output for GISTIC 2.0 analysis

0

Entering edit mode

jcarter • 0

@jcarter-9812

Last seen 9.8 years ago

I'm interested in looking for recurrent somatic copy number aberrations (RCNAs) amongst the CNVs detected by CopywriteR using GISTIC 2.0 (which is arguably the standard tool used for identifying RCNAs). As per the GISTIC 2.0 documentation ( ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm ), I imagine I can transform CopywriteR's DNAcopy object, segment.CNA.object, into a GISTIC 2.0 compatible Segmentation File while also using the log2_read_counts.igv file to generate the required Markers File. Not sure about the best approach to accomplish this though. Any ideas?

copywriter • 4.8k views

ADD COMMENT • link updated 9.8 years ago by t.kuilman ▴ 170 • written 9.8 years ago by jcarter • 0

score 3 · Answer 1 · 2016-02-29

GISTIC is probably written for analyzing DNAcopy objects since it requires exactly the output format of DNAcopy objects. The segment.Rdata file contains all the information that you need. By loading this file and accessing the data structure using str(segment.CNA.object), you can see that both the original data (in segment.CNA.object$data), as well as the segmentation values (in segment.CNA.object$output) are within the DNAcopy object:

library(DNAcopy)
load("/PATH/TO/segment.Rdata")
str(segment.CNA.object)

## Code above yields following output:
List of 4
 $ data   :Classes ‘CNA’ and 'data.frame':    137807 obs. of  6 variables:
  ..$ chrom                     :Class 'AsIs'  int [1:137807] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ maploc                    : num [1:137807] 750000 770000 790000 810000 850000 ...
  ..$ log2.2M.bam.vs.log2.2M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ log2.4M.bam.vs.log2.4M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ log2.2M.bam.vs.none       : num [1:137807] -1.811 -1.99 -1.587 1.525 -0.464 ...
  ..$ log2.4M.bam.vs.none       : num [1:137807] -0.178 -2.459 -1.293 1.864 -1.254 ...
  ..- attr(*, "data.type")= chr "logratio"
 $ output :'data.frame':    4204 obs. of  6 variables:
  ..$ ID       : chr [1:4204] "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" ...
  ..$ chrom    : int [1:4204] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ loc.start: num [1:4204] 750000 30000 70000 50000 30000 ...
  ..$ loc.end  : num [1:4204] 2.49e+08 2.43e+08 1.98e+08 1.91e+08 1.81e+08 ...
  ..$ num.mark : num [1:4204] 10830 11534 9667 9280 8655 ...
  ..$ seg.mean : num [1:4204] 0 0 0 0 0 0 0 0 0 0 ...
 $ segRows:'data.frame':    4204 obs. of  2 variables:
  ..$ startRow: int [1:4204] 1 10836 22379 32048 41331 49989 58262 65685 72647 78019 ...
  ..$ endRow  : int [1:4204] 10835 22378 32047 41330 49988 58261 65684 72646 78018 84317 ...
 $ call   : language segment(x = CNA.object, verbose = 1)
 - attr(*, "class")= chr "DNAcopy"

Here's how you can reformat the data:

library(DNAcopy)

load("/PATH/TO/segment.Rdata")
segmentation.values <- segment.CNA.object$output
colnames(segmentation.values) <- c("Sample", "Chromosome", "Start Position", "End Position",
                                   "Num markers", "Seg.CN")
write.table(segmentation.values, file = "/PATH/TO/segmentation_values.tsv", quote = FALSE,
            row.names = FALSE, sep = "\t")

markers <- data.frame(paste(segment.CNA.object$data$chrom, segment.CNA.object$data$maploc,
                            sep = ":"),
                      segment.CNA.object$data$chrom, segment.CNA.object$data$maploc)
colnames(markers) <- c("Marker Name", "Chromosome", "Marker Position")
write.table(markers, file = "/PATH/TO/markers.tsv", quote = FALSE, row.names = FALSE,
            sep = "\t")

Let me know if this works for you.