Preparing CopywriteR output for GISTIC 2.0 analysis
1
0
Entering edit mode
jcarter • 0
@jcarter-9812
Last seen 5.6 years ago

I'm interested in looking for recurrent somatic copy number aberrations (RCNAs) amongst the CNVs detected by CopywriteR using GISTIC 2.0 (which is arguably the standard tool used for identifying RCNAs). As per the GISTIC 2.0 documentation ( ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm ), I imagine I can transform CopywriteR's DNAcopy object, segment.CNA.object, into a GISTIC 2.0 compatible Segmentation File while also using the log2_read_counts.igv file to generate the required Markers File. Not sure about the best approach to accomplish this though. Any ideas?

copywriter • 2.6k views
3
Entering edit mode
@tkuilman-6868
Last seen 17 months ago
Netherlands

GISTIC is probably written for analyzing DNAcopy objects since it requires exactly the output format of DNAcopy objects. The segment.Rdata file contains all the information that you need. By loading this file and accessing the data structure using str(segment.CNA.object), you can see that both the original data (in segment.CNA.object$data), as well as the segmentation values (in segment.CNA.object$output) are within the DNAcopy object:

library(DNAcopy)
str(segment.CNA.object)

## Code above yields following output:
List of 4
$data :Classes ‘CNA’ and 'data.frame': 137807 obs. of 6 variables: ..$ chrom                     :Class 'AsIs'  int [1:137807] 1 1 1 1 1 1 1 1 1 1 ...
..$maploc : num [1:137807] 750000 770000 790000 810000 850000 ... ..$ log2.2M.bam.vs.log2.2M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
..$log2.4M.bam.vs.log2.4M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ... ..$ log2.2M.bam.vs.none       : num [1:137807] -1.811 -1.99 -1.587 1.525 -0.464 ...
..$log2.4M.bam.vs.none : num [1:137807] -0.178 -2.459 -1.293 1.864 -1.254 ... ..- attr(*, "data.type")= chr "logratio"$ output :'data.frame':    4204 obs. of  6 variables:
..$ID : chr [1:4204] "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" ... ..$ chrom    : int [1:4204] 1 2 3 4 5 6 7 8 9 10 ...
..$loc.start: num [1:4204] 750000 30000 70000 50000 30000 ... ..$ loc.end  : num [1:4204] 2.49e+08 2.43e+08 1.98e+08 1.91e+08 1.81e+08 ...
..$num.mark : num [1:4204] 10830 11534 9667 9280 8655 ... ..$ seg.mean : num [1:4204] 0 0 0 0 0 0 0 0 0 0 ...
$segRows:'data.frame': 4204 obs. of 2 variables: ..$ startRow: int [1:4204] 1 10836 22379 32048 41331 49989 58262 65685 72647 78019 ...
..$endRow : int [1:4204] 10835 22378 32047 41330 49988 58261 65684 72646 78018 84317 ...$ call   : language segment(x = CNA.object, verbose = 1)
- attr(*, "class")= chr "DNAcopy"

Here's how you can reformat the data:

library(DNAcopy)

segmentation.values <- segment.CNA.object$output colnames(segmentation.values) <- c("Sample", "Chromosome", "Start Position", "End Position", "Num markers", "Seg.CN") write.table(segmentation.values, file = "/PATH/TO/segmentation_values.tsv", quote = FALSE, row.names = FALSE, sep = "\t") markers <- data.frame(paste(segment.CNA.object$data$chrom, segment.CNA.object$data$maploc, sep = ":"), segment.CNA.object$data$chrom, segment.CNA.object$data\$maploc)
colnames(markers) <- c("Marker Name", "Chromosome", "Marker Position")
write.table(markers, file = "/PATH/TO/markers.tsv", quote = FALSE, row.names = FALSE,
sep = "\t")

Let me know if this works for you.

0
Entering edit mode

Thank you! I took out the floor function for the markers part since it was causing a mismatch error in GISTIC (secondary to my segmentation_values start positions ending in ".5"), but after that it ran perfectly!

0
Entering edit mode

Thanks for pointing that out, I will adjust the original answer to incorporate your comment.