GISTIC is probably written for analyzing DNAcopy objects since it requires exactly the output format of DNAcopy objects. The segment.Rdata file contains all the information that you need. By loading this file and accessing the data structure using str(segment.CNA.object)
, you can see that both the original data (in segment.CNA.object$data
), as well as the segmentation values (in segment.CNA.object$output
) are within the DNAcopy object:
library(DNAcopy)
load("/PATH/TO/segment.Rdata")
str(segment.CNA.object)
## Code above yields following output:
List of 4
$ data :Classes ‘CNA’ and 'data.frame': 137807 obs. of 6 variables:
..$ chrom :Class 'AsIs' int [1:137807] 1 1 1 1 1 1 1 1 1 1 ...
..$ maploc : num [1:137807] 750000 770000 790000 810000 850000 ...
..$ log2.2M.bam.vs.log2.2M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
..$ log2.4M.bam.vs.log2.4M.bam: num [1:137807] 0 0 0 0 0 0 0 0 0 0 ...
..$ log2.2M.bam.vs.none : num [1:137807] -1.811 -1.99 -1.587 1.525 -0.464 ...
..$ log2.4M.bam.vs.none : num [1:137807] -0.178 -2.459 -1.293 1.864 -1.254 ...
..- attr(*, "data.type")= chr "logratio"
$ output :'data.frame': 4204 obs. of 6 variables:
..$ ID : chr [1:4204] "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" "log2.2M.bam.vs.log2.2M.bam" ...
..$ chrom : int [1:4204] 1 2 3 4 5 6 7 8 9 10 ...
..$ loc.start: num [1:4204] 750000 30000 70000 50000 30000 ...
..$ loc.end : num [1:4204] 2.49e+08 2.43e+08 1.98e+08 1.91e+08 1.81e+08 ...
..$ num.mark : num [1:4204] 10830 11534 9667 9280 8655 ...
..$ seg.mean : num [1:4204] 0 0 0 0 0 0 0 0 0 0 ...
$ segRows:'data.frame': 4204 obs. of 2 variables:
..$ startRow: int [1:4204] 1 10836 22379 32048 41331 49989 58262 65685 72647 78019 ...
..$ endRow : int [1:4204] 10835 22378 32047 41330 49988 58261 65684 72646 78018 84317 ...
$ call : language segment(x = CNA.object, verbose = 1)
- attr(*, "class")= chr "DNAcopy"
Here's how you can reformat the data:
library(DNAcopy)
load("/PATH/TO/segment.Rdata")
segmentation.values <- segment.CNA.object$output
colnames(segmentation.values) <- c("Sample", "Chromosome", "Start Position", "End Position",
"Num markers", "Seg.CN")
write.table(segmentation.values, file = "/PATH/TO/segmentation_values.tsv", quote = FALSE,
row.names = FALSE, sep = "\t")
markers <- data.frame(paste(segment.CNA.object$data$chrom, segment.CNA.object$data$maploc,
sep = ":"),
segment.CNA.object$data$chrom, segment.CNA.object$data$maploc)
colnames(markers) <- c("Marker Name", "Chromosome", "Marker Position")
write.table(markers, file = "/PATH/TO/markers.tsv", quote = FALSE, row.names = FALSE,
sep = "\t")
Let me know if this works for you.
Thank you! I took out the floor function for the markers part since it was causing a mismatch error in GISTIC (secondary to my segmentation_values start positions ending in ".5"), but after that it ran perfectly!
Thanks for pointing that out, I will adjust the original answer to incorporate your comment.