Hi,

Thanks for reading. Any help/ advice on this issue would be much appreciated. Also, if my approach does not make any sense. I have got all CNV calls/segments in .txt (one for every sample) with following structure:

For SAMPLE 1 it would be:

Chromosome Start End Value

chr1 754192 151015495 -0.02005069889128208

chr1 151016790 151150857 -0.2238580733537674

chr1 151174772 243812552 0.02483091503381729

chr1 243818465 243918083 0.16757509112358093

chr1 243919773 249212878 0.06885097920894623

chr2 21494 243052331 -0.0025195078924298286

chr3 63411 69846904 -0.050300538539886475

chr3 69847460 70004208 -0.126520037651062

....

For SAMPLE 2 it would be:

Chromosome Start End Value

chr1 754192 186557453 0.0036580897867679596

chr1 186577925 186639485 -0.08182021975517273

chr1 186642429 189369841 -0.006529499311000109

chr1 189378725 189721806 -0.09558720141649246

chr1 189731338 197300995 0.02319585345685482

....

And so on. My question is, is it possible to merge all this information in a dataframe using R, where every row is a sample and every column is a segment? I can not figure out how to do it as most of samples will have different segments, some of them overlapping between samples, and the total number of segments varies between samples.

The aim of constructing this data frame is to perform PCA and clustering. I also have numerical and categorical variables for every sample, which is the best way of putting it together with the CNV data? Any help will be much appreciated. Thanks

Kind regards

IOM

Hi,

Don't think the vignette is bad. I added 3 lines of code to the vignette example to do a PCA (untested code):

require("CNTools")

data(sampleData)

head(sampleData)

###################################################

### code chunk number 2: HowTo.Rnw:65-70

###################################################

cnseg <- CNSeg(sampleData[which(is.element(sampleData[, "ID"], sample(unique(sampleData[, "ID"]), 20))), ])

rdseg <- getRS(cnseg, by = "region", imput = FALSE, XY = FALSE, what = "mean")

data("geneInfo")

geneInfo <- geneInfo[sample(1:nrow(geneInfo), 2000), ]

rdByGene <- getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median")

# remove gene information from the copy number data.frame

m <- rs(rdByGene)[,-(1:6)]

pca <- prcomp(t(m))

plot(pca$x[,1], pca$x[,2])

If you cluster raw log-ratios, tumor purity and will probably confound your clustering. If you do not expect a lot of variance in purity, you can also categorize log-ratios GISTIC-like (deep loss, loss, normal, gain, amplification), which might or might not improve clustering.

Good luck with your data,

Markus

Hi Markus,

Thanks very much for your reply. I am going to have a deeper look at it and will come back to you. Thanks

IOM