Question: How to create a data frame with overlapping segments from CNV oncoscan arrays?
0
2.9 years ago by
IOM20
Birmingham
IOM20 wrote:

Hi,

Thanks for reading. Any help/ advice on this issue would be much appreciated. Also, if my approach does not make any sense. I have got all CNV calls/segments in .txt (one for every sample) with following structure:

For SAMPLE 1 it would be:

Chromosome    Start    End    Value
chr1    754192    151015495    -0.02005069889128208
chr1    151016790    151150857    -0.2238580733537674
chr1    151174772    243812552    0.02483091503381729
chr1    243818465    243918083    0.16757509112358093
chr1    243919773    249212878    0.06885097920894623
chr2    21494    243052331    -0.0025195078924298286
chr3    63411    69846904    -0.050300538539886475
chr3    69847460    70004208    -0.126520037651062

....

For SAMPLE 2 it would be:

Chromosome    Start    End    Value
chr1    754192    186557453    0.0036580897867679596
chr1    186577925    186639485    -0.08182021975517273
chr1    186642429    189369841    -0.006529499311000109
chr1    189378725    189721806    -0.09558720141649246
chr1    189731338    197300995    0.02319585345685482

....

And so on. My question is, is it possible to merge all this information in a dataframe using R, where every row is a sample and every column is a segment? I can not figure out how to do it as most of samples will have different segments, some of them overlapping between samples, and the total number of segments varies between samples.

The aim of constructing this data frame is to perform PCA and clustering. I also have numerical and categorical variables for every sample, which is the best way of putting it together with the CNV data? Any help will be much appreciated. Thanks

Kind regards

IOM

modified 2.9 years ago • written 2.9 years ago by IOM20
Answer: How to create a data frame with overlapping segments from CNV oncoscan arrays?
0
2.9 years ago by
markus.riester110 wrote:

Take a look at the CNTools package.

Answer: How to create a data frame with overlapping segments from CNV oncoscan arrays?
0
2.9 years ago by
IOM20
Birmingham
IOM20 wrote:

Hi Markus,

Thanks for your advice. As the vignette was not very good I looked for something else at the beginning, but it works now. Do you have any Idea of how can I perform PCA on the object  get from applying getRS? Thanks!

IOM

Hi,

Don't think the vignette is bad. I added 3 lines of code to the vignette example to do a PCA (untested code):

require("CNTools")
data(sampleData)
###################################################
### code chunk number 2: HowTo.Rnw:65-70
###################################################
cnseg <- CNSeg(sampleData[which(is.element(sampleData[, "ID"], sample(unique(sampleData[, "ID"]), 20))), ])
rdseg <- getRS(cnseg, by = "region", imput = FALSE, XY = FALSE, what = "mean")
data("geneInfo")
geneInfo <- geneInfo[sample(1:nrow(geneInfo), 2000), ]
rdByGene <- getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median")

# remove gene information from the copy number data.frame
m <- rs(rdByGene)[,-(1:6)]
pca <- prcomp(t(m))
plot(pca$x[,1], pca$x[,2])

If you cluster raw log-ratios, tumor purity and will probably confound your clustering. If you do not expect a lot of variance in purity, you can also categorize log-ratios GISTIC-like (deep loss, loss, normal, gain, amplification), which might or might not improve clustering.

Markus

Hi Markus,

Thanks very much for your reply. I am going to have a deeper look at it and will come back to you. Thanks

IOM