Question

How to create a data frame with overlapping segments from CNV oncoscan arrays?

0

Entering edit mode

IOM ▴ 20

@iom-7548

Last seen 7.8 years ago

Birmingham

Hi,

Thanks for reading. Any help/ advice on this issue would be much appreciated. Also, if my approach does not make any sense. I have got all CNV calls/segments in .txt (one for every sample) with following structure:

For SAMPLE 1 it would be:

Chromosome   Start   End   Value
chr1   754192   151015495   -0.02005069889128208
chr1   151016790   151150857   -0.2238580733537674
chr1   151174772   243812552   0.02483091503381729
chr1   243818465   243918083   0.16757509112358093
chr1   243919773   249212878   0.06885097920894623
chr2   21494   243052331   -0.0025195078924298286
chr3   63411   69846904   -0.050300538539886475
chr3   69847460   70004208   -0.126520037651062

....

For SAMPLE 2 it would be:

Chromosome   Start   End   Value
chr1   754192   186557453   0.0036580897867679596
chr1   186577925   186639485   -0.08182021975517273
chr1   186642429   189369841   -0.006529499311000109
chr1   189378725   189721806   -0.09558720141649246
chr1   189731338   197300995   0.02319585345685482

....

And so on. My question is, is it possible to merge all this information in a dataframe using R, where every row is a sample and every column is a segment? I can not figure out how to do it as most of samples will have different segments, some of them overlapping between samples, and the total number of segments varies between samples.

The aim of constructing this data frame is to perform PCA and clustering. I also have numerical and categorical variables for every sample, which is the best way of putting it together with the CNV data? Any help will be much appreciated. Thanks

Kind regards

IOM

oncoscan R data frame CNVs overlapping segments • 1.7k views

ADD COMMENT • link 8.2 years ago IOM ▴ 20

score 0 · Answer 1 · 2016-09-28

0

Entering edit mode

markus.riester ▴ 130

@markusriester-9875

Last seen 2.5 years ago

United States

Take a look at the CNTools package.

ADD COMMENT • link 8.2 years ago markus.riester ▴ 130

score 0 · Answer 2 · 2016-09-29

0

Entering edit mode

IOM ▴ 20

@iom-7548

Last seen 7.8 years ago

Birmingham

Hi Markus,

Thanks for your advice. As the vignette was not very good I looked for something else at the beginning, but it works now. Do you have any Idea of how can I perform PCA on the object get from applying getRS? Thanks!

IOM

ADD COMMENT • link 8.2 years ago IOM ▴ 20

0

Entering edit mode

Hi,

Don't think the vignette is bad. I added 3 lines of code to the vignette example to do a PCA (untested code):

require("CNTools")
data(sampleData)
head(sampleData)
###################################################
### code chunk number 2: HowTo.Rnw:65-70
###################################################
cnseg <- CNSeg(sampleData[which(is.element(sampleData[, "ID"], sample(unique(sampleData[, "ID"]), 20))), ])
rdseg <- getRS(cnseg, by = "region", imput = FALSE, XY = FALSE, what = "mean")
data("geneInfo")
geneInfo <- geneInfo[sample(1:nrow(geneInfo), 2000), ]
rdByGene <- getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median")

# remove gene information from the copy number data.frame
m <- rs(rdByGene)[,-(1:6)]
pca <- prcomp(t(m))
plot(pca$x[,1], pca$x[,2])

If you cluster raw log-ratios, tumor purity and will probably confound your clustering. If you do not expect a lot of variance in purity, you can also categorize log-ratios GISTIC-like (deep loss, loss, normal, gain, amplification), which might or might not improve clustering.

Good luck with your data,

Markus

ADD REPLY • link 8.2 years ago markus.riester ▴ 130

0

Entering edit mode

Hi Markus,

Thanks very much for your reply. I am going to have a deeper look at it and will come back to you. Thanks

IOM

ADD REPLY • link 8.2 years ago IOM ▴ 20