How to create a data frame with overlapping segments from CNV oncoscan arrays?
2
0
Entering edit mode
IOM ▴ 20
@iom-7548
Last seen 7.8 years ago
Birmingham

Hi,

 

Thanks for reading. Any help/ advice on this issue would be much appreciated. Also, if my approach does not make any sense. I have got all CNV calls/segments in .txt (one for every sample) with following structure:

For SAMPLE 1 it would be:

Chromosome    Start    End    Value
chr1    754192    151015495    -0.02005069889128208
chr1    151016790    151150857    -0.2238580733537674
chr1    151174772    243812552    0.02483091503381729
chr1    243818465    243918083    0.16757509112358093
chr1    243919773    249212878    0.06885097920894623
chr2    21494    243052331    -0.0025195078924298286
chr3    63411    69846904    -0.050300538539886475
chr3    69847460    70004208    -0.126520037651062

....

For SAMPLE 2 it would be:

Chromosome    Start    End    Value
chr1    754192    186557453    0.0036580897867679596
chr1    186577925    186639485    -0.08182021975517273
chr1    186642429    189369841    -0.006529499311000109
chr1    189378725    189721806    -0.09558720141649246
chr1    189731338    197300995    0.02319585345685482

....

And so on. My question is, is it possible to merge all this information in a dataframe using R, where every row is a sample and every column is a segment? I can not figure out how to do it as most of samples will have different segments, some of them overlapping between samples, and the total number of segments varies between samples.

The aim of constructing this data frame is to perform PCA and clustering. I also have numerical and categorical variables for every sample, which is the best way of putting it together with the CNV data? Any help will be much appreciated. Thanks

 

Kind regards

 

IOM

oncoscan R data frame CNVs overlapping segments • 1.7k views
ADD COMMENT
0
Entering edit mode
@markusriester-9875
Last seen 2.5 years ago
United States

Take a look at the CNTools package.

ADD COMMENT
0
Entering edit mode
IOM ▴ 20
@iom-7548
Last seen 7.8 years ago
Birmingham

Hi Markus,

Thanks for your advice. As the vignette was not very good I looked for something else at the beginning, but it works now. Do you have any Idea of how can I perform PCA on the object  get from applying getRS? Thanks!

IOM

ADD COMMENT
0
Entering edit mode

Hi,

Don't think the vignette is bad. I added 3 lines of code to the vignette example to do a PCA (untested code):

require("CNTools")
data(sampleData)
head(sampleData)
###################################################
### code chunk number 2: HowTo.Rnw:65-70
###################################################
cnseg <- CNSeg(sampleData[which(is.element(sampleData[, "ID"], sample(unique(sampleData[, "ID"]), 20))), ])
rdseg <- getRS(cnseg, by = "region", imput = FALSE, XY = FALSE, what = "mean")
data("geneInfo")
geneInfo <- geneInfo[sample(1:nrow(geneInfo), 2000), ]
rdByGene <- getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median")

# remove gene information from the copy number data.frame
m <- rs(rdByGene)[,-(1:6)]
pca <- prcomp(t(m))
plot(pca$x[,1], pca$x[,2])

If you cluster raw log-ratios, tumor purity and will probably confound your clustering. If you do not expect a lot of variance in purity, you can also categorize log-ratios GISTIC-like (deep loss, loss, normal, gain, amplification), which might or might not improve clustering.

Good luck with your data,

Markus

ADD REPLY
0
Entering edit mode

Hi Markus,

Thanks very much for your reply. I am going to have a deeper look at it and will come back to you. Thanks

IOM

ADD REPLY

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6