Search
Question: Gaia not returning results - load_cnv problem
0
3 months ago by
rina0
rina0 wrote:

Hello,

I perform CNV analysis using TCGA data and the gaia package. I have followed the reference manual of gaia, compared thoroughly the format of my inputs, but I still do not get any results back. The run_gaia function runs without any problem, but it always returns an empty data frame.

Here are some info about the data I am using.

str(cnvMatrix)

'data.frame': 28270 obs. of  6 variables:

$Chromosome : int 1 1 1 1 1 1 1 1 1 1 ...$ Start         : int  25266637 40881535 48892229 72290430 94665210 103697067 109690415 112153343 152583230 193187149 ...

$End : int 25318693 40908771 49360721 72343068 94669551 103717410 109697556 112154591 152614118 193187218 ...$ Num.of.Markers: int  19 16 232 49 16 3 12 22 35 4 ...

$Sample.Name : chr "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" ...$ Aberration    : int  1 1 0 1 1 1 0 0 0 1 ...

> str(markersMatrix)

'data.frame': 1874147 obs. of  3 variables:

$Probe.Name: int 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 ...$ Chromosome: int  1 1 1 1 1 1 1 1 1 1 ...

$Start : int 99897690 99904180 99942312 99973037 99984524 99990722 100006891 100027241 100031071 100083855 ... The problem, and most probably the reason of why I do not get back results, seems to be the matrix returned by load_cnv(cnvMatrix, markers_obj, nbsamples)where the observations of each aberration (in my case 0s and 1s) in each chromosome is provided three times. I cannot see why I get this result back. Hope someone can help me. Thank you in advance. R. ADD COMMENTlink modified 3 months ago by Sandro Morganella30 • written 3 months ago by rina0 Having the same problem here! library(TCGAbiolinks) library(GAIA) project = "TCGA-PRAD" data.category = "Copy Number Variation" data.type = "Masked Copy Number Segment" legacy = FALSE sample.type = c("Primary solid Tumor") datQuery <- GDCquery(project = project, data.category = data.category,data.type = data.type,legacy=legacy,sample.type = sample.type ) GDCdownload(datQuery) prad<- GDCprepare(datQuery,save = TRUE,save.filename = "prad.cnv.hg38.rda",summarizedExperiment = TRUE) ## Marker Descriptor Matrix #https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files url<- "https://gdc.cancer.gov/files/public/file/snp6.na35.liftoverhg38.txt.zip" temp <- tempfile() download.file(url = url, temp) unzip(temp) probes_metadata<- read.csv("snp6.na35.liftoverhg38.txt", sep = "\t",as.is = TRUE) probes_metadata=probes_metadata[probes_metadata[,"freqcnv"]==FALSE,] colnames(probes_metadata)[1:3] <- c("Probe.Name", "Chromosome", "Start") probes_metadata[probes_metadata$Chromosome == "X","Chromosome"] <- 23 probes_metadata[probes_metadata$Chromosome == "Y","Chromosome"] <- 24 probes_metadata$Chromosome <- as.integer(probes_metadata$Chromosome) markerID <- paste(probes_metadata$Chromosome,probes_metadata$Start, sep = ":") # Removed duplicates probes_metadata <- probes_metadata[!duplicated(markerID),] # Filter probes_metadata for common CNV markerID <- paste(probes_metadata$Chromosome,probes_metadata$Start, sep = ":") markers_obj <- load_markers(probes_metadata) ## Aberrant Region Descriptor Matrix load("prad.cnv.hg38.rda") synthCNV_Matrix<- data synthCNV_Matrix <- cbind(synthCNV_Matrix,Label=NA) synthCNV_Matrix[synthCNV_Matrix[,"Segment_Mean"] < -0.2,"Label"] <- 0 synthCNV_Matrix[synthCNV_Matrix[,"Segment_Mean"] > 0.2,"Label"] <- 1 synthCNV_Matrix <- synthCNV_Matrix[!is.na(synthCNV_Matrix$Label),]

synthCNV_Matrix<- synthCNV_Matrix[,c(7,2,3,4,5,8)] colnames(synthCNV_Matrix)<- c("Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration")

#Replace x and y chromosome names xidx <- which(synthCNV_Matrix$Chromosome=="X") yidx <- which(synthCNV_Matrix$Chromosome=="Y") synthCNV_Matrix[xidx,"Chromosome"] <- 23 synthCNV_Matrix[yidx,"Chromosome"] <- 24 synthCNV_Matrix$Chromosome <- sapply(synthCNV_Matrix$Chromosome,as.integer)

cnv_obj<- load_cnv(synthCNV_Matrix, markers_obj, length(selected))

#Loading Copy Number Data

#.Error in start_index:end_index : argument of length 0



The is related to the cnvMatrix: the colulmn 'Num.of.Markers' must report the size of the aberrant region (End-Start). I can see that the manual/help is misleading in calling this Num.of.Markers, it should be 'Region Size' (or something like that). I need to fix this problem with the manual/help. Anyway, you will fix the problem with your data by using the follwoing command: cnvMatrix1$Num.of.Markers <- cnvMatrix1$End-cnvMatrix1$Start+1 On Thu, Aug 16, 2018 at 8:39 PM, martinguerrerog89 [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User martinguerrerog89 <https: support.bioconductor.org="" u="" 14839=""/> wrote Comment: > Gaia not returning results - load_cnv problem > <https: support.bioconductor.org="" p="" 111990="" #112170="">: > > Having the same problem here! > >  > > library(TCGAbiolinks) > > library(GAIA) > > project = "TCGA-PRAD" > data.category = "Copy Number Variation" > data.type = "Masked Copy Number Segment" > legacy = FALSE > sample.type = c("Primary solid Tumor") > > datQuery <- GDCquery(project = project, data.category = > data.category,data.type = data.type,legacy=legacy,sample.type = > sample.type ) > GDCdownload(datQuery) > prad<- GDCprepare(datQuery,save = TRUE,save.filename = "prad.cnv.hg38.rda",summarizedExperiment > = TRUE) > > ## Marker Descriptor Matrix > #https://gdc.cancer.gov/about-data/data-harmonization-and- > generation/gdc-reference-files > > url<- "https://gdc.cancer.gov/files/public/file/snp6.na35. > liftoverhg38.txt.zip" > temp <- tempfile() > download.file(url = url, temp) > unzip(temp) > probes_metadata<- read.csv("snp6.na35.liftoverhg38.txt", sep = "\t",as.is > = TRUE) > probes_metadata=probes_metadata[probes_metadata[,"freqcnv"]==FALSE,] > colnames(probes_metadata)[1:3] <- c("Probe.Name", "Chromosome", "Start") > > probes_metadata[probes_metadata$Chromosome == "X","Chromosome"] <- 23 > probes_metadata[probes_metadata$Chromosome == "Y","Chromosome"] <- 24 > probes_metadata$Chromosome <- as.integer(probes_metadata$Chromosome) > markerID <- paste(probes_metadata$Chromosome,probes_metadata$Start, sep = > ":") > # Removed duplicates > probes_metadata <- probes_metadata[!duplicated(markerID),] > # Filter probes_metadata for common CNV > markerID <- paste(probes_metadata$Chromosome,probes_metadata$Start, sep = > ":") > > markers_obj <- load_markers(probes_metadata) > ## Aberrant Region Descriptor Matrix > load("prad.cnv.hg38.rda") > synthCNV_Matrix<- data > > synthCNV_Matrix <- cbind(synthCNV_Matrix,Label=NA) > synthCNV_Matrix[synthCNV_Matrix[,"Segment_Mean"] < -0.2,"Label"] <- 0 > synthCNV_Matrix[synthCNV_Matrix[,"Segment_Mean"] > 0.2,"Label"] <- 1 > synthCNV_Matrix <- synthCNV_Matrix[!is.na(synthCNV_Matrix$Label),] > > synthCNV_Matrix<- synthCNV_Matrix[,c(7,2,3,4,5,8)] > colnames(synthCNV_Matrix)<- c("Sample.Name", "Chromosome", "Start", "End", > "Num.of.Markers", "Aberration") > > #Replace x and y chromosome names > xidx <- which(synthCNV_Matrix$Chromosome=="X") > yidx <- which(synthCNV_Matrix$Chromosome=="Y") > synthCNV_Matrix[xidx,"Chromosome"] <- 23 > synthCNV_Matrix[yidx,"Chromosome"] <- 24 > synthCNV_Matrix$Chromosome <- sapply(synthCNV_Matrix$ > Chromosome,as.integer) > > > cnv_obj<- load_cnv(synthCNV_Matrix, markers_obj, length(selected)) > > #Loading Copy Number Data > > #.Error in start_index:end_index : argument of length 0 > >  > > > > > > ------------------------------ > > Post tags: gaia, cnv > > You may reply via email or visit https://support.bioconductor. > org/p/111990/#112170 > -- - Sandro Morganella -
I have double checked and the problem reported here is actually related to the fact that the probes inside the markersMatrix does not contain all the probes reported in the cnvMatrix. Note that the markersMatrix must have a probe for each Start and End reported in the cnvMatrix. You have two options for this: 1) Add the position of the missing probes to the markerMatrix 2) Remove the aberrant regions from the cnvMatrix that are not covered in the markerMatrix On Mon, Aug 20, 2018 at 1:54 PM Morganella Sandro <morganellaalx@gmail.com> wrote: > The is related to the cnvMatrix: the colulmn 'Num.of.Markers' must report > the size of the aberrant region (End-Start). > > I can see that the manual/help is misleading in calling this > Num.of.Markers, it should be 'Region Size' (or something like that). I need > to fix this problem with the manual/help. > > Anyway, you will fix the problem with your data by using the follwoing > command: > > cnvMatrix1$Num.of.Markers <- cnvMatrix1$End-cnvMatrix1$Start+1 > > On Thu, Aug 16, 2018 at 8:39 PM, martinguerrerog89 [bioc] < > noreply@bioconductor.org> wrote: > >> Activity on a post you are following on support.bioconductor.org >> >> User martinguerrerog89 <https: support.bioconductor.org="" u="" 14839=""/> wrote Comment: >> Gaia not returning results - load_cnv problem >> <https: support.bioconductor.org="" p="" 111990="" #112170="">: >> >> Having the same problem here! >> >>  >> >> library(TCGAbiolinks) >> >> library(GAIA) >> >> project = "TCGA-PRAD" >> data.category = "Copy Number Variation" >> data.type = "Masked Copy Number Segment" >> legacy = FALSE >> sample.type = c("Primary solid Tumor") >> >> datQuery <- GDCquery(project = project, data.category = >> data.category,data.type = data.type,legacy=legacy,sample.type = sample.type >> ) >> GDCdownload(datQuery) >> prad<- GDCprepare(datQuery,save = TRUE,save.filename = >> "prad.cnv.hg38.rda",summarizedExperiment = TRUE) >> >> ## Marker Descriptor Matrix >> # >> https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files >> >> url<- " >> https://gdc.cancer.gov/files/public/file/snp6.na35.liftoverhg38.txt.zip" >> temp <- tempfile() >> download.file(url = url, temp) >> unzip(temp) >> probes_metadata<- read.csv("snp6.na35.liftoverhg38.txt", sep = "\t",as.is >> = TRUE) >> probes_metadata=probes_metadata[probes_metadata[,"freqcnv"]==FALSE,] >> colnames(probes_metadata)[1:3] <- c("Probe.Name", "Chromosome", "Start") >> >> probes_metadata[probes_metadata$Chromosome == "X","Chromosome"] <- 23 >> probes_metadata[probes_metadata$Chromosome == "Y","Chromosome"] <- 24 >> probes_metadata$Chromosome <- as.integer(probes_metadata$Chromosome) >> markerID <- paste(probes_metadata$Chromosome,probes_metadata$Start, sep = >> ":") >> # Removed duplicates >> probes_metadata <- probes_metadata[!duplicated(markerID),] >> # Filter probes_metadata for common CNV >> markerID <- paste(probes_metadata$Chromosome,probes_metadata$Start, sep = >> ":") >> >> markers_obj <- load_markers(probes_metadata) >> ## Aberrant Region Descriptor Matrix >> load("prad.cnv.hg38.rda") >> synthCNV_Matrix<- data >> >> synthCNV_Matrix <- cbind(synthCNV_Matrix,Label=NA) >> synthCNV_Matrix[synthCNV_Matrix[,"Segment_Mean"] < -0.2,"Label"] <- 0 >> synthCNV_Matrix[synthCNV_Matrix[,"Segment_Mean"] > 0.2,"Label"] <- 1 >> synthCNV_Matrix <- synthCNV_Matrix[!is.na(synthCNV_Matrix$Label),] >> >> synthCNV_Matrix<- synthCNV_Matrix[,c(7,2,3,4,5,8)] >> colnames(synthCNV_Matrix)<- c("Sample.Name", "Chromosome", "Start", >> "End", "Num.of.Markers", "Aberration") >> >> #Replace x and y chromosome names >> xidx <- which(synthCNV_Matrix$Chromosome=="X") >> yidx <- which(synthCNV_Matrix$Chromosome=="Y") >> synthCNV_Matrix[xidx,"Chromosome"] <- 23 >> synthCNV_Matrix[yidx,"Chromosome"] <- 24 >> synthCNV_Matrix$Chromosome <- >> sapply(synthCNV_Matrix$Chromosome,as.integer) >> >> >> cnv_obj<- load_cnv(synthCNV_Matrix, markers_obj, length(selected)) >> >> #Loading Copy Number Data >> >> #.Error in start_index:end_index : argument of length 0 >> >>  >> >> >> >> >> >> ------------------------------ >> >> Post tags: gaia, cnv >> >> You may reply via email or visit >> C: Gaia not returning results - load_cnv problem >> > > > > -- > - Sandro Morganella - > -- - Sandro Morganella -
1
3 months ago by
United Kingdom
Sandro Morganella30 wrote:
Hi, I think the issue can be related to the order of your matrix. Try to order your matrix so that the sample name is in the first column. In particular your matrix should have exactly this format: $Sample.Name$ Chromosome $Start$ End. $Num.of.Markers$ Aberration The function that loads the CNV matrix doesn't uses the column's name but it is positional. Pleas try this and let me know if it works. Best, Sandro On Mon, Aug 13, 2018 at 1:50 PM, rina [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User rina <https: support.bioconductor.org="" u="" 16738=""/> wrote Question: > Gaia not returning results - load_cnv problem > <https: support.bioconductor.org="" p="" 111990=""/>: > > Hello, > > I perform CNV analysis using TCGA data and the gaia package. I have > followed the reference manual of gaia, compared thoroughly the format of my > inputs, but I still do not get any results back. The run_gaia function runs > without any problem, but it always returns an empty data frame. > > Here are some info about the data I am using. > > str(cnvMatrix) > > 'data.frame': 28270 obs. of 6 variables: > > $Chromosome : int 1 1 1 1 1 1 1 1 1 1 ... > >$ Start : int 25266637 40881535 48892229 72290430 94665210 103697067 109690415 112153343 152583230 193187149 ... > > $End : int 25318693 40908771 49360721 72343068 94669551 103717410 109697556 112154591 152614118 193187218 ... > >$ Num.of.Markers: int 19 16 232 49 16 3 12 22 35 4 ... > > > $Sample.Name : chr "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" ... > >$ Aberration : int 1 1 0 1 1 1 0 0 0 1 ... > > > > > str(markersMatrix) > > 'data.frame': 1874147 obs. of 3 variables: > > $Probe.Name: int 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 ... > >$ Chromosome: int 1 1 1 1 1 1 1 1 1 1 ... > > $Start : int 99897690 99904180 99942312 99973037 99984524 99990722 100006891 100027241 100031071 100083855 ... > > The problem, and most probably the reason of why I do not get back > results, seems to be the matrix returned by load_cnv(cnvMatrix, > markers_obj, nbsamples)where the observations of each aberration (in my > case 0s and 1s) in each chromosome is provided three times. > > I cannot see why I get this result back. Hope someone can help me. > > Thank you in advance. > > R. > > ------------------------------ > > Post tags: gaia, cnv > > You may reply via email or visit https://support.bioconductor. > org/p/111990/ > -- - Sandro Morganella - ADD COMMENTlink written 3 months ago by Sandro Morganella30 Hi Thanks a lot for your help. I had considered that, but for some reason whenever I change the order ( by: setcolorder(cnvMatrix, c("Sample.Name","Chromosome", "Start", "End", "Num.of.Markers","Aberration")) I get this error > cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples) Loading Copy Number Data .Error in start_index:end_index : argument of length 0 but inspecting the cnvMatrix it does not seem that there is any problem with the columns. > head(cnvMatrix) Sample.Name Chromosome Start End Num.of.Markers Aberration 3 TCGA-AA-A02R-01A-01D-A008-01 1 25266637 25318693 19 1 5 TCGA-AA-A02R-01A-01D-A008-01 1 40881535 40908771 16 1 7 TCGA-AA-A02R-01A-01D-A008-01 1 48892229 49360721 232 0 9 TCGA-AA-A02R-01A-01D-A008-01 1 72290430 72343068 49 1 11 TCGA-AA-A02R-01A-01D-A008-01 1 94665210 94669551 16 1 13 TCGA-AA-A02R-01A-01D-A008-01 1 103697067 103717410 3 1 > str(cnvMatrix) 'data.frame': 28270 obs. of 6 variables:$ Sample.Name   : chr  "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" ...

$Chromosome : int 1 1 1 1 1 1 1 1 1 1 ...$ Start         : int  25266637 40881535 48892229 72290430 94665210 103697067 109690415 112153343 152583230 193187149 ...

$End : int 25318693 40908771 49360721 72343068 94669551 103717410 109697556 112154591 152614118 193187218 ...$ Num.of.Markers: int  19 16 232 49 16 3 12 22 35 4 ...

$Aberration : int 1 1 0 1 1 1 0 0 0 1 ... ADD REPLYlink written 3 months ago by rina0 Are the positions of the probes in your markers matrix consistent with the cnv matrix? It is quite hard for me to understand what is the problem without looking at data. On Mon, Aug 13, 2018 at 2:17 PM, rina [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User rina <https: support.bioconductor.org="" u="" 16738=""/> wrote Comment: Gaia > not returning results - load_cnv problem > <https: support.bioconductor.org="" p="" 111990="" #111992="">: > > Hi > > Thanks a lot for your help. I had considered that, but for some reason > whenever I change the order ( by: setcolorder(cnvMatrix, > c("Sample.Name","Chromosome", "Start", "End", "Num.of.Markers","Aberration")) > I get this error > > > cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples) > Loading Copy Number Data > .Error in start_index:end_index : argument of length 0 > > but inspecting the cnvMatrix it does not seem that there is any problem > with the columns. > > > head(cnvMatrix) > Sample.Name Chromosome Start End Num.of.Markers Aberration > 3 TCGA-AA-A02R-01A-01D-A008-01 1 25266637 25318693 19 1 > 5 TCGA-AA-A02R-01A-01D-A008-01 1 40881535 40908771 16 1 > 7 TCGA-AA-A02R-01A-01D-A008-01 1 48892229 49360721 232 0 > 9 TCGA-AA-A02R-01A-01D-A008-01 1 72290430 72343068 49 1 > 11 TCGA-AA-A02R-01A-01D-A008-01 1 94665210 94669551 16 1 > 13 TCGA-AA-A02R-01A-01D-A008-01 1 103697067 103717410 3 1 > > > str(cnvMatrix) > > 'data.frame': 28270 obs. of 6 variables: > >$ Sample.Name : chr "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" "TCGA-AA-A02R-01A-01D-A008-01" ... > > $Chromosome : int 1 1 1 1 1 1 1 1 1 1 ... > >$ Start : int 25266637 40881535 48892229 72290430 94665210 103697067 109690415 112153343 152583230 193187149 ... > > $End : int 25318693 40908771 49360721 72343068 94669551 103717410 109697556 112154591 152614118 193187218 ... > >$ Num.of.Markers: int 19 16 232 49 16 3 12 22 35 4 ... > > \$ Aberration : int 1 1 0 1 1 1 0 0 0 1 ... > > ------------------------------ > > Post tags: gaia, cnv > > You may reply via email or visit https://support.bioconductor. > org/p/111990/#111992 > -- - Sandro Morganella -
So this error suggests that there is a problem with the markers themselves and not the structure of one of the two matrices?Segment data and markers are both mapped on hg38, so theoretically they should be consistent, right? Is there something specific that I could upload here so you could get an overview? Excuse my naive questions as a newbie! Really appreciate the help!
As last attempt, could you try to run the analysis only on the probes/cnv of chromosome 1? On Mon, Aug 13, 2018 at 8:38 PM, rina [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User rina <https: support.bioconductor.org="" u="" 16738=""/> wrote Comment: Gaia > not returning results - load_cnv problem > <https: support.bioconductor.org="" p="" 111990="" #112012="">: > > So this error suggests that there is a problem with the markers themselves > and not the structure of one of the two matrices?Segment data and markers > are both mapped on hg38, so theoretically they should be consistent, right? > Is there something specific that I could upload here so you could get an > overview? Excuse my naive questions as a newbie! Really appreciate the > help! > ------------------------------ > > Post tags: gaia, cnv > > You may reply via email or visit https://support.bioconductor. > org/p/111990/#112012 > -- - Sandro Morganella -
This didn't seem to work either. I get the same error regarding the start and end index.
How I said it is quite hard for me to understand where the problem is without looking at the data. Anyway, If you could send me the data for chr1 (both cnv and probe data) I will try to understand how to fix the issue. Please use this email morganellaalx [at] gmail.com On Wed, Aug 15, 2018 at 11:05 AM, rina [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User rina <https: support.bioconductor.org="" u="" 16738=""/> wrote Comment: Gaia > not returning results - load_cnv problem > <https: support.bioconductor.org="" p="" 111990="" #112093="">: > > This didn't seem to work either. I get the same error regarding the start > and end index. > ------------------------------ > > Post tags: gaia, cnv > > You may reply via email or visit https://support.bioconductor. > org/p/111990/#112093 > -- - Sandro Morganella -

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.