Calculate heterozygosity % using SNP genotype data
2
0
Entering edit mode
gowtham ▴ 210
@gowtham-5301
Last seen 9.6 years ago
Hi Everyone, I am attempting to do the same from a vcf file that contains 1000s of variant positions across 4 samples (two groups with 2 replicates each). VCFfiles are generated using GATK. I would like to know what is the heterozygosity levels for each samples. I readVcf files and created snpmat using MatrixToSnpMatrix. tab <- TabixFile("./Allsamples_gatk.q_50_30.vcf.gz", "Allsamples_gatk.q_50_30.vcf.gz.tbi") vcf_all <- readVcf(tab, "LdoB") snpmat <- MatrixToSnpMatrix( geno(vcf_all)$GT, values(ref(vcf_all))[["REF"]], library(snpStats) library(hexbin) summary(snpmat) But, summary(snpmat) give different output than that of mentioned in snpStats vignette. I am wondering how do I generate hetrozgosity values for each sample here. > summary(snpmat) Length Class Mode genotypes 294028 SnpMatrix raw map 4 DataFrame S4 The snpmat object seems to be SnpMatrix object with 4 rows and 73507 columns. Any further advice will be very much appreciated. Thanks, Gowthaman On Fri, Jun 1, 2012 at 2:50 PM, Yadav Sapkota <ysapkota@ualberta.ca> wrote: > Hello, > > I am trying to validate few LOH regions using SNP genotype data. I am > assuming that if it is a LOH, it will contain predominantly homozygous > genotypes. For simplicity, I chose 15 SNPs per ~70 kb LOH region. > > Now I need to calculate the heterozygosity for LOHs in each samples using > genotype data of 15 SNPs. > > Does anyone know the way to calculate the heterozygous % per sample using a > set of SNP genotype data? > > Your help will be greatly appreciated. > > --Yadav > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]
SNP snpMatrix SNP snpMatrix • 5.6k views
ADD COMMENT
0
Entering edit mode
gowtham ▴ 210
@gowtham-5301
Last seen 9.6 years ago
Oops!. I think, summary(snpmat$genotypes) gave the summary information as in vignette. It also has a column with summary on heterozygosity. I assume it summarises the data over all the samples. But, if I would like to know the heterozygosity for each of the samples, how do i get that? And how do I print at the data in snpmat? Sorry for asking such a naive question. Thanks, Gowthaman On Fri, Jul 20, 2012 at 12:19 PM, gowtham <ragowthaman@gmail.com> wrote: > Hi Everyone, > I am attempting to do the same from a vcf file that contains 1000s of > variant positions across 4 samples (two groups with 2 replicates each). > VCFfiles are generated using GATK. > > I would like to know what is the heterozygosity levels for each samples. I > readVcf files and created snpmat using MatrixToSnpMatrix. > > > tab <- TabixFile("./Allsamples_gatk.q_50_30.vcf.gz", > "Allsamples_gatk.q_50_30.vcf.gz.tbi") > vcf_all <- readVcf(tab, "LdoB") > snpmat <- MatrixToSnpMatrix( geno(vcf_all)$GT, > values(ref(vcf_all))[["REF"]], > library(snpStats) > library(hexbin) > summary(snpmat) > > But, summary(snpmat) give different output than that of mentioned in > snpStats vignette. I am wondering how do I generate hetrozgosity values for > each sample here. > > > summary(snpmat) > Length Class Mode > genotypes 294028 SnpMatrix raw > map 4 DataFrame S4 > > The snpmat object seems to be SnpMatrix object with 4 rows and 73507 > columns. Any further advice will be very much appreciated. > > Thanks, > Gowthaman > > > On Fri, Jun 1, 2012 at 2:50 PM, Yadav Sapkota <ysapkota@ualberta.ca>wrote: > >> Hello, >> >> I am trying to validate few LOH regions using SNP genotype data. I am >> assuming that if it is a LOH, it will contain predominantly homozygous >> genotypes. For simplicity, I chose 15 SNPs per ~70 kb LOH region. >> >> Now I need to calculate the heterozygosity for LOHs in each samples using >> genotype data of 15 SNPs. >> >> Does anyone know the way to calculate the heterozygous % per sample using >> a >> set of SNP genotype data? >> >> Your help will be greatly appreciated. >> >> --Yadav >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On Fri, Jul 20, 2012 at 3:36 PM, gowtham <ragowthaman@gmail.com> wrote: > Oops!. I think, > summary(snpmat$genotypes) gave the summary information as in vignette. It > also has a column with summary on heterozygosity. I assume it summarises > the data over all the samples. But, if I would like to know the > heterozygosity for each of the samples, how do i get that? > > And how do I print at the data in snpmat? Sorry for asking such a naive > question. > > Thanks, > Gowthaman > > On Fri, Jul 20, 2012 at 12:19 PM, gowtham <ragowthaman@gmail.com> wrote: > > > Hi Everyone, > > I am attempting to do the same from a vcf file that contains 1000s of > > variant positions across 4 samples (two groups with 2 replicates each). > > VCFfiles are generated using GATK. > > > > I would like to know what is the heterozygosity levels for each samples. > I > > readVcf files and created snpmat using MatrixToSnpMatrix. > > > > > > tab <- TabixFile("./Allsamples_gatk.q_50_30.vcf.gz", > > "Allsamples_gatk.q_50_30.vcf.gz.tbi") > > vcf_all <- readVcf(tab, "LdoB") > > snpmat <- MatrixToSnpMatrix( geno(vcf_all)$GT, > > values(ref(vcf_all))[["REF"]], > > library(snpStats) > > library(hexbin) > > summary(snpmat) > > > > But, summary(snpmat) give different output than that of mentioned in > > snpStats vignette. I am wondering how do I generate hetrozgosity values > for > > each sample here. > > > > > summary(snpmat) > > Length Class Mode > > genotypes 294028 SnpMatrix raw > > map 4 DataFrame S4 > > > > The snpmat object seems to be SnpMatrix object with 4 rows and 73507 > > columns. Any further advice will be very much appreciated. > > > > Thanks, > > Gowthaman > > > > > > On Fri, Jun 1, 2012 at 2:50 PM, Yadav Sapkota <ysapkota@ualberta.ca> >wrote: > > > >> Hello, > >> > >> I am trying to validate few LOH regions using SNP genotype data. I am > >> assuming that if it is a LOH, it will contain predominantly homozygous > >> genotypes. For simplicity, I chose 15 SNPs per ~70 kb LOH region. > >> > >> Now I need to calculate the heterozygosity for LOHs in each samples > using > >> genotype data of 15 SNPs. > >> > >> Does anyone know the way to calculate the heterozygous % per sample > using > >> a > >> set of SNP genotype data? > >> > >> Your help will be greatly appreciated. > >> > >> --Yadav > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > > -- > > Gowthaman > > > > Bioinformatics Systems Programmer. > > SBRI, 307 West lake Ave N Suite 500 > > Seattle, WA. 98109-5219 > > Phone : LAB 206-256-7188 (direct). > > > > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
On Fri, Jul 20, 2012 at 3:19 PM, gowtham <ragowthaman@gmail.com> wrote: > Hi Everyone, > I am attempting to do the same from a vcf file that contains 1000s of > variant positions across 4 samples (two groups with 2 replicates each). > VCFfiles are generated using GATK. > > I would like to know what is the heterozygosity levels for each samples. I > readVcf files and created snpmat using MatrixToSnpMatrix. > > > tab <- TabixFile("./Allsamples_gatk.q_50_30.vcf.gz", > "Allsamples_gatk.q_50_30.vcf.gz.tbi") > vcf_all <- readVcf(tab, "LdoB") > snpmat <- MatrixToSnpMatrix( geno(vcf_all)$GT, > values(ref(vcf_all))[["REF"]], > library(snpStats) > library(hexbin) > summary(snpmat) > > But, summary(snpmat) give different output than that of mentioned in > snpStats vignette. I am wondering how do I generate hetrozgosity values for > each sample here. > > use col.summary, not summary > > summary(snpmat) > Length Class Mode > genotypes 294028 SnpMatrix raw > map 4 DataFrame S4 > > The snpmat object seems to be SnpMatrix object with 4 rows and 73507 > columns. Any further advice will be very much appreciated. > > Thanks, > Gowthaman > > > On Fri, Jun 1, 2012 at 2:50 PM, Yadav Sapkota <ysapkota@ualberta.ca> > wrote: > > > Hello, > > > > I am trying to validate few LOH regions using SNP genotype data. I am > > assuming that if it is a LOH, it will contain predominantly homozygous > > genotypes. For simplicity, I chose 15 SNPs per ~70 kb LOH region. > > > > Now I need to calculate the heterozygosity for LOHs in each samples using > > genotype data of 15 SNPs. > > > > Does anyone know the way to calculate the heterozygous % per sample > using a > > set of SNP genotype data? > > > > Your help will be greatly appreciated. > > > > --Yadav > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
> > > use col.summary, not summary > Thanks Vincent. But, i dont seem to have col.summary function available in my installation. Neigher row.summary. Is it part of snpStats or some basic R package? I managed to get following table. Is it okay to calculate % of heterozygosity (A/B) from here? Or does the calculation originally involves more statistical approach? Thanks once again, Gowthaman -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
> library(snpStats) Loading required package: survival Loading required package: splines Loading required package: Matrix Loading required package: lattice Attaching package: 'Matrix' The following object(s) are masked from 'package:IRanges': expand > args(col.summary) function (object, rules = NULL, uncertain = TRUE) NULL give sessionInfo() if questions persist. On Fri, Jul 20, 2012 at 5:11 PM, gowtham <ragowthaman@gmail.com> wrote: > >> use col.summary, not summary >> > > Thanks Vincent. But, i dont seem to have col.summary function available in > my installation. Neigher row.summary. Is it part of snpStats or some basic > R package? > > > I managed to get following table. Is it okay to calculate % of > heterozygosity (A/B) from here? Or does the calculation originally involves > more statistical approach? > > Thanks once again, > Gowthaman > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Too bad on my part Vincent. I have installed the snpStats in the current session but, forgot to load it. I could not believe, i was bothering everyone with having bothered to check this up. Thanks so much for the help, Gowthaman On Fri, Jul 20, 2012 at 3:05 PM, Vincent Carey <stvjc@channing.harvard.edu>wrote: > > library(snpStats) > Loading required package: survival > Loading required package: splines > Loading required package: Matrix > Loading required package: lattice > > Attaching package: 'Matrix' > > The following object(s) are masked from 'package:IRanges': > > expand > > > args(col.summary) > function (object, rules = NULL, uncertain = TRUE) > NULL > > give sessionInfo() if questions persist. > > On Fri, Jul 20, 2012 at 5:11 PM, gowtham <ragowthaman@gmail.com> wrote: > >> >>> use col.summary, not summary >>> >> >> Thanks Vincent. But, i dont seem to have col.summary function available >> in my installation. Neigher row.summary. Is it part of snpStats or some >> basic R package? >> >> >> I managed to get following table. Is it okay to calculate % of >> heterozygosity (A/B) from here? Or does the calculation originally involves >> more statistical approach? >> >> Thanks once again, >> Gowthaman >> >> -- >> Gowthaman >> >> Bioinformatics Systems Programmer. >> SBRI, 307 West lake Ave N Suite 500 >> Seattle, WA. 98109-5219 >> Phone : LAB 206-256-7188 (direct). >> > > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6