Question: VCF file does not include snp.id, can I still run SNPRelate for Relatedness Analysis? Data is output from STACKS for mangroves
0
gravatar for cav3gh
7 months ago by
cav3gh0
cav3gh0 wrote:

I am using the Tutorials for the R/Bioconductor package SNPRelate trying to run a relatedness analysis. I have a VCF output file from STACKS for mangrove (Avicennia germinans) populations. The VCF includes the following information:

INFO ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data" INFO ID=AF,Number=.,Type=Float,Description="Allele Frequency" FORMAT ID=GT,Number=1,Type=String,Description="Genotype" FORMAT ID=DP,Number=1,Type=Integer,Description="Read Depth" FORMAT ID=AD,Number=1,Type=Integer,Description="Allele Depth" FORMAT ID=GL,Number=.,Type=Float,Description="Genotype Likelihood" INFO ID=locori,Number=1,Type=Character,Description="Orientation the corresponding Stacks locus aligns in"

The first two rows of the VCF file has:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BC-B39.all BC102.all ............. un 1105 16_53 A G . PASS NS=2;AF=0.500;locori=p GT:DP:AD ./.:0:.,. ./.:0:.,. ..............

I am running the code based on the tutorial and have run the following code:

setwd("~/Desktop")
vcf_test1 <- read.vcf("/Users/allisavincent/Desktop/Full_Study_Current.vcf")
vcf.fn <- "/Users/allisavincent/Desktop/Full_Study_Current.vcf"
seqarray_test2 <- snpgdsVCF2GDS(vcf.fn, "Full_Study.gds")
snpgdsSummary("/Users/allisavincent/Desktop/Full_Study.gds")
genofile <- snpgdsOpen("/Users/allisavincent/Desktop/Full_Study.gds")
pop_code <- scan("/Users/allisavincent/Desktop/pop.txt", what=character())
set.seed(100)
snp.id <- samplesnpset.id, 1500)  # random 1500 SNPs
    Error in samplesnpset.id, 1500) : object 'snpset.id' not found
ibd <- snpgdsIBDMLE(genofile, sample.id=YRI.id, snp.id=snp.id,
+                     maf=0.05, missing.rate=0.05, num.thread=2)
Error in stopifnotis.nullsample.id) | is.vectorsample.id) | is.factorsample.id)) : 
  object 'YRI.id' not found
snprelate vcf • 156 views
ADD COMMENTlink modified 7 months ago by Stephanie M. Gogarten740 • written 7 months ago by cav3gh0
Answer: VCF file does not include snp.id, can I still run SNPRelate for Relatedness Anal
1
gravatar for Stephanie M. Gogarten
7 months ago by
University of Washington
Stephanie M. Gogarten740 wrote:

When the GDS file is created, snpgdsVCF2GDS automatically generates a unique integer ID for each variant. This is what you would use to identify variants in SNPRelate functions.

snp.id <- read.gdsn(index.gdsn(genofile, "snp.id"))
sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

Your code has several errors that are not related to the contents of the VCF file: you have not defined the objects snpset.id or YRI.id.

ADD COMMENTlink written 7 months ago by Stephanie M. Gogarten740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 481 users visited in the last hour