VCF file does not include snp.id, can I still run SNPRelate for Relatedness Analysis? Data is output from STACKS for mangroves
1
0
Entering edit mode
cav3gh • 0
@cav3gh-15680
Last seen 2.6 years ago

I am using the Tutorials for the R/Bioconductor package SNPRelate trying to run a relatedness analysis. I have a VCF output file from STACKS for mangrove (Avicennia germinans) populations. The VCF includes the following information:

INFO ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data" INFO ID=AF,Number=.,Type=Float,Description="Allele Frequency" FORMAT ID=GT,Number=1,Type=String,Description="Genotype" FORMAT ID=DP,Number=1,Type=Integer,Description="Read Depth" FORMAT ID=AD,Number=1,Type=Integer,Description="Allele Depth" FORMAT ID=GL,Number=.,Type=Float,Description="Genotype Likelihood" INFO ID=locori,Number=1,Type=Character,Description="Orientation the corresponding Stacks locus aligns in"

The first two rows of the VCF file has:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BC-B39.all BC102.all ............. un 1105 16_53 A G . PASS NS=2;AF=0.500;locori=p GT:DP:AD ./.:0:.,. ./.:0:.,. ..............

I am running the code based on the tutorial and have run the following code:

setwd("~/Desktop")
vcf_test1 <- read.vcf("/Users/allisavincent/Desktop/Full_Study_Current.vcf")
vcf.fn <- "/Users/allisavincent/Desktop/Full_Study_Current.vcf"
seqarray_test2 <- snpgdsVCF2GDS(vcf.fn, "Full_Study.gds")
snpgdsSummary("/Users/allisavincent/Desktop/Full_Study.gds")
genofile <- snpgdsOpen("/Users/allisavincent/Desktop/Full_Study.gds")
pop_code <- scan("/Users/allisavincent/Desktop/pop.txt", what=character())
set.seed(100)
snp.id <- samplesnpset.id, 1500)  # random 1500 SNPs
    Error in samplesnpset.id, 1500) : object 'snpset.id' not found
ibd <- snpgdsIBDMLE(genofile, sample.id=YRI.id, snp.id=snp.id,
+                     maf=0.05, missing.rate=0.05, num.thread=2)
Error in stopifnotis.nullsample.id) | is.vectorsample.id) | is.factorsample.id)) : 
  object 'YRI.id' not found
SNPRelate VCF • 464 views
ADD COMMENT
1
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 11 weeks ago
University of Washington

When the GDS file is created, snpgdsVCF2GDS automatically generates a unique integer ID for each variant. This is what you would use to identify variants in SNPRelate functions.

snp.id <- read.gdsn(index.gdsn(genofile, "snp.id"))
sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

Your code has several errors that are not related to the contents of the VCF file: you have not defined the objects snpset.id or YRI.id.

ADD COMMENT

Login before adding your answer.

Traffic: 203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6