Entering edit mode
georgewwp
•
0
@georgewwp-9719
Last seen 4.7 years ago
Hi there,
I have a list of variants (in vcf format) called across two samples. I want to use multiple criteria to select a subset of these variants.
For example, I want to select "G" to "A" or "C" to "T" changes, since I'm only interested in these two specific type of SNPs.
Also, I want these SNPs have certain GT call combinations in the two samples: "0/1" GT call for sample 1 and "1/1" GT call for sample 2; or cannot be "1/1" for both samples at the same time.
What's the best way to achieve this? I had difficulty combining these criteria.
Thanks!
George
class: CollapsedVCF
dim: 309482 2
rowRanges(vcf):
GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 17 columns: INDEL, IDV, IMF, DP, VDB, RPB, MQB, BQB, MQSB, SGB, MQ0F, ICB, HOB, AC, AN, DP4, MQ
info(header(vcf)):
Number Type Description
INDEL 0 Flag Indicates that the variant is an INDEL.
IDV 1 Integer Maximum number of reads supporting an indel
IMF 1 Float Maximum fraction of reads supporting an indel
DP 1 Integer Raw read depth
VDB 1 Float Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better),Version
RPB 1 Float Mann-Whitney U test of Read Position Bias (bigger is better)
MQB 1 Float Mann-Whitney U test of Mapping Quality Bias (bigger is better)
BQB 1 Float Mann-Whitney U test of Base Quality Bias (bigger is better)
MQSB 1 Float Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)
SGB 1 Float Segregation based metric.
MQ0F 1 Float Fraction of MQ0 reads (smaller is better)
ICB 1 Float Inbreeding Coefficient Binomial test (bigger is better)
HOB 1 Float Bias in the number of HOMs number (smaller is better)
AC A Integer Allele count in genotypes for each ALT allele, in the same order as listed
AN 1 Integer Total number of alleles in called genotypes
DP4 4 Integer Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases
MQ 1 Integer Average mapping quality
geno(vcf):
SimpleList of length 2: GT, PL
geno(header(vcf)):
Number Type Description
GT 1 String Genotype
PL G Integer List of Phred-scaled genotype likelihoods

It would be helpful to know what you tried and where it failed. The first step would be to
expand()the VCF so that the variants can be selected on a per-alt basis.