Question: predictCoding with multiple samples
0
gravatar for luong.jeff
4.3 years ago by
luong.jeff0
Canada
luong.jeff0 wrote:

Hi BioC,

I'm using the predictCoding function in the VariantAnnotation package on a VCF file with multiple samples. predictCoding tells me that there are many variations at the same location, but no way to tell which variation occurred in which sample. It is possible to retrieve this information?

ADD COMMENTlink modified 4.3 years ago by Valerie Obenchain6.7k • written 4.3 years ago by luong.jeff0
Answer: predictCoding with multiple samples
0
gravatar for Valerie Obenchain
4.3 years ago by
United States
Valerie Obenchain6.7k wrote:

Hi,

You'll need to look at the GT data to determine which samples have the variant. The genotypesToSnpMatrix() function converts the genotypes to a SnpMatrix object where rows are samples and columns are snps. See ?genotypeToSnpMatrix for details and information about the warnings.

> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 
> vcf <- readVcf(fl, "hg19")
> mat <- genotypeToSnpMatrix(vcf)
Warning messages:
1: In .local(x, ...) : variants with >1 ALT allele are set to NA
2: In .local(x, ...) : non-single nucleotide variations are set to NA

> as(mat$genotype, "character")
        rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1
NA00001 "A/A"     "A/A"        "NA"      "NA"           "NA"     
NA00002 "A/B"     "A/B"        "NA"      "NA"           "NA"     
NA00003 "B/B"     "A/A"        "NA"      "NA"           "NA"     
> as(mat$genotype, "matrix")
        rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1
NA00001        01           01        00             00        00
NA00002        02           02        00             00        00
NA00003        03           01        00             00        00

 

You probably know that predictCoding() returns results for coding variants only and if the variant falls in multiple transcripts there will be a row for each variant-transcript match. The QUERYID column in the output maps back to the row of the original query. Using this and the data from the SnpMatrix you can id which samples had a particular variant output by predictCoding().

Valerie

ADD COMMENTlink written 4.3 years ago by Valerie Obenchain6.7k
Hi Valerie, I’m still having difficulties interpreting my GT data using the genotypeToSnpMatrix function. I have some sample data below to illustrate my situation. The predictCoding function output suggests that there are 4 coding variants at row 4116 of the original query. But the SnpMatrix data suggests that there are many more than 4 samples with at least one risk allele (“B”). How would I interpret this data? How would I find out which 4 out of my 209 samples are being represented in the predictCoding output? query_id gene cancer_type sample_id position wt_residue mut_residue 4116 CECR1 breastcancer 22:17669306_T/C 94 H R 4116 CECR1 breastcancer 22:17669306_T/C 335 H R 4116 CECR1 breastcancer 22:17669306_T/C 335 H R 4116 CECR1 breastcancer 22:17669306_T/C 293 H R SnpMatrix "D66001" "A/B" "D66002" "A/A" "D66003" "A/B" "D66004" "A/A" "D66005" "A/A" "D66006" "A/B" "D66007" "A/A" "D66008" "A/B" "D66009" "B/B" "D66010" "A/B" "D66011" "A/B" "D66012" "A/B" "D66013" "A/A" "D66014" "A/A" "D66015" "A/A" "D66016" "A/B" "D66017" "A/B" "D66018" "A/B" "D66019" "A/A" "D66020" "B/B" "D66021" "A/A" "D66022" "A/A" "D66023" "A/B" "D66024" "A/A" "D66025" "A/A" "D66026" "A/A" "D66027" "A/B" "D66028" "A/B" "D66029" "A/A" "D66030" "B/B" "D66031" "A/B" "D66032" "A/A" "D66033" "A/B" "D66034" "A/A" "D66035" "A/B" "D66036" "B/B" "D66037" "B/B" "D66038" "A/A" "D66039" "A/A" "D66040" "A/B" "D66041" "A/B" "D66042" "A/B" "D66043" "A/A" "D66044" "A/A" "D66045" "A/A" "D66046" "B/B" "D66047" "A/B" "D66048" "A/A" "D66049" "A/B" "D66050" "A/B" "D66051" "A/B" "D66052" "A/B" "D66053" "A/B" "D66054" "A/B" "D66055" "A/A" "D66056" "A/A" "D66057" "A/B" "D66058" "A/A" "D66059" "A/B" "D66060" "A/B" "D66061" "A/A" "D66062" "A/B" "D66063" "A/A" "D66064" "A/A" "D66065" "A/A" "D66066" "A/A" "D66067" "A/B" "D66068" "B/B" "D66069" "A/A" "D66070" "A/A" "D66071" "A/A" "D66072" "A/A" "D66073" "A/B" "D66074" "B/B" "D66075" "A/B" "D66076" "A/A" "D66077" "A/A" "D66078" "A/A" "D66079" "A/A" "D66080" "A/A" "D66081" "A/A" "D66082" "A/A" "D66083" "A/A" "D66084" "B/B" "D66085" "A/A" "D66086" "A/A" "D66087" "A/B" "D66088" "A/A" "D66089" "A/B" "D66090" "A/B" "D66091" "A/A" "D66092" "A/A" "D66093" "A/B" "D66094" "A/A" "D66095" "B/B" "D66096" "A/A" "D66097" "A/A" "D66098" "A/B" "D66099" "A/B" "D66100" "A/B" "D66101" "A/B" "D66102" "A/B" "D66103" "A/A" "D66104" "A/A" "D66200" "A/A" "D66201" "A/B" "D66202" "A/B" "D66203" "A/A" "D66204" "A/B" "D66205" "A/B" "D66206" "A/B" "D66207" "A/A" "D66208" "A/B" "D66209" "A/B" "D66210" "A/B" "D66211" "A/B" "D66212" "B/B" "D66213" "A/A" "D66214" "A/A" "D66215" "A/A" "D66216" "A/A" "D66217" "A/A" "D66218" "A/A" "D66219" "A/A" "D66220" "A/A" "D66221" "B/B" "D66222" "A/B" "D66223" "A/B" "D66224" "A/B" "D66225" "A/B" "D66226" "A/B" "D66227" "A/A" "D66228" "A/B" "D66229" "A/B" "D66230" "A/B" "D66231" "A/A" "D66232" "A/A" "D66233" "A/A" "D66234" "A/B" "D66235" "A/A" "D66236" "A/B" "D66237" "A/A" "D66238" "A/B" "D66239" "A/B" "D66240" "A/B" "D66241" "A/A" "D66242" "A/B" "D66243" "A/A" "D66244" "A/B" "D66245" "A/B" "D66246" "A/B" "D66247" "A/A" "D66248" "B/B" "D66249" "A/A" "D66250" "A/A" "D66251" "A/A" "D66252" "B/B" "D66253" "A/A" "D66254" "A/B" "D66255" "B/B" "D66256" "A/B" "D66257" "A/B" "D66258" "A/A" "D66259" "A/A" "D66260" "A/A" "D66261" "A/B" "D66262" "A/B" "D66263" "A/A" "D66264" "A/A" "D66265" "A/A" "D66266" "A/A" "D66267" "A/A" "D66268" "A/B" "D66269" "A/A" "D66270" "A/B" "D66271" "A/A" "D66272" "A/B" "D66273" "A/A" "D66274" "A/A" "D66275" "B/B" "D66276" "A/A" "D66277" "A/A" "D66278" "A/A" "D66279" "A/B" "D66280" "A/B" "D66281" "A/B" "D66282" "A/B" "D66283" "A/A" "D66284" "A/B" "D66285" "B/B" "D66286" "A/B" "D66287" "A/A" "D66288" "A/B" "D66289" "A/A" "D66290" "A/A" "D66291" "A/B" "D66292" "A/A" "D66293" "A/A" "D66294" "A/A" "D66295" "A/A" "D66296" "A/B" "D66297" "A/B" "D66298" "A/B" "D66299" "A/A" "D66300" "B/B" "D66301" "A/B" "D66302" "A/A" "D66303" "A/A" "D66304" "A/B" Thanks, Jeff > On Aug 26, 2015, at 3:41 PM, Valerie Obenchain [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org <https: support.bioconductor.org=""/> > User Valerie Obenchain <https: support.bioconductor.org="" u="" 4275=""/> wrote Answer: predictCoding with multiple samples <https: support.bioconductor.org="" p="" 71448="" #71516="">: > > > Hi, > > You'll need to look at the GT data to determine which samples have the variant. The genotypesToSnpMatrix() function converts the genotypes to a SnpMatrix object where rows are samples and columns are snps. See ?genotypeToSnpMatrix for details and information about the warnings. > > > fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") > > vcf <- readVcf(fl, "hg19") > > mat <- genotypeToSnpMatrix(vcf) > Warning messages: > 1: In .local(x, ...) : variants with >1 ALT allele are set to NA > 2: In .local(x, ...) : non-single nucleotide variations are set to NA > > > as(mat$genotype, "character") > rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1 > NA00001 "A/A" "A/A" "NA" "NA" "NA" > NA00002 "A/B" "A/B" "NA" "NA" "NA" > NA00003 "B/B" "A/A" "NA" "NA" "NA" > > as(mat$genotype, "matrix") > rs6054257 20:17330_T/A rs6040355 20:1230237_T/. microsat1 > NA00001 01 01 00 00 00 > NA00002 02 02 00 00 00 > NA00003 03 01 00 00 00 > > You probably know that predictCoding() returns results for coding variants only and if the variant falls in multiple transcripts there will be a row for each variant-transcript match. The QUERYID column in the output maps back to the row of the original query. Using this and the data from the SnpMatrix you can id which samples had a particular variant output by predictCoding(). > > Valerie > > > Post tags: predict coding, variantannotation > > You may reply via email or visit A: predictCoding with multiple samples >
ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by luong.jeff0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour