Question: Create protein sequences including variants from a VCF file
3
gravatar for daniel.magnus.bader
7 weeks ago by
daniel.magnus.bader30 wrote:

Dear all,

I am investigating the proteome of human cancer samples and want to insert their genetic variations into the reference proteome fasta sequences to increase the sensitivity of my peptide/protein quantification.

Can you implement this "proteomeVariantInsertion()" in the VariantAnnotation package?

The VariantAnnotation::predictCoding() function already translates codons at variant positions from a reference BSgenome object to assess the consequences of a variant. I would like to take all coding variants (or just non-synonymous SNVs for a start) and insert them into the reference proteome, then save the modified fasta file.

On customProDB: In principle the package customProDB is already doing this job. But from 11,000 genes with ~40k non-synonymous SNVs that were extracted using VariantAnnotation::predictCoding() only ~2k proteins are changed with at least one variant. There is too much loss. The customProDB package works mostly on custom data.frames and could utilize the maintained Bioc objects on variants and sequences much more.

I would highly appreciate a "Bioconductor-native" solution for the customized proteome challenge.

Thanks, Daniel

ADD COMMENTlink modified 6 weeks ago • written 7 weeks ago by daniel.magnus.bader30

This sounds like a feature request - Could you please open it as an issue on the github page for the package: https://github.com/Bioconductor/VariantAnnotation/issues

ADD REPLYlink written 6 weeks ago by shepherl ♦♦ 1.3k

This is potentially doable with BSgenome::injectSNPs().

ADD REPLYlink written 6 weeks ago by Michael Lawrence11k

Thanks for the reply sheperl. I did not want to start right away with an issue, but now I posted it here: https://github.com/Bioconductor/VariantAnnotation/issues/24

Meanwhile, I will look at BSgenome::injectSNPs() which sounds indeed very interesting. Thanks Michael!

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by daniel.magnus.bader30

FWIW you might also want to take a look at Biostrings::replaceAt(). It's lower level and more flexible than BSgenome::injectSNPs() (the former works on AAString/AAStringSet/DNAString/DNAStringSet objects while the later only works on a BSgenome object).

ADD REPLYlink written 6 weeks ago by Hervé Pagès ♦♦ 13k

Thanks Herve,

However, I think injectSNPs should be sufficient, since I start from genome coordinates in a VCF.

Best, Daniel

ADD REPLYlink written 6 weeks ago by daniel.magnus.bader30
Answer: Create protein sequences including variants from a VCF file
0
gravatar for daniel.magnus.bader
6 weeks ago by
daniel.magnus.bader30 wrote:

Thanks to the suggestions from Michael Lawrence and Herve Pages, I guess it should work as follows:

  1. Identify all coding SNVs, e.g. via VariantAnnotation::predictCoding()
  2. Injecting coding SNVs into the genome, e.g. via BSgenome::injectSNPs()
  3. Concatenate the exons per protein isoform of a gene harboring a coding SNV to gain all relevant coding sequences (already modified)
  4. Translate these into AAString, e.g. via Biostrings::translate()

What is your opinion?

ADD COMMENTlink written 6 weeks ago by daniel.magnus.bader30

Should work. Use GenomicFeatures::extractTranscriptSeqs() for #3.

ADD REPLYlink written 6 weeks ago by Michael Lawrence11k

Thanks. Looks like the perfect fit!

ADD REPLYlink written 6 weeks ago by daniel.magnus.bader30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 261 users visited in the last hour