Dear all,
I am investigating the proteome of human cancer samples and want to insert their genetic variations into the reference proteome fasta sequences to increase the sensitivity of my peptide/protein quantification.
Can you implement this "proteomeVariantInsertion()" in the VariantAnnotation package?
The VariantAnnotation::predictCoding()
function already translates codons at variant positions from a reference BSgenome object to assess the consequences of a variant. I would like to take all coding variants (or just non-synonymous SNVs for a start) and insert them into the reference proteome, then save the modified fasta file.
On customProDB
:
In principle the package customProDB is already doing this job. But from 11,000 genes with ~40k non-synonymous SNVs that were extracted using VariantAnnotation::predictCoding() only ~2k proteins are changed with at least one variant. There is too much loss.
The customProDB package works mostly on custom data.frames and could utilize the maintained Bioc objects on variants and sequences much more.
I would highly appreciate a "Bioconductor-native" solution for the customized proteome challenge.
Thanks, Daniel
This sounds like a feature request - Could you please open it as an issue on the github page for the package: https://github.com/Bioconductor/VariantAnnotation/issues
This is potentially doable with
BSgenome::injectSNPs()
.Thanks for the reply sheperl. I did not want to start right away with an issue, but now I posted it here: https://github.com/Bioconductor/VariantAnnotation/issues/24
Meanwhile, I will look at
BSgenome::injectSNPs()
which sounds indeed very interesting. Thanks Michael!FWIW you might also want to take a look at
Biostrings::replaceAt()
. It's lower level and more flexible thanBSgenome::injectSNPs()
(the former works on AAString/AAStringSet/DNAString/DNAStringSet objects while the later only works on a BSgenome object).Thanks Herve,
However, I think injectSNPs should be sufficient, since I start from genome coordinates in a VCF.
Best, Daniel