Entering edit mode
hoonhuiyi
•
0
@hoonhuiyi-23342
Last seen 4.5 years ago
Hi Im using R in DECIPHER to design group-specific primers to target a specific group out of other non target groups. Before designing the primers, according to the decipher tutorial, we need to perform the command to define groups in our sequence database. How is DECIPHER able to correctly define the groups? Should a phylogeny classification to correctly define the groups be carried out first?
The vignette for
DesignPrimers()
, "Design Group-Specific Primers", has a section on "Defining Groups". As it says, it is up to users how they wish to define non-target groups. For example, this can be done automatically withIdClusters()
or assigned manually based on a taxonomy.I would like to design primers to target the accumulibacter group out of other non target bacteria groups. on the section of defining groups, How do I create one identifier for all sequences belonging to the accumulibacter group? one identifer is given to each sequence instead.
My fasta files have these sequence descriptions:
The sequence descriptions are stored in the description field within the sequence database. So it is possible to query that field, extract the genus name, and then use that as the identifier:
Do I replace the text in "select description from Seqs" with Accumulibacter? It has a syntax error when I try to do so:
No, the code should work directly as written. Have you tried it?
Do I insert Accumulibacter in the indentifier? There is an error below:
AccumulibacterError in system(paste("hybrid-min -n DNA -t", temp, "-T", temp, "-N", : 'CreateProcess' failed to run 'C:\PROGRA~2\OLIGOA~1\bin\HYBRID~1.EXE -n DNA -t 64 -T 64 -N 0.224783720074173 -E -q TCTGTGAGCAGGAAAGC GCTTTCCTGCTCACAGA CTGTGAGCAGGAAAGCA TGCTTTCCTGCTCACAG TGTGAGCAGGAAAGCAG CTGCTTTCCTGCTCACA GTGAGCAGGAAAGCAGG CCTGCTTTCCTGCTCAC TGAGCAGGAAAGCAGGG CCCTGCTTTCCTGCTCA GAGCAGGAAAGCAGGGG CCCCTGCTTTCCTGCTC AGCAGGAAAGCAGGGGA TCCCCTGCTTTCCTGCT GCAGGAAAGCAGGGGAT ATCCCCTGCTTTCCTGC CAGGAAAGCAGGGGATC GATCCCCTGCTTTCCTG AGGAAAGCAGGGGATCG CGATCCCCTGCTTTCCT GGAAAGCAGGGGATCGC GCGATC
Do I insert Accumulibacter in the indentifier? There is an error below:
primers <- DesignPrimers(tiles, identifier="Accumulibacter", minCoverage=1, minGroupCoverage=1)
AccumulibacterError in system(paste("hybrid-min -n DNA -t", temp, "-T", temp, "-N", : 'CreateProcess' failed to run 'C:\PROGRA~2\OLIGOA~1\bin\HYBRID~1.EXE -n DNA -t 64 -T 64 -N 0.224783720074173 -E -q TCTGTGAGCAGGAAAGC GCTTTCCTGCTCACAGA CTGTGAGCAGGAAAGCA TGCTTTCCTGCTCACAG TGTGAGCAGGAAAGCAG CTGCTTTCCTGCTCACA GTGAGCAGGAAAGCAGG CCTGCTTTCCTGCTCAC TGAGCAGGAAAGCAGGG CCCTGCTTTCCTGCTCA GAGCAGGAAAGCAGGGG CCCCTGCTTTCCTGCTC AGCAGGAAAGCAGGGGA TCCCCTGCTTTCCTGCT GCAGGAAAGCAGGGGAT ATCCCCTGCTTTCCTGC CAGGAAAGCAGGGGATC GATCCCCTGCTTTCCTG AGGAAAGCAGGGGATCG CGATCCCCTGCTTTCCT GGAAAGCAGGGGATCGC GCGATC
This looks like an issue with accessing OligoArrayAux from R. What happens when you run?:
Try restarting R after installing OligoArrayAux. Also, try specifying
batchSize=100
inDesignPrimers()
.Thank you I have managed to design primers for my sample file of 100 sequences.
However, for the alignment of the SILVA SSU NR Ref database with 500,000 sequences, my Computer 8GB RAM is insufficient to align the RNA sequences:
You can download the aligned version of the SILVA database.
have imported the massively gigantic 25GB SILVA database and defined the groups!
at the creating tiles, The following Error message is displayed:
and what is the usual recommended minCoverage &mingroupcoverage if targeting a particular genus e.g.Escherichia or a a particular kingdom like e.g. fungi, out of the 500 000 sequences of eukayota archaea and bacteria in the database.
How were your groups defined? Family level groups should not be too large to process.
I have never observed that error for such a small amount of memory. Could you provide the output of
.traceback()
?The defaults are recommended unless you have a specific reason to change them.
Defined the groups like this:
Then created tiles:
What is the distribution of group sizes? That is, how many sequences are there per
identifier
? For example:sort(table(x))
From the sort(table(x)), is about 4 sequences(e.g.centropages) to 14000(e.g.bacillus)to 28 000(e.g. uncultured) sequences per identifier.And the identifier is quite clearly seen from the table as genus to species names.
The Silva sequences that I have look like this:
Also, I would like to define group such that I can design a primer for not only targeting genus levels but also to target fungi!
It is difficult to investigate the issue without a reproducible example. Could you please send a reproducible example? Thanks.
I guess I shouldn't use such a huge database in the first place. have got some primers using smaller set of sequences
Hi Erik,
I have the following issues with the design probe function:
probes <- DesignProbes(tiles, identifier="Streptococcus",start=120, end=1450, batchSize=100,numProbeSets=5)
StreptococcusWarning message: In DesignProbes(tiles, identifier = "Streptococcus", start = 120, : No target sites met the specified constraints: Streptococcus
probes <- DesignProbes(tiles, identifier="Pseudomonas",start=120, end=1450, batchSize=100,numProbeSets=5)
PseudomonasWarning message: In DesignProbes(tiles, identifier = "Streptococcus", start = 120, : All target sites have too many permutations: Pseudomonas
what does it mean and how can I solve it?
See my reply in your other posting. Please post the same comment only once to avoid confusion.