For the Stack exchange post, that eleborates more
I have a question and I tried a lot of options already, but not with the desired result.
So I want to simulate some insertions. Because I don't want my insertions to be a random sequence, I made a DNAseq that is named chr1 (which has 100.000bp of similar DNA seq) and chr2 (which is the seq I want my insertions in).
So I want to take a DNA seq from chr1 and put them in chr2 as an insert.
If I simulate SVs, they either copy/cut from chr2 to chr2 (which essentially makes a duplication or DEL+INS) or if you don't specify anything, it is random. Sometimes from chr1 -> chr2 and sometimes chr2 -> chr1. This is really annoying. Trying to set chrA ="chr1" and chrB="chr2" doesn't work. Also the examples pasted below do not give the desired result.
Using translocations also wont give me the effect that I want, because you cannot generate more translocations than genomes/chromosomes that you add.
However, when processing INS, I found out that it just copies a DNA seq from one part, and puts it somewhere else in the same DNA seq (Bascially a DUP then), or if you specify another an extra DNAseq, to cut from, It randomly cuts from either chr1 or chr2 and puts it in the other. This creates deletions in the DNA seq I want to investigate, which is not what I want.
Does anyone know how to get this kind of result?
width seq names  40 AAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTT chr1  40 GGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCC chr2
width seq names  32 AAAAAAAAAAAAAAAATTTTTTTTTTTTTTTT chr1  48 GGGGGGGGGGGGGGAAAATTTTGGGGGGCCCCCCCCCCCCCCCCCCCC chr2
I have tried several things already over the past hours, I dont even have all the examples anymore. Could someone help with this? I need 30 INS on random places of chr2 of varying sizes.
#some things I tried, see stackexchange post for more examples seq_random <- readDNAStringSet("/path/random.fasta", "fasta") seq_ref <- readDNAStringSet("/path/reference1MB.fasta", "fasta") genome2 = DNAStringSet(c(seq_random, seq_ref)) names(genome2) = c("chr1","chr2") length_seq = width(genome2) knownInsertion = GRanges(IRanges(0,length_seq), seqnames="chr2") knownInsertion = GRanges(IRanges(0,length_seq), seqnames="chr1", chrB="chr2", chrA="chr1") knownInsertion = GRanges(seqnames="chr1", chrB="chr2") sim = simulateSV(output='/folder/of/output/', genome=genome2, ins = 30, sizeIns=y, regionsIns=knownInsertion)