Subset center regions of a set of different length sequences
1
0
Entering edit mode
ferbecneu • 0
@ferbecneu-16188
Last seen 3.5 years ago

Hey! I have a set of sequences that I got from differential ChIP-seq analysis. I would like to use MEME-ChIP for motif discovery on this set of sequences, as input it requires sequences of the same length, however, for this trimming I would like to take the 400bp on the center of each of the sequences from my list of different length sequences (ranging from 450 to 1400 approx). Could anybody help me figure out how can I do this in Biostrings or some other R package? Thanks a lot!

Biostrings • 777 views
ADD COMMENT
1
Entering edit mode
lahuuki ▴ 10
@lahuuki-24238
Last seen 23 months ago
Baltimore MD

Biostrings allows you to subset stings using the index of the characters (or bp). Here is a function that finds center n characters of a string. I've added a "wobble" for cases when the exact center doesn't exist (selecting an odd amount of characters out out an even length string and vice-versa). Hope this helps :) -Louise

    library(Biostrings)

get_center <- function(bstr, center_n){
  outer_n = length(bstr) - center_n
  if(outer_n < 0) stop( "Center is longer than string")

  half_outer_n <- outer_n %/% 2
  wobble = 0
  if (outer_n %% 2 == 1){  
    # if remainder cannot be split evenly 'wobble' start index
    wobble = sample(c(0,1), 1)
  }
  start_i <- half_outer_n + 1 + wobble
  end_i <- half_outer_n + center_n + wobble
  return(bstr[start_i:end_i])
}

long_bs <- BString("aaaaaaaaaatttttttttcttttttttttgggggggggg")
short_bs_even <- BString("attg")
short_bs_odd <- BString("atctg")

get_center(long_bs, 20)
# 20-letter BString object
# seq: tttttttttctttttttttt

get_center(short_bs_even, 2)
# 2-letter BString object
# seq: tt
get_center(short_bs_odd, 2)
# 2-letter BString object
# seq: ct
# OR
# 2-letter BString object
# seq: tc
get_center(ss, 5)
# Error in get_center(ss, 5) : Center is longer than string
ADD COMMENT

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6