GC content track on GVIZ
1
0
Entering edit mode
@15877814
Last seen 7 days ago
Italy

Hi everyone, I'm trying to obtain a GC content track for the M. tuberculosis genome to plot on IGV. Below there is the code I have used to do this. However, when calculating the GC content, R blocks without giving an output. How can I solve the problem?

library(Biostrings)
library(Gviz)
# Calculate genome length
genome_length <- sum(width(genome_seq))

# GC content for a certain window size
window_size <- 10000
gc_content <- numeric(genome_length)
for (i in 1:(genome_length - window_size + 1)) {
window_seq <- subseq(genome_seq, start = i, end = i + window_size - 1)
gc_content[i] <- sum(unlist(strsplit(as.character(window_seq), "")) %in% c("G", "C")) / window_size * 100
}

Thank everyone for help

Gviz • 123 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

This has nothing to do with Gviz. You shouldn't pick an unrelated package for questions because you are A.) directly asking that maintainer to answer an unrelated question and B.) not asking the relevant package's maintainer for help. But anyway.

> library(Biostrings)
## get windows
> starts <- seq(1, width(z), 1000)
> ends <- starts + 1000
## last one is too long, so fix
> ends[length(ends)] <- width(z)
## get the alphabet frequency for each 1000 bp window and rbind into a matrix
> zz <- do.call(rbind, lapply(seq(along = starts), function(x) alphabetFrequency(subseq(z, starts[x], ends[x]))))
## obvs
> zzz <- rowSums(zz[,2:3])/rowSums(zz)
[1] 0.5884116 0.5764236 0.6073926 0.5844156 0.6473526 0.6213786