Question

GC content track on GVIZ

0

Entering edit mode

ilaria.maurizio • 0

@15877814

Last seen 13 months ago

Italy

Hi everyone, I'm trying to obtain a GC content track for the M. tuberculosis genome to plot on IGV. Below there is the code I have used to do this. However, when calculating the GC content, R blocks without giving an output. How can I solve the problem?

library(Biostrings)
library(Gviz)
#load FASTA file
genome_seq<-readDNAStringSet("M.tuberculosis_NC000962.3.fasta")
# Calculate genome length 
genome_length <- sum(width(genome_seq))

# GC content for a certain window size 
window_size <- 10000  
gc_content <- numeric(genome_length)
for (i in 1:(genome_length - window_size + 1)) {
  window_seq <- subseq(genome_seq, start = i, end = i + window_size - 1)
  gc_content[i] <- sum(unlist(strsplit(as.character(window_seq), "")) %in% c("G", "C")) / window_size * 100
}

Thank everyone for help

Gviz • 503 views

ADD COMMENT • link updated 13 months ago by James W. MacDonald 68k • written 13 months ago by ilaria.maurizio • 0

score 0 · Answer 1 · 2024-02-09

This has nothing to do with Gviz. You shouldn't pick an unrelated package for questions because you are A.) directly asking that maintainer to answer an unrelated question and B.) not asking the relevant package's maintainer for help. But anyway.

> library(Biostrings)
## I downloaded the relevant FASTQ from NCBI
> z <- readDNAStringSet("c:/Users/jmacdon/Downloads/sequence.fasta")
## get windows
> starts <- seq(1, width(z), 1000)
> ends <- starts + 1000
## last one is too long, so fix
> ends[length(ends)] <- width(z)
## get the alphabet frequency for each 1000 bp window and rbind into a matrix
> zz <- do.call(rbind, lapply(seq(along = starts), function(x) alphabetFrequency(subseq(z, starts[x], ends[x]))))
## obvs
> zzz <- rowSums(zz[,2:3])/rowSums(zz)
> head(zzz)
[1] 0.5884116 0.5764236 0.6073926 0.5844156 0.6473526 0.6213786