Question

gcContent error with shortread

0

Entering edit mode

danova_fr ▴ 20

@danova_fr-7502

Last seen 9.1 years ago

Andorra

I´m processing a fastq file of reads that have already paired. Trying to get the GC content i get this error ???

> fq <- readFastq(file)
> fq
class: ShortReadQ
length: 50545 reads; width: 79..503 cycles
> gc <- gcContent(fq)
Warning messages:
1: In if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
  the condition has length > 1 and only the first element will be used
2: In if (is.na(nbins)) stop("invalid value of 'nbins'") :
  the condition has length > 1 and only the first element will be used

Looking at the fastqfile

> qual <- as(quality(fq), "matrix")
> qual
    [,501] [,502] [,503]
    [1,]     NA     NA     NA
    [2,]     NA     NA     NA

I´ve noticed that the qual object contains NA values at the end of the sequence. Any idea ??

I have merged pairs using bbmerge first but don´t see any problem in the fastq file. I have tried other programs and fastq file look ok.

shortread • 1.6k views

ADD COMMENT • link 9.0 years ago danova_fr ▴ 20

0

Entering edit mode

gcContent is not a function in the ShortRead package. Where did you get it from? Type gcContent at the command line (no parentheses()) to see its source. Add this information to your question.

The NA's are present because you have asked to see the quality scores as a matrix. A matrix has to be rectangular (each row with same number of reads) so the quality scores of short reads have been 'padded' with NA values.

ADD REPLY • link 9.0 years ago Martin Morgan 25k

score 0 · Answer 1 · 2015-05-07

0

Entering edit mode

danova_fr ▴ 20

@danova_fr-7502

Last seen 9.1 years ago

Andorra

Sorry,

Got it from the manual.

gcContent <- function(x)
{
  abc <- alphabetFrequency(sread(x), baseOnly=TRUE)
  gcPerRead <- rowSums(abc[,c("G", "C")])
  wd <- unique(width(sread(x)))
  tabulate(1 + gcPerRead, 1 + wd)
}

tabulate(1 + gcPerRead, 1 + wd) seems to returning the previous error

ADD COMMENT • link 9.0 years ago danova_fr ▴ 20

0

Entering edit mode

'The manual' is not really descriptive enough! I think you can change

wd <- unique(width(sread(x)))

to

wd <- max(width(sread(x)), 0L)

Since your reads are different widths, it doesn't necessarily make sense to count the number of G and C nucleotides; maybe it is more helpful to calculate GC frequency of each read

gc <- letterFrequency(sread(x), "GC", as.prob=TRUE)
hist(gc)

ADD REPLY • link 9.0 years ago Martin Morgan 25k

score 0 · Answer 2 · 2015-05-07

0

Entering edit mode

danova_fr ▴ 20

@danova_fr-7502

Last seen 9.1 years ago

Andorra

great,

thanks so much,

david

ADD COMMENT • link 9.0 years ago danova_fr ▴ 20