Search
Question: gcContent error with shortread
0
gravatar for danova_fr
2.6 years ago by
danova_fr20
Andorra
danova_fr20 wrote:

I´m processing a fastq file of reads that have already paired. Trying to get the GC content i get this error ???

> fq <- readFastq(file)
> fq
class: ShortReadQ
length: 50545 reads; width: 79..503 cycles
> gc <- gcContent(fq)
Warning messages:
1: In if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
  the condition has length > 1 and only the first element will be used
2: In if (is.na(nbins)) stop("invalid value of 'nbins'") :
  the condition has length > 1 and only the first element will be used

 

Looking at the fastqfile

> qual <- as(quality(fq), "matrix")
> qual
    [,501] [,502] [,503]
    [1,]     NA     NA     NA
    [2,]     NA     NA     NA
   

I´ve noticed that the qual object contains NA values at the end of the sequence. Any idea ??

I have merged pairs using bbmerge first but don´t see any problem in the fastq file. I have tried other programs and fastq file look ok.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by danova_fr20

gcContent is not a function in the ShortRead package. Where did you get it from? Type gcContent at the command line (no parentheses()) to see its source. Add this information to your question.

The NA's are present because you have asked to see the quality scores as a matrix. A matrix has to be rectangular (each row with same number of reads) so the quality scores of short reads have been 'padded' with NA values.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Martin Morgan ♦♦ 20k
0
gravatar for danova_fr
2.6 years ago by
danova_fr20
Andorra
danova_fr20 wrote:

Sorry,

Got it from the manual.

gcContent <- function(x)
{
  abc <- alphabetFrequency(sread(x), baseOnly=TRUE)
  gcPerRead <- rowSums(abc[,c("G", "C")])
  wd <- unique(width(sread(x)))
  tabulate(1 + gcPerRead, 1 + wd)
}

 

  tabulate(1 + gcPerRead, 1 + wd) seems to returning the previous error

 

 

 

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by danova_fr20

'The manual' is not really descriptive enough! I think you can change

wd <- unique(width(sread(x)))

to

wd <- max(width(sread(x)), 0L)

Since your reads are different widths, it doesn't necessarily make sense to count the number of G and C nucleotides; maybe it is more helpful to calculate GC frequency of each read

gc <- letterFrequency(sread(x), "GC", as.prob=TRUE)
hist(gc)

 

ADD REPLYlink written 2.6 years ago by Martin Morgan ♦♦ 20k
0
gravatar for danova_fr
2.6 years ago by
danova_fr20
Andorra
danova_fr20 wrote:

great,

thanks so much,

 

david

 

ADD COMMENTlink written 2.6 years ago by danova_fr20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 259 users visited in the last hour