gcContent error with shortread
2
0
Entering edit mode
danova_fr ▴ 20
@danova_fr-7502
Last seen 9.1 years ago
Andorra

I´m processing a fastq file of reads that have already paired. Trying to get the GC content i get this error ???

> fq <- readFastq(file)
> fq
class: ShortReadQ
length: 50545 reads; width: 79..503 cycles
> gc <- gcContent(fq)
Warning messages:
1: In if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
  the condition has length > 1 and only the first element will be used
2: In if (is.na(nbins)) stop("invalid value of 'nbins'") :
  the condition has length > 1 and only the first element will be used

 

Looking at the fastqfile

> qual <- as(quality(fq), "matrix")
> qual
    [,501] [,502] [,503]
    [1,]     NA     NA     NA
    [2,]     NA     NA     NA
   

I´ve noticed that the qual object contains NA values at the end of the sequence. Any idea ??

I have merged pairs using bbmerge first but don´t see any problem in the fastq file. I have tried other programs and fastq file look ok.

shortread • 1.6k views
ADD COMMENT
0
Entering edit mode

gcContent is not a function in the ShortRead package. Where did you get it from? Type gcContent at the command line (no parentheses()) to see its source. Add this information to your question.

The NA's are present because you have asked to see the quality scores as a matrix. A matrix has to be rectangular (each row with same number of reads) so the quality scores of short reads have been 'padded' with NA values.

ADD REPLY
0
Entering edit mode
danova_fr ▴ 20
@danova_fr-7502
Last seen 9.1 years ago
Andorra

Sorry,

Got it from the manual.

gcContent <- function(x)
{
  abc <- alphabetFrequency(sread(x), baseOnly=TRUE)
  gcPerRead <- rowSums(abc[,c("G", "C")])
  wd <- unique(width(sread(x)))
  tabulate(1 + gcPerRead, 1 + wd)
}

 

  tabulate(1 + gcPerRead, 1 + wd) seems to returning the previous error

 

 

 

ADD COMMENT
0
Entering edit mode

'The manual' is not really descriptive enough! I think you can change

wd <- unique(width(sread(x)))

to

wd <- max(width(sread(x)), 0L)

Since your reads are different widths, it doesn't necessarily make sense to count the number of G and C nucleotides; maybe it is more helpful to calculate GC frequency of each read

gc <- letterFrequency(sread(x), "GC", as.prob=TRUE)
hist(gc)

 

ADD REPLY
0
Entering edit mode
danova_fr ▴ 20
@danova_fr-7502
Last seen 9.1 years ago
Andorra

great,

thanks so much,

 

david

 

ADD COMMENT

Login before adding your answer.

Traffic: 872 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6