Search
Question: edgeR: Error in taglist[[i]] : subscript out of bounds
0
gravatar for Nick N
2.4 years ago by
Nick N60
United Kingdom
Nick N60 wrote:

I have rna-seq data (18 samples) from which I produced the raw counts using htseq-count. Now I want to analyze the data using edgeR. I've done this dozens of times and had no issues. Not this time. When I execute

samples <- read.csv(file="metadata_composite.csv",header=TRUE,sep=",")
counts <- readDGE(samples$CountFiles)$counts

I get:

Error in taglist[[i]] : subscript out of bounds

I've analyzed the same data using Deseq2 and had no issues whatsoever. So it is something related to edgeR. Here is my data - it consists of 18 directories each containing a count file (accepted_hits.count) and a directory called edgeR which contains "metadata_composite.csv" which is the metadata file that I load in in the code snippet above. 

Can you tell what is the problem?

 

 

ADD COMMENTlink modified 2.4 years ago by Gordon Smyth32k • written 2.4 years ago by Nick N60
2
gravatar for Aaron Lun
2.4 years ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

You've got a couple of duplicated entries in samples$CountFiles. Each entry of taglist is named according to the count file, so if you have the same count file for different samples, information from the later samples will overwrite that of earlier samples instead of forming a new entry in taglist. The offending samples seem to be WT-H2 and WT-H3 in your CSV file, both of which refer to the WT-H1 directory to extract the count file. Redirect these to get the count files from their appropriate directories, and you should be fine.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Aaron Lun17k

The same error I got as Nick N.

> s=read.csv("Samples.csv")

> counts=readDGE(s$cf)$counts

Error in taglist[[i]] : subscript out of bounds

I can't able  to open his link that he provided  and I couldn't able to find error in my csv file. can u pls point out what are changes to make for rectify the error.

LibraryName LibraryLayout fastq1 fastq2 condition shortname
ATCC26_30 PAIRED AT30.left.fastq AT30.right.fastq Saphrophyre AT_30
ATCC26_37 PAIRED AT37.left.fastq AT37.right.fastq Saphrophyre AT_37
CI1123_30 PAIRED CI30.left.fastq CI30.right.fastq Surgery CI_30
CI1123_37 PAIRED CI37.left.fastq CI37.right.fastq Surgery CI_37
CI1698_30 PAIRED YS30.left.fastq YS30.right.fastq Healed YS_30
CI1698_37 PAIRED YS37.left.fastq YS37.right.fastq Healed YS_37

 

The above meta table information which I am using to run the commands.

ADD REPLYlink written 19 months ago by muthubioinfotech0

Well, for starters, there's no cf column in your metadata table.

ADD REPLYlink written 19 months ago by Aaron Lun17k

I have used paste function for adding cf column to Samples.csv file. In terminal window I could able to find the column  cf with its values but I could not able to find the cf column in csv file when I opened directly.  I manually entered the cf column in csv fiile, but still different error occurred like

counts=readDGE(s$cf)$counts
Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
  no lines available in input.

I have one doubt , without variable name we can't read the csv file in R, am I right?... Why I am asking means the protocol which I am using to follow ,they directly called csv file into R and manipulating the columns.But I could not able to do that.

Any suggestion to rectify the error

Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
  no lines available in input.

 

 

ADD REPLYlink written 19 months ago by muthubioinfotech0

It's not clear to me what you're actually doing. All I can say is to check that:

  • there is a column named cf in your data frame s.
  • s$cf is a character vector that contains paths to count files.
  • the count files aren't empty and are properly formatted.

I would also suggest that you find someone local to help you, since it seems like you're new to this. This support site is not meant to be a place to learn R or Bioconductor.

ADD REPLYlink modified 19 months ago • written 19 months ago by Aaron Lun17k
1
gravatar for Gordon Smyth
2.4 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

To add to Aaron's post, there are also a couple of tunes that you need to use to accommodate htseq output. First, htseq output has no headers, so

dge <- readDGE(samples$CountFiles, header=FALSE)

Second, the last 5 lines of htseq output are not real genes, so you need to remove them:

realgene <- grep("^ENS",rownames(dge))
dge <- dge[realgene,]
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Gordon Smyth32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 420 users visited in the last hour