Search
Question: edgeR: Error in taglist[[i]] : subscript out of bounds
0
3.4 years ago by
Nick N60
United Kingdom
Nick N60 wrote:

I have rna-seq data (18 samples) from which I produced the raw counts using htseq-count. Now I want to analyze the data using edgeR. I've done this dozens of times and had no issues. Not this time. When I execute

samples <- read.csv(file="metadata_composite.csv",header=TRUE,sep=",")
counts <- readDGE(samples$CountFiles)$counts


I get:

Error in taglist[[i]] : subscript out of bounds

I've analyzed the same data using Deseq2 and had no issues whatsoever. So it is something related to edgeR. Here is my data - it consists of 18 directories each containing a count file (accepted_hits.count) and a directory called edgeR which contains "metadata_composite.csv" which is the metadata file that I load in in the code snippet above.

Can you tell what is the problem?

modified 3.4 years ago by Gordon Smyth35k • written 3.4 years ago by Nick N60
2
3.4 years ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

You've got a couple of duplicated entries in samples$CountFiles. Each entry of taglist is named according to the count file, so if you have the same count file for different samples, information from the later samples will overwrite that of earlier samples instead of forming a new entry in taglist. The offending samples seem to be WT-H2 and WT-H3 in your CSV file, both of which refer to the WT-H1 directory to extract the count file. Redirect these to get the count files from their appropriate directories, and you should be fine. ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Aaron Lun21k The same error I got as Nick N. > s=read.csv("Samples.csv") > counts=readDGE(s$cf)$counts Error in taglist[[i]] : subscript out of bounds I can't able to open his link that he provided and I couldn't able to find error in my csv file. can u pls point out what are changes to make for rectify the error.  LibraryName LibraryLayout fastq1 fastq2 condition shortname ATCC26_30 PAIRED AT30.left.fastq AT30.right.fastq Saphrophyre AT_30 ATCC26_37 PAIRED AT37.left.fastq AT37.right.fastq Saphrophyre AT_37 CI1123_30 PAIRED CI30.left.fastq CI30.right.fastq Surgery CI_30 CI1123_37 PAIRED CI37.left.fastq CI37.right.fastq Surgery CI_37 CI1698_30 PAIRED YS30.left.fastq YS30.right.fastq Healed YS_30 CI1698_37 PAIRED YS37.left.fastq YS37.right.fastq Healed YS_37 The above meta table information which I am using to run the commands. ADD REPLYlink written 2.5 years ago by muthubioinfotech0 Well, for starters, there's no cf column in your metadata table. ADD REPLYlink written 2.5 years ago by Aaron Lun21k I have used paste function for adding cf column to Samples.csv file. In terminal window I could able to find the column cf with its values but I could not able to find the cf column in csv file when I opened directly. I manually entered the cf column in csv fiile, but still different error occurred like counts=readDGE(s$cf)$counts Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input. I have one doubt , without variable name we can't read the csv file in R, am I right?... Why I am asking means the protocol which I am using to follow ,they directly called csv file into R and manipulating the columns.But I could not able to do that. Any suggestion to rectify the error Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input. ADD REPLYlink written 2.5 years ago by muthubioinfotech0 It's not clear to me what you're actually doing. All I can say is to check that: • there is a column named cf in your data frame s. • s$cf is a character vector that contains paths to count files.
• the count files aren't empty and are properly formatted.

I would also suggest that you find someone local to help you, since it seems like you're new to this. This support site is not meant to be a place to learn R or Bioconductor.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun21k
1
3.4 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

To add to Aaron's post, there are also a couple of tunes that you need to use to accommodate htseq output. First, htseq output has no headers, so

dge <- readDGE(samples\$CountFiles, header=FALSE)

Second, the last 5 lines of htseq output are not real genes, so you need to remove them:

realgene <- grep("^ENS",rownames(dge))
dge <- dge[realgene,]
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Gordon Smyth35k