Question

edgeR: Error in taglist[[i]] : subscript out of bounds

0

Entering edit mode

Nick N ▴ 60

@nick-n-6370

Last seen 8.6 years ago

United Kingdom

I have rna-seq data (18 samples) from which I produced the raw counts using htseq-count. Now I want to analyze the data using edgeR. I've done this dozens of times and had no issues. Not this time. When I execute

samples <- read.csv(file="metadata_composite.csv",header=TRUE,sep=",")
counts <- readDGE(samples$CountFiles)$counts

I get:

Error in taglist[[i]] : subscript out of bounds

I've analyzed the same data using Deseq2 and had no issues whatsoever. So it is something related to edgeR. Here is my data - it consists of 18 directories each containing a count file (accepted_hits.count) and a directory called edgeR which contains "metadata_composite.csv" which is the metadata file that I load in in the code snippet above.

Can you tell what is the problem?

edger rna-seq • 3.8k views

ADD COMMENT • link updated 8.8 years ago by Gordon Smyth 50k • written 8.8 years ago by Nick N ▴ 60

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

To add to Aaron's post, there are also a couple of tunes that you need to use to accommodate htseq output. First, htseq output has no headers, so

dge <- readDGE(samples$CountFiles, header=FALSE)

Second, the last 5 lines of htseq output are not real genes, so you need to remove them:

realgene <- grep("^ENS",rownames(dge))
dge <- dge[realgene,]

ADD COMMENT • link 8.8 years ago Gordon Smyth 50k

score 2 · Accepted Answer · 2015-07-02

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 15 hours ago

The city by the bay

You've got a couple of duplicated entries in samples$CountFiles. Each entry of taglist is named according to the count file, so if you have the same count file for different samples, information from the later samples will overwrite that of earlier samples instead of forming a new entry in taglist. The offending samples seem to be WT-H2 and WT-H3 in your CSV file, both of which refer to the WT-H1 directory to extract the count file. Redirect these to get the count files from their appropriate directories, and you should be fine.

ADD COMMENT • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

The same error I got as Nick N.

> s=read.csv("Samples.csv")

> counts=readDGE(s$cf)$counts

Error in taglist[[i]] : subscript out of bounds

I can't able to open his link that he provided and I couldn't able to find error in my csv file. can u pls point out what are changes to make for rectify the error.

LibraryName	LibraryLayout	fastq1	fastq2	condition	shortname
ATCC26_30	PAIRED	AT30.left.fastq	AT30.right.fastq	Saphrophyre	AT_30
ATCC26_37	PAIRED	AT37.left.fastq	AT37.right.fastq	Saphrophyre	AT_37
CI1123_30	PAIRED	CI30.left.fastq	CI30.right.fastq	Surgery	CI_30
CI1123_37	PAIRED	CI37.left.fastq	CI37.right.fastq	Surgery	CI_37
CI1698_30	PAIRED	YS30.left.fastq	YS30.right.fastq	Healed	YS_30
CI1698_37	PAIRED	YS37.left.fastq	YS37.right.fastq	Healed	YS_37

The above meta table information which I am using to run the commands.

ADD REPLY • link 8.0 years ago muthubioinfotech • 0

0

Entering edit mode

Well, for starters, there's no cf column in your metadata table.

ADD REPLY • link 8.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

I have used paste function for adding cf column to Samples.csv file. In terminal window I could able to find the column cf with its values but I could not able to find the cf column in csv file when I opened directly. I manually entered the cf column in csv fiile, but still different error occurred like

counts=readDGE(s$cf)$counts
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input.

I have one doubt , without variable name we can't read the csv file in R, am I right?... Why I am asking means the protocol which I am using to follow ,they directly called csv file into R and manipulating the columns.But I could not able to do that.

Any suggestion to rectify the error

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input.

ADD REPLY • link 8.0 years ago muthubioinfotech • 0

0

Entering edit mode

It's not clear to me what you're actually doing. All I can say is to check that:

there is a column named cf in your data frame s.
s$cf is a character vector that contains paths to count files.
the count files aren't empty and are properly formatted.

I would also suggest that you find someone local to help you, since it seems like you're new to this. This support site is not meant to be a place to learn R or Bioconductor.

ADD REPLY • link 8.0 years ago Aaron Lun ★ 28k