2
0
Entering edit mode
@humberto_munoz-10903
Last seen 6.1 years ago

I plan to use the readDGE function with two CSV files containing gene counts from different samples. One has 2814 genes and the other 2809 genes. The files are on my Desktop and this is the error that I get:

> files <- dir(pattern="*\\.csv$") > RG <- readDGE(files) Error in [.data.frame(d[[i]], , columns[2]) : undefined columns selected How I can fix the error? readDGE edgeR • 2.0k views ADD COMMENT 0 Entering edit mode How many columns does each of the two files have? What are the column headings? ADD REPLY 0 Entering edit mode Each file has two columns, the first are the Gene IDs and the second Gene Read Counts. Is the function readDGE creating a DGEList that includes all genes that have at least one count in one of the samples? ADD REPLY 0 Entering edit mode Aaron I followed your comment and I got these results. How I can see all rows in data 2, or to compute its size. Also I want to do the TMM normalization, but I got the error message below. > readDGE(data2, sep=",") An object of class "DGEList"$samples
files group lib.size norm.factors
Dark Aerobic     Dark Aerobic.csv     1   481909            1
Dark Anaerobic Dark Anaerobic.csv     1  1033135            1

\$counts
Samples
Tags        Dark Aerobic Dark Anaerobic
641610012           17             28
641610013           55             36
641610014          331           1551
641610015         1005           2292
641610016           96            136
2816 more rows ...

> y<-calcNormFactors(data2, method=c("TMM","RLE","upperquartile","none"),
+                    refColumn=NULL, logratioTrim=.3, sumTrim=0.05, doWeighting=TRUE,
+                    Acutoff=-1e10, p=0.75)
Error in colSums(x) : 'x' must be numeric

0
Entering edit mode

2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 16 hours ago
The city by the bay

readDGE expects that each file is tab-separated and contains at least two columns (one of gene names/IDs and another of gene counts). It seems that your files do not follow this format, i.e., fewer columns than expected. This is probably because a different separator is involved - for CSV files, you should set sep="," in the readDGE call, as is mentioned in the documentation for the function. Also see the columns argument in ?readDGE if there are more than two columns and the first two do not correspond to the IDs and counts.

0
Entering edit mode
@gordon-smyth
Last seen 13 minutes ago
WEHI, Melbourne, Australia

There a few problems here:

First, as already noted, you need to specify sep="," because you have a comma-separated file.

Second, there is a problem with your files. Somewhere in one of your data files you have a character entered where you should have a number. Have a look especially at the last row of your files, as that is often the culprit. Check that your files don't contain any unnecessary spaces, because a space will be read as a character.

Third, you only have two samples in total, meaning the sample size is n=1 in each group. In other words you have no replication. So there isn't much analysis that edgeR will be able to do for you, because edgeR is designed to work with biological replicates.

0
Entering edit mode

Actually, I have sample studies of an experiment with 11 different conditions and not biological replicates. My intention is to apply TMM normalization considering the first sample as the reference (Dark Aerobic). First, I'm trying with the first two samples (Dark Aerobic and Dark Anaerobic). According to your last commend, this TMM normalization is not applicable with these data sets.

0
Entering edit mode

You are mis-interpreting my answer. Your difficulties and my answer having nothing to do with the TMM method.