readDGE function error in R
1
0
Entering edit mode
@shraddhaadamane-11285
Last seen 9.3 years ago

Hello there,

A very good morning. 

I am using edgeR to carry out some comparative gene expression analysis from RNA seq raw counts data. I am using the readDGE function to compile counts data for library size.

The table I get shows that this has worked for most of the samples returns a value of NA for 2 samples.

                                        Files group       lib.size         norm.factors

12345_exn   12345_exn.txt     1            NA                         1
23456_Inv    23456_Inv.txt      1           164265.2                1
34567_DCIS 34567_DCIS.txt  1          172467.7                1
45678_exn   45678_exn.txt     1            NA                        1
56789_exn   56789_exn.txt     1           168533.8               1

I have checked the following:

a.All the counts are numbers and not text (and not NA), b.the files have the correct headings,  c. the samples are named correctly.

I am at a loss for what the problem could be, any suggestions/ advice would be greatly appreciated. Awaiting an answer badly.

Shradha.

edger • 996 views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 29k
@alun
Last seen 3 hours ago
The city by the bay

I would guess that the 12345_exn.txt and 45678_exn.txt files are missing some genes that are present in the other files. This causes a NA value to be generated when the counts are collated - which makes sense, because if the gene is missing, the function can't know its count. Make sure each file has the same number and names for all genes. If not, you can either remove the offending rows and recalculate the library sizes:

dge <- dge[rowSums(is.na(dge$counts))==0, , keep.lib.size=FALSE]

... or you can set the counts to zero, but only if you know that the missingness represents a count of zero:

dge$counts[is.na(dge$counts)] <- 0
dge$samples$lib.size <- colSums(dge$counts)

Whether or not that is the case depends on the process you used to generate the counts.

P.S. I notice that your library sizes aren't integer. While this is not a problem in and of itself, edgeR is intended to work with counts - either integer read counts, or something like the expected counts from RSEM. You had better not be using CPMs or RPKMs as inputs.

Edit: Actually, ignore what I said above. readDGE will automatically assign a count of zero to any gene that is not present in a file. So, the only possible reason for getting NA values would be to have them in the file itself.

ADD COMMENT
0
Entering edit mode

Thanks Aaron Lun for your comment.. I have checked and rechecked the number and names of genes are equal and same respectively in all of my samples. Just couple of seconds ago, I just recreated the files and saved them in a separate folder and an again thorught he script. And guess what, that error of NA has gone and I can see the numbers in my library size..

Thanks for your comment and time though.

ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6