DESeq no recognizing row.names
3
0
Entering edit mode
@alicia-r-perez-porro-5953
Last seen 7.2 years ago
Hi, I'm trying to use DESeq to know the differential expressed genes of my datasets and i'm encountering that DESeq is not recognizing my row.names so i can't create my cds. My .csv input file looks like: transcript_id,C4,CRL_2APR10,CRL_1_15JUL11,CRL_2_15JUL11 comp1000201_c0_seq1,5.00,0.00,0.00,0.00 comp1000297_c0_seq1,7.00,0.00,0.00,0.00 comp100036_c0_seq1,0.00,0.00,0.00,0.00 comp10003_c1_seq1,2.00,0.00,0.00,0.00 comp100041_c0_seq1,3.00,0.00,0.00,0.00 comp100041_c0_seq2,0.00,0.00,0.00,0.00 comp100041_c0_seq3,0.00,0.00,0.00,0.00 comp100051_c0_seq1,0.00,0.00,0.00,0.00 comp1000890_c0_seq1,3.00,0.00,0.00,0.00 This is what i'm running: > spercysts_vs_embryos = read.csv ( + file.choose(), + header = TRUE, + row.names=1, + sep = ",", + dec = ".") > head(spercysts_vs_embryos) C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11 comp1000201_c0_seq1 5 0 0 0 comp1000297_c0_seq1 7 0 0 0 comp100036_c0_seq1 0 0 0 0 comp10003_c1_seq1 2 0 0 0 comp100041_c0_seq1 3 0 0 0 comp100041_c0_seq2 0 0 0 0 >cond = factor(c("SP", "SP", "EB", "EB")) > spercysts_vs_embryosDesign = data.frame( + row.names = colnames( spercysts_vs_embryos ), + condition = c( "SP", "SP", "EB", "EB" ), + libType = c( "paired-end", "paired-end", "paired-end", "paired- end" ) ) > spercysts_vs_embryosDesign condition libType C4 SP paired-end CRL_2APR10 SP paired-end CRL_1_15JUL11 EB paired-end CRL_2_15JUL11 EB paired-end > str(spercysts_vs_embryos) 'data.frame': 307048 obs. of 4 variables: $C4 : num 5 7 0 2 3 0 0 0 3 0 ...$ CRL_2APR10 : num 0 0 0 0 0 0 0 0 0 0 ... $CRL_1_15JUL11: num 0 0 0 0 0 0 0 0 0 10 ...$ CRL_2_15JUL11: num 0 0 0 0 0 0 0 0 0 3 ... So, everything looks fine to me. But when i try to create my cds: > cds <-newCountDataSet(spercysts_vs_embryos, cond ) Error in newCountDataSet(spercysts_vs_embryos, cond) : The countData is not integer. So, if i check what is happening: > which( is.na(spercysts_vs_embryos), arr.ind=TRUE ) row col Any suggestions??? Thanks! -- Alicia R. Pérez-Porro PhD candidate Giribet lab Department of Organismic and Evolutionary Biology MCZ labs Harvard University 26 Oxford St, Cambridge MA 02138 phone: +1 617-496-5308 fax: +1 617-495-5667 www.oeb.harvard.edu/faculty/giribet/ Department of Marine Ecology Center for Advanced Studies of Blanes (CEAB-CSIC) C/Accés Cala St. Francesc 14 17300 Blanes, Girona, SPAIN phone: +34 972 336 101 fax: +34 972 337 806 www.ceab.csic.es [[alternative HTML version deleted]]
DESeq DESeq • 1.5k views
0
Entering edit mode
@steve-lianoglou-2771
Last seen 12 hours ago
Denali
Hi Alicia, On Fri, May 24, 2013 at 2:38 PM, Alicia R. P?rez-Porro <alicia.r.perezporro at="" gmail.com=""> wrote: > Hi, > > I'm trying to use DESeq to know the differential expressed genes of my > datasets and i'm encountering that DESeq is not recognizing my row.names so > i can't create my cds. > > My .csv input file looks like: > > transcript_id,C4,CRL_2APR10,CRL_1_15JUL11,CRL_2_15JUL11 > comp1000201_c0_seq1,5.00,0.00,0.00,0.00 > comp1000297_c0_seq1,7.00,0.00,0.00,0.00 > comp100036_c0_seq1,0.00,0.00,0.00,0.00 > comp10003_c1_seq1,2.00,0.00,0.00,0.00 > comp100041_c0_seq1,3.00,0.00,0.00,0.00 > comp100041_c0_seq2,0.00,0.00,0.00,0.00 > comp100041_c0_seq3,0.00,0.00,0.00,0.00 > comp100051_c0_seq1,0.00,0.00,0.00,0.00 > comp1000890_c0_seq1,3.00,0.00,0.00,0.00 > > This is what i'm running: > >> spercysts_vs_embryos = read.csv ( > + file.choose(), > + header = TRUE, > + row.names=1, > + sep = ",", > + dec = ".") > >> head(spercysts_vs_embryos) > C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11 > comp1000201_c0_seq1 5 0 0 0 > comp1000297_c0_seq1 7 0 0 0 > comp100036_c0_seq1 0 0 0 0 > comp10003_c1_seq1 2 0 0 0 > comp100041_c0_seq1 3 0 0 0 > comp100041_c0_seq2 0 0 0 0 > >>cond = factor(c("SP", "SP", "EB", "EB")) > >> spercysts_vs_embryosDesign = data.frame( > + row.names = colnames( spercysts_vs_embryos ), > + condition = c( "SP", "SP", "EB", "EB" ), > + libType = c( "paired-end", "paired-end", "paired-end", "paired- end" ) ) >> spercysts_vs_embryosDesign > condition libType > C4 SP paired-end > CRL_2APR10 SP paired-end > CRL_1_15JUL11 EB paired-end > CRL_2_15JUL11 EB paired-end > >> str(spercysts_vs_embryos) > 'data.frame': 307048 obs. of 4 variables: > $C4 : num 5 7 0 2 3 0 0 0 3 0 ... >$ CRL_2APR10 : num 0 0 0 0 0 0 0 0 0 0 ... > $CRL_1_15JUL11: num 0 0 0 0 0 0 0 0 0 10 ... >$ CRL_2_15JUL11: num 0 0 0 0 0 0 0 0 0 3 ... > > So, everything looks fine to me. But when i try to create my cds: Everything isn't fine :-) Your columns should be integers, not just "numeric". If you look at the source code of newCountDataSet, you'll see right at the very top: countData <- as.matrix(countData) if (any(round(countData) != countData)) stop("The countData is not integer.") Which looks like the error you are getting here: >> cds <-newCountDataSet(spercysts_vs_embryos, cond ) > Error in newCountDataSet(spercysts_vs_embryos, cond) : > The countData is not integer. So it's not that you have NA's in your data.frame, but your first problem is that the numbers you are using for your count matrix are not rounding to themselves, which is a quick/easy way to check that they aren't whole numbers, as you would expect by count data, and DESeq requires count data. So, instead of checking for NA here: > So, if i check what is happening: > >> which( is.na(spercysts_vs_embryos), arr.ind=TRUE ) > row col You might try to check which numbers are suspect: R> which(round(spercysts_vs_embryos) != spercysts_vs_embryos), arr.ind=TRUE) HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
0
Entering edit mode
If i do: >which( spercysts_vs_embryos != round (spercysts_vs_embryos), arr.ind=TRUE ) I get: comp203811_c0_seq3 44259 1 comp203811_c0_seq4 44260 1 comp203818_c0_seq2 44266 1 comp203818_c0_seq4 44268 1 comp203827_c0_seq2 44281 1 comp203827_c0_seq3 44282 1 comp203827_c0_seq7 44286 1 comp203828_c0_seq1 44287 1 [ reached getOption("max.print") -- omitted 166743 rows ] If i do: >which(round(spercysts_vs_embryos) != spercysts_vs_embryos), arr.ind=TRUE) I get: Error: unexpected ',' in "which(round(spercysts_vs_embryos) != spercysts_vs_embryos)," On Fri, May 24, 2013 at 4:51 PM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi Alicia, > > On Fri, May 24, 2013 at 2:38 PM, Alicia R. Pérez-Porro > <alicia.r.perezporro@gmail.com> wrote: > > Hi, > > > > I'm trying to use DESeq to know the differential expressed genes of my > > datasets and i'm encountering that DESeq is not recognizing my row.names > so > > i can't create my cds. > > > > My .csv input file looks like: > > > > transcript_id,C4,CRL_2APR10,CRL_1_15JUL11,CRL_2_15JUL11 > > comp1000201_c0_seq1,5.00,0.00,0.00,0.00 > > comp1000297_c0_seq1,7.00,0.00,0.00,0.00 > > comp100036_c0_seq1,0.00,0.00,0.00,0.00 > > comp10003_c1_seq1,2.00,0.00,0.00,0.00 > > comp100041_c0_seq1,3.00,0.00,0.00,0.00 > > comp100041_c0_seq2,0.00,0.00,0.00,0.00 > > comp100041_c0_seq3,0.00,0.00,0.00,0.00 > > comp100051_c0_seq1,0.00,0.00,0.00,0.00 > > comp1000890_c0_seq1,3.00,0.00,0.00,0.00 > > > > This is what i'm running: > > > >> spercysts_vs_embryos = read.csv ( > > + file.choose(), > > + header = TRUE, > > + row.names=1, > > + sep = ",", > > + dec = ".") > > > >> head(spercysts_vs_embryos) > > C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11 > > comp1000201_c0_seq1 5 0 0 0 > > comp1000297_c0_seq1 7 0 0 0 > > comp100036_c0_seq1 0 0 0 0 > > comp10003_c1_seq1 2 0 0 0 > > comp100041_c0_seq1 3 0 0 0 > > comp100041_c0_seq2 0 0 0 0 > > > >>cond = factor(c("SP", "SP", "EB", "EB")) > > > >> spercysts_vs_embryosDesign = data.frame( > > + row.names = colnames( spercysts_vs_embryos ), > > + condition = c( "SP", "SP", "EB", "EB" ), > > + libType = c( "paired-end", "paired-end", "paired-end", "paired-end" > ) ) > >> spercysts_vs_embryosDesign > > condition libType > > C4 SP paired-end > > CRL_2APR10 SP paired-end > > CRL_1_15JUL11 EB paired-end > > CRL_2_15JUL11 EB paired-end > > > >> str(spercysts_vs_embryos) > > 'data.frame': 307048 obs. of 4 variables: > > $C4 : num 5 7 0 2 3 0 0 0 3 0 ... > >$ CRL_2APR10 : num 0 0 0 0 0 0 0 0 0 0 ... > > $CRL_1_15JUL11: num 0 0 0 0 0 0 0 0 0 10 ... > >$ CRL_2_15JUL11: num 0 0 0 0 0 0 0 0 0 3 ... > > > > So, everything looks fine to me. But when i try to create my cds: > > Everything isn't fine :-) Your columns should be integers, not just > "numeric". If you look at the source code of newCountDataSet, you'll > see right at the very top: > > countData <- as.matrix(countData) > if (any(round(countData) != countData)) > stop("The countData is not integer.") > > Which looks like the error you are getting here: > > >> cds <-newCountDataSet(spercysts_vs_embryos, cond ) > > Error in newCountDataSet(spercysts_vs_embryos, cond) : > > The countData is not integer. > > So it's not that you have NA's in your data.frame, but your first > problem is that the numbers you are using for your count matrix are > not rounding to themselves, which is a quick/easy way to check that > they aren't whole numbers, as you would expect by count data, and > DESeq requires count data. > > So, instead of checking for NA here: > > > So, if i check what is happening: > > > >> which( is.na(spercysts_vs_embryos), arr.ind=TRUE ) > > row col > > You might try to check which numbers are suspect: > > R> which(round(spercysts_vs_embryos) != spercysts_vs_embryos), > arr.ind=TRUE) > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > [[alternative HTML version deleted]]
0
Entering edit mode
@steve-lianoglou-2771
Last seen 12 hours ago
Denali
Hi Alicia, Unfortunately I replied to the previous email off list, but let's get back on list now: > Sorry for my stupidity, i'm completely new in R and DESeq. > > I identified the problem but because i opened my file in excel and i saw that i have some decimal numbers. But i didn't identified at all my problem by what i got with which( spercysts_vs_embryos != round (spercysts_vs_embryos), > > arr.ind=TRUE ) > > Can you please explain to me what i got?: > > comp203811_c0_seq3 44259 1 > > comp203811_c0_seq4 44260 1 > > comp203818_c0_seq2 44266 1 > > comp203818_c0_seq4 44268 1 > > comp203827_c0_seq2 44281 1 > > comp203827_c0_seq3 44282 1 > > comp203827_c0_seq7 44286 1 > > comp203828_c0_seq1 44287 1 > > [ reached getOption("max.print") -- omitted 166743 rows ] You've just identified the row,col indices of the values in your count matrix that do not look like integers. As Simon and I tried to explain in the initial responses, this command: R> which( spercysts_vs_embryos != round (spercysts_vs_embryos), arr.ind=TRUE ) Finds the elements in spercysts_vs_embryos that are not rounding to each other. As you included in the previous email, one of these elements is (44259,1), so what is it? R> spercysts_vs_embryos[44259, 1] The point is that DESeq needs *integers*, you have stuff in your data.frame that are not integers. Likely they are some decimal number, I guess. You need to input raw count data to DESeq, not some mean count, RPKM, or whatever. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 15 months ago
Zentrum für Molekularbiologie, Universi…
> So, if i check what is happening: > >> which( is.na(spercysts_vs_embryos), arr.ind=TRUE ) May the issue is not an NA but some non-integer number that sneaked in. Try which( spercysts_vs_embryos != round (spercysts_vs_embryos), arr.ind=TRUE )