DESEQ2 coldata creation from featureCounts and row.names length error
1
0
Entering edit mode
@997c6d9a
Last seen 2.2 years ago
France

Hi all, I am having an issue with DESeq2. One is related to its use in galaxy (did not get an answer on galaxy forum so I thought why not ask here) and one is related to the introduction of coldata information in the matrix before running DESeq2 when using featureCounts data.

1) using Galaxy with 2 factors (2 batches/ 2 discinct studies from the litterature), 3 levels in each factor that are not the same. so 2 batches and in first batch I have non-treated, treated 1h and selected population treated 1h and in second batch I have 3 populations selected that I think could contribute to the 1h treatment of the first batch/study. also in the first study they have duplicates and in the second they have triplicates. I end up with the folliowing error: "Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-" I tried to change names of factors, of factorlevels and put duplicates everywhere, it did not work. However with only One factor and putting everything as factor levels, it works. My batch effect is not taken into account though... Could you tell me where could the error lie? Or at least what this row.names length error refers to?

2) I then tried to retrieve my featureCounts datasets from galaxy so that I can do deseq2 myself in R (I'm beginner in R) I will fuse my different featureCounts data using join under terminal to have my list of gene names in first column and counts for all replicates in a column each and import it in R and make it as a matrix. Here, I read the bioconductor Doc of DESeq2, but I'm not sure I understand how to create the colData information to inform about the factors. after some search I propose (condition <- factor(c(rep("cond1", 2), rep("cond2", 2), rep("cond3", 2), rep("cond4", 3), rep("cond5", 3), rep("cond6", 3)))) (batch <- factor(c(rep("batch1", 6), rep("batch2", 9)))) (coldata <- data.frame(row.names=colnames(countdata), condition, batch)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition, batch) dds <- DESeq(dds)

and then I can go on. Could you tell me if it is correct? Where could I find more explanation about the coldata implementation into the matrix? and if I have only one factor with factorlevels only how should I do? only the "condition" lane?

thanks for any help you could provide, and let me know if you need any more information. Best regards

DESeq2 • 2.0k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I'm not sure about the error in (1) without the underlying R code. Could you find a way to provide that? It looks like the wrong number of samples is being provided across the inputs.

For 2, yes you can create a data.frame like so. (Note that you can use e.g.: rep(c("cond1","cond2"), c(2,2)) where you give each repeated value and then the number of repeats.

Or you can write a CSV file using a text editor and read it into R as a data.frame with read.csv.

ADD COMMENT
0
Entering edit mode

Hi , thanks a lot for your answer. \ For 2) could I ask you the format of the csv? I guess first line ID second column could be first factor then second factor etc...\ For 1), I can tell you the structure: in galaxy I created 2 "factors"\ factorname1 => 3 levels (3 condtions of first paper): FactorLevel1_WT, FactorLevel2_injured, FactorLevel3_celltype1_injured and in each factor level, 2 replicates\ factorname2 => 3 levels (3 sorted cell type frm another tissue that may contribute to cells in injured condition): FactorLevel4_celltype2, FactorLevel5_celltype3, FactorLevel6_celltype4, 3 replicates each\ Putting everything under one unique factor works fine (This example does not have the 3rd level in factor one but I also tried it and also tried to put duplicates only for the factor2), I've also tried to put a different structure with factor one "condition" and factor 2 is "batch" but then I have duplicates as error because I have the same featureCounts (all of them) in both factors.\ I can attach a screenshot if that helps, the bug report tells me this

Rscript '/cvmfs/main.galaxyproject.org/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/deseq2/71bacea10eee/deseq2/deseq2.R' --cores ${GALAXY_SLOTS:-1} -o '/galaxy-repl/main/files/048/404/dataset_48404416.dat' -p '/galaxy-repl/main/files/048/404/dataset_48404417.dat'                                     -H  -f '[["FactorName1", [{"FactorLevel2_inj": ["/galaxy-repl/main/files/047/664/dataset_47664221.dat", "/galaxy-repl/main/files/047/664/dataset_47664223.dat"]}, {"FactorLevel1_WT": ["/galaxy-repl/main/files/047/553/dataset_47553086.dat", "/galaxy-repl/main/files/047/553/dataset_47553088.dat"]}]], ["FactorName2", [{"FactorLevel6_CD142": ["/galaxy-repl/main/files/048/247/dataset_48247834.dat", "/galaxy-repl/main/files/048/247/dataset_48247845.dat", "/galaxy-repl/main/files/048/247/dataset_48247890.dat"]}, {"FactorLevel5_ICAM1": ["/galaxy-repl/main/files/048/243/dataset_48243217.dat", "/galaxy-repl/main/files/048/247/dataset_48247822.dat", "/galaxy-repl/main/files/048/247/dataset_48247840.dat"]}, {"FactorLevel4_DPP4": ["/galaxy-repl/main/files/048/247/dataset_48247894.dat", "/galaxy-repl/main/files/048/247/dataset_48247901.dat", "/galaxy-repl/main/files/048/247/dataset_48247903.dat"]}]]]' -l '{"dataset_47553086.dat": "Counts_WT_Malecova_Rep1", "dataset_47553088.dat": "Counts_WT_Malecova_rep2", "dataset_47664221.dat": "Counts_Inj_d1_rep1", "dataset_47664223.dat": "Counts_Inj_d1", "dataset_48247894.dat": "featureCounts on data 234 and data 515: Counts", "dataset_48247901.dat": "featureCounts on data 234 and data 516: Counts", "dataset_48247903.dat": "featureCounts on data 234 and data 517: Counts", "dataset_48243217.dat": "featureCounts on data 234 and data 456: Counts", "dataset_48247822.dat": "featureCounts on data 234 and data 497: Counts", "dataset_48247840.dat": "featureCounts on data 234 and data 499: Counts", "dataset_48247834.dat": "featureCounts on data 234 and data 498: Counts", "dataset_48247845.dat": "featureCounts on data 234 and data 500: Counts", "dataset_48247890.dat": "featureCounts on data 234 and data 514: Counts"}' -t 1

stderr

Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-

\ thank you again for your help.

ADD REPLY
0
Entering edit mode

Re: format of the CSV, this is some basic R input, I'd poke around on the online R guides as to how to read CSV data into R. Also if you feel more comfortable doing this with data.frame and factor, go ahead.

I won't be able to debug the Galaxy bit, sorry, due to time pressure. It just may not be possible to do all types of analyses within the Galaxy plugin.

ADD REPLY

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6