Hello,
This is my first question in Bioconductor support and I am completely new to R and bioinformatics, so I apologize if the reply to my question seems obvious.
I am trying to compare RNA-seq data of two different conditions: LG (5 replicates) vs HG (3 replicates) by using RUVSeq to remove unwanted variation between batches.
The data look like this: The columns are the samples: HG1,HG2,HG3,HG5,HG4,LG1,LG2,LG5 The rows are the genes: NM000014, NM000015... up to 18000 genes.
The code I wrote is the following:
count_tab <- read.table("Human_islets_counts_Refseq_HG_vs_LG.csv",header = TRUE,row.names = 1,sep = ',')
filter <- apply(count_tab, 1, function(x) length(x[x>5])>=2)
filtered <- count_tab[filter,]
genes <- rownames(filtered)[grep("^NM", rownames(filtered))]
x <- as.factor(rep(c("HG", "LG"), each=5,3))
set <- newSeqExpressionSet(as.matrix(filtered),phenoData = data.frame(x, row.names=colnames(filtered)))
Error in data.frame(x, row.names = colnames(filtered)) : 
  row names supplied are of the wrong length
Anyone could give me a hint on why this is wrong?
Thanks in advance Cecilia
> sessionInfo()
R version 3.6.0 alpha (2019-04-08 r76348)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] RUVSeq_1.17.1               edgeR_3.25.4                limma_3.39.15               EDASeq_2.17.4              
 [5] ShortRead_1.41.0            GenomicAlignments_1.19.1    SummarizedExperiment_1.13.0 DelayedArray_0.9.9         
 [9] matrixStats_0.54.0          Rsamtools_1.99.6            GenomicRanges_1.35.1        GenomeInfoDb_1.19.3        
[13] lattice_0.20-38             locfit_1.5-9.1              zebrafishRNASeq_1.3.0       Biostrings_2.51.5          
[17] XVector_0.23.2              IRanges_2.17.5              S4Vectors_0.21.23           BiocParallel_1.17.18       
[21] Biobase_2.43.1              BiocGenerics_0.29.2        
loaded via a namespace (and not attached):
Error in x[["Version"]] : subscript out of bounds
In addition: Warning messages:
1: In FUN(X[[i]], ...) :
  DESCRIPTION file of package 'RCurl' is missing or broken
2: In FUN(X[[i]], ...) :
  DESCRIPTION file of package 'bitops' is missing or broken

I am having a similar problem with my data. Is RUVseq able to deal with replicates with unequal numbers across the sample groups?