merge error: 'by.x' and 'by.y' specify different numbers of columns

0

Entering edit mode

Tran, Nhu Quynh T ▴ 50

@tran-nhu-quynh-t-6628

Last seen 9.6 years ago

Hi bioconductor group, I'm working on a chIPseq dataset and trying to annotated my peaks. So, after getting the gene symbol using biomart and want to merge it back to my bed file and it give me this error. Any help is appreciated. Thanks. QT bed_file_orig <- read.delim(file_name, header=FALSE, skip=1) bed_file <- bed_file_orig[!(stri_sub(bed_file_orig$V1, 1, 2)%in%c("HG", "MT", "GL")),] peakList <- BED2RangedData(bed_file) annotatedPeak = annotatePeakInBatch(peakList, AnnotationData=hs_annotation_tss) #add gene ids to the peak: using addGeneIDs gives error if the database does not contain the feature. So use biomart #annotatedPeak_tss <- addGeneIDs(annotatedPeak_tss,"org.Hs.eg.db",c("symbol", "genename")) feature_ids <- unique(annotatedPeak$feature) feature_ids<-feature_ids[!is.na(feature_ids)] feature_ids<-feature_ids[feature_ids!=""] IDs2Add<-getBM(attributes=c("ensembl_gene_id","external_ge ne_id"),filters = "ensembl_gene_id", values = feature_ids, mart=mart) out_file_name <- paste("../data/processed/",patient,"_",TF,"_",cond_out, ".csv", sep="") write.csv(annotatedPeak, file=out_file_name) annotatedPeak <- read.csv(out_file_name) annotatedPeak_reorder <- annotatedPeak[,c(9, 1:8, 10:15)] annotatedPeak_tss <- merge(annotatedPeak_reorder, IDs2Add, by.x="feature", b.y="ensembl_gene_id") Error in merge.data.frame(annotatedPeak_reorder, IDs2Add, by.x = "feature", : 'by.x' and 'by.y' specify different numbers of columns _______________________________ Nhu Quynh T. Tran, Ph.D. Assistant Professor of Preventive Medicine University of Tennessee Health Science Center 66 N. Pauline, Suite 633 Memphis, TN 38105 Phone: 901-448-1361 Fax: 901-448-7041 Email: qtran1@uthsc.edu<mailto:qtran1@uthsc.edu> [[alternative HTML version deleted]]

ChIPSeq biomaRt chipseq ChIPSeq biomaRt chipseq • 3.0k views

ADD COMMENT • link updated 9.8 years ago by James W. MacDonald 65k • written 9.8 years ago by Tran, Nhu Quynh T ▴ 50

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen just now

United States

Hi Nhu Quynh Tran, You are having problems with merge(), which is a base R function, so this question is better asked on R-help. On 6/26/2014 6:35 PM, Tran, Nhu Quynh T wrote: > Hi bioconductor group, > > I'm working on a chIPseq dataset and trying to annotated my peaks. So, after getting the gene symbol using biomart and want to merge it back to my bed file and it give me this error. Any help is appreciated. Thanks. QT > > > bed_file_orig <- read.delim(file_name, header=FALSE, skip=1) > bed_file <- bed_file_orig[!(stri_sub(bed_file_orig$V1, 1, 2)%in%c("HG", "MT", "GL")),] > peakList <- BED2RangedData(bed_file) > annotatedPeak = annotatePeakInBatch(peakList, AnnotationData=hs_annotation_tss) > > #add gene ids to the peak: using addGeneIDs gives error if the database does not contain the feature. So use biomart > #annotatedPeak_tss <- addGeneIDs(annotatedPeak_tss,"org.Hs.eg.db",c("symbol", "genename")) > feature_ids <- unique(annotatedPeak$feature) > feature_ids<-feature_ids[!is.na(feature_ids)] > feature_ids<-feature_ids[feature_ids!=""] > IDs2Add<-getBM(attributes=c("ensembl_gene_id","external _gene_id"),filters = "ensembl_gene_id", values = feature_ids, mart=mart) > > out_file_name <- paste("../data/processed/",patient,"_",TF,"_",cond_out, ".csv", sep="") > write.csv(annotatedPeak, file=out_file_name) > annotatedPeak <- read.csv(out_file_name) > annotatedPeak_reorder <- annotatedPeak[,c(9, 1:8, 10:15)] > annotatedPeak_tss <- merge(annotatedPeak_reorder, IDs2Add, by.x="feature", b.y="ensembl_gene_id") > Is that the actual code you used? If so, note that you used the argument 'b.y', not 'by.y'. So that is likely to be the reason. Best, Jim > > Error in merge.data.frame(annotatedPeak_reorder, IDs2Add, by.x = "feature", : > 'by.x' and 'by.y' specify different numbers of columns > _______________________________ > Nhu Quynh T. Tran, Ph.D. > Assistant Professor of Preventive Medicine > University of Tennessee Health Science Center > 66 N. Pauline, Suite 633 > Memphis, TN 38105 > Phone: 901-448-1361 > Fax: 901-448-7041 > Email: qtran1 at uthsc.edu<mailto:qtran1 at="" uthsc.edu=""> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 9.8 years ago James W. MacDonald 65k

Login before adding your answer.