陈万志,宋剑,王德建,王星.基于正则表达式的结构化修复改进算法[J].电子测量与仪器学报,2017,31(12):2036-2041
基于正则表达式的结构化修复改进算法
Improved structural repairing algorithm based on regular expression
  
DOI:10.13382/j.jemi.2017.12.022
中文关键词:  数据清洗  结构化修复  正则表达式  编辑距离
英文关键词:data cleaning  structural repairing  regular expression  edit distance
基金项目:辽宁省自然基金(2015020098)、辽宁工程技术大学博士启动基金(2015 1147)资助项目
作者单位
陈万志 辽宁工程技术大学 电子与信息工程学院葫芦岛125105 
宋剑 辽宁工程技术大学 电子与信息工程学院葫芦岛125105 
王德建 渤海装备辽河重工有限公司盘锦124010 
王星 辽宁工程技术大学 电子与信息工程学院葫芦岛125105 
AuthorInstitution
Chen Wanzhi School Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China 
Song Jian School Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China 
Wang Dejian China Petroleum Liaohe Equipment Company, Panjin 124010, China 
Wang Xing School Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China 
摘要点击次数: 2030
全文下载次数: 6691
中文摘要:
      针对结构化数据的清洗问题,以基于正则表达式的结构化修复(RSR)算法为基础,借鉴字符串之间编辑距离的计算思想,将违反偏序关系的边从自动机的边集中提取出来,仅对得到的边引入优先级队列来修正所对应的编辑距离,而其他边由于满足偏序关系则可直接通过递推式来计算,从而提出一种改进RSR算法。算法测试与分析结果表明,改进RSR算法在时间复杂度方面有明显优势,相对原算法的提升显著且稳定。
英文摘要:
      Aiming at the structural data cleaning, an improved structural repairing algorithm based on regular expression was proposed according to calculate the edit distance between strings. Firstly, the violation partial order edge from edge set of nondeterministic finite automata was extracted, then the edit distance for edge in it was only revised by priority queue. At the same time, others edge to satisfy the partial order relation could calculate by recursive formula instead of the complex priority queue. The experimental results show that the improved algorithm not only has obvious advantage in time complexity, but also the improvement rate is significant and stable comparted with the original algorithm.
查看全文  查看/发表评论  下载PDF阅读器