陈万志,宋剑,王德建,王星.基于正则表达式的结构化修复改进算法[J].电子测量与仪器学报,2017,31(12):2036-2041 |
基于正则表达式的结构化修复改进算法 |
Improved structural repairing algorithm based on regular expression |
|
DOI:10.13382/j.jemi.2017.12.022 |
中文关键词: 数据清洗 结构化修复 正则表达式 编辑距离 |
英文关键词:data cleaning structural repairing regular expression edit distance |
基金项目:辽宁省自然基金(2015020098)、辽宁工程技术大学博士启动基金(2015 1147)资助项目 |
|
Author | Institution |
Chen Wanzhi | School Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China |
Song Jian | School Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China |
Wang Dejian | China Petroleum Liaohe Equipment Company, Panjin 124010, China |
Wang Xing | School Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China |
|
摘要点击次数: 2157 |
全文下载次数: 7058 |
中文摘要: |
针对结构化数据的清洗问题,以基于正则表达式的结构化修复(RSR)算法为基础,借鉴字符串之间编辑距离的计算思想,将违反偏序关系的边从自动机的边集中提取出来,仅对得到的边引入优先级队列来修正所对应的编辑距离,而其他边由于满足偏序关系则可直接通过递推式来计算,从而提出一种改进RSR算法。算法测试与分析结果表明,改进RSR算法在时间复杂度方面有明显优势,相对原算法的提升显著且稳定。 |
英文摘要: |
Aiming at the structural data cleaning, an improved structural repairing algorithm based on regular expression was proposed according to calculate the edit distance between strings. Firstly, the violation partial order edge from edge set of nondeterministic finite automata was extracted, then the edit distance for edge in it was only revised by priority queue. At the same time, others edge to satisfy the partial order relation could calculate by recursive formula instead of the complex priority queue. The experimental results show that the improved algorithm not only has obvious advantage in time complexity, but also the improvement rate is significant and stable comparted with the original algorithm. |
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|