Abstract:Aiming at the structural data cleaning, an improved structural repairing algorithm based on regular expression was proposed according to calculate the edit distance between strings. Firstly, the violation partial order edge from edge set of nondeterministic finite automata was extracted, then the edit distance for edge in it was only revised by priority queue. At the same time, others edge to satisfy the partial order relation could calculate by recursive formula instead of the complex priority queue. The experimental results show that the improved algorithm not only has obvious advantage in time complexity, but also the improvement rate is significant and stable comparted with the original algorithm.