Analyzing and Experimenting Open Source OCR Engines in RPA with Levenshtein Distance Algorithm

Authors

  • Malathi T Assistant Professor, Department of Computer Science & Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode, Tamilnadu Author
  • Diwaan Chandar C S Second Year, Department of Information Science & Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode, Tamilnadu Author
  • Nithish S Second Year, Department of Information Science & Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode, Tamilnadu Author
  • Niranjan V Second Year, Department of Computer Technology, Bannari Amman Institute of Technology, Sathyamangalam, Erode, Tamilnadu Author
  • Swashthika A K Second Year, Department of Computer Technology, Bannari Amman Institute of Technology, Sathyamangalam, Erode, Tamilnadu Author

DOI:

https://doi.org/10.47392/irjash.2020.269

Keywords:

Optical Character Recognition(OCR), Robotics Process Automation(RPA), Google Tesseract OCR, Microsoft OCR

Abstract

Robotic Process Automation is a platform used to automate boring and repetitive computer processes using software bots so that humans could involve in tasks which include creativity and decision making which could not be done by robots. Optical Character Recognition takes out printed characters in an image and converts it to text. Google Tesseract OCR and Microsoft OCR were the commonly used OCR engines available in UiPath, a tool for Robotic Process Automation. In Previous research on comparing those two open source OCR engine, there we made a comparison on basic factors which included speed, hardware requirements, accuracy, but in that case, accuracy was been calculated manually which gave us results but with less precision, as it was a manual process to substitute scraped data to that formulas, In this research we’ve made results with more precision by performing a String comparison algorithm named, “Levenshtein Distance Algorithm” which is deployed in UiPath.

         

Downloads

Published

2020-12-01