Abstract
Figuring out how a document has changed from one version to another isn’t always the simplest task. We encountered the problem of comparing two PDF documents, edited using different editing tools. When we tried to compare these PDFs, using existing comparison tools, comparison results were not sat- isfactory. After analysis, we found that, if documents had been edited using any other tool than acrobat(non-Native), then these tools were unable to detect the proper layout (para, header, footer, columns, tables etc.) of the document and therefore unable to sequence them in correct order resulting in false com- parison output. To overcome this problem, we tried latest developments in computer vision to detect the layout information of the document. Using lay- out information, contents were arranged in correct order and then compared. This resulted in better comparison output. Also, using AI for layout detection made it independent of how the document was created and edited. We built a complete framework which includes reading the information, detecting lay- out, arranging information, comparing it, and visualizing the differences. This Framework can be applied to build any document comparison tool irrespective of document type.