Table OCR tool created with OpenCV and Tesseract-OCR
2020/10/23 categories:Python| tags:Python|OpenCV|Tesseract-OCR|
Updated as [OCR Tool Rev1] (https://ymt-lab.com/en/post/2021/ocr-tool-rev1/). 2021/4/6
I recognized the position of the cell in the table with PNG image or PDF with OpenCV, and made an OCR tool with Tesseract-OCR for the recognized cell.
Open file
You can open the file by clicking Open and selecting the file. Supported files are PNG and PDF.
Image display
Click the File name in the table to display the image, turn the mouse wheel to zoom in and out, and click the mouse wheel to drag to change the display position.
Recognize cells from table
Click Recognize to recognize the table cells from the image and display them on the image. Click Rectxx on the right to display the position of the selected cell in a red frame.
Divide the image into cells
Click Split by rects to split the image of the selected file into cells.
Manually enter the area to divide
You can manually enter the area to divide by clicking Draw rect and dragging in the image.
OCR the split image
Click OCR to OCR the image of the rectx column displayed in the table of the file and display the result in the rectx_text column.
Source code
Uploaded to [github] (https://github.com/ymtlab/table_recognition_tool)