Table OCR tool created with OpenCV and Tesseract-OCR

2020/10/23 categories:Python| tags:Python|OpenCV|Tesseract-OCR|

Updated as [OCR Tool Rev1] (https://ymt-lab.com/en/post/2021/ocr-tool-rev1/). 2021/4/6

I recognized the position of the cell in the table with PNG image or PDF with OpenCV, and made an OCR tool with Tesseract-OCR for the recognized cell.

Open file

You can open the file by clicking Open and selecting the file. Supported files are PNG and PDF.

Image display

Click the File name in the table to display the image, turn the mouse wheel to zoom in and out, and click the mouse wheel to drag to change the display position.

Recognize cells from table

Click Recognize to recognize the table cells from the image and display them on the image. Click Rectxx on the right to display the position of the selected cell in a red frame.

Divide the image into cells

Click Split by rects to split the image of the selected file into cells.

Manually enter the area to divide

You can manually enter the area to divide by clicking Draw rect and dragging in the image.

OCR the split image

Click OCR to OCR the image of the rectx column displayed in the table of the file and display the result in the rectx_text column.

Source code

Uploaded to [github] (https://github.com/ymtlab/table_recognition_tool)

Share post

Related Posts

コメント