Port of OCRopus 0.1.1 from linux source to Visual studio 2005 V8
Un article de Oliv's wiki.
Here is a howto port OCROpus 0.1.1 from linux source to a Visual studio 2005 V8 solution. You can also download directly the OCROpus visual studio Project from ???
[modifier] SCOPE
[modifier] OBJECTIVE
Describe step by step the method of port of OCRopus 0.1.1 from Linux to Windows platform to get an executable file and a DLL. Give instruction to use this DLL by her API.
[modifier] Environment specification
- Microsoft Visual Studio 2005 8.0.
- Microsoft Windows XP SP2
[modifier] Hardware specification
- Processor Intel 32 bit simple core
[modifier] Image input format
- PNM image
- PGM image
- JPEG and PNG image return a runtime error (FIXME)
[modifier] Data output format
A string vector wich respect hOCR format, specification of this format is present at http://docs.google.com/View?docid=dfxcv4vc_67g844kf
[modifier] APPLICABLE AND REFERENCES DOCUMENT
[modifier] Applicable documents
[modifier] [A2] OCRopus website: http://code.google.com/p/ocropus/
OCRopus project Website. Perform the documentation and source code of OCRopus.
[modifier] References document
[modifier] [R1] Brief Documentation and file used for a precedent port:
http://groups.google.com/group/ocropus/msg/a5c61374e62ac3f3
[modifier] HOW TO COMPILE OCROPUS USING VISUAL STUDIO
[modifier] CREATE WORKING DIRECTORIES
- Create OCRopus_project
- Create OCRopus_project\OCRopus
- Create OCRopus_project\source
- Create Ocropus_project\tesseract-2.01\lib
- Create OCRopus_project\libpng-1.2.8-lib
- Create OCRopus_project\zlib123-dll
- Create OCRopus_project\jpeglib
[modifier] GENERATE THE TESSERACT LIBRARY
- Download tesseract-2.01.tar.gz and decompress it to the source directory
- Open tesseract.dsp file (using visual studio)
- Modify all of project to use the standard windows library
- Modify all of project to generate as Multithread debug DLL
- Modify properties of tessdll project to compile statically
- Generate the tessdll project
- Copy the lib file (OCRopus_project\source\tesseract-2.01\bin.dbg\tessdll.lib) into Ocropus_project\tesseract-2.01\lib
[modifier] INSTALL OCROPUS 0.1.1
- Download OCropus-0.1.1 from http://code.google.com/p/ocropus/downloads/list
- Extract to OCRopus_project\OCRopus
- Download OCROpus.zip from http://groups.google.com/group/ocropus/files and extract them to the same directory
- Modify the ocr-tesseract header file directory to include all directory of Tesseract source
- Modify all of project to use the standard windows library
- Modify all of project to generate as Multithread debug DLL
- Add the tessdata directory to OCRopus_project\OCRopus\Debug
[modifier] INSTALL LIBPNG1.2.8 DLL
- Download libpng1.2.8.exe from http://sourceforge.net/project/showfiles.php?group_id=23617&package_id=16183 to source directory
- Install it to OCRopus_project\libpng-1.2.8-lib
- Copy libpng13.dll to OCRopus_project\OCRopus\debug
[modifier] INSTALL ZLIB123 DLL
- Download zlib123-dll.zip
- Extract them to OCRopus_project\zlib123-dll
- Copy zlib1.dll to OCRopus_project\OCRopus\debug
[modifier] INSTALL JPEGLIB
- Download jpeglib from http://gnuwin32.sourceforge.net/packages/jpeg.htm
- Install them to OCRopus_project\jpeglib
[modifier] COMPILATION
[modifier] Compilation instructions
- Open ocropus.sln by using visual studio and compile
[modifier] Compilation possible errors
[modifier] Can't create ocrocmd command:
Delete OCRopus_project\OCRopus\ocrocmd\debug
[modifier] hardcoded_version_string or set_version_string error:
- delete line 237 of file ocrocmd.cc ("set_version_string(hardcoded_version_string());"
[modifier] tessdll.lib(dawg.obj) : error LNK2019: extern symbol unresolved _ntohl@4 referenced in function "void __cdecl read_squished_dawg(char const *,unsigned __int64 *,int)"
- Add the ws2_32.lib library to tessdll project in tesseract solution (visual studio project option).
- Recompile tessdll and move the lib to ocrodll project
- Add the ws2_32.lib library to ocrodll and ocrocmd project in ocropus solution.
- Recompile ocropus solution.
[modifier] EXECUTION
[modifier] Execution instructions
Execute the ocromd.exe executable following by the path of the pnm image.
[modifier] Execution possible errors instructions
[modifier] The OCR analyzing interrupts inexplicably just before starting of analyze.
Check if tessdata directory is present in the ocrocmd executable directory.
[modifier] The result vector is empty.
Your project should be compiled as a multithread debugger DLL and use standard windows library.
[modifier] The ocropus_recognize_dll function run only one time correctly (access read error)
This bug is due to the tesseract library is not reinitialize at the second call because of the static variable “first_time” in the file “msmenus.cpp” of the tesseract project. To correct this bug, change the value of this variable to ensure she stay at true (line 69 of “msmenus.cpp” )
[modifier] CR asserterror
Error is caused by a higher than 0xFF value passed to isalpha function in dawg.cpp (tesseract solution). To fix it, place a test that return false if variables “dummy_word [char_index]” is between 0 (0x00) and 255 (0xFF) (at line 144 of dawg.cpp).
[modifier] Others errors
Check the tesseract.log error
[modifier] IMPLICIT DLL CREATION
- Create a new DLL project in the solution named ocrodll
- opy contents of ocrocmd.cc to ocrodll.cpp and delete the real_main and main function
- ink the project dependencies of ocrodll using the solution properties
- Add the needed repository in ocrodll project
- Add the tessdll.lib link in ocrodll project.
- Add the appropriate header.
- Compile.
[modifier] HOW TO USE THE OCROPUS DLL (OCRODLL)
[modifier] INTEGRATION TO ANOTHER PROJECT
Integrate them like any other DLL by indicate the library (ocrodll.lib), and include the header file (ocrodll.h). Add the ocrodll.dll and tessdll directory to the executable directory.
[modifier] API
Only one function is accessible:
//--------------------------------------------------------------
// ocropus_recognize
//
//
// Perform the ananlyse of file “complete_file_path” and stock the result in res_vec
// input_spec:char*: should be a valid path
//--------------------------------------------------------------
std::vector<std::string> ocropus_recognize(const char *complete_file_path, std::vector<std::string>& res_vec)
[modifier] DATA FORMAT RETURN
The analyze result is format as a string vector. Each line of this vector is a line that is originally written in the result file like it is mentioned in [A2].

