Elastic search and OCR
Revision as of 23:14, 28 November 2016 by old>Admin (→Understanding integrated search)
Understanding integrated search
The integrated fulltext serach using Elatic search is a internal/active approach to indexing the content. Content will be added to a indexing queue every time it is updated - ensuring allways updated content, but consuming CPU ressources on the indexing server.
Beacuase file indexing is very CPU intensive, the file indexing functionality i seperated into a service that can run on a server seperated from te main application server. Anyway the fileindexer will run from a database queue.
The basic search service requires
- TS file indeing service (queue handler)
- Elastic search server (search engine)
If PDF OCR functionality is needed the following components needs installation too
- Ghostscript (PDF to TIFF conversion)
- Tesseract (OCR library)
The above components for OCR must be installed on the file indexing server.