Elastic search and OCR

From TempusServa wiki
Revision as of 23:08, 28 November 2016 by old>Admin (Created page with "== Elastic search overview == The integrated fulltext serach using Elatic search is a internal/active approach to indexing the content. Content will be added to a indexing que...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Elastic search overview

The integrated fulltext serach using Elatic search is a internal/active approach to indexing the content. Content will be added to a indexing queue every time it is updated - ensuring allways updated content, but causing a performance degrade.

Beacuase file indexing is very CPU intensive, the file indexing functionality i seperated into a service that can run on a server seperated from te main application server. Anyway the fileindexer will run from a database queue.

If PDF OCR functionality is needed the following components needs installation too

  • Ghostscript (PDF to TIFF conversion)
  • Tesseract (OCR library)

The above components for OCR must be installed on the file indexing server.