Difference between revisions of "Elastic search and OCR"
old>Admin |
old>Admin |
||
Line 3: | Line 3: | ||
Content will be added to a indexing queue every time it is updated - ensuring allways updated content, but consuming CPU ressources on the indexing server. | Content will be added to a indexing queue every time it is updated - ensuring allways updated content, but consuming CPU ressources on the indexing server. | ||
Beacuase file indexing is very CPU intensive, the file indexing functionality i seperated into a service that can run on a server seperated from te main application server. Anyway the fileindexer will run from a database queue. | Beacuase file indexing is very CPU intensive, the file indexing functionality i seperated into a service that can run on a server seperated from te main application server. Anyway the fileindexer will run from a database queue, so in most cases seperation is not strictly required. | ||
The basic search service requires | The basic search service requires | ||
* TS file indeing service (queue handler) | * TS file indeing service (queue handler) | ||
* Elastic search server (search engine) | * Elastic search server (search engine) | ||
For multitenant setups a single TS file indexing service can service multiple instances, as long as they write requests to the same queue (using DB views). The Elastic search server can also handle multiple applications. | |||
If PDF OCR functionality is needed the following components needs installation too | If PDF OCR functionality is needed the following components needs installation too | ||
Line 16: | Line 18: | ||
=== Setting up basic search service === | === Setting up basic search service === | ||
Note that the Elastic search server can be installed on a seperate server (neither TS file indexing or the application server is required). | |||
==== Install: TS file indexing service ==== | ==== Install: TS file indexing service ==== | ||
==== Install: Elastic search server ==== | ==== Install: Elastic search server ==== | ||
==== Multi application setup ==== | |||
=== Adding OCR capability === | === Adding OCR capability === | ||
Both OCR components msut be installed on the same server as TS file indexing service. | |||
==== Install: Ghostscript binaries ==== | ==== Install: Ghostscript binaries ==== | ||
==== Install: Tesseract binaries ==== | ==== Install: Tesseract binaries ==== |
Revision as of 23:20, 28 November 2016
Understanding integrated search
The integrated fulltext serach using Elatic search is a internal/active approach to indexing the content. Content will be added to a indexing queue every time it is updated - ensuring allways updated content, but consuming CPU ressources on the indexing server.
Beacuase file indexing is very CPU intensive, the file indexing functionality i seperated into a service that can run on a server seperated from te main application server. Anyway the fileindexer will run from a database queue, so in most cases seperation is not strictly required.
The basic search service requires
- TS file indeing service (queue handler)
- Elastic search server (search engine)
For multitenant setups a single TS file indexing service can service multiple instances, as long as they write requests to the same queue (using DB views). The Elastic search server can also handle multiple applications.
If PDF OCR functionality is needed the following components needs installation too
- Ghostscript (PDF to TIFF conversion)
- Tesseract (OCR library)
The above components for OCR must be installed on the file indexing server.
Setting up basic search service
Note that the Elastic search server can be installed on a seperate server (neither TS file indexing or the application server is required).
Install: TS file indexing service
Install: Elastic search server
Multi application setup
Adding OCR capability
Both OCR components msut be installed on the same server as TS file indexing service.