Elastic search
Introduction
Install
Fulltext search on Tomcat/Linux installations
In order to index records and files you will need to complete these steps
- Install standalone Elastic search server
- Install and configure Tempus Serva file indexing
- Configure the Tempus Serva installation
Install Elastic search
Java 8 / Elastic search 6
This is the recommended version but requires Java 8.
Follow this guide:
https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html
Java 7 / Elastic search 1.7
This version is an alternate version.
Install and unpack files
sudo wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.6.tar.gz tar -xvf elasticsearch-1.7.6.tar.gz sudo rm elasticsearch-1.7.6.tar.gz
Run as a daemon
elasticsearch-1.7.6/bin/elasticsearch -d
Test that the service is running
curl 'http://localhost:9200/?pretty'
Install TS indexing service
Install war file
cd /usr/share/tomcat6/webapps/ sudo wget https://www.tempusserva.dk/install/tsFileIndexingService.war
A couple of seconds later you can configure he data connection and paths for OCR librarys
sudo nano /usr/share/tomcat6/conf/Catalina/localhost/tsFileIndexingService.xml
Restart server after changes
tstomcatrestart
Enable and test indexing in Tempus Serva
Set the following configurations to true
- fulltextIndexData
- fulltextIndexFile
Also add port 8080 to the following URL
- fulltextFileHandlerURL
Update any record in the TS installation
Tjeck the index is created and that there is a mapping for the solution
curl 'http://localhost:9200/tempusserva/?pretty'
Next validate that records are found when searched for (replace * with a valid string)
curl 'http://localhost:9200/tempusserva/_search?pretty&q=*'
Finally validate that the Tempus Serva wrapper also works
http://<server>/TempusServa/fulltextsearch?subtype=4&term=*
Optional OCR components
Some libraries must be installed (ghostscript is probably allready installed)
sudo yum install ImageMagick sudo yum install ghostscript
Also install tesseract
CentOS/Fedora
sudo yum install tesseract-ocr
Amazon linux
sudo yum --enablerepo=epel --disablerepo=amzn-main install libwebp sudo yum --enablerepo=epel --disablerepo=amzn-main install tesseract
Afterwards change the configurations in the file indexer
sudo nano /usr/share/tomcat6/conf/Catalina/localhost/tsFileIndexingService.xml
The values should be
- /usr/bin/tesseract
- /usr/bin/convert
- /usr/bin/ghostscript
After changing the values restart the server.
Trouble shooting
In doubt if the indexer has been executed ?
<server>/tsFileIndexingService/execute
Note that there is a switch in configuration file (context.xml) which can disable file clean on the server
<Parameter name="DisableFileCleanup" value=""/>
Reindexing files
Before reindexing starts may clean up the index (this is optional)
DELETE FROM lucenedatastore WHERE FieldID > 0;
To reindex execute the statement below using the following parameters
- schema of the database (example: "tslive")
- file table of the solution (example: "data_solution_file")
INSERT INTO lucenefilequeue (application,tablename,FileID) SELECT 'tslive', 'data_solution_file', f.ID as FileID FROM data_solution_file as f WHERE f.IsDeleted = 0;
After executing the statement execute the indexing service and wait patiently
Controlling timeouts
<Parameter name="TimeoutTesseract" value="600"/> <Parameter name="TimeoutGhostscript" value="60"/> <Parameter name="SuppressCommandOutput" value="0"/>