Difference between revisions of "Constellio search/indexing"

From TempusServa wiki
Jump to navigation Jump to search
old>Admin
old>Admin
Line 41: Line 41:


==== Option: Tweak search results ====
==== Option: Tweak search results ====
The search servlet wil automatically deliver content in a crude form, without any extra html such as wrappers. It will also provide the crawler with information about when it was last updated, and document Title will be se to current records Resume value.
You might consider excluding the '''command=list''' pages for better (less redundant) search results.

Revision as of 14:31, 24 June 2014


Constilleio search / indexing

Activate the search servlet in your installation

The search servlet is deactivated by default.

  1. Edit the <tomcat>/webapps/<Tempus Serva>/WEB-INF/web.xml
  2. Remove comments from the search servlet
  3. Remove comments from the search filter

If you are using web container security please remove it from the search servlet: The servlet filter will handle authentication of crwaler robots using a specialized form of basic authetication (normal users will be redirected to the main servlet instead).


Option: Create a user for crawling

You will need at least 1 user for crawling the content in Tempus Serva, possibly more if content restrictions apply to different search user groups. All group and policies will be respected through the indexing.

Prepare Constellio

Install Constellio

  1. Download the 1.3 installer
  2. Run the installer by doubleclicking the .jar file
  3. Install to MySQL database
  4. Run the Start constellio

Setting up a connector

Before setting up a connector create or choose a valid search scope

  1. Choose connector type: auth-http-conector
  2. Ensure that Use security is checked
  3. Set start URL to: http://<server name>/TempusServa/search
  4. Include the same URL in include patterns
  5. Enter username for the crawler user (a valid TS user)
  6. Enter password for the crawler user (a valid TS user)
  7. After submitting the new connector, crawling/indexing will start by itself

Option: Tweak search results

The search servlet wil automatically deliver content in a crude form, without any extra html such as wrappers. It will also provide the crawler with information about when it was last updated, and document Title will be se to current records Resume value.

You might consider excluding the command=list pages for better (less redundant) search results.