Unicode Enabling a Natural Language Search Engine: Ask Jeeves Answers
John Lowe - Ask Jeeves, Inc. & Tammy Young - Basis Technology Corporation
Intended Audience: |
Software Engineer |
Session Level: |
Intermediate |
Ask Jeeves $B!G (J search technology combines natural-language processing
techniques with an editorial staff to provide end-users with a focused,
highly relevant search experience. After opening their first European site
in March 2000, Ask Jeeves International consulted Basis Technology for help
in readying their product suite for Asian markets. Since one of the
explicit criteria for this development was to enable the same code base to
support multiple languages, the logical choice for them was Unicode-enabled
multilingual executables. Along the way the teams faced several challenges.
These included processing URLs in a variety of encodings, and upgrading the
existing Jeeves tokenizer to handle Unicode characters. In addition to
these basic internationalization tasks, Ask Jeeves and Basis worked together
to identify appropriate dictionaries, handle Japanese morphology and
orthography, build a demonstrable Japanese knowledgebase, and to test this
complex software. The newly established Ask Jeeves Japan KK is poised for a
successful launch of the Unicode-enabled Japanese Answers software in early
2001.
|