| Creating custom break iterators for ICU (International Components for Unicode)Edward Batutis - Batutis Internationalization Consulting
| Intended Audience: | Software Engineers |  
| Session Level: | Intermediate, Advanced |  This paper will discuss creating custom break iterators for International Components 
for Unicode (ICU) a popular internationalization toolkit. ICU for Java and ICU for C/C++
provide break iterators to be used for character, word, and line-breaking. These iterators 
are useful for parsing text - for example, extracting words for a search engine or 
implementing a word-wrap feature in a text editor.The break iterators supplied are 
sufficient for many purposes, but some implementors may wish to use their own customized 
iterators. This paper will first discuss the default break iterators supplied by ICU for 
Java and C/C++ and how they are implemented. Next, the paper will cover how the existing 
iterators can be extended or replaced to meet an application-specific requirement.
 |