Simple tokenizer Homepage, Documentation and Downloads – SQLite fts5 Plugin Supporting Chinese and Pinyin Search – Development details

simple is a support for Chinese and Pinyin sqlite3 fts5 provides the full Solution to the problem of multi-phonetic characters in full-text search on WeChat mobile terminal Solution 4 in this article supports Chinese and Pinyin searches very simply and efficiently.

Implementation related introduction:

On this basis, it is also supported by cppjieba To achieve more precise phrase matching, see the introduction article


  1. The simple tokenizer supports Chinese and Pinyin word segmentation, and can control whether Pinyin needs to be supported through a switch
  2. The simple_query() function realizes the function of automatically assembling the match query, and the user does not need to learn the syntax of the fts5 query
  3. simple_highlight() realizes continuous highlighting of matching words, which is similar to the highlight that comes with sqlite, but simple_highlight realizes the logic that continuous matching words are grouped into the same group, theoretically users need this more
  4. simple_highlight_pos() returns the vocabulary position of the match, users can decide how to use it
  5. simple_snippet() implements the function of intercepting match fragments, which is similar to the snippet function that comes with sqlite, and also enhances the logic of grouping consecutive match words into the same group
  6. jieba_query() realizes the effect of jieba participle, and can achieve more accurate matching when the index remains to pass -DSIMPLE_WITH_JIEBA=OFF Turn off the function of stuttering and participle #35
  7. jieba_dict() specifies the directory of the dict, only needs to be called once, and needs to be specified before calling jieba_query().

#Simple #tokenizer #Homepage #Documentation #Downloads #SQLite #fts5 #Plugin #Supporting #Chinese #Pinyin #Search #Development details

Leave a Reply

Your email address will not be published. Required fields are marked *