SHOGUN  4.1.0
 全部  命名空间 文件 函数 变量 类型定义 枚举 枚举值 友元 宏定义  
NGramTokenizer.h
浏览该文件的文档.
1 /*
2  * This program is free software; you can redistribute it and/or modify
3  * it under the terms of the GNU General Public License as published by
4  * the Free Software Foundation; either version 3 of the License, or
5  * (at your option) any later version.
6  *
7  * Written (W) 2013 Evangelos Anagnostopoulos
8  * Copyright (C) 2013 Evangelos Anagnostopoulos
9  */
10 
11 #ifndef _NGRAMTOKENIZER__H__
12 #define _NGRAMTOKENIZER__H__
13 
14 #include <shogun/lib/config.h>
15 
16 #include <shogun/lib/Tokenizer.h>
17 
18 namespace shogun
19 {
20 template <class T> class SGVector;
21 
26 {
27 
28 public:
33  CNGramTokenizer(int32_t ns=3);
34 
39  CNGramTokenizer(const CNGramTokenizer& orig);
40 
42  virtual ~CNGramTokenizer() {}
43 
48  virtual void set_text(SGVector<char> txt);
49 
55  virtual bool has_next();
56 
63  virtual index_t next_token_idx(index_t& start);
64 
70  virtual const char* get_name() const;
71 
72  virtual CNGramTokenizer* get_copy();
73 
74 private:
75  void init();
76 
77 protected:
78 
80  int32_t n;
81 
84 };
85 }
86 #endif /* _NGRAMTOKENIZER__H__ */
87 
The class CNGramTokenizer is used to tokenize a SGVector into n-grams.
virtual const char * get_name() const
int32_t index_t
Definition: common.h:62
The class CTokenizer acts as a base class in order to implement tokenizers. Sub-classes must implemen...
Definition: Tokenizer.h:29
virtual void set_text(SGVector< char > txt)
all of classes and functions are contained in the shogun namespace
Definition: class_list.h:18
virtual index_t next_token_idx(index_t &start)
CNGramTokenizer(int32_t ns=3)
virtual CNGramTokenizer * get_copy()

SHOGUN 机器学习工具包 - 项目文档