SHOGUN
v2.0.0
|
Class CInputParser is a templated class used to maintain the reading/parsing/providing of examples.
Parsing is done in a thread separate from the learner.
Note that parsing is not done directly by this class, but by the Streaming*File classes. This class only calls the required get_vector* functions from the StreamingFile object. (Exactly which function should be called is set through the set_read_vector* functions)
The template type should be the type of feature vector the parser should return. Eg. CInputParser<float32_t> means it will expect a float32_t* vector to be returned from the get_vector function. Other parameters returned are length of feature vector and the label, if applicable.
If the vectors cannot be directly represented as say float32_t* one can instantiate eg. CInputParser<VwExample> and it looks for a get_vector function which returns a VwExample, which may contain any kind of data, including label, vector, weights, etc. It is then up to the external algorithm to handle such objects.
The parser should first be started with a call to the start_parser() function which starts a new thread for continuous parsing of examples.
Parsing is done through the CParseBuffer object, which in its current implementation is a ring of a specified number of examples. It is the task of the CInputParser object to ensure that this ring is being updated with new parsed examples.
CInputParser provides mainly the get_next_example function which returns the next example from the CParseBuffer object to the caller (usually a StreamingFeatures object). When one is done using the example, finalize_example() should be called, leaving the spot free for a new example to be loaded.
The parsing thread should be joined with a call to end_parser(). exit_parser() may be used to cancel the parse thread if needed.
Options are provided for automatic SG_FREEing of example objects after each finalize_example() and also on CInputParser destruction. They are set through the set_free_vector* functions. Do not free vectors on finalize_example() if you intend to reuse the same vector memory locations for different examples. Do not free vectors on destruction if you are bound to free them manually later.
Definition at line 80 of file InputParser.h.
Public Member Functions | |
CInputParser () | |
~CInputParser () | |
void | init (CStreamingFile *input_file, bool is_labelled=true, int32_t size=PARSER_DEFAULT_BUFFSIZE) |
bool | is_running () |
int32_t | get_number_of_features () |
void | set_read_vector (void(CStreamingFile::*func_ptr)(T *&vec, int32_t &len)) |
void | set_read_vector_and_label (void(CStreamingFile::*func_ptr)(T *&vec, int32_t &len, float64_t &label)) |
int32_t | get_vector_and_label (T *&feature_vector, int32_t &length, float64_t &label) |
int32_t | get_vector_only (T *&feature_vector, int32_t &length) |
void | set_free_vector_after_release (bool free_vec) |
void | set_free_vectors_on_destruct (bool destroy) |
void | start_parser () |
void * | main_parse_loop (void *params) |
void | copy_example_into_buffer (Example< T > *ex) |
Example< T > * | retrieve_example () |
int32_t | get_next_example (T *&feature_vector, int32_t &length, float64_t &label) |
int32_t | get_next_example (T *&feature_vector, int32_t &length) |
void | finalize_example () |
void | end_parser () |
void | exit_parser () |
int32_t | get_ring_size () |
Public Attributes | |
bool | parsing_done |
bool | reading_done |
E_EXAMPLE_TYPE | example_type |
Protected Attributes | |
void(CStreamingFile::* | read_vector )(T *&vec, int32_t &len) |
void(CStreamingFile::* | read_vector_and_label )(T *&vec, int32_t &len, float64_t &label) |
CStreamingFile * | input_source |
Input source, CStreamingFile object. | |
pthread_t | parse_thread |
Thread in which the parser runs. | |
CParseBuffer< T > * | examples_ring |
The ring of examples, stored as they are parsed. | |
int32_t | number_of_features |
Number of features in dataset (max of 'seen' features upto point of access) | |
int32_t | number_of_vectors_parsed |
Number of vectors parsed. | |
int32_t | number_of_vectors_read |
Number of vectors used by external algorithm. | |
Example< T > * | current_example |
Example currently being used. | |
T * | current_feature_vector |
Feature vector of current example. | |
float64_t | current_label |
Label of current example. | |
int32_t | current_len |
Number of features in current example. | |
bool | free_after_release |
Whether to SG_FREE() vector after it is used. | |
int32_t | ring_size |
Size of the ring of examples. | |
pthread_mutex_t | examples_state_lock |
Mutex which is used when getting/setting state of examples (whether a new example is ready) | |
pthread_cond_t | examples_state_changed |
Condition variable to indicate change of state of examples. |
CInputParser | ( | ) |
Constructor
Definition at line 370 of file InputParser.h.
~CInputParser | ( | ) |
Destructor
Definition at line 376 of file InputParser.h.
void copy_example_into_buffer | ( | Example< T > * | ex | ) |
Copy example into the buffer.
ex | Example to be copied. |
Definition at line 496 of file InputParser.h.
void end_parser | ( | ) |
End the parser, waiting for the parse thread to complete.
Definition at line 648 of file InputParser.h.
void exit_parser | ( | ) |
Terminates the parsing thread
Definition at line 653 of file InputParser.h.
void finalize_example | ( | ) |
Finalize the current example, indicating that the buffer position it occupies may be overwritten by the parser.
Should be called when the example has been processed by the external algorithm.
Definition at line 643 of file InputParser.h.
int32_t get_next_example | ( | T *& | feature_vector, |
int32_t & | length, | ||
float64_t & | label | ||
) |
Gets the next example, assuming it to be labelled.
Waits till retrieve_example returns a valid example, or returns if reading is done already.
feature_vector | Feature vector pointer |
length | Length of feature vector |
label | Label of example |
Definition at line 585 of file InputParser.h.
int32_t get_next_example | ( | T *& | feature_vector, |
int32_t & | length | ||
) |
Gets the next example, assuming it to be unlabelled.
feature_vector | |
length |
Definition at line 635 of file InputParser.h.
int32_t get_number_of_features | ( | ) |
Get number of features from example. Currently reads first line of input to infer.
Definition at line 122 of file InputParser.h.
int32_t get_ring_size | ( | ) |
Returns the size of the examples ring
Definition at line 277 of file InputParser.h.
int32_t get_vector_and_label | ( | T *& | feature_vector, |
int32_t & | length, | ||
float64_t & | label | ||
) |
Gets feature vector, length and label. Sets their values by reference. Uses method for reading the vector defined in CStreamingFile.
feature_vector | Pointer to feature vector |
length | Features in vector |
label | Label of example |
Definition at line 465 of file InputParser.h.
int32_t get_vector_only | ( | T *& | feature_vector, |
int32_t & | length | ||
) |
Gets feature vector and length by reference. Assumes examples are unlabelled. Uses method for reading the vector defined in CStreamingFile.
feature_vector | Pointer to feature vector |
length | Features in vector |
Definition at line 481 of file InputParser.h.
void init | ( | CStreamingFile * | input_file, |
bool | is_labelled = true , |
||
int32_t | size = PARSER_DEFAULT_BUFFSIZE |
||
) |
Initializer
Sets initial or default values for members. is_example_used is initialized to EMPTY. example_type is LABELLED by default.
input_file | CStreamingFile object |
is_labelled | Whether example is labelled or not (bool), optional |
size | Size of the buffer in number of examples |
Definition at line 387 of file InputParser.h.
bool is_running | ( | ) |
Test if parser is running.
Definition at line 446 of file InputParser.h.
void * main_parse_loop | ( | void * | params | ) |
Main parsing loop. Reads examples from source and stores them in the buffer.
params | 'this' object |
Definition at line 501 of file InputParser.h.
Example< T > * retrieve_example | ( | ) |
Retrieves the next example from the buffer.
Definition at line 555 of file InputParser.h.
void set_free_vector_after_release | ( | bool | free_vec | ) |
Sets whether to SG_FREE() the vector explicitly after it has been used
free_vec | whether to SG_FREE() or not, bool |
Definition at line 415 of file InputParser.h.
void set_free_vectors_on_destruct | ( | bool | destroy | ) |
Sets whether to free all vectors that were allocated in the ring upon destruction of the ring.
destroy | free all vectors on destruction |
Definition at line 421 of file InputParser.h.
void set_read_vector | ( | void(CStreamingFile::*)(T *&vec, int32_t &len) | func_ptr | ) |
Sets the function used for reading a vector from the file.
The function must be a member of CStreamingFile, taking a T* as input for the vector, and an int for length, setting both by reference. The function returns void.
The argument is a function pointer to that function.
Definition at line 356 of file InputParser.h.
void set_read_vector_and_label | ( | void(CStreamingFile::*)(T *&vec, int32_t &len, float64_t &label) | func_ptr | ) |
Sets the function used for reading a vector and label from the file.
The function must be a member of CStreamingFile, taking a T* as input for the vector, an int for length, and a float for the label, setting all by reference. The function returns void.
The argument is a function pointer to that function.
Definition at line 363 of file InputParser.h.
void start_parser | ( | ) |
Starts the parser, creating a new thread.
main_parse_loop is the parsing method.
Definition at line 427 of file InputParser.h.
|
protected |
Example currently being used.
Definition at line 331 of file InputParser.h.
|
protected |
Feature vector of current example.
Definition at line 334 of file InputParser.h.
|
protected |
Label of current example.
Definition at line 337 of file InputParser.h.
|
protected |
Number of features in current example.
Definition at line 340 of file InputParser.h.
E_EXAMPLE_TYPE example_type |
LABELLED or UNLABELLED
Definition at line 293 of file InputParser.h.
|
protected |
The ring of examples, stored as they are parsed.
Definition at line 319 of file InputParser.h.
|
protected |
Condition variable to indicate change of state of examples.
Definition at line 352 of file InputParser.h.
|
protected |
Mutex which is used when getting/setting state of examples (whether a new example is ready)
Definition at line 349 of file InputParser.h.
|
protected |
Whether to SG_FREE() vector after it is used.
Definition at line 343 of file InputParser.h.
|
protected |
Input source, CStreamingFile object.
Definition at line 313 of file InputParser.h.
|
protected |
Number of features in dataset (max of 'seen' features upto point of access)
Definition at line 322 of file InputParser.h.
|
protected |
Number of vectors parsed.
Definition at line 325 of file InputParser.h.
|
protected |
Number of vectors used by external algorithm.
Definition at line 328 of file InputParser.h.
|
protected |
Thread in which the parser runs.
Definition at line 316 of file InputParser.h.
bool parsing_done |
true if all input is parsed
Definition at line 290 of file InputParser.h.
|
protected |
This is the function pointer to the function to read a vector from the input.
It is called while reading a vector.
Definition at line 302 of file InputParser.h.
|
protected |
This is the function pointer to the function to read a vector and label from the input.
It is called while reading a vector and a label.
Definition at line 310 of file InputParser.h.
bool reading_done |
true if all examples are fetched
Definition at line 291 of file InputParser.h.
|
protected |
Size of the ring of examples.
Definition at line 346 of file InputParser.h.