jaulib v1.3.6
Jau Support Library (C++, Java, ..)
|
A lexical analyzer (tokenizer) using a tabular finite-state-machine (FSM), aka endlicher automat
(EA).
More...
#include <token_fsm.hpp>
Classes | |
struct | result_t |
Result type for token_fsm::find() More... | |
struct | token_value_t |
Terminal token name and ASCII string value pair, provided by user. More... | |
Public Types | |
typedef State_type | uint_t |
Unsigned int symbol for token-value type. | |
Public Member Functions | |
token_fsm (alphabet alphabet, const std::string_view separators="\040\011\012\015") | |
Constructs an empty instance. | |
token_fsm (const alphabet &alphabet, const std::vector< token_value_t > &key_words, const std::string_view separators="\040\011\012\015") | |
Constructs a new instance w/ given token_value_t name and value pairs. | |
token_fsm (const token_fsm &src) noexcept=default | |
token_fsm (token_fsm &&src) noexcept=default | |
bool | add (const token_value_t &tkey_word) |
Adds given token_value_t name and value pair. | |
void | clear () noexcept |
Clears the FSM. | |
bool | contains (uint_t token_name) const noexcept |
Returns true if this FSM containes the given token name. | |
size_t | count () const noexcept |
Returns the number of contained token. | |
bool | empty () const noexcept |
result_t | find (const std::string_view &haystack, int start=0) noexcept |
Find a token within the given haystack, starting from given start position. | |
std::string | fsm_to_string (const int token_per_row) const noexcept |
uint_t | get (const std::string_view &word) noexcept |
Returns the token numerical name (terminal symbol) if found, otherwise token_error. | |
bool | is_separator (const char c) const noexcept |
Returns true if the given char is listed as a separator. | |
uint_t | next_state () const noexcept |
token_fsm & | operator= (const token_fsm &x) noexcept=default |
token_fsm & | operator= (token_fsm &&x) noexcept=default |
uint_t | state_count () const noexcept |
std::string | to_string () const noexcept |
Static Public Member Functions | |
static constexpr uint_t | to_symbol (char c) noexcept |
Static Public Attributes | |
static constexpr const uint_t | token_error = std::numeric_limits<uint_t>::max() |
token_error value, denoting an invalid token or alphabet code-point. | |
A lexical analyzer (tokenizer) using a tabular finite-state-machine (FSM), aka endlicher automat
(EA).
Implemented initially by Sven Gothel in July 1992 using early C++ with and brought to a clean C++17 template.
State_type | used for token name and internal FSM, hence memory sensitive. Must be an unsigned integral type with minimum size of sizeof(alphabet::code_point_t), i.e. uint16_t. |
Definition at line 217 of file token_fsm.hpp.
typedef State_type jau::lang::token_fsm< State_type >::uint_t |
Unsigned int symbol for token-value type.
Definition at line 222 of file token_fsm.hpp.
|
defaultnoexcept |
|
defaultnoexcept |
|
inline |
Constructs an empty instance.
alphabet | the used alphabet |
separators | separator, defaults to SPACE, TAB, LF, CR |
Definition at line 323 of file token_fsm.hpp.
|
inline |
Constructs a new instance w/ given token_value_t name and value pairs.
In case of an error, method will clear() and abort, user might validated via empty().
Reasons for failures could be
alphabet | the used alphabet |
key_words | vector of to be added token_value_t name and values |
separators | separator, defaults to SPACE, TAB, LF, CR |
Definition at line 347 of file token_fsm.hpp.
|
inlinestaticconstexprnoexcept |
|
defaultnoexcept |
|
defaultnoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
Definition at line 272 of file token_fsm.hpp.
|
inlinenoexcept |
Returns true if this FSM containes the given token name.
Definition at line 275 of file token_fsm.hpp.
|
inlinenoexcept |
Returns the number of contained token.
Definition at line 280 of file token_fsm.hpp.
|
inlinenoexcept |
Returns true if the given char is listed as a separator.
Definition at line 283 of file token_fsm.hpp.
|
inlinenoexcept |
Clears the FSM.
Afterwards, the FSM can be filled over again from scratch.
Definition at line 311 of file token_fsm.hpp.
|
inline |
Adds given token_value_t name and value pair.
In case of an error, method will clear() and abort, user might validated via empty().
Reasons for failures could be
tkey_word | the given token name and value pair |
Definition at line 378 of file token_fsm.hpp.
|
inlinenoexcept |
Find a token within the given haystack, starting from given start position.
This method reads over all characters until a token has been found or end-of-view.
This method considers given separators.
haystack | string view to search for tokens |
start | start position, allowing to reuse the view |
Definition at line 469 of file token_fsm.hpp.
|
inlinenoexcept |
Returns the token numerical name (terminal symbol) if found, otherwise token_error.
This method does not consider given separators and expects given word to match a token 1:1.
word | the key word to lookup |
Definition at line 524 of file token_fsm.hpp.
|
inlinenoexcept |
Definition at line 567 of file token_fsm.hpp.
|
inlinenoexcept |
Definition at line 593 of file token_fsm.hpp.
|
staticconstexpr |
token_error value, denoting an invalid token or alphabet code-point.
Definition at line 227 of file token_fsm.hpp.