jaulib v1.3.6
Jau Support Library (C++, Java, ..)
Loading...
Searching...
No Matches
jau::lang::token_fsm< State_type > Class Template Reference

A lexical analyzer (tokenizer) using a tabular finite-state-machine (FSM), aka endlicher automat (EA). More...

#include <token_fsm.hpp>

Collaboration diagram for jau::lang::token_fsm< State_type >:

Classes

struct  result_t
 Result type for token_fsm::find() More...
 
struct  token_value_t
 Terminal token name and ASCII string value pair, provided by user. More...
 

Public Types

typedef State_type uint_t
 Unsigned int symbol for token-value type.
 

Public Member Functions

 token_fsm (alphabet alphabet, const std::string_view separators="\040\011\012\015")
 Constructs an empty instance.
 
 token_fsm (const alphabet &alphabet, const std::vector< token_value_t > &key_words, const std::string_view separators="\040\011\012\015")
 Constructs a new instance w/ given token_value_t name and value pairs.
 
 token_fsm (const token_fsm &src) noexcept=default
 
 token_fsm (token_fsm &&src) noexcept=default
 
bool add (const token_value_t &tkey_word)
 Adds given token_value_t name and value pair.
 
void clear () noexcept
 Clears the FSM.
 
bool contains (uint_t token_name) const noexcept
 Returns true if this FSM containes the given token name.
 
size_t count () const noexcept
 Returns the number of contained token.
 
bool empty () const noexcept
 
result_t find (const std::string_view &haystack, int start=0) noexcept
 Find a token within the given haystack, starting from given start position.
 
std::string fsm_to_string (const int token_per_row) const noexcept
 
uint_t get (const std::string_view &word) noexcept
 Returns the token numerical name (terminal symbol) if found, otherwise token_error.
 
bool is_separator (const char c) const noexcept
 Returns true if the given char is listed as a separator.
 
uint_t next_state () const noexcept
 
token_fsmoperator= (const token_fsm &x) noexcept=default
 
token_fsmoperator= (token_fsm &&x) noexcept=default
 
uint_t state_count () const noexcept
 
std::string to_string () const noexcept
 

Static Public Member Functions

static constexpr uint_t to_symbol (char c) noexcept
 

Static Public Attributes

static constexpr const uint_t token_error = std::numeric_limits<uint_t>::max()
 token_error value, denoting an invalid token or alphabet code-point.
 

Detailed Description

template<typename State_type>
class jau::lang::token_fsm< State_type >

A lexical analyzer (tokenizer) using a tabular finite-state-machine (FSM), aka endlicher automat (EA).

Implemented initially by Sven Gothel in July 1992 using early C++ with and brought to a clean C++17 template.

Template Parameters
State_typeused for token name and internal FSM, hence memory sensitive. Must be an unsigned integral type with minimum size of sizeof(alphabet::code_point_t), i.e. uint16_t.

Definition at line 217 of file token_fsm.hpp.

Member Typedef Documentation

◆ uint_t

template<typename State_type>
typedef State_type jau::lang::token_fsm< State_type >::uint_t

Unsigned int symbol for token-value type.

Definition at line 222 of file token_fsm.hpp.

Constructor & Destructor Documentation

◆ token_fsm() [1/4]

template<typename State_type>
jau::lang::token_fsm< State_type >::token_fsm ( const token_fsm< State_type > & src)
defaultnoexcept
Here is the caller graph for this function:

◆ token_fsm() [2/4]

template<typename State_type>
jau::lang::token_fsm< State_type >::token_fsm ( token_fsm< State_type > && src)
defaultnoexcept

◆ token_fsm() [3/4]

template<typename State_type>
jau::lang::token_fsm< State_type >::token_fsm ( alphabet alphabet,
const std::string_view separators = "\040\011\012\015" )
inline

Constructs an empty instance.

Parameters
alphabetthe used alphabet
separatorsseparator, defaults to SPACE, TAB, LF, CR
See also
add()

Definition at line 323 of file token_fsm.hpp.

◆ token_fsm() [4/4]

template<typename State_type>
jau::lang::token_fsm< State_type >::token_fsm ( const alphabet & alphabet,
const std::vector< token_value_t > & key_words,
const std::string_view separators = "\040\011\012\015" )
inline

Constructs a new instance w/ given token_value_t name and value pairs.

In case of an error, method will clear() and abort, user might validated via empty().

Reasons for failures could be

  • invalid token name, e.g. 0
  • duplicate token name in input key_words
  • invalid token value
    • empty string
    • invalid character according to given alphabet or a separator
Parameters
alphabetthe used alphabet
key_wordsvector of to be added token_value_t name and values
separatorsseparator, defaults to SPACE, TAB, LF, CR
See also
add()

Definition at line 347 of file token_fsm.hpp.

Member Function Documentation

◆ to_symbol()

template<typename State_type>
static constexpr uint_t jau::lang::token_fsm< State_type >::to_symbol ( char c)
inlinestaticconstexprnoexcept

Definition at line 229 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ operator=() [1/2]

template<typename State_type>
token_fsm & jau::lang::token_fsm< State_type >::operator= ( const token_fsm< State_type > & x)
defaultnoexcept

◆ operator=() [2/2]

template<typename State_type>
token_fsm & jau::lang::token_fsm< State_type >::operator= ( token_fsm< State_type > && x)
defaultnoexcept

◆ state_count()

template<typename State_type>
uint_t jau::lang::token_fsm< State_type >::state_count ( ) const
inlinenoexcept

Definition at line 269 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ next_state()

template<typename State_type>
uint_t jau::lang::token_fsm< State_type >::next_state ( ) const
inlinenoexcept

Definition at line 270 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ empty()

template<typename State_type>
bool jau::lang::token_fsm< State_type >::empty ( ) const
inlinenoexcept

Definition at line 272 of file token_fsm.hpp.

◆ contains()

template<typename State_type>
bool jau::lang::token_fsm< State_type >::contains ( uint_t token_name) const
inlinenoexcept

Returns true if this FSM containes the given token name.

Definition at line 275 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ count()

template<typename State_type>
size_t jau::lang::token_fsm< State_type >::count ( ) const
inlinenoexcept

Returns the number of contained token.

Definition at line 280 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ is_separator()

template<typename State_type>
bool jau::lang::token_fsm< State_type >::is_separator ( const char c) const
inlinenoexcept

Returns true if the given char is listed as a separator.

Definition at line 283 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ clear()

template<typename State_type>
void jau::lang::token_fsm< State_type >::clear ( )
inlinenoexcept

Clears the FSM.

Afterwards, the FSM can be filled over again from scratch.

Definition at line 311 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ add()

template<typename State_type>
bool jau::lang::token_fsm< State_type >::add ( const token_value_t & tkey_word)
inline

Adds given token_value_t name and value pair.

In case of an error, method will clear() and abort, user might validated via empty().

Reasons for failures could be

  • invalid token name, e.g. 0 or token_error
  • duplicate token name in input key_words
  • invalid token value
    • empty string
    • invalid character according to given alphabet or a separator
Parameters
tkey_wordthe given token name and value pair
Returns
true if successful, otherwise false

Definition at line 378 of file token_fsm.hpp.

Here is the caller graph for this function:

◆ find()

template<typename State_type>
result_t jau::lang::token_fsm< State_type >::find ( const std::string_view & haystack,
int start = 0 )
inlinenoexcept

Find a token within the given haystack, starting from given start position.

This method reads over all characters until a token has been found or end-of-view.

This method considers given separators.

Parameters
haystackstring view to search for tokens
startstart position, allowing to reuse the view
Returns
result_t denoting the found token, where result_t::token_name == token_error denotes not found.
See also
get()

Definition at line 469 of file token_fsm.hpp.

◆ get()

template<typename State_type>
uint_t jau::lang::token_fsm< State_type >::get ( const std::string_view & word)
inlinenoexcept

Returns the token numerical name (terminal symbol) if found, otherwise token_error.

This method does not consider given separators and expects given word to match a token 1:1.

Parameters
wordthe key word to lookup
See also
find()

Definition at line 524 of file token_fsm.hpp.

◆ fsm_to_string()

template<typename State_type>
std::string jau::lang::token_fsm< State_type >::fsm_to_string ( const int token_per_row) const
inlinenoexcept

Definition at line 567 of file token_fsm.hpp.

◆ to_string()

template<typename State_type>
std::string jau::lang::token_fsm< State_type >::to_string ( ) const
inlinenoexcept

Definition at line 593 of file token_fsm.hpp.

Member Data Documentation

◆ token_error

template<typename State_type>
const uint_t jau::lang::token_fsm< State_type >::token_error = std::numeric_limits<uint_t>::max()
staticconstexpr

token_error value, denoting an invalid token or alphabet code-point.

Definition at line 227 of file token_fsm.hpp.


The documentation for this class was generated from the following file: