Trie Maps

Example

The following code example illustrates how to populate a trie_map instance and perform prefix searches. The later part of the example also shows how you can create a packed version of the content for faster lookups and reduced memory footprint.

#include <mdds/global.hpp>  // for MDDS_ASCII
#include <mdds/trie_map.hpp>
#include <iostream>

using namespace std;

typedef mdds::trie_map<mdds::trie::std_string_trait, int> trie_map_type;

int main()
{
    // Cities in North Carolina and their populations in 2013.
    trie_map_type nc_cities;

    // Insert key-value pairs.
    nc_cities.insert(MDDS_ASCII("Charlotte"),     792862);
    nc_cities.insert(MDDS_ASCII("Raleigh"),       431746);
    nc_cities.insert(MDDS_ASCII("Greensboro"),    279639);
    nc_cities.insert(MDDS_ASCII("Durham"),        245475);
    nc_cities.insert(MDDS_ASCII("Winston-Salem"), 236441);
    nc_cities.insert(MDDS_ASCII("Fayetteville"),  204408);
    nc_cities.insert(MDDS_ASCII("Cary"),          151088);
    nc_cities.insert(MDDS_ASCII("Wilmington"),    112067);
    nc_cities.insert(MDDS_ASCII("High Point"),    107741);
    nc_cities.insert(MDDS_ASCII("Greenville"),    89130);
    nc_cities.insert(MDDS_ASCII("Asheville"),     87236);
    nc_cities.insert(MDDS_ASCII("Concord"),       83506);
    nc_cities.insert(MDDS_ASCII("Gastonia"),      73209);
    nc_cities.insert(MDDS_ASCII("Jacksonville"),  69079);
    nc_cities.insert(MDDS_ASCII("Chapel Hill"),   59635);
    nc_cities.insert(MDDS_ASCII("Rocky Mount"),   56954);
    nc_cities.insert(MDDS_ASCII("Burlington"),    51510);
    nc_cities.insert(MDDS_ASCII("Huntersville"),  50458);
    nc_cities.insert(MDDS_ASCII("Wilson"),        49628);
    nc_cities.insert(MDDS_ASCII("Kannapolis"),    44359);
    nc_cities.insert(MDDS_ASCII("Apex"),          42214);
    nc_cities.insert(MDDS_ASCII("Hickory"),       40361);
    nc_cities.insert(MDDS_ASCII("Goldsboro"),     36306);

    cout << "Cities that start with 'Cha' and their populations:" << endl;
    auto results = nc_cities.prefix_search(MDDS_ASCII("Cha"));
    for (const trie_map_type::key_value_type& kv : results)
    {
        cout << "  " << kv.first << ": " << kv.second << endl;
    }

    cout << "Cities that start with 'W' and their populations:" << endl;
    results = nc_cities.prefix_search(MDDS_ASCII("W"));
    for (const trie_map_type::key_value_type& kv : results)
    {
        cout << "  " << kv.first << ": " << kv.second << endl;
    }

    // Create a compressed version of the container.  It works nearly identically.
    auto packed = nc_cities.pack();

    cout << "Cities that start with 'C' and their populations:" << endl;
    auto packed_results = packed.prefix_search(MDDS_ASCII("C"));
    for (const trie_map_type::key_value_type& kv : packed_results)
    {
        cout << "  " << kv.first << ": " << kv.second << endl;
    }

    // Individual search.
    auto it = packed.find(MDDS_ASCII("Wilmington"));
    cout << "Population of Wilmington: " << it->second << endl;

    // You get an end position iterator when the container doesn't have the
    // specified key.
    it = packed.find(MDDS_ASCII("Asheboro"));

    cout << "Population of Asheboro: ";

    if (it == packed.end())
        cout << "not found";
    else
        cout << it->second;

    cout << endl;

    return EXIT_SUCCESS;
}

One thing to note in the above example is the use of MDDS_ASCII macro, which expands a literal string definition into a literal string and its length as two parameters. This macro comes in handy when you need to define a literal and immediately pass it to a function that expects a pointer to a string and its length.

You’ll get the following output when compiling the above code and executing it:

Cities that start with 'Cha' and their populations:
  Chapel Hill: 59635
  Charlotte: 792862
Cities that start with 'W' and their populations:
  Wilmington: 112067
  Wilson: 49628
  Winston-Salem: 236441
Cities that start with 'C' and their populations:
  Cary: 151088
  Chapel Hill: 59635
  Charlotte: 792862
  Concord: 83506
Population of Wilmington: 112067
Population of Asheboro: not found

Here is a version that uses packed_trie_map:

#include <mdds/global.hpp>  // for MDDS_ASCII and MDDS_N_ELEMENTS
#include <mdds/trie_map.hpp>
#include <iostream>

using namespace std;

typedef mdds::packed_trie_map<mdds::trie::std_string_trait, int> trie_map_type;

int main()
{
    // Entries must be known prior to creating the instance, and they must be
    // sorted by the key in ascending order.
    trie_map_type::entry entries[] = {
        { MDDS_ASCII("Apex"),           42214 },
        { MDDS_ASCII("Asheville"),      87236 },
        { MDDS_ASCII("Burlington"),     51510 },
        { MDDS_ASCII("Cary"),          151088 },
        { MDDS_ASCII("Chapel Hill"),    59635 },
        { MDDS_ASCII("Charlotte"),     792862 },
        { MDDS_ASCII("Concord"),        83506 },
        { MDDS_ASCII("Durham"),        245475 },
        { MDDS_ASCII("Fayetteville"),  204408 },
        { MDDS_ASCII("Gastonia"),       73209 },
        { MDDS_ASCII("Goldsboro"),      36306 },
        { MDDS_ASCII("Greensboro"),    279639 },
        { MDDS_ASCII("Greenville"),     89130 },
        { MDDS_ASCII("Hickory"),        40361 },
        { MDDS_ASCII("High Point"),    107741 },
        { MDDS_ASCII("Huntersville"),   50458 },
        { MDDS_ASCII("Jacksonville"),   69079 },
        { MDDS_ASCII("Kannapolis"),     44359 },
        { MDDS_ASCII("Raleigh"),       431746 },
        { MDDS_ASCII("Rocky Mount"),    56954 },
        { MDDS_ASCII("Wilmington"),    112067 },
        { MDDS_ASCII("Wilson"),         49628 },
        { MDDS_ASCII("Winston-Salem"), 236441 },
    };

    // Cities in North Carolina and their populations in 2013.
    trie_map_type nc_cities(entries, MDDS_N_ELEMENTS(entries));

    cout << "Cities that start with 'Cha' and their populations:" << endl;
    auto results = nc_cities.prefix_search(MDDS_ASCII("Cha"));
    for (const trie_map_type::key_value_type& kv : results)
    {
        cout << "  " << kv.first << ": " << kv.second << endl;
    }

    cout << "Cities that start with 'W' and their populations:" << endl;
    results = nc_cities.prefix_search(MDDS_ASCII("W"));
    for (const trie_map_type::key_value_type& kv : results)
    {
        cout << "  " << kv.first << ": " << kv.second << endl;
    }

    cout << "Cities that start with 'C' and their populations:" << endl;
    results = nc_cities.prefix_search(MDDS_ASCII("C"));
    for (const trie_map_type::key_value_type& kv : results)
    {
        cout << "  " << kv.first << ": " << kv.second << endl;
    }

    // Individual search.
    auto it = nc_cities.find(MDDS_ASCII("Wilmington"));
    cout << "Population of Wilmington: " << it->second << endl;

    // You get an end position iterator when the container doesn't have the
    // specified key.
    it = nc_cities.find(MDDS_ASCII("Asheboro"));

    cout << "Population of Asheboro: ";

    if (it == nc_cities.end())
        cout << "not found";
    else
        cout << it->second;

    cout << endl;

    return EXIT_SUCCESS;
}

This code generates exactly the same output as the first example that uses trie_map. The only difference is that you need to provide the list of entries pre-sorted prior to instantiating the map object.

This example uses another useful macro MDDS_N_ELEMENTS which computes the length of an array by dividing the size of the whole array by the size of its first element. This macro is useful when the array definition is given in the same compilation unit and therefore its size is known at the call site where the macro is used.

API Reference

Trie Map

template <typename _KeyTrait, typename _ValueT>
class trie_map

Trie map provides storage for multiple key-value pairs where keys are either strings, or otherwise consist of arrays of characters. The keys are stored in an ordered tree structure known as trie, or sometimes referred to as prefix tree.

Public Types

typedef packed_trie_map<_KeyTrait, _ValueT> packed_type
typedef _KeyTrait key_trait_type
typedef key_trait_type::key_type key_type
typedef key_trait_type::key_buffer_type key_buffer_type
typedef key_trait_type::key_unit_type key_unit_type
typedef _ValueT value_type
typedef size_t size_type
typedef std::pair<key_type, value_type> key_value_type
typedef trie::detail::iterator_base<trie_map> const_iterator
typedef trie::detail::search_results<trie_map> search_results

Public Functions

trie_map()

Default constructor.

const_iterator begin() const
const_iterator end() const
void insert(const key_type &key, const value_type &value)

Insert a new key-value pair.

Parameters
  • key: key value.
  • value: value to associate with the key.

void insert(const key_unit_type *key, size_type len, const value_type &value)

Insert a new key-value pair.

Parameters
  • key: pointer to the first character of a character array that stores key value.
  • len: length of the character array storing the key.
  • value: value to associate with the key.

bool erase(const key_unit_type *key, size_type len)

Erase a key and the value associated with it.

Return
true if a key is erased, false otherwise.
Parameters
  • key: pointer to the first character of a character array that stores key value.
  • len: length of the character array storing the key.

const_iterator find(const key_type &key) const

Find a value associated with a specified key.

Return
iterator that references a value associated with the key, or the end position in case the key is not found.
Parameters
  • key: key to match.

const_iterator find(const key_unit_type *input, size_type len) const

Find a value associated with a specified key.

Return
iterator that references a value associated with the key, or the end position in case the key is not found.
Parameters
  • input: pointer to an array whose value represents the key to match.
  • len: length of the matching key value.

search_results prefix_search(const key_type &prefix) const

Retrieve all key-value pairs whose keys start with specified prefix. You can also retrieve all key-value pairs by passing a null prefix and a length of zero.

Return
results object containing all matching key-value pairs.
Parameters
  • prefix: prefix to match.

search_results prefix_search(const key_unit_type *prefix, size_type len) const

Retrieve all key-value pairs whose keys start with specified prefix. You can also retrieve all key-value pairs by passing a null prefix and a length of zero.

Return
results object that contains all matching key-value pairs. The results are sorted by the key in ascending order.
Parameters
  • prefix: pointer to an array whose value represents the prefix to match.
  • len: length of the prefix value to match.

size_type size() const

Return the number of entries in the map.

Return
the number of entries in the map.

void clear()

Empty the container.

packed_type pack() const

Create a compressed and immutable version of the container which provides better search performance and requires much less memory footprint.

Return
an instance of mdds::packed_trie_map with the same content.

Friends

friend mdds::trie_map::trie::detail::iterator_base< trie_map >
friend mdds::trie_map::trie::detail::search_results< trie_map >

Packed Trie Map

template <typename _KeyTrait, typename _ValueT>
class packed_trie_map

An immutable trie container that packs its content into a contiguous array to achieve both space efficiency and lookup performance. The user of this data structure must provide a pre-constructed list of key-value entries that are sorted by the key in ascending order, or construct from an instance of mdds::trie_map.

Note that, since this container is immutable, the content of the container cannot be modified once constructed.

Public Types

typedef _KeyTrait key_trait_type
typedef key_trait_type::key_type key_type
typedef key_trait_type::key_buffer_type key_buffer_type
typedef key_trait_type::key_unit_type key_unit_type
typedef _ValueT value_type
typedef size_t size_type
typedef std::pair<key_type, value_type> key_value_type
typedef trie::detail::packed_iterator_base<packed_trie_map> const_iterator
typedef trie::detail::packed_search_results<packed_trie_map> search_results

Public Functions

packed_trie_map()

Not implemented.

packed_trie_map(const entry *entries, size_type entry_size)

Constructor that initializes the content from a static list of key-value entries. The caller must ensure that the key-value entries are sorted in ascending order, else the behavior is undefined.

Parameters
  • entries: pointer to the array of key-value entries.
  • entry_size: size of the key-value entry array.
  • null_value: null value to return when the find method fails to find a matching entry.

packed_trie_map(const trie_map<key_trait_type, value_type> &other)

Constructor to allow construction of an instance from the content of a mdds::trie_map instance.

Parameters

const_iterator begin() const
const_iterator end() const
const_iterator find(const key_type &key) const

Find a value associated with a specified key.

Return
iterator that references a value associated with the key, or the end position in case the key is not found.
Parameters
  • key: key to match.

const_iterator find(const key_unit_type *input, size_type len) const

Find a value associated with a specified key.

Return
iterator that references a value associated with the key, or the end position in case the key is not found.
Parameters
  • input: pointer to an array whose value represents the key to match.
  • len: length of the matching key value.

search_results prefix_search(const key_type &prefix) const

Retrieve all key-value pairs whose keys start with specified prefix. You can also retrieve all key-value pairs by passing a null prefix and a length of zero.

Return
results object containing all matching key-value pairs.
Parameters
  • prefix: prefix to match.

search_results prefix_search(const key_unit_type *prefix, size_type len) const

Retrieve all key-value pairs whose keys start with specified prefix. You can also retrieve all key-value pairs by passing a null prefix and a length of zero.

Return
results object that contains all matching key-value pairs. The results are sorted by the key in ascending order.
Parameters
  • prefix: pointer to an array whose value represents the prefix to match.
  • len: length of the prefix value to match.

size_type size() const

Return the number of entries in the map.

Return
the number of entries in the map.

Friends

friend mdds::packed_trie_map::trie::detail::packed_iterator_base< packed_trie_map >
friend mdds::packed_trie_map::trie::detail::packed_search_results< packed_trie_map >
struct entry

Single key-value entry. Caller must provide at compile time a static array of these entries.

Public Functions

template<>
entry(const key_unit_type *_key, size_type _keylen, value_type _value)

Public Members

template<>
const key_unit_type *key
template<>
size_type keylen
template<>
value_type value

String Trait

struct std_string_trait

String trait that uses std::string as the key type. This trait can be used with mdds::trie_map or mdds::packed_trie_map.

Public Types

typedef std::string key_type

type used to store a key value.

typedef std::string key_buffer_type

type used to build an intermediate key value, from which a final key value is to be created. It is expected to be an array structure whose content is contiguous in memory. Its elements must be of key_unit_type.

typedef char key_unit_type

type that represents a single character inside a key or a key buffer object. A key object is expected to store a series of elements of this type.

Public Static Functions

static key_buffer_type to_key_buffer(const key_unit_type *str, size_t length)

Function called to create and initialize a buffer object from a given initial key value.

Return
buffer object containing the specified key value.
Parameters
  • str: pointer to the first character of the key value.
  • length: length of the key value.

static key_buffer_type to_key_buffer(const key_type &key)

Function called to create and initialize a buffer object from a given initial key value.

Return
buffer object containing the specified key value.
Parameters
  • key: key value

static const key_unit_type *buffer_data(const key_buffer_type &buf)
static size_t buffer_size(const key_buffer_type &buf)
static void push_back(key_buffer_type &buffer, key_unit_type c)

Function called to append a single character to the end of a key buffer.

Parameters
  • buffer: buffer object to append character to.
  • c: character to append to the buffer.

static void pop_back(key_buffer_type &buffer)

Function called to remove a single character from the tail of an existing key buffer.

Parameters
  • buffer: buffer object to remove character from.

static key_type to_key(const key_buffer_type &buf)

Function called to create a final string object from an existing buffer.

Return
string object whose content is created from the buffer object.
Parameters
  • buf: buffer object to create a final string object from.