Experimental type-safe easy_parser router
The problem
Express-like router was added to RESTinio in v.0.2. It’s an easy to use tool that is known for many developers. But express-like router has several principal drawbacks those lead to various kind of errors. The worst thing is that those errors can be detected only in run-time.
The propensity to errors
Let’s see a simple code snippet:
router->http_get("/api/v1/books/:id",
[](const auto & req, auto params) {
const auto book_id = restinio::cast_to<std::uint64_t>(params["Id"]);
...
});
There are several problems here and all of them will be detected only at run-time.
There is no explicit requirement for the format of “id” param. It can be a number or a sequence of non-digit symbols. If “id” is not a number an exception will be thrown in cast_to
.
And there is a stupid typo in the extraction of the value of “id” parameter: “Id” instead of “id”. This is just a typo but such typos encounter very often.
It seems that those errors can be easily fixed. For example:
router->http_get(R"(/api/v1/books/:id(\d+))",
[](const auto & req, auto params) {
const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
...
});
But there still is a bit more subtle bug: there is no limitation for the count of digits in “id” parameter. So “id” can contain a value that can’t fit into std::uint64_t
. And because of that, the more accurate fix should look like:
router->http_get(R"(/api/v1/books/:id(\d{1,10}))",
[](const auto & req, auto params) {
const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
...
});
Unfortunately, we have no help from the compiler in the detection of such problems. It’s a pity.
A C++ compiler could help here, but the principal design of the express router prevents such help. It’s because express router borrowed from dynamically typed language where there is no such thing as a type-checking from the compiler before the execution.
The opacity and absence of type-safety
Let’s see such example:
// Type of actual requests handler.
class api_v1_handler {
...
public:
auto on_get_book(
const restinio::request_handle_t & req,
restinio::router::route_params_t params) {
const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
...
}
auto on_get_book_version(
const restinio::request_handle_t & req,
restinio::router::route_params_t params) {
const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
const auto ver_id = restinio::cast_to<std::string>(params["version"]);
...
}
auto on_get_author_books(
const restinio::request_handle_t & req,
restinio::router::route_params_t params) {
const auto author = restinio::cast_to<std::string>(params["author"]);
...
}
...
};
// The definition of routes and handlers.
auto handler = std::make_shared<api_v1_handler>(...);
router->http_get(R"(/api/v1/books/:id(\d{1,10}))",
[handler](const auto & req, auto params) {
return handler->on_get_book_version(req, params);
});
router->http_get(R"(/api/v1/books/:id(\d{1,10})/versions/:version)",
[handler](const auto & req, auto params) {
return handler->on_get_author_books(req, params);
});
router->http_get(R"(/api/v1/:author)",
[handler](const auto & req, auto params) {
return handler->on_get_book(req, params);
});
Nothing prevents calling a wrong handler’s method for a particular route. Thus express router allows calling a handler on_get_book
where on_get_author_books
is expected. It’s because all handlers have the same format and restinio::router::route_params_t
plays the role of the untyped key-value map.
Unfortunately, we can’t get help from the compiler here, because of the problem in the principal design of express router: providing the parameters from a parsed route in the form of an untyped key-value map. So a map instance can easily be passed to a wrong handler. And that mistake can only be detected at the run-time.
Another problem is the opacity of the prototype of a request handler. We just see a route_params_t
in the prototype, but do not know what parameters the handler actually needs and the types of those parameters. That information can only be obtained from the body of the handler. And that is not good because it makes the maintenance and extension of request handlers harder.
A type-safe router as a solution
Since v.0.6.6 RESTinio provides a type-safe router that is similar to express router but allows to work with typed parameters.
As a very simple example, this express router-based code:
router->http_get(R"(/api/v1/books/:id(\d{1,10}))",
[](const auto & req, auto params) {
const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
...
});
can be expressed in a new type-safe router that way:
namespace epr = restinio::router::easy_parser_router;
auto book_id_p = epr::non_negative_decimal_number_p<std::uint64_t>;
router->http_get(
epr::path_to_params("/api/v1/books/", book_id_p),
[](const auto & req, std::uint64_t book_id) {
...
});
And the example with api_v1_handler
above can be rewritten that way:
// Type of actual requests handler.
class api_v1_handler {
...
public:
using book_id_type = std::uint64_t;
auto on_get_book(
const restinio::request_handle_t & req,
book_id_type book_id) {
...
}
auto on_get_book_version(
const restinio::request_handle_t & req,
book_id_type book_id,
const std::string & ver_id) {
...
}
auto on_get_author_books(
const restinio::request_handle_t & req,
const std::string & author) {
...
}
...
};
// The definition of routes and handlers.
namespace epr = restinio::router::easy_parser_router;
auto book_id_p = epr::non_negative_decimal_number_p<api_v1_handler::book_id_type>();
auto ver_id_p = epr::path_fragment_p();
auto author_p = epr::path_fragment_p();
auto handler = std::make_shared<api_v1_handler>(...);
router->http_get(
epr::path_to_params("/api/v1/books/", book_id_p),
[handler](const auto & req, auto book_id) {
return handler->on_get_book(req, book_id);
});
router->http_get(
epr::path_to_params("/api/v1/books/", book_id_p, "/versions/", ver_id_p),
[handler](const auto & req, auto book_id, const auto & ver_id) {
return handler->on_get_book_version(req, book_id, ver_id);
});
router->http_get(
epr::path_to_params("/api/v1/", author_p),
[handler](const auto & req, const auto & author) {
return handler->on_get_author_books(req, author);
});
In that variant we can’t can on_get_book
for a route where on_get_author_books
is expected.
The easy_parser_router
The type-safe router mentioned above is represented as restinio::router::easy_parser_router_t
class and a set of helper functions from restinio::router::easy_parser_router
namespace (see more about generic_easy_parser_router_t
and easy_parser_router_t
below).
To use easy_parser_router it is necessary to do the following steps.
Include restinio/router/easy_parser_router.hpp
header file. Please note that this header is not included automatically in restinio/core.hpp
. So it is necessary to write something like that:
#include <restinio/core.hpp>
#include <restinio/router/easy_parser_router.hpp>
Then the type restinio::router::easy_parser_router_t
should be set as request_handler_t
type in server’s traits:
struct my_traits : public restinio::default_traits_t {
using request_handler_t = restinio::router::easy_parser_router_t;
};
// Or:
using my_traits = restinio::traits_t<
restinio::asio_timer_manager_t,
restinio::single_threaded_ostream_logger_t,
restinio::router::easy_parser_router_t >;
Then an instance of easy_parser_router_t
should be created and tuned:
namespace epr = restinio::router::easy_parser_router;
auto router = std::make_unique<restinio::router::easy_parser_router_t>();
router->http_get(epr::path_to_params(...), ...);
router->http_post(epr::path_to_params(...), ...);
...
And then this instance should be passed to RESTinio server:
restinio::run(
restinio::on_this_thread<my_traits>()
.request_handler(std::move(router))
...
);
Setting up handlers for routes
The easy_parser_router_t
class has the similar set of methods as express_router_t
class:
http_get
for handlers of HTTP GET method;http_head
for handlers of HTTP HEAD method;http_post
for handlers of HTTP POST method;http_put
for handlers of HTTP PUT method;http_delete
for handlers of HTTP DELETE method;add_handler
for the case whenhttp_*
methods mentioned above can’t be used;non_matched_request_handler
for the case when a handler for a particular request is not found.
So the definition of handlers for routes for easy_parser-router looks similar to express-router:
auto router = std::make_unique<restinio::router::easy_parser_router_t>;
...
router->http_get(route1, handler1);
router->http_post(route1, handler2);
router->http_delete(route1, handler3);
router->add_handler(restinio::http_method_lock(), route1, handler4);
...
router->http_get(route2, handlerN);
router->http_post(route2, handlerM);
...
router->non_matched_request_handler(non_matched_handler);
The main difference with express-router is the description of routes. The express-router requires that a route be described as a string with a regular expression inside. The easy_parser-router uses a special DSL based on easy_parser helper.
generic_easy_parser_router_t and easy_parser_router_t classes
Since v.0.6.13 after the addition of support for extra-data in request objects (see Extra-data in request object for more details) the name easy_parser_router_t
is just an alias for more generic template class generic_easy_parser_router_t
:
template< typename User_Data_Factory >
class generic_easy_parser_router_t {...};
using easy_parser_router_t =
generic_easy_parser_router_t< no_extra_data_factory_t >;
If we don’t need to have some specific data incorporated into a request object
we can still use a simple easy_parser_router_t
name. But once we want to
add some of our data into a request we have to switch to
generic_easy_parser_router_t
. It’s important because we have to specify the
type of extra-data-factory in the description of request-handler type:
struct my_extra_data_factory {
... // The usual stuff for extra-data-factory.
};
// Type of router has to have relation to extra-data-factory type.
using my_router = restinio::router::generic_easy_parser_router_t<
my_extra_data_factory >;
struct my_traits : public restinio::default_traits_t {
using extra_data_factory_t = my_user_data_factory;
using request_handler_t = my_router;
};
In the case when a custom extra-data-factory is used then the first parameter to a request-handler will have type restinio::generic_request_handle_t< User_Data>
:
struct my_extra_data_factory {
... // The usual stuff for extra-data-factory.
};
using my_router = restinio::router::generic_easy_parser_router_t<
my_extra_data_factory >;
struct my_traits : public restinio::default_traits_t {
using extra_data_factory_t = my_user_data_factory;
using request_handler_t = my_router;
};
auto router = std::make_unique<my_router>;
router->http_get(
epr::path_to_params("/api/v1/books/", book_id_p),
[handler]( restinio::generic_request_handle_t<my_extra_data_factory::data_t> req,
std::uint64_t book_id)
{
return handler->on_get_book(req, book_id);
});
easy_parser_router DSL
There are two functions in restinio::router::easy_parser_router
namespace those should be used for description of routes: path_to_params
and path_to_tuple
. Both have the same format but require request-handlers with different prototypes.
The path_to_params
and path_to_tuple
functions are variadic-template functions that returns an implementation-specific type. Each of those functions accepts a list of arguments where every argument is a string (string literal, an object of type std::string
or std::string_view
) or a value producer. Every value producer gives a single parameter extracted from a route. For example:
namespace epr = restinio::router::easy_parser_router;
// Handler for a route without parameters inside.
router->http_get(
epr::path_to_params("/api/v1/books"),
[](const auto & req) {...});
// Handler for a router with one parameter inside.
router->http_get(
epr::path_to_params("/api/v1/books/",
// Producer for a value of the single parameter.
epr::non_negative_decimal_number_p<std::uint64_t>()),
[](const auto & req, std::uint64_t book_id) {...});
// Handler for a router with two parameters inside.
router->http_get(
epr::path_to_params("/api/v1/books/",
// Producer for a value of the first parameter.
epr::non_negative_decimal_number_p<std::uint64_t>(),
"/title/",
// Producer for a value of the second parameter.
epr::path_fragment_p()),
[](const auto & req, std::uint64_t book_id, const std::string & title) {...});
When path_to_params
is used for the description of a route then request-handler will receive every parameter from the route as a separate argument. If there are no parameters in a route then request-handler will receive just one argument: a request handle. Those cases are shown above.
When path_to_tuple
is used then a request-handler will receive all parameters from the route as a single argument of type std::tuple<Vs...>
where Vs...
is a list of parameter types. If there are no parameters in the route then there will be a single argument of type std::tuple<>
:
namespace epr = restinio::router::easy_parser_router;
// Handler for a route without parameters inside.
router->http_get(
epr::path_to_tuple("/api/v1/books"),
[](const auto & req, std::tuple<>) {...});
// Handler for a router with one parameter inside.
router->http_get(
epr::path_to_params("/api/v1/books/",
// Producer for a value of the single parameter.
epr::non_negative_decimal_number_p<std::uint64_t>()),
[](const auto & req, std::tuple<std::uint64_t> params) {...});
// Handler for a router with two parameters inside.
router->http_get(
epr::path_to_params("/api/v1/books/",
// Producer for a value of the first parameter.
epr::non_negative_decimal_number_p<std::uint64_t>(),
"/title/",
// Producer for a value of the second parameter.
epr::path_fragment_p()),
[](const auto & req, std::tuple<std::uint64_t, std::string> params) {...});
Note. Of course, generic lambdas can be used here and parameters can be accepted by a const reference:
router->http_get(
epr::path_to_tuple("/api/v1/books/",
// Producer for a value of the first parameter.
epr::non_negative_decimal_number_p<std::uint64_t>(),
"/title/",
// Producer for a value of the second parameter.
epr::path_fragment_p()),
[](const auto & req, const auto & params) {...});
Performance
There are results of cmp_route_bench benchmark described in Performance section for express-router (at 2020.04.10):
# of threads | hardcoded | easy_parser_router | express-router (std) | express-router (PCRE) |
---|---|---|---|---|
1 | 115,083.86 | 105,205.42 (91.42%) | 88,115.27 (76.57%) | 102,601.51 (89.15%) |
2 | 159,301.80 | 152,842.62 (95.95%) | 131,806.19 (82.74%) | 143,969.74 (90.38%) |
3 | 192,849.04 | 187,092.28 (97.01%) | 161,748.54 (83.87%) | 177,840.71 (92.22%) |
4 | 210,509.90 | 207,102.15 (98.38%) | 176,486.59 (83.84%) | 193,072.76 (91.72%) |
Benchmark environment:
- CPU: 8x Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz;
- Memory: 16343MB;
- Operating System: Ubuntu 16.04.2 LTS.
- Compiler: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1)
easy_parser
Intro
The easy_parser is an internal part of RESTinio introduced in 0.6.1 for parsing HTTP fields. This is a template-based implementation of PEG recursive-descent parser. It supports the main functionality of PEG and can express rather complex grammars in ordinary C++ code.
The easy_parser was tested inside RESTinio and looks pretty usable for real-world tasks. However, it’s not the simplest part of RESTinio, so if you encounter some problems with easy_parser or have some ideas of how to make easy_parser more expressive and easy to use, please let us know.
easy_parser and easy_parser_router namespaces
All easy_parser-related stuff is defined in restinio::easy_parser
namespace. But all that stuff is also available via restinio::router::easy_parser_router
namespace. Because this section is related to easy_parser features useful for easy_parser_router we will use a namespace epr
that is defined as:
namespace epr = restinio::router::easy_parser_router;
The usage of easy_parser alone without easy_parser_router is out of scope of the current document. If you want to use easy_parser’s functionality in your project directly and have some questions feel free to ask us.
The main easy_parser principle
The easy_parser is a rather complex tool, but there is just one main principle in easy_parser’s design: easy_parser gets a set of rules, tries to apply them to an input string and produces a single value in the case of successful parsing.
It means that the result of successful parsing is always a single value. And that value is produced by a special entity called producer.
Producers, consumers, transformers and clauses
The easy_parser has several ready-to-use basic producers with _p
suffix in their names. Like: symbol_p
, digit_p
, decimal_number_p
, path_fragment_p
. So for trivial syntax rules, it’s enough to use one of easy_parser’s producers. For example, for the rule like:
version = ['-'|'+'] NUMBER
it’s enough to use epr::decimal_number_p<int>()
call to make a corresponding parser.
But the vast majority of practical cases are more complex and require handling of several produced values. For example, let’s see this very simple grammar:
indicators = LETTER DIGIT
This grammar describes two-symbol strings like “c1”, “B2”, “D4” and so on. After the parsing of such string, we want to have a pair of values: the first letter and the digit. This pair can be represented by such simple struct:
struct idicators {
char class_;
char level_;
};
There are any_symbol_p
and digit_p
producers in easy_parser, so we can extract the first letter and a digit from the input string. But our task is to make an instance of type indicators
. So we should create that instance somehow and store values produced by any_symbol_p
and digit_p
into it.
In easy_parser DSL that is encoded the following way:
epr::produce<indicators>(
epr::any_symbol_p() >> &indicators::class_,
epr::digit_p() >> &indicators::level_
);
We’ll speak about the interpretation of that code a bit later, and now we concentrate on the sense of a construct like producer >> dest
.
Every produced value should be consumed somehow. And the expression like digit_p() >> &indicators::level_
tells that a value produced by digit_p
producer should be consumed as a value of indicators::level_
member.
There are several ready-to-use consumers in easy_parser and we’ll see some of them later.
Sometimes a produced value has to be transformed somehow before the consumption. This is possible in easy_parser with the help of transformers. A transformer is a function that takes an input value and produces a new value from it, maybe with a different type.
Let’s make our example with indicators
struct a bit harder. In the first version that’s shown above indicators::class_
can contain upper and lower case letters. In some circumstances, it’s not convenient and it’s better to store only lower case letters in indicators::class_
member. In easy_parser we can do that by using to_lower
transformer:
epr::produce<indicators>(
epr::any_symbol_p() >> epr::to_lower() >> &indicators::class_,
epr::digit_p() >> &indicators::level_
);
In this example a symbol produced by any_symbol_p
will be transformed by a new symbol by to_lower
transformer and that new symbol will be stored in indicators::level_
member.
There can be any number of transformers in expressions like producer >> transformer1 >> transformer2 >> ... >> transformerN >> consumer
. But it’s important to note that transformers in such expressions can only be used between a producer and consumer.
An expression like producer >> consumer
or producer >> transformer >> consumer
defines a clause. A clause is a part of DSL that doesn’t produce value by itself. Some values can be produced and consumed inside a clause, but clause by itself is not a value producer.
A producer can be seen as a function with non-void return type. A clause, in that case, is a void-returning function.
Clauses play a very important role because some parts of real-world grammars can’t be expressed only by producers+consumers, and clauses help in such cases.
The skip() consumer and ready-to-use clauses
Sometimes it’s necessary to skip a produced value. Let’s see the following grammar:
indicators = LETTER SPACE DIGIT
As a result of applying this rule, we should get a compound value that consists of a letter and a digit. We don’t need to store a space character between them. But every produced value in easy_parsed should be consumed somehow. Because of that produce
accept a set of clauses and we can’t write this simple code:
epr::produce<indicators>(
epr::any_symbol_p() >> &indicators::class_,
epr::space_p(),
epr::digit_p() >> &indicators::level_);
Expression epr::space_p()
is a producer here, not a clause. To make it a clause we have to throw out a value produced by epr::space_p
. It can be done by using a special skip
consumer:
epr::produce<indicators>(
epr::any_symbol_p() >> &indicators::class_,
epr::space_p() >> epr::skip(),
epr::digit_p() >> &indicators::level_);
The skip
consumer is a special consumer that just ignores any previously produced value. This consumer is specially added to easy_parser to handle cases like that.
The case producer >> skip()
is so widely used that there are a set of ready-to-use consumers in easy_parser that just use skip()
under the hood. Thus, the expression space_p() >> skip()
can be replaced by space
clause:
epr::produce<indicators>(
epr::any_symbol_p() >> &indicators::class_,
epr::space(),
epr::digit_p() >> &indicators::level_);
Where space()
is just a predefined shorthand for space_p() >> skip()
.
There are other ready-to-use clauses like space
in easy_parser: symbol
, caseless_symbol
, digit
, hexdigit
, exact
.
just() transformer
Sometimes it’s necessary to replace the value of one type to a specific value of another type. For example, let’s see the following simple grammar:
seed = ("in" | "out") NUMBER
This grammar describes strings like “in 4096” and “out 16000”.
We may want to parse those string to values of the type:
enum class direction { in, out };
struct speed {
direction dir_;
unsigned int value_;
};
It means, that we have to replace the substring “in” to a value direction::in
, and the substring “out” should be replaced by “direction::out”. One way to do so is to use convert
transformer + a lambda function:
auto parse = epr::produce<speed>(
epr::alternatives(
epr::exact_p("in")
>> epr::convert([](const auto &) // Ignore actual value.
{ return direction::in; })
>> &speed::dir_,
epr::exact_p("out")
>> epr::convert([](const auto &) // Ignore actual value.
{ return direction::out; })
>> &speed::dir_
),
epr::non_negative_decimal_number_p<unsigned int>() >> &speed::value_);
Writing such simple convert
transformers is a boring, time-consuming and error-prone task and easy_parser has a special just
transformer that allows to write the same thing more compact and precise:
auto parse = epr::produce<speed>(
epr::alternatives(
epr::exact_p("in") >> epr::just(direction::in) >> &speed::dir_,
epr::exact_p("out") >> epr::just(direction::out) >> &speed::dir_
),
epr::non_negative_decimal_number_p<unsigned int>() >> &speed::value_);
How epr::producer() works
Let’s see this simple example again:
epr::produce<indicators>(
epr::any_symbol_p() >> &indicators::class_,
epr::digit_p() >> &indicators::level_
);
It’s important to understand what happens inside epr::produce
. Understanding of this will help when we’ll speak about as_result
and to_container
consumers.
The easy_parser implements recursive-descent parser. Because of that the epr::produce
shown above is extended to something like that:
epr::expected_t<indicators, epr::parse_error_t>
try_produce__indicators__(epr::impl::input_t & source)
{
indicators result; // An empty value of expected result type is created.
// Try to handle the first part of expression.
{
epr::impl::any_symbol_producer_t producer;
const auto r = producer.try_parse(source);
if(!r) // Parsing failed.
return make_unexpected(r.error());
epr::impl::field_value_setter_t<&indicators::class_> consumer;
consumer.consume(*r, result); // Consume the produced value.
}
// Try to handle the second part of expression.
{
epr::impl::digit_producer_t producer;
const auto r = producer.try_parse(source);
if(!r) // Parsing failed.
return make_unexpected(r.error());
epr::impl::field_value_setter_t<&indicators::level_> consumer;
consumer.consume(*r, result); // Consume the produced value.
}
// No errors. Actual value can be returned.
return result;
}
This is just a sketch and is not an actual code behind epr::produce<T>
, but it shows the whole principle of epr::produce<T>
:
- create an instance of T with the default constructor;
- try to handle all nested clauses. If a clause is processed without an error, then that instance of type T is passed to a clause for modification;
- if all nested clauses are handled without errors then that instance of type T is returned as the result.
The main point is the presence of an instance of type T that is created inside produce()
and is passed to every nested clause of that produce()
.
as_result() consumer
Sometimes we have to parse a string with several terms inside but only one term should be treated as the result value. For example, let’s see a grammar:
limit: "limit" [SPACE] ":" [SPACE] NUMBER SPACE "bytes"
This grammar describes strings like “limit:4096 bytes”, “limit : 4096 bytes”, “limit: 4096 bytes”, and so on.
To parse those strings we have to define a parser like that:
auto parser = epr::produce<unsigned int>(
epr::exact("limit"),
epr::maybe(epr::space()),
epr::symbol(':'),
epr::maybe(epr::space()),
epr::non_negative_decimal_number_p<unsigned int>(),
epr::space(),
epr::exact("bytes"));
But this code won’t be compiled because produce
expects a set of clauses that do not produce values. But non_negative_decimal_number_p()
is a producer here. And we have to transform it into a clause.
A value returned by that non_negative_decimal_number_p()
should be used as the return value of the whole produce
call. So, in that case, we can use a special as_result()
consumer:
auto parser = epr::produce<unsigned int>(
epr::exact("limit"),
epr::maybe(epr::space()),
epr::symbol(':'),
epr::maybe(epr::space()),
epr::non_negative_decimal_number_p<unsigned int>() >> epr::as_result(),
epr::space(),
epr::exact("bytes"));
The as_result
consumer can be used not only with trivial types like int
or char
, but also with structs:
// A parser for grammar:
//
// communicator = "port=" ("default" | port_params)
// port_params = '(' NUMBER ':' NUMBER ',' NUMBER ')'
//
struct port_params {
unsigned short port_index_;
unsigned int in_speed_;
unsigned int out_speed_;
};
auto parser = epr::produce<port_params>(
epr::exact("port="),
epr::alternatives(
epr::exact("default")
>> epr::just(port_params{10u, 4096u, 4096u})
>> epr::as_result(),
epr::produce<port_params>(
epr::symbol('('),
epr::non_negative_decimal_number_p<unsigned short>()
>> &port_params::port_index_,
epr::symbol(':'),
epr::non_negative_decimal_number_p<unsigned int>()
>> &port_params::in_speed_,
epr::non_negative_decimal_number_p<unsigned int>()
>> &port_params::out_speed_,
epr::symbol(')')
) >> epr::as_result()
)
);
The as_result
consumer can even be used with containers (we’ll discuss repeat
and to_conainer
below):
// A parser for grammar:
//
// communication_ports = "ports=" ("none" | port_params (',' port_params)*)
//
// port_params = '(' NUMBER ':' NUMBER ',' NUMBER ')'
//
struct port_params {
unsigned short port_index_;
unsigned int in_speed_;
unsigned int out_speed_;
};
auto port_params_p = epr::produce<port_params>(
epr::symbol('('),
epr::non_negative_decimal_number_p<unsigned short>()
>> &port_params::port_index_,
epr::symbol(':'),
epr::non_negative_decimal_number_p<unsigned int>()
>> &port_params::in_speed_,
epr::non_negative_decimal_number_p<unsigned int>()
>> &port_params::out_speed_,
epr::symbol(')')
);
auto parser = epr::produce<std::vector<port_params>>(
epr::exact("ports="),
epr::alternatives(
epr::exact("none")
>> epr::just(std::vector<port_params>>{})
>> epr::as_result(),
epr::produce<std::vector<port_params>>(
port_params_p >> epr::to_container(),
epr::repeat(0, epr::N,
epr::symbol(','),
port_params_p >> epr::to_container())
) >> epr::as_result()
)
);
just_result() consumer
The just_result()
consumer is just a shorthand for:
epr::just(T{}) >> epr::as_result()
So just_result()
allows to write:
epr::alternatives(
epr::exact("none")
>> epr::just_result(std::vector<port_params>>{}),
instead of:
epr::alternatives(
epr::exact("none")
>> epr::just(std::vector<port_params>>{})
>> epr::as_result(),
Support of PEG features
Alternatives
Alternatives in PEG grammars are supported via alternatives
clause. So the following grammar:
demo = A | B | C
can be expressed in easy_parser’s DSL as:
epr::produce<SomeType>(
epr::alternatives(
parse_A_clause,
parse_B_clause,
parse_C_clause)
);
For example:
// Grammar is:
//
// duration = NUMBER ("seconds" | "sec" | "s")
//
epr::produce<int>(
epr::non_negative_decimal_number_p<int>(),
epr::alternatives(
epr::exact("seconds"),
epr::exact("sec"),
epr::exact("s"))
);
Please note that alternatives()
takes a list of clauses. It means that if there is a value producer in some clause that producer should be connected with a consumer.
// Grammar:
//
// book_id = (NUMBER | STRING)
//
using book_identity = std::variant<int, std::string>;
epr::produce<book_identity>(
epr::alternatives(
epr::non_negative_decimal_number_p<int>() >> as_result(),
epr::path_fragment_p() >> as_result()
)
);
Sometimes it could be necessary to pass a complex clause as an alternative in alternatives()
. In that case, such a complex clause can be expressed via sequence()
helper function:
// A parser for:
//
// rev-id = ("hash-" STRING | "tag/" STRING)
//
epr::produce<std::string>(
epr::alternatives(
epr::sequence(
epr::exact("hash-"),
epr::path_fragment_p() >> as_result()),
epr::sequence(
epr::exact("tag/"),
epr::path_fragment_p() >> as_result())
)
);
Optional clauses
Optional clauses in PEG grammars are supported via maybe
clause. So the following grammar:
demo = A [B] C
can be expressed in easy_parser’s DSL as:
epr::produce<SomeType>(
parse_A_clause,
epr::maybe(parse_B_clause),
parse_C_clause);
For example:
// Grammar is:
//
// limit = "limit" [SPACE] ':' [SPACE] NUMBER [SPACE "bytes"]
//
auto parser = epr::produce<unsigned int>(
epr::exact("limit"),
epr::maybe(epr::space()),
epr::symbol(':'),
epr::maybe(epr::space()),
epr::non_negative_decimal_number_p<unsigned int>() >> epr::as_result(),
epr::maybe(epr::space(), epr::exact("bytes"))
);
Please note that maybe()
takes a list of clauses. It means that if there is a value producer in some clause that producer should be connected with a consumer.
// Grammar is
//
// duration = NUMBER ['.' NUMBER] ["s"]
//
struct duration {
unsigned short integer_{0u};
unsigned short fractional_{0u};
};
auto parser = epr::produce<duration>(
epr::non_negative_decimal_number_p<unsigned short>()
>> &duration::integer_,
epr::maybe(
epr::symbol('.'),
epr::non_negative_decimal_number_p<unsigned short>()
>> &duration::fractional_),
epr::maybe(epr::symbol('s'))
);
Repetitions
Repetitions are supported via the repeat
clause. For example, PEG’s A+
(one or more repetitions) is expressed as epr::repeat(1, epr::N, parse_A_clause)
and A*
(zero or more repetitions) is expressed as epr::repeat(0, epr::N, parse_A_clause)
.
// This expression allows to parse sequences like
//
// group-group-group-group
//
// where each group can contain from 2 to 8 hexadecimal digits.
//
epr::sequence(
// For the first three groups.
epr::repeat(3u, 3u,
epr::repeat(2u, 8u, epr::hexdigit()),
epr::symbol('-')
),
// For the last group.
epr::repeat(2u, 8u, epr::hexdigit())
);
The value epr::N
is a special value that means unlimited maximum count of occurrencies.
The repeat
clause expects a set of subclasses. It means that a producer can’t be used inside repeat
clause if the produced value is not consumed somehow. So this code is invalid and won’t be compiled:
// This expression allows to parse sequences like
//
// group-group-group-group
//
// where each group can contain from 2 to 8 hexadecimal digits.
//
epr::sequence(
// For the first three groups.
epr::repeat(3u, 3u,
// NOTE: hexdigit_p is a producer!
epr::repeat(2u, 8u, epr::hexdigit_p()),
epr::symbol('-')
),
// For the last group.
// NOTE: hexdigit_p is a producer!
epr::repeat(2u, 8u, epr::hexdigit_p())
);
So the main trick related to the usage of repeat
clause is the consumption of produced values. Usually, a special to_container
consumer is used inside repeat
clause:
auto parser = epr::produce<std::vector<std::uint32_t>>(
epr::repeat(3u, 3u,
epr::hexadecimal_number_p<std::uint32_t>() >> epr::to_container(),
epr::symbol('-')
),
epr::hexadecimal_number_p<std::uint32_t>() >> epr::to_container()
);
The main difference between as_result()
and to_container()
consumer is that as_result()
sets the whole result value of the appropriate producer while to_container()
adds another value to the result (and the result is expected to be some kind of a container like std::vector
, std::map
or std::string
).
and_clause as and-predicate
The PEG’s and-predicate is expressed via and_clause
in easy_parser. For example, this simple grammar:
duration = NUMBER &(SPACE "sec")
can be represented as:
auto parser = epr::produce<unsigned int>(
epr::non_negative_decimal_number_p<unsigned int>(),
epr::and_clause(epr::space(), epr::exact("sec"))
);
Please note that and_clause
accepts a set of clauses. If a producer is used inside and_clause
it should be connected with a consumer. But there is no much sense to use producers inside and_clause
because and_clause
doesn’t consumes matched input.
not_clause as not-predicate
The PEG’s not-predicate is expressed via not_clause
in easy_parser. For example, this simple grammar:
milliseconds = NUMBER !(SPACE "sec")
can be represented as:
auto parser = epr::produce<unsigned int>(
epr::non_negative_decimal_number_p<unsigned int>(),
epr::not_clause(epr::space(), epr::exact("sec"))
);
Please note that not_clause
accepts a set of clauses. If a producer is used inside not_clause
it should be connected with a consumer. But there is no much sense to use producers inside not_clause
because not_clause
doesn’t consumes matched input.
Where to find information about easy_parser’s ready-to-use stuff?
The easy_parser and easy_parser_router already contain a set of ready-to-use tools that can’t be discussed in this section because the lack of the room. Information about those tools can be found in API Reference Manual. See the content of restinio::easy_parser
and restinio::easy_parser_router
namespaces.
Some more complex examples
A router from long_output example
The long_output example shows how to create big responses by using chunked_output. This example uses a router that handles the following routes:
routes = '/' chunk-size [multiplier] '/' chunk-count
| '/' chunk-size [multiplier]
| '/'
chunk-size = NUMBER
chunk-count = NUMBER
multiplier = ('b'|'B') | ('k'|'K') | ('m'|'M')
So if long_output receives GET request for /
path it uses default number of chunks of the default size. If long_output receives GET request for /15k
path it uses default number of chunks of size 15 kilobytes each. If long_output receives GET request for /512/200
path it responds by 200 chunks of 512 bytes each.
By using express-router those routes can be defined that way:
router->http_get("/",
[&ctx](auto req, auto) {...});
router->http_get(
R"(/:value(\d+):multiplier([MmKkBb]?))",
[&ctx](auto req, auto params) {...});
router->http_get(
R"(/:value(\d+):multiplier([MmKkBb]?)/:count(\d+))",
[&ctx](auto req, auto params) {...});
With easy_parser_router things will be a bit more interesting…
The first way we can go is to mimics express-router and to define three route handlers like:
// To make things compact and clear.
using namespace restinio::router::easy_parser_router;
// This producer will be repeated.
auto multiplier_p = produce<std::optional<char>>(
maybe(
alternatives(
caseless_symbol_p('b') >> as_result(),
caseless_symbol_p('k') >> as_result(),
caseless_symbol_p('m') >> as_result()
)
)
);
router->http_get(
path_to_params("/"),
[&ctx](auto req) {...});
router->http_get(
path_to_params(
"/",
non_negative_decimal_number_p<std::size_t>(),
multiplier_p,
),
[&ctx](auto req,
std::size_t chunk_size,
std::optional<char> multiplier) {...});
router->http_get(
path_to_params(
"/",
non_negative_decimal_number_p<std::size_t>(),
multiplier_p,
"/",
non_negative_decimal_number_p<std::size_t>()
),
[&ctx](auto req,
std::size_t chunk_size,
std::optional<char> multiplier,
std::size_t chunk_count) {...});
But this approach doesn’t look promising because it lefts some repeated tasks to route handlers. For example, we should handle multiplier
in two separate handlers, but this handling will be the same in each of them. There is also a need to get the default chunk count in the first and the second handlers.
We can try to write just one route handler that will handle all corner cases.
To do that we have to define a struct like that:
struct distribution_params
{
std::size_t chunk_size_{100u*1024u};
std::size_t count_{10000u};
};
And an instance of that struct will be a result of the route parser. So we can write a parser in the form:
router->http_get(
path_to_params(
produce<distribution_params>(
exact("/"),
maybe(
..., // Some code related to "chunk-size [multiplier]"
// That code fills distribution_params::chunk_size_ member.
maybe(
exact("/"),
non_negative_decimal_number_p<std::size_t>()
>> &distribution_params::count_
)
)
)
),
[&ctx](auto req,
distribution_params params) {...});
The main problem is what we can write instead of ellipsis.
But before we dive deep in that problem a note about the usage of exact("/")
inside produce
. String literals can be used directly only if they are parameters to path_to_params
or path_to_tuple
functions. It is because path_to_params
/path_to_tuple
is a part of easy_parser_router DSL and that DSL treats string literals special way. But clauses inside produce
belongs to easy_parser’s DSL and that DSL doesn’t understand string literals. So we have to enclose string literals into exact
in easy_parser’s DSL.
And now go back to the problem with parsing “chunk-size [multiplier]” part. If there weren’t multiplier part we could write just:
produce<distribution_params>(
exact("/"),
maybe(
non_negative_decimal_number_p<std::size_t>()
>> &distribution_params::chunk_size_,
maybe(
exact("/"),
non_negative_decimal_number_p<std::size_t>()
>> &distribution_params::count_
)
)
)
But we have to handle multiplier and chunk_size_
should be set with the respect to multiplier value.
So we can use a trick here: extract two values (chunk-size and multiplier) and then transform this pair into a single value of type std::size_t
.
A helper struct is necessary here (just a note: std::pair
can be used instead, but a dedicated struct chunk_size
makes things cleaner):
struct chunk_size { std::uint32_t c_{1u}, m_{1u}; };
The code of extraction of chunk-size and multiplier into an instance of chunk_size
will look like:
produce<chunk_size>(
non_negative_decimal_number_p<std::uint32_t>()
>> &chunk_size::c_,
maybe(
produce<std::uint32_t>(
alternatives(
caseless_symbol_p('b') >> just_result(1u),
caseless_symbol_p('k') >> just_result(1024u),
caseless_symbol_p('m') >> just_result(1024u * 1024u)
)
) >> &chunk_size::m_
)
)
Now we have an instance of chunk_size
and all we need is the transformation of that instance into a single std::size_t
value. We can do that by using easy_parser’s convert
transformer:
produce<chunk_size>(
...
) >> convert([](auto cs) { return std::size_t{cs.c_} * cs.m_; })
And now we have to store the transformed value into distribution_params::chunk_size_
member:
produce<chunk_size>(
...
) >> convert(...)
>> &distribution_params::chunk_size_
So the whole code for long_output router will look like:
router->http_get(
path_to_params(
produce<distribution_params>(
exact("/"),
maybe(
produce<chunk_size>(
non_negative_decimal_number_p<std::uint32_t>()
>> &chunk_size::c_,
maybe(
produce<std::uint32_t>(
alternatives(
caseless_symbol_p('b') >> just_result(1u),
caseless_symbol_p('k') >> just_result(1024u),
caseless_symbol_p('m') >> just_result(1024u * 1024u)
)
) >> &chunk_size::m_
)
) >> convert([](auto cs) { return std::size_t{cs.c_} * cs.m_; })
>> &distribution_params::chunk_size_,
maybe(
exact("/"),
non_negative_decimal_number_p<std::size_t>()
>> &distribution_params::count_
)
)
)
),
[&ctx](auto req,
distribution_params params) {...});
Parsing of UUIDs in URL
Another interesting example is the parsing of UUID values specified in route paths. For express router dealing with UUID in route can be done with a rather simple regular expression:
router->http_get(
"/books/:id([A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12})",
...);
The easy_parser_router also allows handling such cases, but it may require some more work from a programmer. Here we’ll discuss two ways of extraction of UUID values.
Extraction of UUID as a string
Let’s start with the simplest approach: the extraction of UUID value as a string. It can look like that:
// To make things compact and clear.
using namespace restinio::router::easy_parser_router;
// The definition of UUID parser. It then be used in path_to_params.
auto uuid_p = produce<std::string>(
repeat(8u, 8u, hexdigit_p() >> to_container()),
symbol_p('-') >> to_container(),
repeat(3u, 3u,
repeat(4u, 4u, hexdigit_p() >> to_container()),
symbol_p('-') >> to_container()),
repeat(12u, 12u, hexdigit_p() >> to_container())
);
// The definition of route handler.
router->http_get(
path_to_params("/books/", uuid_p),
[](auto & req, const auto & uuid) {...});
The definition of uuid_p is a bit wordy but it is straightforward: we expect 8 hexadecimal digits, then hyphen sign, then three groups each of them contains 4 hexadecimal digits and hyphen sign, and then 12 hexadecimal digits. The only thing that should be mentioned is the storing of every extracted symbol to the result container. Even hyphens are stored, otherwise we’ll get “12345678000011112222123456789abc” instead of “12345678-0000-1111-2222-123456789abc”.
This definition of uuid_p can be slightly improved. In its first form uuid_p stores UUID values in the original form. So if there is a mix of lower and upper case letters (like “abcd1234-bbbb-CCCC-dddd-12345678ABC”) then this mix will be kept as is. If this is not appropriate we can tell uuid_p to do the automatic transformation of values to lower case:
auto uuid_p = produce<std::string>(
repeat(8u, 8u, hexdigit_p() >> to_container()),
symbol_p('-') >> to_container(),
repeat(3u, 3u,
repeat(4u, 4u, hexdigit_p() >> to_container()),
symbol_p('-') >> to_container()),
repeat(12u, 12u, hexdigit_p() >> to_container())
) >> to_lower(); // Now the extracted value will be converted
// to lower case.
The main drawback of this solution is the usage of std::string
for holding a small fixed-size value of UUID (but the length of that value can exceed the internal std::string buffer used for SSO). Dynamic allocation for storing just 36 bytes is not a good idea. Can we avoid it?
Yes, we can use std::array<char, 36>
instead of std::string
. Let’s look how the definition of uuid_p will be changed for std::array
:
const auto uuid_p = produce< std::array<char, 36> >(
repeat(8u, 8u, hexdigit_p() >> to_container()),
symbol_p('-') >> to_container(),
repeat(3u, 3u,
repeat(4u, 4u, hexdigit_p() >> to_container()),
symbol_p('-') >> to_container()),
repeat(12u, 12u, hexdigit_p() >> to_container())
) >> to_lower();
The only change required is the replacement of std::string
to std::array
.
Extraction of UUID as a struct with integers inside
If the storing of UUID in the form of a string is not appropriate for some reasons we can make a parser that extracts the value of UUID as a struct with integers inside. Such struct can be defined that way:
struct uuid_t
{
std::uint32_t time_low_;
std::uint16_t time_mid_;
std::uint16_t time_hi_and_version_;
std::uint8_t clock_seq_hi_and_res_;
std::uint8_t clock_seq_low_;
std::array<std::uint8_t, 6> node_;
};
And the extraction of UUID value into such struct can look like that:
// To make things compact and clear.
using namespace restinio::router::easy_parser_router;
// Helpers to be used in the uuid_p below.
const auto x_uint32_p =
hexadecimal_number_p<std::uint32_t>(expected_digits(8));
const auto x_uint16_p =
hexadecimal_number_p<std::uint16_t>(expected_digits(4));
const auto x_uint8_p =
hexadecimal_number_p<std::uint8_t>(expected_digits(2));
// The parser for UUID.
const auto uuid_p = produce<uuid_t>(
x_uint32_p >> &uuid_t::time_low_,
symbol('-'),
x_uint16_p >> &uuid_t::time_mid_,
symbol('-'),
x_uint16_p >> &uuid_t::time_hi_and_version_,
symbol('-'),
x_uint8_p >> &uuid_t::clock_seq_hi_and_res_,
x_uint8_p >> &uuid_t::clock_seq_low_,
symbol('-'),
produce< std::array<std::uint8_t, 6> >(
repeat( 6, 6, x_uint8_p >> to_container() )
) >> &uuid_t::node_
);
In this version, we store only numeric values and ignore all hyphens.
Strictly speaking, there is no need to define helpers like x_uint32_p
and x_uint16_p
, but they make the definition of uuid_p
much more readable.