OpenSSL client and server from scratch, part 4
This is a continuation of yesterday’s post,
“OpenSSL client and server from scratch, part 3.”
In the previous post, we made a trivial little HTTPS server that we could talk to with curl
.
Today we’ll write our own HTTPS client as a replacement for curl
.
Set up an SSL_CTX
for the client
Recall that before we can create an SSL
connection, we need to fill out
an SSL_CTX
. On the server side, the SSL_CTX
holds the server’s certificate and private key,
so that the server can authenticate itself to clients. On the client side, the SSL_CTX
holds
a trust store — a set of certificates that our client considers trustworthy.
By the way, it is also possible for the client to present a certificate and for the server to do certificate verification. This is extremely uncommon on the World Wide Web, where servers offer their services mostly to strangers, but you may have encountered client authentication on a private network. For example, your browser might need to present a “client certificate” as part of logging in to your employer’s email service.
We won’t talk about client certificates today.
If you don’t set up a trust store, OpenSSL won’t trust any certificates by default.
That’s not what you want! So let’s tell OpenSSL to trust “whatever our computer trusts,”
just like curl
probably does. So we create our SSL_CTX
object and then call
SSL_CTX_set_default_verify_paths
:
auto ctx = my::UniquePtr<SSL_CTX>(SSL_CTX_new(TLS_client_method()));
SSL_CTX_set_min_proto_version(ctx.get(), TLS1_2_VERSION);
if (SSL_CTX_set_default_verify_paths(ctx.get()) != 1) {
my::print_errors_and_exit("Error loading trust store");
}
You can locate your default trust store using this OpenSSL function:
puts(X509_get_default_cert_file());
Or by running this Python command:
python -c 'import ssl; print(ssl.get_default_verify_paths().cafile)'
Now we set up our TCP connection to duckduckgo.com
, just like we did in part 1.
The only difference is that we connect to port 443 (the port conventionally associated with HTTPS services)
instead of to port 80.
auto bio = my::UniquePtr<BIO>(BIO_new_connect("duckduckgo.com:443"));
if (BIO_do_connect(bio.get()) <= 0) {
my::print_errors_and_exit("Error in BIO_do_connect");
}
Once we’ve made the TCP connection, it’s time to start talking TLS.
We slap an SSL filter BIO in front of our socket BIO, using the overloaded operator|
that we wrote in part 2.
auto ssl_bio = std::move(bio)
| my::UniquePtr<BIO>(BIO_new_ssl(ctx.get(), 1));
Notice the integer argument 1
here. Recall from part 3 that
1
means “I’m a client,” and 0
means “I’m a server.” This line of code is just one character
different from what we wrote yesterday, but if you cut-and-paste from yesterday’s code and forget
to change that one character, you’ll be in a world of cryptic and frustrating errors!
Getting at the actual SSL
object
Our SSL filter BIO owns an SSL
object, which holds the state of the actual connection
(like, what symmetric encryption key it’s negotiated, or what certificate it has received from the server),
as distinct from the SSL “context” SSL_CTX
that we already set up. (By the way, the SSL
object
also holds a reference to our SSL_CTX
; SSL_CTX
objects, like most OpenSSL objects, are refcounted
internally. So we could actually relinquish our ownership by setting ctx = nullptr
at this point,
and it would do no harm. The SSL_CTX
object won’t actually disappear from memory until every stakeholder
is done with it.)
We’re about to do some work with the per-connection SSL
object,
so let’s write a quick helper function to get at it:
SSL *get_ssl(BIO *bio)
{
SSL *ssl = nullptr;
BIO_get_ssl(bio, &ssl);
if (ssl == nullptr) {
my::print_errors_and_exit("Error in BIO_get_ssl");
}
return ssl;
}
Keep in mind that we always read and write data via a BIO. The BIO
object is higher-level
than the SSL
object. OpenSSL does provide lower-level macros like SSL_read
and SSL_write
that talk
directly to the SSL
object, but we aren’t going to use them. The only time we’ll touch an SSL
object
is when we have absolutely no other choice.
The SNI field
You’re reading this blog post on quuxplusone.github.io,
which means that the HTTPS server serving you this page must present a certificate for quuxplusone.github.io
.
But the server serving you this page (let’s say its IP address is 185.199.108.153)
probably serves dozens or hundreds of other domains, too. (And not just *.github.io
domains, either,
so wildcard certificates don’t fully solve the problem, smarty pants.)
How does the server know what certificate to present to you?
(Remember, it can’t look at the Host:
header of your HTTP request, because you haven’t sent
your HTTP request yet, because you can’t be sure if you trust this server until it’s presented you
with a certificate.)
TLS solves this chicken-and-egg problem by encouraging the client to send their destination domain in plaintext as part of the TLS handshake. It goes in a field called “Server Name Indication” (SNI), as specified in RFC 6066. OpenSSL does not set the SNI field by default, but in real life you should set it, like this:
SSL_set_tlsext_host_name(my::get_ssl(ssl_bio.get()), "duckduckgo.com");
The TLS handshake and certificate verification
When it comes to Internet security, the server is involved; the client is committed.
Our trivial HTTPS server merely presents its certificate to the client and waits to see if the client
accepts it. This is a non-interactive process from the programmer’s point of view. We just
set up the server’s SSL_CTX
with the appropriate certificate and private key, and then we could
immediately call BIO_read
on the SSL BIO; the process of presenting our certificate and negotiating
encryption parameters all happened “under the hood.” If the client happens to reject our certificate,
no problem — BIO_read
returns zero bytes, we drop the connection and move on to service the
next client.
Our trivial HTTPS client has to do more work. Before we do anything else with its SSL connection,
we need to verify that the server is who they say they are. So before we issue our first BIO_write
on the connection, let’s complete the TLS handshake: let’s receive the server’s certificate, check whether it
has a valid chain of trust back to a root in our trust store, and check whether any of the links in
that chain have expired. Only if the verification succeeds should we go on to send our GET request
over the encrypted connection.
if (BIO_do_handshake(ssl_bio.get()) <= 0) {
my::print_errors_and_exit("Error in TLS handshake");
}
my::verify_the_certificate(my::get_ssl(ssl_bio.get()), "duckduckgo.com");
my::send_http_request(ssl_bio.get(), "GET / HTTP/1.1", "duckduckgo.com");
std::string response = my::receive_http_message(ssl_bio.get());
printf("%s", response.c_str());
Recall from part 2 that BIO_do_handshake
is just a macro that
means “do the next step in whatever this BIO’s state machine is,” and that functionally it’s a
synonym for BIO_do_connect
and BIO_do_accept
. We could write
if (BIO_do_connect(ssl_bio.get()) <= 0) {
my::print_errors_and_exit("Error in TLS handshake");
}
and the compiler would treat it as exactly the same thing… but we shouldn’t! The point of this
line of code is to complete a TLS handshake, not to make a TCP connection to a server
nor to accept a TCP connection from a client. So we should use the BIO_do_handshake
mnemonic
to convey our intent to the human reader.
Let’s look at the implementation of my::verify_the_certificate
.
void verify_the_certificate(SSL *ssl, const std::string& expected_hostname)
{
int err = SSL_get_verify_result(ssl);
if (err != X509_V_OK) {
const char *message = X509_verify_cert_error_string(err);
fprintf(stderr, "Certificate verification error: %s (%d)\n", message, err);
exit(1);
}
X509 *cert = SSL_get_peer_certificate(ssl);
if (cert == nullptr) {
fprintf(stderr, "No certificate was presented by the server\n");
exit(1);
}
if (X509_check_host(cert, expected_hostname.data(), expected_hostname.size(), 0, nullptr) != 1) {
fprintf(stderr, "Certificate verification error: Hostname mismatch\n");
exit(1);
}
}
There are three things we need to check in order to verify the server’s certificate.
-
First, ask OpenSSL whether there was anything “off” about the certificate presented by the server. Did the server present us with an expired certificate? Was the certificate not signed by anyone we trust? (Remember, the
SSL
object knows about theSSL_CTX
, and we set up theSSL_CTX
with a trust store.) Did the server fail to prove that it knows the private key associated with the certificate? -
Second, check that the server actually provided us with a certificate! The TLS protocol doesn’t technically require the server to present a certificate, any more than it requires the client to present one. If no certificate was presented, then OpenSSL will happily report “nothing was wrong with the certificate [because there wasn’t one].” Therefore it’s extremely important that we do this step!
-
Finally, remember how SNI was kind of an afterthought? Suppose the server presents us with a valid certificate for
google.com
, signed by someone we trust, and proves that it knows the private key associated with the certificate. This is absolutely sufficient evidence that the server we’re talking to is reallygoogle.com
. But that could still be a problem — if we wanted to talk toduckduckgo.com
! Therefore our third step is to verify that the name on the certificate matches the name of the server that we asked to talk to.
A single certificate may be valid for many different domain names. One of those names occupies a privileged position known as the “Common Name” (CN); the rest must be relegated to an extension field called “Subject Alternative Names” (SAN). However, because storing names in two different places is a pain in the neck, the industry has essentially deprecated the “Common Name” field in favor of putting all the names in the SAN field.
When we generated our duckduckgo.com
certificate in part 3, we used
that deprecated CN field:
openssl req -new -x509 -sha256 -key server-private-key.pem -subj "/CN=duckduckgo.com" -out server-certificate.pem
To generate a certificate with SANs like all the cool kids have, consult StackOverflow; it seems to be very difficult pre-1.1.1 and maybe still kind of tricky even in 1.1.1, judging from the various answers.
Anyway, the point is, when we do that final step of checking that the cert presented by the server
was issued for the domain that we’re actually trying to reach, we won’t be checking “the name”
on the certificate; we’ll be checking the names on the certificate, plural, to see if any of them
match the domain we’re trying to reach. We should also consider wildcards; for example, a certificate issued for
*.github.io
should match quuxplusone.github.io
. This logic is extremely tricky, so we will
let OpenSSL do it for us, by calling X509_check_host
as shown above.
OpenSSL 1.1.0 can do X509_check_host
as part of SSL_get_verify_result
… if we add a setup step
that tells OpenSSL to do the check for us! The setup step is spelled
SSL_set1_host
, and you can see it in
the complete code below.
Prior to OpenSSL 1.0.2, the function
X509_check_host
didn’t even exist, and everyone had to roll their own implementations. A high-quality, but limited-functionality, example of hostname validation is at iSECPartners/ssl-conservatory on GitHub, along with an excellent ten-page paper “Everything You’ve Always Wanted to Know About Certificate Validation With OpenSSL (but Were Afraid to Ask).” Notably, their example code fails to handle wildcard certs.
Putting it all together
At the bottom of this post you’ll find the complete code for our very simple C++14 HTTPS client. When you compile and run it, it should fetch the home page of duckduckgo.com over an encrypted HTTPS connection.
Suppose you want to try it out with our simple HTTPS server. (Remember, our server pretends to be
duckduckgo.com
, with a self-signed certificate we issued to ourselves.) We just need to change two lines.
First, we need to make the initial TCP connection to our server, not the real DuckDuckGo:
- auto bio = my::UniquePtr<BIO>(BIO_new_connect("duckduckgo.com:443"));
+ auto bio = my::UniquePtr<BIO>(BIO_new_connect("localhost:8080"));
And second, we need to tell the SSL_CTX
that we trust the server’s certificate:
- if (SSL_CTX_set_default_verify_paths(ctx.get()) != 1) {
+ if (SSL_CTX_load_verify_locations(ctx.get(), "server-certificate.pem", nullptr) != 1) {
That’s it! If you make those two changes to the client’s code, and recompile, then you’ll be able to spin up the HTTPS server from part 3 and connect to it with this client, as follows:
$ ./https-server >/dev/null &
$ ./https-client
HTTP/1.1 200 OK
Content-Length: 10
okay cool
$
Godbolt Compiler Explorer doesn’t support running programs that do networking, but you can see the code on Godbolt here anyway.
// g++ -std=c++14 https-client.cpp $(pkg-config --cflags --libs openssl) -o https-client
#include <memory>
#include <stdarg.h>
#include <stdexcept>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <string>
#include <vector>
#include <openssl/bio.h>
#include <openssl/err.h>
#include <openssl/ssl.h>
#include <openssl/x509v3.h>
namespace my {
template<class T> struct DeleterOf;
template<> struct DeleterOf<BIO> { void operator()(BIO *p) const { BIO_free_all(p); } };
template<> struct DeleterOf<BIO_METHOD> { void operator()(BIO_METHOD *p) const { BIO_meth_free(p); } };
template<> struct DeleterOf<SSL_CTX> { void operator()(SSL_CTX *p) const { SSL_CTX_free(p); } };
template<class OpenSSLType>
using UniquePtr = std::unique_ptr<OpenSSLType, DeleterOf<OpenSSLType>>;
my::UniquePtr<BIO> operator|(my::UniquePtr<BIO> lower, my::UniquePtr<BIO> upper)
{
BIO_push(upper.get(), lower.release());
return upper;
}
class StringBIO {
std::string str_;
my::UniquePtr<BIO_METHOD> methods_;
my::UniquePtr<BIO> bio_;
public:
StringBIO(StringBIO&&) = delete;
StringBIO& operator=(StringBIO&&) = delete;
explicit StringBIO() {
methods_.reset(BIO_meth_new(BIO_TYPE_SOURCE_SINK, "StringBIO"));
if (methods_ == nullptr) {
throw std::runtime_error("StringBIO: error in BIO_meth_new");
}
BIO_meth_set_write(methods_.get(), [](BIO *bio, const char *data, int len) -> int {
std::string *str = reinterpret_cast<std::string*>(BIO_get_data(bio));
str->append(data, len);
return len;
});
bio_.reset(BIO_new(methods_.get()));
if (bio_ == nullptr) {
throw std::runtime_error("StringBIO: error in BIO_new");
}
BIO_set_data(bio_.get(), &str_);
BIO_set_init(bio_.get(), 1);
}
BIO *bio() { return bio_.get(); }
std::string str() && { return std::move(str_); }
};
[[noreturn]] void print_errors_and_exit(const char *message)
{
fprintf(stderr, "%s\n", message);
ERR_print_errors_fp(stderr);
exit(1);
}
[[noreturn]] void print_errors_and_throw(const char *message)
{
my::StringBIO bio;
ERR_print_errors(bio.bio());
throw std::runtime_error(std::string(message) + "\n" + std::move(bio).str());
}
std::string receive_some_data(BIO *bio)
{
char buffer[1024];
int len = BIO_read(bio, buffer, sizeof(buffer));
if (len < 0) {
my::print_errors_and_throw("error in BIO_read");
} else if (len > 0) {
return std::string(buffer, len);
} else if (BIO_should_retry(bio)) {
return receive_some_data(bio);
} else {
my::print_errors_and_throw("empty BIO_read");
}
}
std::vector<std::string> split_headers(const std::string& text)
{
std::vector<std::string> lines;
const char *start = text.c_str();
while (const char *end = strstr(start, "\r\n")) {
lines.push_back(std::string(start, end));
start = end + 2;
}
return lines;
}
std::string receive_http_message(BIO *bio)
{
std::string headers = my::receive_some_data(bio);
char *end_of_headers = strstr(&headers[0], "\r\n\r\n");
while (end_of_headers == nullptr) {
headers += my::receive_some_data(bio);
end_of_headers = strstr(&headers[0], "\r\n\r\n");
}
std::string body = std::string(end_of_headers+4, &headers[headers.size()]);
headers.resize(end_of_headers+2 - &headers[0]);
size_t content_length = 0;
for (const std::string& line : my::split_headers(headers)) {
if (const char *colon = strchr(line.c_str(), ':')) {
auto header_name = std::string(&line[0], colon);
if (header_name == "Content-Length") {
content_length = std::stoul(colon+1);
}
}
}
while (body.size() < content_length) {
body += my::receive_some_data(bio);
}
return headers + "\r\n" + body;
}
void send_http_request(BIO *bio, const std::string& line, const std::string& host)
{
std::string request = line + "\r\n";
request += "Host: " + host + "\r\n";
request += "\r\n";
BIO_write(bio, request.data(), request.size());
BIO_flush(bio);
}
SSL *get_ssl(BIO *bio)
{
SSL *ssl = nullptr;
BIO_get_ssl(bio, &ssl);
if (ssl == nullptr) {
my::print_errors_and_exit("Error in BIO_get_ssl");
}
return ssl;
}
void verify_the_certificate(SSL *ssl, const std::string& expected_hostname)
{
int err = SSL_get_verify_result(ssl);
if (err != X509_V_OK) {
const char *message = X509_verify_cert_error_string(err);
fprintf(stderr, "Certificate verification error: %s (%d)\n", message, err);
exit(1);
}
X509 *cert = SSL_get_peer_certificate(ssl);
if (cert == nullptr) {
fprintf(stderr, "No certificate was presented by the server\n");
exit(1);
}
#if OPENSSL_VERSION_NUMBER < 0x10100000L
if (X509_check_host(cert, expected_hostname.data(), expected_hostname.size(), 0, nullptr) != 1) {
fprintf(stderr, "Certificate verification error: X509_check_host\n");
exit(1);
}
#else
// X509_check_host is called automatically during verification,
// because we set it up in main().
(void)expected_hostname;
#endif
}
} // namespace my
int main()
{
#if OPENSSL_VERSION_NUMBER < 0x10100000L
SSL_library_init();
SSL_load_error_strings();
#endif
/* Set up the SSL context */
#if OPENSSL_VERSION_NUMBER < 0x10100000L
auto ctx = my::UniquePtr<SSL_CTX>(SSL_CTX_new(SSLv23_client_method()));
#else
auto ctx = my::UniquePtr<SSL_CTX>(SSL_CTX_new(TLS_client_method()));
#endif
if (SSL_CTX_set_default_verify_paths(ctx.get()) != 1) {
my::print_errors_and_exit("Error setting up trust store");
}
auto bio = my::UniquePtr<BIO>(BIO_new_connect("duckduckgo.com:443"));
if (bio == nullptr) {
my::print_errors_and_exit("Error in BIO_new_connect");
}
if (BIO_do_connect(bio.get()) <= 0) {
my::print_errors_and_exit("Error in BIO_do_connect");
}
auto ssl_bio = std::move(bio)
| my::UniquePtr<BIO>(BIO_new_ssl(ctx.get(), 1))
;
SSL_set_tlsext_host_name(my::get_ssl(ssl_bio.get()), "duckduckgo.com");
#if OPENSSL_VERSION_NUMBER >= 0x10100000L
SSL_set1_host(my::get_ssl(ssl_bio.get()), "duckduckgo.com");
#endif
if (BIO_do_handshake(ssl_bio.get()) <= 0) {
my::print_errors_and_exit("Error in BIO_do_handshake");
}
my::verify_the_certificate(my::get_ssl(ssl_bio.get()), "duckduckgo.com");
my::send_http_request(ssl_bio.get(), "GET / HTTP/1.1", "duckduckgo.com");
std::string response = my::receive_http_message(ssl_bio.get());
printf("%s", response.c_str());
}
This series continues with “OpenSSL client and server from scratch, part 5.”