Nothing new, still broken, insecure by default since then: Python's e-mail libraries and certificate verification

Today, basically every e-mail provider supports TLS for their services and programmatically accessing e-mail services with Python code using TLS-wrapped clients is common. Python offers three libraries shipped with a standard installation for handling e-mail transfer. These modules are smtplib, imaplib, and poplib. While Python programming is usually straightforward, using these Python libraries require passing a magic parameter in the right way to use secure communication. If one has just read the summary on Stackoverflow, read a tutorial that does not mention security settings, or asked ChatGPT not specifically enough, it results in programs that do not defeat active attackers in a machine-in-the-middle (MITM) position. Our journey started, when we wrote an e-mail monitoring plugin in Python and ended for the time being with the notification of various open source projects.

Insecure past

There was a time, when fetching e-mail via clear-text channels was the standard and there was no real alternative to that. SMTP is 40 years old. The encryption program PGP is 31 years old and can be understood as tool that originated from the desire for reasonably useful end-to-end privacy, at a time where the underlying foundation was based on clear-text. In the mid-nineties, the first crypto-protocols for regular users were developed, such as SSH for shell access and SSL mainly for web browsing. For e-mail, STARTTLS was an improvement to upgrade from an insecure communication to a crypto-channel that preserves integrity and confidentiality as long as an active attacker did not prevent STARTTLS. The pre-Snowden era is not that long ago, where websites usually relied on cleartext-HTTP. These old times influenced how things are nowadays. Nevertheless, times are changing, techniques too, and expectations anyway.

The same reasoning applies to libraries with protocol implementations. It is not different with Python. So, everything here is an old hat. It is a "known" problem. It is more or less documented in the corresponding Python documentation. There is a Python Enhancement Proposal, a PEP, which is the Python community's way of coming to decisions. It is PEP 476 – Enabling certificate verification by default for stdlib HTTP clients from 2014, because HTTP libraries had the same issue. They were insecure by default. PEP 476 suggests doing certificate checking for accessing resources via HTTPS and it also mentions that there are other Python libraries with the same issue. These other libraries are smtplib, imaplib, and poplib for mail handling and additional libraries for FTP and NNTP. The PEP explains that the mail libraries may still need insecure defaults for backward compatibility, for example because mail servers may have invalid certificates and therefore with e-mail this certificate verification could be a different thing compared to the PEP's aim at changing certificate verification for HTTP. And there is the very same discussion already on Python's Github tracker.

Making things worse, while imaplib and poplib are mainly used in client to server communication (where you might have a chance to ask the user to do a security decision), SMTP is used for server to server communication as well. While security features are introduced to improve mail security (e.g. MTA-STS), the situation is still challenging. That's why we think projects should configure e-mail as secure as possible, where possible.

Meanwhile it is 2023 and insecure defaults are a problem, especially when our infrastructure depends on many small building blocks. An insecurity here, a shared password there, an outdated component over there. Each vulnerability could be one too much to grant attackers the next level of access. So let us explain the problem.

A vulnerable example

Consider this example of Python script that is intended to connect to an IMAP server:

#!/usr/bin/env python3

import imaplib

imap_host = 'outlook.office365.com'
imap_user = 'nonexisting@outlook.com'
imap_pass = 'secret'

print("IMAP: Try to connect.")
server = imaplib.IMAP4_SSL(host=imap_host, port=993)

print("IMAP: Try IMAP login, which is expected to fail with the data above.")
server.login(imap_user, imap_pass)

At first, the script looks like it should work, which it does. But if this script is used, it will not verify the peer's certificate, which can be tested by just adding an entry to /etc/hosts, for example by associating an IP address from imap.gmail.com with the hostname of outlook.office365.com:

173.194.76.109  outlook.office365.com

If the script now connects to Microsoft's IMAP server, it will instead connect to the one from Google. The certificate will not match the hostname, but running the script now results in an authentication error and not in an ssl.SSLCertVerificationError. As there was no certificate verification error, username and password were sent to the wrong server and the script did not prevent this.

$ ./imaptest.py
IMAP: Try to connect.
IMAP: Try IMAP login, which is expected to fail with the data above.
Traceback (most recent call last):
  File "/tmp/./imaptest.py", line 13, in <module>
    server.login(imap_user, imap_pass)
  File "/usr/lib/python3.10/imaplib.py", line 612, in login
    raise self.error(dat[-1])
imaplib.IMAP4.error: b'[AUTHENTICATIONFAILED] Authentication failed.'

Outch!

Read the manual first

This was just an example with accessing an IMAP server. Technically, it is the same with Python's smtplib and also poplib. This is more or less documented behaviour. The imaplib.IMAP4_SSL has an optional parameter ssl_context, which is described as:

ssl_context is a ssl.SSLContext object which allows bundling SSL configuration options, certificates and private keys into a single (potentially long-lived) structure. Please read Security considerations for best practices.

The ssl_context parameter was added in Python 3.3 released in 2012. Following the link to the security best practices, the Python documentation further explains:

For client use, if you don’t have any special requirements for your security policy, it is highly recommended that you use the create_default_context() function to create your SSL context. It will load the system’s trusted CA certificates, enable certificate validation and hostname checking, and try to choose reasonably secure protocol and cipher settings. [...]

By contrast, if you create the SSL context by calling the SSLContext constructor yourself, it will not have certificate validation nor hostname checking enabled by default. If you do so, please read the paragraphs below to achieve a good security level. [...]

When calling the SSLContext constructor directly, CERT_NONE is the default. Since it does not authenticate the other peer, it can be insecure, especially in client mode where most of time you would like to ensure the authenticity of the server you’re talking to. Therefore, when in client mode, it is highly recommended to use CERT_REQUIRED. However, it is in itself not sufficient; you also have to check that the server certificate, which can be obtained by calling SSLSocket.getpeercert(), matches the desired service. For many protocols and applications, the service can be identified by the hostname; in this case, the match_hostname() function can be used. This common check is automatically performed when SSLContext.check_hostname is enabled.

Would you say that after reading this, everything is completely clear? Creating a default context to load CA certificates sounds good, but is the impact clear, when one skips this step and just does not pass the right SSL context, when calling the IMAP4_SSL constructor? Does the code call the SSLContext contructor somehow, when you completely ignore the IMAP4_SSL constructor's ssl_context parameter? The Python documentation links the corresponding source code. So one can check, what happens starting from the constructor imaplib.IMAP4_SSL:

def __init__(self, host='', port=IMAP4_SSL_PORT, keyfile=None,
             certfile=None, ssl_context=None, timeout=None):
[...]

    if ssl_context is None:
        ssl_context = ssl._create_stdlib_context(certfile=certfile,
                                                 keyfile=keyfile)
    self.ssl_context = ssl_context
    IMAP4.__init__(self, host, port, timeout)

There, the ssl_context parameter is initialized with None. If no specific value is passed via ssl_context, then _create_stdlib_context is called. The function ssl._create_stdlib_context is defined as:

_create_stdlib_context = _create_unverified_context

Unverified context does not sound good. The corresponding code is shown below. The function _create_unverified_context disables check_hostname and does not load a default trust chain, if not instructed to do this:

def _create_unverified_context(protocol=None, *, cert_reqs=CERT_NONE,
                           check_hostname=False, purpose=Purpose.SERVER_AUTH,
                           certfile=None, keyfile=None,
                           cafile=None, capath=None, cadata=None):
[...]
    context = SSLContext(protocol)
    context.check_hostname = check_hostname
    if cert_reqs is not None:
        context.verify_mode = cert_reqs
    if check_hostname:
        context.check_hostname = True
[...]
    if certfile or keyfile:
        context.load_cert_chain(certfile, keyfile)

    # load CA root certs
    if cafile or capath or cadata:
        context.load_verify_locations(cafile, capath, cadata)
    elif context.verify_mode != CERT_NONE:
        # no explicit cafile, capath or cadata but the verify mode is
        # CERT_OPTIONAL or CERT_REQUIRED. Let's try to load default system
        # root CA certificates for the given purpose. This may fail silently.
        context.load_default_certs(purpose)

Not checking certificates explains why the vulnerable example client in the beginning just continued connecting and sent credentials to another mail server or to the entity that pretends to be one.

How could that be avoided? Just create an SSL context and pass it to the constructor as an additional parameter (examples for each library):

imaplib.IMAP4_SSL(hostname, port, ssl_context=ssl.create_default_context())
smtplib.SMTP_SSL(hostname, port, context=ssl.create_default_context())
poplib.POP3_SSL(hostname, port, context=ssl.create_default_context())

By the way, did you noticed the different parameter names for the SSL context?

Implications

The direct consequence is that a vulnerable Python program that uses the libraries in a vulnerable way may leak credentials for an IMAP or SMTP account (or POP3) to an active attacker in a MITM position. A script may only use SMTP, but often SMTP and IMAP credentials are identical and therefore may result in IMAP being accessed, too. Furthermore, if an e-mail is sent via SMTP, mail contents may leak or could be modified during transit. Due to the flexibility of IMAP, not only mail contents may be retrieved, but also modified or deleted before another, legitimate user can access it. Since a library is affected here, the exact impact depends on the use case of the software that uses the library and it will depend on the type of data being processed via e-mail. Another impact is that with access to e-mail infrastructure, the reachable attack surface increases for the attacker. Access to e-mail infrastructure may be abused for phishing attacks and spamming, for example.

If there is the need to create an SSLContext to enable certificate verification, will everyone have read the documentation? Sadly, there are even tutorials not mentioning the SSL context and ChatGPT won’t tell the full story until one asks... erm... demonstrates prompt engineering skills.

We can use code search engines such as grep.app, search code, and Debian Code Search, looking for constructor calls to imaplib.IMAP4_SSL and smtplib.SMTP_SSL and skim through the source code. Skipping testing code, inactive and some personal code repositories, it is possible to find quite some vulnerable uses. A curated list is given at the end of this article.

We found vulnerabilities in code that is covered by bug bounty programs. For example, we found a vulnerability in an OAuth2 Python script by Google, where a connection to an SMTP/IMAP server is made to test if authentication tokens are valid. We reported this to the Google and Alphabet Vulnerability Reward Program (VRP). They rewarded us with a small bounty, at least in theory. [1] We also filed vulnerability reports at Hackerone for Spotify and Mozilla, but they [2] don't see a problem with not checking server certificates and closed the tickets. The vulnerability in Spotify's Luigi project has been found before, but was likely not reported back then. We were able to report the vulnerability in Mozilla's Bugbot via another channel and it was fixed.

We tried to contact most of the projects, where it was somehow possible to find (security) contact data. Some of them were very happy to receive the report and immediately provided a fix.

Necessary conclusion to draw

It is understandable to prefer not to break existing functionality. We all know ecosystems in software engineering, where things break every now and then. But keeping insecure defaults is not a value that makes sense to get preserved. It is a vulnerability. If a security issue is only documented in the fine print, developers will miss it and will write vulnerable code. RTFM is not an excuse. You don't build unavoidable trip traps and urge caution.

Over a year ago, a ticket was opened in the Cpython project to change the default behaviour of these e-mail libraries, in other words to be secure by default. We are really looking forward to the ticket being implemented and closed.

A few affected tools and frameworks

Infrastructure:

Frameworks:

Security:

Community:

End-user tools:

OAuth2 clients: