Today, we close the connection after 10 errors. While this is fine for
normal use, it is unnecessarily large.
Lowering it to 3 helps with defense-in-depth for cross-protocol attacks
(e.g. https://alpaca-attack.com/), while still being large enough for
useful troubleshooting and normal operation.
As part of this change, we also remove the AUTH-specific failures limit,
because they're covered by the connection limit.
When we receive unknown commands, we use the first 6 bytes for
troubleshooting (e.g. put them in traces and exported metrics).
While this is safe, since the different places know how to quote them
properly, it makes things more difficult to analyse, since it's not
uncommon to see be binary blobs.
This patch makes us use the ascii-quoted version instead, to make things
easier to analyze.
When we fail to check if a user exists, we currently return a permanent
error, which can be misleading and also make things more difficult to
troubleshoot.
This patch makes chasquid return a temporary error in that case.
Thanks to Thor77 (thor77@thor77.org) for suggesting this change.
This patch implements support for incoming connections wrapped in the
HAProxy protocol v1.
This is useful when running chasquid behind a HAProxy server, as it
needs the original source IP to perform SPF checks.
This patch is a reimplementation of one originally provided by Denys
Vitali in pull request #15, except the logic for the protocol handling
is moved to a new package, and the smtpsrv.Conn handling of the source
IP is simplified.
It is marked as experimental for now, since we want to give it a bit
more exposure just in case the option/api needs adjustment.
Thanks a lot to Denys Vitali (@denysvitali in github) for sending the
original patch for this, and helping test it!
This patch renames courier.Procmail to courier.MDA, to make it more
obvious that the functionality is not tied to that particular MDA.
It's just for readability, there are no functional changes.
Some utilities might want to access the EHLO/HELO domain in the
post-data hook (for example, to do additional SPF validations).
This patch implements that support, including sanitizing the EHLO domain
on the environment variable to reduce the risk of problems.
The EHLO parameter is generally referred to as "domain", even though it
can take either a domain or an address.
For clarity, rename the variable and comments to match.
This is stylistic only, there are no functional changes.
This patch makes chasquid's monitoring server expose an OpenMetrics
metrics endpoint.
It adds a new package "expvarom" which implements an HTTP handler that
exports expvar variables in the OpenMetrics text format.
Then, the handler is registered by the monitoring server at /metrics
(where most things expect it to be).
The existing exported variables are also extended with descriptions,
which is optional, but improves the readability of the metrics.
When we can't authenticate due to a transient issue, for example if we
rely on Dovecot and it is not responding, we should use a differentiated
error code to avoid confusing users.
However, today we return the same error code as when the user enters the
wrong password, which could confuse users as their MUA might think their
credentials are no longer valid.
This patch fixes the issue by returning a differentiated error code in
that case, as per RFC 4954.
Thanks to Max Mazurov (fox.cpp@disroot.org) for reporting this problem.
tls.Config.BuildNameToCertificate was deprecated in Go 1.14, and is no
longer necessary.
However, we support down to 1.11, so we will keep it for now.
This patch adds a TODO to remove it in the future once the minimum
supported version is 1.14; and adjust the CI linter accordingly.
When creating a new Queue instance, we os.MkdirAll the queue directory.
Currently we don't check if it fails, which will cause us to find out
about problems when the queue is first used, where it is more annoying
to troubleshoot.
This patch adjusts the code so that we check and propagate the error.
That way, problems with the queue directory will be more evident and
easier to handle.
The linter complains that we're not checking for errors, but on some
cases it's on code paths were it is reasonable to do so (e.g. we're
closing the connection and it's a best-effort write).
This patch adjusts the code to make those cases explicit.
When receiving a message on a TLS socket, we currently don't check the
Handshake result, so connections often fail in a way that is not easy to
troubleshoot.
This patch fixes that by checking the result and emitting a nicer error
message before closing the connection.
The smtpsrv fuzzer doesn't handle DATA commands particularly well:
it will continue to read but will skip lines that have STARTTLS as
content, and only really care for the first line due to a bug.
This patch fixes the handling, and moves the logic to a separate
function for readability.
When the client closes the connection, which is very common, chasquid
logs it as an error ("exiting with error: EOF").
That can be confusing and mislead users, and also makes a lot of
traces be marked as errored, when nothing wrong occurred.
So this patch changes the log to not treat it as an error.
When the DATA input is too large, we should keep on reading through it
until we reach the end marker, otherwise there is a security problem:
the remaining data will be interpreted as SMTP commands, so for example
a forwarded message that is too long might end up executing SMTP
commands under an authenticated user.
This patch implements this behaviour, while being careful not to consume
extra memory to avoid opening up the possibility of a DoS.
Note the equivalent logic for single long lines is already implemented.
Reloading during tests will cause the testing aliases to be removed,
which makes test runs that extend beyond 30s to be flaky.
This patch fixes the bug by disabling reloads during these tests.
The testing couriers are currently only used in the queue tests, but we
also want to use them in smtpsrv tests so we can make them more robusts
by checking the emails got delivered.
This patch moves the testing couriers to testlib, and makes both queue
and smtpsrv use them.
Currently, there is no limit to incoming line length, so an evil client
could cause a memory exhaustion DoS by issuing very long lines.
This patch fixes the bug by limiting the size of the lines.
To do that, we replace the textproto.Conn with a pair of buffered reader
and writer, which simplify the code and allow for better and cleaner
control.
Thanks to Max Mazurov (fox.cpp@disroot.org) for finding and reporting
this issue.
If we fail to put the message in the queue (e.g. because we're out of
storage space, or the aliases-resolve hook errored out), it should be
considered a transient failure.
Currently we return a permanent error, which is misleading, as we want
clients to retry in these situations.
So this patch changes the error returned accordingly.
This patch implements two new hooks: alias-resolve and alias-exists.
They are called during the aliases resolution process, to allow for more
complex integration with other systems, such as storing the aliases in a
database.
See the included documentation for more details.
The spf library has gained support for macros, but to process them
properly, a new function needs to be called with the full sender
address, spf.CheckHostWithSender.
This patch updates chasquid's calls to the new API.
There are a few context.WithDeadline calls that can be simplified by
using context.WithTimeout.
At the time they were added, WithTimeout was too new so we didn't want
to depend on it. But now that the minimum Go version has been raised to
1.9, we can simplify the calls.
This patch does that simplification, which is purely mechanical, and
does not change the logic itself.
When handling a connection, today we only set a deadline after each
command received.
However, this does not cover our initial greeting, or the initial TLS
handshake (if the socket is TLS), so a connection can hang
indefininitely at that stage.
This patch fixes that by setting a deadline earlier, before we send or
receive anythong on the connection.
This patch contains some minor code style improvements, to leave the
linter happier and generally follow best practices in some areas where
things snuck through.
Despite its loose appearance, the "Received" header has a reasonably
standarized format.
We were not following the standard format as closely as we should; this
rarely causes problems in this particular case, but there's no need to
deviate from it.
This patch changes the Received header generation as follows:
- The "from" section now uses the remote address as canonical (for
non-authenticated users) which provides more valuable information
than the user-supplied EHLO address (which is also included).
- The remote authenticated user is now hidden, for additional privacy.
- Use the "with" optional clause.
- Use the standard way of printing TLS cipher suite.
- Use the standard way of printing address literals.
This patch makes chasquid reload domaininfo periodically, so it notices
any external changes made to it.
It is in line with what we do for aliases and authentication already,
and makes it possible for external removals an additions to the
domaininfo database to be picked up without a restart.
This patch adds a missing docstrings for exported identifiers, and
adjust some of the existing ones to match the standard style.
In some cases, the identifiers were un-exported after noticing they had
no external users.
Besides improving documentation, it also reduces the linter noise
significantly.
This patch implements an Authenticator type, which connections use to
do authentication and user existence checks.
It simplifies the abstractions (the server doesn't need to know about
userdb, or keep track of domain-userdb maps), and lays the foundation
for other types of authentication backends which will come in later
patches.
For direct TLS connections, such as submission-over-TLS, we currently
don't get the TLS information so it appears in the headers as "plain
text", which is misleading.
This patch fixes the problem by getting the connection information
early. Note it explicitly triggers the handshake, which would otherwise
happen transparently on the first read/write, so we can use the hostname
(if any) in our hello message.
We have many places in our tests where we create temporary directories,
which we later remove (most of the time). We have at least 3 helpers to
do this, and various places where it's done ad-hoc (and the cleanup is
not always present).
To try to reduce the clutter, and make the tests more uniform and
readable, this patch introduces two helpers in a new "testutil" package:
one for creating and one for removing temporary directories.
These new functions are safer, better tested, and make the tests more
consistent. All the tests are updated to use them.
This patch adds support for TLS-wrapped submission connections.
Instead of clients establishing a connection over plain text and then
using STARTTLS to switch over a TLS connection, this new mode allows the
clients to connect directly over TLS, like it's done in HTTPS.
This is not an official standard yet, but it's reasonably common in
practice, and provides some advantages over the traditional submission
port.
The default port is 465, commonly used for this; chasquid defaults to
systemd file descriptor passing as for the other protocols (for now).
The server is written assuming there's at least one valid SSL/TLS
certificate. For example, it unconditionally advertises STARTTLS, and
only supports AUTH over TLS.
This patch makes the server fail to listen if there are no certificates
configured, so the users don't accidentally run an unsupported
configuration.
When testing, we don't want the server to do SPF lookups, as those cause
real DNS queries which can be problematic and add a dependency on the
environment.
This patch adds an internal boolean to disable the SPF lookups, which is
only set from the tests.
Picking the domain used in the DSN message "From" header is more
complicated than it needs to be, causing confusing code paths and having
different uses for the hostname, which should be purely aesthetic.
This patch makes the queue pick the DSN "From" domain from the message
itself, by looking for a local domain in either the sender or the
original recipients. We should find at least one, otherwise it'd be
relaying.
This allows the code to be simplified, and we can narrow the scope of
the hostname option even further.
The loop test can be quite slow, specially on computers without
cryptography-friendly instructions.
This patch introduces a new flag for testing, so that we can bring the
threshold down to 5. The test is just as useful but now runs in a few
seconds, as opposed to a few minutes.
glog works fine and has great features, but it does not play along well
with systemd or standard log rotators (as it does the rotation itself).
So this patch replaces glog with a new logging module "log", which by
default logs to stderr, in a systemd-friendly manner.
Logging to files or syslog is still supported.
This patch makes the hooks always have a complete set of environment
varuables, set to 0/1 or whatever is appropriate, to make it easier to
write the checks for them.