The RFCs are very clear that in DATA contents:
> CR and LF MUST only occur together as CRLF; they MUST NOT appear
> independently in the body.
https://www.rfc-editor.org/rfc/rfc5322#section-2.3https://www.rfc-editor.org/rfc/rfc5321#section-2.3.8
Allowing "independent" CR and LF can cause a number of problems.
In particular, there is a new "SMTP smuggling attack" published recently
that involves the server incorrectly parsing the end of DATA marker
`\r\n.\r\n`, which an attacker can exploit to impersonate a server when
email is transmitted server-to-server.
https://www.postfix.org/smtp-smuggling.htmlhttps://sec-consult.com/blog/detail/smtp-smuggling-spoofing-e-mails-worldwide/
Currently, chasquid is vulnerable to this attack, because Go's standard
libraries net/textproto and net/mail do not enforce CRLF strictly.
This patch fixes the problem by introducing a new "dot reader" function
that strictly enforces CRLF when reading dot-terminated data, used in
the DATA input processing.
When an invalid newline terminator is found, the connection is aborted
immediately because we cannot safely recover from that state.
We still keep the internal representation as LF-terminated for
convenience and simplicity.
However, the MDA courier is changed to pass CRLF-terminated lines, since
that is an external program which could be strict when receiving email
messages.
See https://github.com/albertito/chasquid/issues/47 for more details and
discussion.
This patch makes mail_diff more strict in its parsing, to ensure we
catch any encoding issues that may otherwise be masked by the default
compatibility policy.
The minor dialogs test covers some very specific SMTP exchanges, and
some of those include delivering email.
Today, we don't verify the final mailbox, we just check the SMTP
exchange. However, it can be very useful for some of the tests to do
end-to-end checking of the final mailbox.
This patch implements that ability in the test itself, and on the
(currently only) email delivering dialog.
Subsequent patches that introduce new tests will make use of this
feature.
Using an empty listening address will result in chasquid listening on a
random port, which is a dangerous misconfiguration.
That is most likely done to prevent it from listening at all.
To prevent this misconfiguration, explicitly reject empty listening
addresses early and with a warning, so there is no ambiguity.
Users can still prevent chasquid from listening by just commenting out
the entry in the config (and not passing any systemd file descriptors).
See https://github.com/albertito/chasquid/issues/45 for more details and
discussion, including alternatives considered.
Thanks to xavierg who reported this via IRC.
Today, when a user sets an alias with drop characters and/or suffixes,
those go unused, since we always "clean" addresses before alias
resolution.
This results in unexpected and surprising behaviour, and it's not
properly documented either.
This patch resolves this unexpected behaviour as follows:
- Drop characters are ignored, both at parsing time and at lookup time.
- Lookups are done including the suffixes first, and if that results in
no matches, they are retried without suffixes.
This results in aliases working more intuitively for the most common use
cases: of users wanting to have different aliases for specific suffixes,
and not having to care for drop characters.
Hooks can be used to get different behaviour if needed, since the first
lookup is done with the address as-is.
Thanks to znerol@ (lo+github@znerol.ch) for reporting this, and the
discussion on how to fix it, in
https://github.com/albertito/chasquid/issues/41.
Today we check the aliases deliver mail to the expected locations, but
we don't fail if there are unexpected deliveries.
Doing so can help catch bugs (including test bugs), so this patch
implements that.
In addition, fix two of the tests that were printing on error, but not
causing the tests to fail (which was the original intention).
The t-20-bad_configs test can sometimes have false positives because
the last line recorded in the log isn't always the one causing the
failure, due to asynchronous logging and in particular in combination
with coverage tests (which alter the os.Exit behaviour subtly).
This patch fixes that by having the tests look at the last 4 lines of
output instead. This is not super pretty, but it should be good enough
to cover for any timing issues that arise in these particular tests.
Currently, if the `certs/` directory has a symlink inside, we skip it.
That is not really intended, it's an unfortunate side-effect of skipping
regular files.
To fix this, this patch adjusts the logic to only ignore regular files
instead. It also adds a message when a directory is skipped, to make it
easier to debug permission issues.
Thanks to @erjoalgo for reporting this in
https://github.com/albertito/chasquid/pull/39, and providing an
alternative patch!
This patch makes chasquid-util's aliases-resolve and domaininfo-remove
commands talk to the chasquid server (via the new localrpc server).
For aliases-resolve, currently has fairly hacky logic which reimplements
a bunch of the servers', and is also incomplete because it does not
support hooks.
In this patch we fix that by having it talk to the server, where we get
authoritative responses and have no issues with aliases hooks. This
resolves https://github.com/albertito/chasquid/issues/18.
For domaininfo-remove, currently its implementation is also very hacky
since it manipulates files behind the servers' back and without even
using the internal library.
In this patch we fix that by doing the operation through the server,
avoiding the need for those hacks, and also remove the need to manually
reload the server afterwards.
This patch updates tests.md to reflect the recent changes around
coverage testing, specifically we no longer have coverage issues with
fatal/non-zero exits, so remove that section from the docs.
Also while at it, add links to the software we reference, for
convenience.
Go 1.20 finally includes proper support for instrumenting binaries for
coverage. This allows us to drop quite a few hacks and workarounds that
we used for it, and we can now also test exiting cases.
The downside is that coverage tests now require Go 1.20, but it is an
acceptable price to pay for the more accurate results.
Normal integration tests are unchanged.
This patch updates the coverage testing infrastructure to make use of
the new Go 1.20 features.
We're running against the usage limits in Gitlab CI (500), and Github
Actions should have more (2000).
So this patch replaces Gitlab CI with Github actions for running
integration tests, and build and push Docker images (to Dockerhub and
Gitlab registry).
We'll see how the usage levels are in a few months.
This patch updates the shell scripts with some of the common best
practices, which should make them more resilient to unusual failures and
unexpected environments (in particular, directories with spaces).
Most of these were identified by shellcheck.
The integration tests spend a lot of time on some ancilliary actions,
which slows them down: generating certificates, adding users, and
waiting for things to happen.
To improve the performance of those actions, this patch:
- Makes (most) tests use plain passwords (-20%)
- Adds a certificate cache to reuse certs (-10%)
- Tightens the sleep loops (-5%)
In aggregate, this patch results in a speedup of the integration tests
of ~30-40%.
Note that some of the tests required adjusting the username, because
`chasquid-util user-add` would convert them to lowercase as per PRECIS
mapping.
This patch changes several internal packages to receive and pass tracing
annotations, making use of the new tracing library, so we can have
better debugging information.
ioutil package was deprecated in Go 1.16, replace all uses with their
respective replacements.
This patch was generated with a combination of `gofmt -r`, `eg`, and
manually (for `ioutil.ReadDir`).
Newer versions of msmtp now set In-Reply-To and References header,
causing t-16-spf test to fail because we expect them to be empty (for no
particular reason).
So this patch changes our expected DSN used for testing to ignore their
values.
By default, `git describe` uses only annotated tags. Since some may not
be annotated (like v1.10) this causes some scripts to pick a confusing
version identifier.
This patch fixes the issue by passing `--tags`, which means all tags
will be considered.
This patch is the result of running Go 1.19's `gofmt` on the codebase,
which automatically updates all Go doc comments to the new format.
https://tip.golang.org/doc/go1.19#go-doc
When running a diff for dkimpy's output, we expect that diff to exit with
non-zero code.
Unfortunately, the way we set that expectation (by prefixing the diff
invocation with `!` is incorrect.
Running `! diff ...` will not cause the hook to fail if diff exits with
0, instead `!` will cause the exit code to be ignored.
This patch fixes the problem by running `diff ... && exit 1` instead.
This was caught by shellcheck, https://www.shellcheck.net/wiki/SC2251.
We've accumulated a few linter issues around comments and a couple of
variable names.
While none of them is major, this patch cleans them up so it's easier to
go through the linter output, and we can start being more strict about
it.
The current generate_cert helper was originally taken from Go's source,
and is more complex than we need it to be.
This patch replaces it with our own version, rewritten from scratch
independently.
This patch moves the test helper binaries to a "one directory per
helper" layout, and also makes them to be ignored in the coverage build
instead of all builds.
With this change, "go build ./..." will build all binaries including the
test helpers, which helps make sure that module manage automation also
considers them. In particular, this makes "go mod tidy" work fine.
Dovecot's `state_dir` usually defaults to be at `/var/lib/dovecot`, or a
similar system-wide path.
Under some conditions, our test Dovecot instance can fail, because it's
wanting to write to state_dir, but it is not writeable by us in the test
environment.
This was reported by foxcpp in
https://github.com/albertito/chasquid/issues/28.
This patch fixes the problem by setting a custom state_dir to be within
our testing directory.
Thanks to foxcpp for reporting this problem and suggesting a fix.
Some dkimpy versions have a bug where it can't parse the keys generated
by its own key generator. That causes the dkimpy test to fail.
See https://bugs.launchpad.net/dkimpy/+bug/1978835 for more details.
This patch adds a workaround which detects the buggy version, and skip
the test if needed.
This patch updates the dependency on blitiri.com.ar/go/spf from v1.2.0
to v1.3.0, which includes a few bug fixes.
There are no code changes needed, just some minor adjustment to the
tests due to error strings changing.
The go.mod "go" keyword is also bumped up to 1.15 since it's the minimum
supported version since commit e444fe1f (2021-10-05).
This patch simplifies the internal alias lookup logic, unifying it
across Resolve and Exists.
As part of this, the `alias-exists` hook is removed. It was redundant to
begin with, although it enabled a potential optimization, it isn't worth
the complexity. The timeout for execution of both was the same.
This change should be backwards-compatible because `alias-resolve` is
still used, and the semantics haven't changed.
The `which` command isn't guaranteed to be available, it is just
extremely common; `command -v` is the standard way to do find an
executable program. See https://lwn.net/Articles/874049/ for more
details.
This patch replaces the uses of `which` with `command -v`, which only
appears in a couple of tests.
When resolving MX records, we need to distinguish between "no such
domain" and other kinds of errors. Before Go 1.13, this was not
possible, so we had a workaround that assumed any permanent error was a
"no such domain", which is not great, but functional.
Now that our minimum supported version is Go 1.15, we can remove the
workaround.
This patch replaces the workaround with proper logic using
DNSError.IsNotFound to identify NXDOMAIN results when resolving MX
records.
This requires to adjust a few tests, that used to work on environments
where resolving unknown domains (used for testing) returned a permanent
error, and now they no longer do so. Instead of relying on this
environmental property, we make the affected tests use our own DNS
server, which should make them more hermetic and reproducible.
In the Dovecot integration test, we can now simplify the configuration
as we assume Dovecot 2.3 is the minimum version supported for testing
(as that's the one from Debian stable at the moment).
This patch adds support in the default hook for using dkimpy for DKIM
signing.
Unfortunately, dkimpy binaries have the same name as driusan/dkim's, so
we need to use --help to disambiguate. It's not pretty but it should
work, and is quite self contained.
Also, for the integration tests, we still need driusan/dkim because
dkimpy lacks the features needed. Specifically, dkimpy's dkimverify
can't be made to use custom DNS, or override the TXT values in any way,
so we can't verify that the generated signature is reasonable.
Thanks to ne9z@github for suggesting this change and providing an
alternative patch in https://github.com/albertito/chasquid/pull/19.
Most integration tests depend on the $HOSTALIASES environment variable
being functional. That variable works on most systems, but not all. In
particular, systems with `systemd-resolved` can cause the variable to be
ignored.
This was reported by Alex Ellwein in
https://github.com/albertito/chasquid/issues/20.
This patch makes the affected tests to be skipped if $HOSTALIASES is not
working properly. It also removes unnecessary hosts files from tests
which don't need it, and documents this behaviour.
Thanks to Alex Ellwein and foxcpp@ for reporting and helping investigate
this issue!
Some deployments already have users that authenticate without a domain.
Today, we refuse to even consider those, and reject them at parsing time.
However, it is a use-case worth supporting, at least with some
restrictions that make the complexity manageable.
This patch changes the auth package to support authenticating users
without an "@domain" part.
Those requests will always be directly passed on to the fallback
authenticator, if available.
The dovecot fallback authenticator can already handle this case just fine.
Today, we close the connection after 10 errors. While this is fine for
normal use, it is unnecessarily large.
Lowering it to 3 helps with defense-in-depth for cross-protocol attacks
(e.g. https://alpaca-attack.com/), while still being large enough for
useful troubleshooting and normal operation.
As part of this change, we also remove the AUTH-specific failures limit,
because they're covered by the connection limit.
In Go 1.16, "go get" on non-module paths now require an explicit version
to point to. Without a specific version, the invocation fails.
See https://golang.org/doc/go1.16#go-command for more details on the
change.
The test Dockerfile uses "go get" to fetch driusan/dkim's binaries, used
for integration testing.
So this patch adjusts the Dockerfile to fetch the latest version.
The docopt-go library is quite convenient, but it has been abandoned for
a while :(
Since we only use it for chasquid-util, this patch removes it and
replaces it with a custom small parser, that is a reasonable fit for the
required use cases.
The patch also adds a couple of tests to increase coverage.
NOTE: docopt-go accepted some undocumented behaviour, in particular the
use of "-a b" instead of "-a=b". The new parser does not, so some
user scripts may require updating.
I think this should be rare enough not to be worth the complexity of
adjusting the parser to allow it.
fexp is a testing utility, including it in the regular Go build confuses
some automation as it can think it's part of chasquid proper.
All other testing utilities are ignored via the "+build ignore"
annotation for this reason, so this patch adds it to fexp to fix this
issue.
The haproxy test config includes an obsolete "debug" entry, and is
missing some timeouts which, while harmless in this context, cause a
warning that can be confusing.
This patch fixes the debug entry by running haproxy -d as recommended,
and adds the essential timeouts to avoid the warning.
This patch implements support for incoming connections wrapped in the
HAProxy protocol v1.
This is useful when running chasquid behind a HAProxy server, as it
needs the original source IP to perform SPF checks.
This patch is a reimplementation of one originally provided by Denys
Vitali in pull request #15, except the logic for the protocol handling
is moved to a new package, and the smtpsrv.Conn handling of the source
IP is simplified.
It is marked as experimental for now, since we want to give it a bit
more exposure just in case the option/api needs adjustment.
Thanks a lot to Denys Vitali (@denysvitali in github) for sending the
original patch for this, and helping test it!
Allows terminating chasquid via the network. Useful to trigger a restart
(if there is an init system to relaunch chasquid) and thus reload certificates.
Amended-by: Alberto Bertogli <albertito@blitiri.com.ar>
Added tests, and adjusted shutdown sequence.
This patch removes the dependency on wget for fetching content over
http, which was used in one of the tests to do some checking on debug
and metric pages, as well as loop detection.
Instead of wget, we now use a small built-in utility called fexp.
Some utilities might want to access the EHLO/HELO domain in the
post-data hook (for example, to do additional SPF validations).
This patch implements that support, including sanitizing the EHLO domain
on the environment variable to reduce the risk of problems.
This patch makes chasquid's monitoring server expose an OpenMetrics
metrics endpoint.
It adds a new package "expvarom" which implements an HTTP handler that
exports expvar variables in the OpenMetrics text format.
Then, the handler is registered by the monitoring server at /metrics
(where most things expect it to be).
The existing exported variables are also extended with descriptions,
which is optional, but improves the readability of the metrics.