Picking the domain used in the DSN message "From" header is more
complicated than it needs to be, causing confusing code paths and having
different uses for the hostname, which should be purely aesthetic.
This patch makes the queue pick the DSN "From" domain from the message
itself, by looking for a local domain in either the sender or the
original recipients. We should find at least one, otherwise it'd be
relaying.
This allows the code to be simplified, and we can narrow the scope of
the hostname option even further.
The queue IDs are internal to chasquid, and we don't need them to be
cryptographically secure, so this patch changes the ID generation to use
the PRNG.
This also helps avoid entropy issues.
glog works fine and has great features, but it does not play along well
with systemd or standard log rotators (as it does the rotation itself).
So this patch replaces glog with a new logging module "log", which by
default logs to stderr, in a systemd-friendly manner.
Logging to files or syslog is still supported.
Calculating the next delay based on the previous delay causes daemon
restarts to start from scratch, as we don't persist it.
This can cause a few server restarts to generate many unnecessary sends.
This patch changes the next delay calculation to use the creation time
instead, and also adds a <=1m random perturbation to avoid all queued
emails to be retried at the exact same time after a restart.
The queue currently only considers failed recipients when deciding
whether to send a DSN or not. This is a bug, as recipients that time out
are not taken into account.
This patch fixes that issue by including both failed and pending
recipients in the DSN.
It also adds more comprehensive tests for this case, both in the queue
and in the dsn generation code.
The default INFO logs are more oriented towards debugging and can be
a bit too verbose when looking for high-level information.
This patch introduces a new "maillog" package, used to log messages of
particular relevance to mail transmission at a higher level.
Today, we pick the domain used to send the DSN from based on what we
presented to the client at EHLO time, which itself may be based on the
TLS negotiation (which is not necessarily trusted).
This is complex, not necessarily correct, and involves passing the
domain around through the queue and persisting it in the items.
So this patch simplifies that handling by always using the main domain
as specified by the configuration.
This patch is the result of running go vet, go fmt -s and the linter,
and fixing some of the things they noted/suggested.
There shouldn't be any significant logic changes, it's mostly
readability improvements.
This patch simplifies the sending loop code:
- Move the recipient sending function from a closure to a method.
- Simplify the status update logic: we now update and write
unconditionally (as we should have been doing).
- Create a function for counting recipients in a given status.
It also adds a test for the removal of completed items from the queue,
which was not covered before and came up during development.
This patch introduces expvar counters to chasquid and the queue
packages.
For now there's only a handful of counters, but they will be expanded in
future patches.
This patch reviews various debug and informational messages, making more
uniform use of tracing, and extends the monitoring http server with
useful information like an index and a queue dump.
If there's an alias to forward email to a non-local domain, using the original
From is problematic, as we may not be an authorized sender for it.
Some MTAs (like Exim) will do it anyway, others (like gmail) will construct a
special address based on the original address.
This patch implements the latter approach, which is safer and allows the
receiver to properly enforce SPF.
We construct a (hopefully) reasonable From based on the local user, and
embedding the original From (but transformed for IDNA, as the receiver may not
support SMTPUTF8).
This patch performs some minor cleanups for things detected by "go vet":
- Remove one line of unreachable code.
- Don't leak contexts until their deadline expires, cancel them.
When we permanently failed to deliver to one or more recipients, send delivery
status notifications back to the sender.
To do this, we need to extend a couple of internal structures, to keep track
of the original destinations (so we can include them in the message, for
reference), and the hostname we're identifying ourselves as (this is arguable
but we're going with it for now, may change later).
This patch makes the queue and couriers distinguish between permanent and
transient errors when delivering mail to individual recipients.
Pipe delivery errors are always permanent.
Procmail delivery errors are almost always permanent, except if the command
exited with code 75, which is an indication of transient.
SMTP delivery errors are almost always transient, except if the DNS resolution
for the domain failed.
This patch tidies up the Procmail courier:
- Move the configuration options to the courier instance, instead of using
global variables.
- Implement more useful string replacement options.
- Use exec.CommandContext for running the command with a timeout.
As a consequence of the first item, the queue now takes the couriers via its
constructor.
With the introduction of aliases, the queue may now be delivering mail to
pipes. This patch implements pipe delivery.
It uses a fixed 30s timeout for now, as these commands should really not take
much time, and we don't want to overly complicate the configuration for now.
This patch makes the queue read and write items to disk.
It uses protobuf for serialization. We serialize to text format to make
manual troubleshooting easier, as the performance difference is not very
relevant for us.
This patch does various minor style and simplification cleanups, fixing things
detected by tools such as go vet, gofmt -s, and golint.
There are no functional changes, this change is purely cosmetic, but will
enable us to run those tools more regularly now that their output is clean.
The routing courier is a nice idea in theory, but at least for now, we want
the queue to be aware of when a destination is local so we can implement
differentiated logic.
This may change in the future, though, but at the moment it's not clear that
the abstractions will be worth it.
So this patch removes it, and makes the queue do the routing. There is no
difference in how the two are handled yet, those will come in subsequent
patches.
This patch introduces incremental delays in the queue, so retries are not all
done with the same delay.
The table is static; we could perturb it but there's not that much benefit
anyway, at least for now.
This patch introduces the couriers, which the queue uses to deliver mail.
We have a local courier (using procmail), a remote courier (uses SMTP), and a
router courier that decides which of the two to use based on a list of local
domains.
There are still a few things pending, but they all have their basic
functionality working and tested.
This patch introduces a basic, in-memory queue that only holds emails for now.
This slows down the benchmarks because we don't yet have a way to wait for
delivery (even if fake), that will come in later patches.