Discussion:
CRLF input through pipe causes mangled headers
Barry Pederson
2003-07-17 18:35:18 UTC
Permalink
I was trying to use Exim with Cyrus IMAP 2.1.14, and found that messages
generated by Cyrus' sieve filters, for vacations, forwards, and such - were
getting mangled by Exim 4.20. Exim adds an extra blank line after the first
header line - which makes the rest of the headers end up being part of the
message body.

You can reproduce this, without having to have Cyrus-IMAP, but making a
simple file, named "test.msg" for example, that has CRLF line endings - such as:

--------
x-foo: bar
Subject: test

This is a test
---------

Send it to yourself using Exim with something like:

exim -i -f "<>" -- ***@address.com <test.msg

and you'll see the extra blank line after "x-foo: bar", and the remaining
headers now treated as body.

This doesn't happen if the file has plain unix-style LF endings, and the CRLF
line-endings don't seem to bother genuine Sendmail (whatever version FreeBSD
4.8 uses).

I wonder if this has to do with the bit mentioned in the docs for the "-bm"
option where messages may, for compatibility with Sendmail and Smail, have
things like:


From sender Fri Jan 5 12:55 GMT 1997
From sender Fri, 5 Jan 97 12:55:01

as their first line. Could it be processing that first line specially, and
mishandling the "\r\n" at the end.

Barry




--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Philip Hazel
2003-07-17 20:03:07 UTC
Permalink
Post by Barry Pederson
I was trying to use Exim with Cyrus IMAP 2.1.14, and found that messages
generated by Cyrus' sieve filters, for vacations, forwards, and such - were
getting mangled by Exim 4.20. Exim adds an extra blank line after the first
header line - which makes the rest of the headers end up being part of the
message body.
There is a known issue in the way Cyrus and Exim treat line endings.
Post by Barry Pederson
You can reproduce this, without having to have Cyrus-IMAP, but making a
Exim lives in a Unix world. In the Unix world, line ends are represented
by LF characters alone. If you pass a local file to Exim, it expects it
to use the convention of the local operating system, namely, to have LF
line endings. If you create a file with CRLF line endings, you are
breaking the conventions of the world. Exim will treat the CRs as data
characters. (You can use -dropcr to tell it to drop CRs, however.)
Post by Barry Pederson
and you'll see the extra blank line after "x-foo: bar", and the remaining
headers now treated as body.
That should not be the case unless you deliver to Cyrus. Indeed, I have
just tested it using an appendfile delivery, and it is not the case.

Cyrus delivery has to be configured to use CRLF line endings, so what
Cyrus gets sent has lines ending in CRCRLF and it treats this as *two*
line ends. Hence the effect you see.
Post by Barry Pederson
This doesn't happen if the file has plain unix-style LF endings,
On a Unix system, one might hope that files *would* have Unix-style line
endings... Or am I being hopelessly naive?
Post by Barry Pederson
I wonder if this has to do with the bit mentioned in the docs for the "-bm"
option where messages may, for compatibility with Sendmail and Smail, have
From sender Fri Jan 5 12:55 GMT 1997
From sender Fri, 5 Jan 97 12:55:01
as their first line. Could it be processing that first line specially, and
mishandling the "\r\n" at the end.
No, that's not relevant.

The world is in a mess as regards line endings. It is hard to know what
should be done.


--
Philip Hazel University of Cambridge Computing Service,
***@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Barry Pederson
2003-07-17 20:28:47 UTC
Permalink
Post by Philip Hazel
There is a known issue in the way Cyrus and Exim treat line endings.
...
Post by Philip Hazel
Exim lives in a Unix world. In the Unix world, line ends are represented
by LF characters alone. If you pass a local file to Exim, it expects it
to use the convention of the local operating system, namely, to have LF
line endings. If you create a file with CRLF line endings, you are
breaking the conventions of the world. Exim will treat the CRs as data
characters. (You can use -dropcr to tell it to drop CRs, however.)
...
Post by Philip Hazel
The world is in a mess as regards line endings. It is hard to know what
should be done.
doh! should have know Exim would have a way to handle this. Added

drop_cr = true

to the configuration file and all seems fine now. I guess seeing it happen
just for the first line and not the entire message, and seeing Sendmail
handle it ok made me think: bug in Exim header processing.

I wonder if drop_cr should default to true? more compatible with sendmail
that way?

Thanks Philip

Barry


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Philip Hazel
2003-07-18 11:49:09 UTC
Permalink
Post by Barry Pederson
I wonder if drop_cr should default to true? more compatible with sendmail
that way?
What do other people think? Now that drop_cr drops only one CR preceding
a linefeed (the original hack dropped *all* CRs in a message), this
might be more acceptable. But I can't say I like it.


--
Philip Hazel University of Cambridge Computing Service,
***@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Sheldon Hearn
2003-07-18 12:05:11 UTC
Permalink
Post by Philip Hazel
Post by Barry Pederson
I wonder if drop_cr should default to true? more compatible with sendmail
that way?
What do other people think? Now that drop_cr drops only one CR preceding
a linefeed (the original hack dropped *all* CRs in a message), this
might be more acceptable. But I can't say I like it.
Having never needed it in 6 years on a variety of installations, I'd
prefer for such a thing to remain disabled by default.

I'm responding because I'm other people, not because my opinion's worth
more than anyone else's. :-)

Ciao,
Sheldon.

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Oliver Eikemeier
2003-07-19 18:06:44 UTC
Permalink
Post by Philip Hazel
Post by Barry Pederson
I wonder if drop_cr should default to true? more compatible with sendmail
that way?
What do other people think? Now that drop_cr drops only one CR preceding
a linefeed (the original hack dropped *all* CRs in a message), this
might be more acceptable. But I can't say I like it.
I admit, I'm a Cyrus-IMAPd user too.

Please, can someone advise me where it is preferable to *not* accept
both LF and CRLF? And converting CRLF to CRCRLF in output?

What I would expect is
LF in => CRLF out
CRLF in => CRLF out
CR in => CRLF out
and I don't care about the internal format. To make the purists happy, how
about adding a CR only when it is not already present?

<rant>
Coming from Mac OS I could *never* understand why any text editor cared for
line endings. I do not feel that a file from an foreign OS does break the
conventions of my world.
</rant>

Ok, just my 5 cents - no offense intended
Oliver


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Tony Finch
2003-07-18 14:10:13 UTC
Permalink
Post by Philip Hazel
What do other people think? Now that drop_cr drops only one CR preceding
a linefeed (the original hack dropped *all* CRs in a message), this
might be more acceptable. But I can't say I like it.
I don't see much harm in turning on drop_cr by default. The whole area
is so evil that anyone who causes problems by relying on particular
behaviour deserves the worst possible consequences. Another evil idea
I had as a possible alternative way of dealing with the Cyrus mismatch
would be to convert bare CRs into CRLFSPC, so that the additional line
break doesn't split the header.

Tony.
--
f.a.n.finch <***@dotat.at> http://dotat.at/
FORTIES: SOUTHEAST VEERING SOUTHWEST 4 OR 5, OCCASIONALLY 6. RAIN OR SHOWERS.
MODERATE OR GOOD.

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Philip Hazel
2003-07-18 15:49:32 UTC
Permalink
Post by Tony Finch
I don't see much harm in turning on drop_cr by default. The whole area
is so evil that anyone who causes problems by relying on particular
behaviour deserves the worst possible consequences. Another evil idea
I had as a possible alternative way of dealing with the Cyrus mismatch
would be to convert bare CRs into CRLFSPC, so that the additional line
break doesn't split the header.
Note that drop_cr does not only affect headers. It affects bodies as
well. I still don't like the idea of turning it on by default.

--
Philip Hazel University of Cambridge Computing Service,
***@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
James P. Roberts
2003-07-19 14:19:39 UTC
Permalink
Post by Philip Hazel
Post by Tony Finch
I don't see much harm in turning on drop_cr by default. The whole area
is so evil that anyone who causes problems by relying on particular
behaviour deserves the worst possible consequences. Another evil idea
I had as a possible alternative way of dealing with the Cyrus mismatch
would be to convert bare CRs into CRLFSPC, so that the additional line
break doesn't split the header.
Note that drop_cr does not only affect headers. It affects bodies as
well. I still don't like the idea of turning it on by default.
Given that it affects bodies, I conclude that it should NOT be on by default,
since an MTA should not mess with bodies. "Other people" (Hi, Sheldon!
*wink*) have expressed that this is not a commonly needed option, anyway...

Summary:
(1) MTA's should not modify message bodies.
(2) This issue apparently only comes up with Cyrus?
(3) Not everyone uses Cyrus...

Conclusions:
(1) The fix should really be done on the Cyrus side.
(2) My opinion is to NOT make it the Exim default.

Just my 2 cents.

Jim "another 'other people'" Roberts
Punster Productions, Inc.


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Yves Goergen
2003-07-19 14:41:15 UTC
Permalink
Post by James P. Roberts
Given that it affects bodies, I conclude that it should NOT be on by default,
since an MTA should not mess with bodies.
(...)
(1) MTA's should not modify message bodies.
I'm not that mail & RFC expert and I don't have the problem discussed here, but why shouldn't an MTA 'correct' line endings in headers and/or bodies of e-mail messages? Shouldn't there be a common format for internet transfers and isn't the entire mail processing line-based? (And that on different platforms with different line-endings.)

Now I think that changing line-endings won't be any problem, is it? Of course, as long as they're not changed in their meaning. Could anyone please explain this to me?

-yves


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
b***@medicine.nodak.edu
2003-07-19 15:52:00 UTC
Permalink
Post by James P. Roberts
(1) MTA's should not modify message bodies.
(2) This issue apparently only comes up with Cyrus?
(3) Not everyone uses Cyrus...
(1) The fix should really be done on the Cyrus side.
(2) My opinion is to NOT make it the Exim default.
I used tcpdump to take a look at the traffic Exim was generating both for SMTP
and LMTP delivery of these CRLF terminated messages, and found that Exim *is*
modifying the message bodies, adding an extra CR, so the lines end up
transmitted over the wire with CRCRLF terminations.

So Exim is already violating point #1, but it has to, because as Philip said -
it's a Unix world, and most files have just LF termination. Exim usually needs
to add CR to make it legal SMTP - it's just that in this case, they're already
there, but unfortunately Exim is just blindly adding some more.

I don't agree with point #2 - Cyrus may be the only thing we know of today that
triggers this problem by submitting messages that are already SMTP-ready
(line-termination-wise), but that doesn't mean other things won't pop up down
the road. Maybe something you'll want to use :)
Post by James P. Roberts
From what I can tell, the write_chunk() function in transport.c is where the
extra CRs come from. If it was made just a little smarter about when it adds
CRs, Exim could handle other kinds of input more gracefully.

Barry

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
b***@medicine.nodak.edu
2003-07-19 18:42:31 UTC
Permalink
This message is in MIME format.
--
Post by James P. Roberts
From what I can tell, the write_chunk() function in transport.c is where the
extra CRs come from. If it was made just a little smarter about when it adds
CRs, Exim could handle other kinds of input more gracefully.
I'm attaching a patch that I believe does the trick. Instead of just blindly
adding a '\r' whenever a '\n' is encountered, this will have Exim check that the
previous character it processed wasn't already a '\r'.

It's complicated by how Exim breaks messages down into chunks and has to watch
for things that need escaping, but I think I followed how that all works.

Barry
--
[ patch-src_transport.c of type application/octet-stream deleted ]
--


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
b***@medicine.nodak.edu
2003-07-19 18:50:04 UTC
Permalink
Post by b***@medicine.nodak.edu
I'm attaching a patch that I believe does the trick. Instead of just
blindly adding a '\r' whenever a '\n' is encountered, this will have Exim
check that the previous character it processed wasn't already a '\r'.
It's complicated by how Exim breaks messages down into chunks and has to
watch for things that need escaping, but I think I followed how that all
works.
--
[ patch-src_transport.c of type application/octet-stream deleted ]
Oops, no attachments allowed? Here's a URL:

http://barryp.org/misc/patch-src_transport.c.txt

Barry

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Philip Hazel
2003-07-21 11:05:57 UTC
Permalink
Thanks for all the contributions to this thread. I summarise as follows:

1. My original view was: "This is a Unix box; therefore we expect Unix line
termination rules for non-SMTP incoming messages." That was back in
1995. The world has changed since then. Perhaps this is no longer the
right stance?

2. The drop_cr option affects messages at the input stage. It converts
incoming files from CRLF terminator to the Unix standard.

3. Barry Pederson has suggested another approach, which is to make a
change at the output stage, when CRLF output is specified (either
SMTP, LMTP, or with the use_crlf option set).

4. There is also the point about what to do with "bare" CR characters.

5. MTAs shouldn't mess with message bodies, but I fear that translating
line terminations is a necessary evil that has to be an exception to
this.

Cyrus is becoming popular. It is clear that this issue is not going to
go away. Therefore, it would be best to come up with some solution that
is flexible and is going to last (having already seen two ad hoc
attempts be insufficient). However, I don't want to introduce a whole
lot of complication if it can be avoided.

Barry's idea is neat, but on thinking it over, I think I prefer to do
the fixing at the input stage, so that there is a standard form of
message representation in Exim's spool files. Then Exim either adds CR or
not on output, as the delivery format requires.

The questions then are (1) what facilities should be available? and (2)
what should be the default?

I know there have been programs that looked at the first line of a file,
and used the terminator to set a style that was applied to the rest of
it. I don't think this is a good idea; it can too easily go wrong.

Tony Finch said "The whole area is so evil" and he is quite right.
Teletype code (which is where separate CR, LF, and incidentally HT -
which also causes problems - came from) has a lot to answer for.
Post by Oliver Eikemeier
LF in => CRLF out
CRLF in => CRLF out
CR in => CRLF out
It would be nice to make that the default. But is it right to interpret
bare CR as a line end in the header? There have certainly been spams
where CRs have been present in the Subject: line, just to cause trouble.
Would breaking the line there make things better or worse?

How many OS use bare CR as a line ending?
Post by Oliver Eikemeier
<rant>
Coming from Mac OS I could *never* understand why any text editor cared for
line endings. I do not feel that a file from an foreign OS does break the
conventions of my world.
</rant>
<nostalgia>
Coming from OS-370, I wondered why ASCII doesn't have a "newline"
character, as EBCDIC does. But then EBCDIC *also* has CR and LF,
and OS-370 had data-independent "records" that were used for lines...
</nostalgia>

OK, after all that, here is a

PROPOSAL:

1. Exim continues to use LF terminators internally. Any translation is
done at the time the message is received, and as now, CRs are added
for SMTP, LMTP, and use_crlf delivery.

2. Both LF and CRLF are accepted as line terminators for all incoming
messages. This can be done making drop_cr the default.

3. As I understand it (please correct me if I'm wrong) Cyrus treats bare
CRs as line terminators. A message with a bare CR in a header line
probably then causes the header to be prematurely terminated.
However, this may be the least evil, so I propose it as the default.

I note that RFC 2822 forbids bare CR (and bare LF, though of course
it is talking about messages on the wire, not about how they are
stored and processed locally). This is a tightening up of RFC 822.

4. Should there be any options to change these interpretations? If the
answer is "no", the drop_cr and -dropcr options can be made into
no-ops and obsoleted. If the answer is "yes", what is needed?

--
Philip Hazel University of Cambridge Computing Service,
***@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.



--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Tony Finch
2003-07-21 15:28:46 UTC
Permalink
Post by Philip Hazel
How many OS use bare CR as a line ending?
The most common one is Mac OS. I don't know if it's still the case for
Mac OS X, though. Acorn also used bare CRs, but that's not practically
useful information these days.
Post by Philip Hazel
3. As I understand it (please correct me if I'm wrong) Cyrus treats bare
CRs as line terminators. A message with a bare CR in a header line
probably then causes the header to be prematurely terminated.
My suggestion of transforming bare CR to CRLF space (or CRLF tab) was
intended for headers only. I'm not to happy with following the Cyrus
interpretation since it means different people along the relay chain
see different header information on the same message. I'm also not happy
about quietly throwing nasty octets away.
Post by Philip Hazel
4. Should there be any options to change these interpretations? If the
answer is "no", the drop_cr and -dropcr options can be made into
no-ops and obsoleted. If the answer is "yes", what is needed?
I note that RFC 3030 (BINARYMIME) doesn't have anything helpful to say --
it still insists on CRLF newlines :-(

Tony.
--
f.a.n.finch <***@dotat.at> http://dotat.at/
SHANNON: WEST 3 OR 4 BACKING SOUTHWEST 4 OR 5, OCCASIONALLY 6 LATER IN WEST.
RAIN LATER. GOOD.

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Philip Hazel
2003-07-22 08:50:06 UTC
Permalink
Post by Tony Finch
The most common one is Mac OS. I don't know if it's still the case for
Mac OS X, though. Acorn also used bare CRs, but that's not practically
useful information these days.
IIRC, Acorn changed to LF for Risc OS.
Post by Tony Finch
My suggestion of transforming bare CR to CRLF space (or CRLF tab) was
intended for headers only. I'm not to happy with following the Cyrus
interpretation since it means different people along the relay chain
see different header information on the same message. I'm also not happy
about quietly throwing nasty octets away.
I entirely agree, and, having studied the code, I think this won't be
too hard to do. So the proposal now is:

All three of CR, LF, and CRLF are treated as line terminators. However,
if a bare CR occurs in a header line, it is turned into a line
terminator followed by a space - in other words, it does not terminate
the logical header line.
Post by Tony Finch
Post by Philip Hazel
4. Should there be any options to change these interpretations? If the
answer is "no", the drop_cr and -dropcr options can be made into
no-ops and obsoleted. If the answer is "yes", what is needed?
I note that RFC 3030 (BINARYMIME) doesn't have anything helpful to say --
it still insists on CRLF newlines :-(
On the wire that is certainly true.

Oh well, I'll take all the options away, and wait to see if anybody
wants anything different.

--
Philip Hazel University of Cambridge Computing Service,
***@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Sheldon Hearn
2003-07-21 11:12:30 UTC
Permalink
Post by b***@medicine.nodak.edu
I used tcpdump to take a look at the traffic Exim was generating
both for SMTP and LMTP delivery of these CRLF terminated messages,
and found that Exim *is* modifying the message bodies, adding an
extra CR, so the lines end up transmitted over the wire with CRCRLF
terminations.
So Exim is already violating point #1, but it has to, because
as Philip said - it's a Unix world, and most files have just LF
termination. Exim usually needs to add CR to make it legal SMTP -
it's just that in this case, they're already there, but unfortunately
Exim is just blindly adding some more.
Hang on, you're misunderstanding the results you see. You're looking at
the message "in transit", where it's governed by network transport
rules. Those extra CRs you're seeing are removed at the other end of
the transport pipe, as mandated by the standards.

What we're talking about, with respect to changing message bodies, is
that Exim shouldn't make changes that are visible OUTSIDE the scope of
transport. This is also mandated (to varying degrees depending on
context) by the standards.

Ciao,
Sheldon.

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
James P. Roberts
2003-07-21 13:02:05 UTC
Permalink
Post by Sheldon Hearn
Post by b***@medicine.nodak.edu
I used tcpdump to take a look at the traffic Exim was generating
both for SMTP and LMTP delivery of these CRLF terminated messages,
and found that Exim *is* modifying the message bodies, adding an
extra CR, so the lines end up transmitted over the wire with CRCRLF
terminations.
So Exim is already violating point #1, but it has to, because
as Philip said - it's a Unix world, and most files have just LF
termination. Exim usually needs to add CR to make it legal SMTP -
it's just that in this case, they're already there, but unfortunately
Exim is just blindly adding some more.
Hang on, you're misunderstanding the results you see. You're looking at
the message "in transit", where it's governed by network transport
rules. Those extra CRs you're seeing are removed at the other end of
the transport pipe, as mandated by the standards.
What we're talking about, with respect to changing message bodies, is
that Exim shouldn't make changes that are visible OUTSIDE the scope of
transport. This is also mandated (to varying degrees depending on
context) by the standards.
Ciao,
Sheldon.
Ah. Thanks for the clarification. I completely agree with that concept.
Just as, for example, using SSL to encrypt a message in transport, which
obviously modifies things in transit, but make no changes to the contents
coming out the other end of the pipe. Cool. I think I get it.

Jim Roberts
Punster Productions, Inc.


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Barry Pederson
2003-07-22 15:31:59 UTC
Permalink
Post by Sheldon Hearn
Hang on, you're misunderstanding the results you see. You're looking at
the message "in transit", where it's governed by network transport
rules. Those extra CRs you're seeing are removed at the other end of
the transport pipe, as mandated by the standards.
I think this section from RFC 2822 pretty clearly states that Exim's current
behavior of potentially transmitting CRCRLF or just CR is pretty much
(excepting for service extensions) wrong.

----------------
2.3.7 Lines

SMTP commands and, unless altered by a service extension, message
data, are transmitted in "lines". Lines consist of zero or more data
characters terminated by the sequence ASCII character "CR" (hex value
0D) followed immediately by ASCII character "LF" (hex value 0A).
This termination sequence is denoted as <CRLF> in this document.
Conforming implementations MUST NOT recognize or generate any other
character or character sequence as a line terminator. Limits MAY be
imposed on line lengths by servers (see section 4.5.3).

In addition, the appearance of "bare" "CR" or "LF" characters in text
(i.e., either without the other) has a long history of causing
problems in mail implementations and applications that use the mail
system as a tool. SMTP client implementations MUST NOT transmit
these characters except when they are intended as line terminators
and then MUST, as indicated above, transmit them only as a <CRLF>
sequence.
------------------

Cyrus would then also be wrong in recognizing CRCRLF as two line terminators
- but this isn't a Cyrus mailing list, and I don't think that lets Exim off
the hook for potentially sending something it's not supposed to.
Post by Sheldon Hearn
Those extra CRs you're seeing are removed at the other end of
the transport pipe, as mandated by the standards.
I'm not sure where the standards specify that. Dropping any CRs is probably
just the most straightforward way of changing data from the SMTP-realm to the
Unix-realm, and I'm sure a lot of programs work that way. However, I don't
think it's necessarily something you can count on - and figure that you can
just blindly add as many CRs as you want expecting that someone else will
clean them up. And what if the other end does want to keep CRLF
(non-Unix-box, Cyrus, etc)?
Post by Sheldon Hearn
What we're talking about, with respect to changing message bodies, is
that Exim shouldn't make changes that are visible OUTSIDE the scope of
transport. This is also mandated (to varying degrees depending on
context) by the standards.
Yeah it'd be nice to expect that what went in one end came out exactly
byte-for-byte the same in the other. But can anyone in their right mind
really count on that - given that you may not be sending to another unix box,
or may be passing through who-knows-how-many virus scanners or other gateways?

I think the reality is that line terminators are always going to be changing
to suit the needs of whatever agent is holding the message at the moment. If
you want to transfer any exact sequence of bytes, you'd have to base64 or
uuencode it to have any hope if it passing through unchanged.

What I'm talking about - is just making sure Exim follows RFC2822 and only
transmits CRLF over the wire, and not CRCRLF or just CR or LF or any other
weird combination. If this can be guaranteed then, then everything else
should work itself out, or at least we can really say it's not an Exim problem.

Barry


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Sheldon Hearn
2003-07-22 15:58:09 UTC
Permalink
Post by Barry Pederson
I think this section from RFC 2822 pretty clearly states that Exim's current
behavior of potentially transmitting CRCRLF or just CR is pretty much
(excepting for service extensions) wrong.
Exim doesn't transmit "just CR", does it?
Post by Barry Pederson
----------------
2.3.7 Lines
SMTP commands and, unless altered by a service extension, message
data, are transmitted in "lines". Lines consist of zero or more data
characters terminated by the sequence ASCII character "CR" (hex value
0D) followed immediately by ASCII character "LF" (hex value 0A).
This termination sequence is denoted as <CRLF> in this document.
So <CRLF> is the line terminator, and anything preceding it is the line.
That means "This is a line.<CRCRLF>" should be interpreted (in transit)
as a line with value "This is a line.<CR>".

If that's the value of the line in the email message itself, then Exim
is doing the right thing.
Post by Barry Pederson
Conforming implementations MUST NOT recognize or generate any other
character or character sequence as a line terminator. Limits MAY be
imposed on line lengths by servers (see section 4.5.3).
If you were to change Exim to recognize <CR> as a line terminator and
then replace it with the transport terminator <CRLF>, you'd be in
violation of this mandate.
Post by Barry Pederson
In addition, the appearance of "bare" "CR" or "LF" characters in text
(i.e., either without the other) has a long history of causing
problems in mail implementations and applications that use the mail
system as a tool. SMTP client implementations MUST NOT transmit
these characters except when they are intended as line terminators
and then MUST, as indicated above, transmit them only as a <CRLF>
sequence.
This is where the standard gets stupid. For a start, this
recommendation blows up the holy grail of 8-bit transmission. :-)

Ciao,
Sheldon.

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Barry Pederson
2003-07-22 17:27:01 UTC
Permalink
Post by Sheldon Hearn
Post by Barry Pederson
I think this section from RFC 2822 pretty clearly states that Exim's current
behavior of potentially transmitting CRCRLF or just CR is pretty much
(excepting for service extensions) wrong.
Exim doesn't transmit "just CR", does it?
I haven't tried it, but from what I looked at while coming up with my patch,
it looked like it could happen if that's what was in the message body. So
that was a bit hypothetical.
Post by Sheldon Hearn
That means "This is a line.<CRCRLF>" should be interpreted (in transit)
as a line with value "This is a line.<CR>".
If that's the value of the line in the email message itself, then Exim
is doing the right thing.
No no no, in this case with Cyrus, the bare <CR>s (not paired with an LF) are
not in the lines of the email message itself, *EXIM* puts them there, that
can't be the right thing. Exim takes: "This is a line.<CRLF>", and
transmits it as: "This is a line.<CRCRLF>". The standard is very clear here,
Post by Sheldon Hearn
Post by Barry Pederson
In addition, the appearance of "bare" "CR" or "LF" characters in text
(i.e., either without the other) has a long history of causing
problems in mail implementations and applications that use the mail
system as a tool. SMTP client implementations MUST NOT transmit
these characters except when they are intended as line terminators
and then MUST, as indicated above, transmit them only as a <CRLF>
sequence.
This is where the standard gets stupid. For a start, this
recommendation blows up the holy grail of 8-bit transmission. :-)
I thought the general idea of 8-bit transmission had to do with not stripping
off the most significant bit - so that things like accented Latin-1
characters, which are printable text, pass through unchanged. I don't think
it necessarily means exact line-terminator preservation or binary-data
transmission, that's what base64 and so on are for.

I'm not going to argue whether the standard is stupid or not, but it *is* the
standard, and if you start ignoring the parts you don't like, that's where
the weird incompatibilities start popping up.

Barry


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Philip Hazel
2003-07-23 08:58:40 UTC
Permalink
Post by Barry Pederson
No no no, in this case with Cyrus, the bare <CR>s (not paired with an LF) are
not in the lines of the email message itself, *EXIM* puts them there, that
can't be the right thing. Exim takes: "This is a line.<CRLF>", and
transmits it as: "This is a line.<CRCRLF>".
That is only true when the source of the line is a non-SMTP input. Exim
currently thinks such lines should be terminated according to the Unix
conventions, since it is running on a Unix system.

Note: I have already said that I propose to do something about this (see
below). No need to keep on arguing!
Post by Barry Pederson
Post by Barry Pederson
In addition, the appearance of "bare" "CR" or "LF" characters in text
(i.e., either without the other) has a long history of causing
problems in mail implementations and applications that use the mail
system as a tool. SMTP client implementations MUST NOT transmit
these characters except when they are intended as line terminators
and then MUST, as indicated above, transmit them only as a <CRLF>
sequence.
That's all very well, but this begs the same question: what should Exim
do if a bare CR appears in an incoming message? This is the same
question as what should Exim do if a top-bit-set character appears in an
incoming message? At present (and I stress *at present*) Exim by default
does the same thing in both cases - it just transmits the character.

There was once an MTA that bounced messages that contained top-bit-set
characters. It was a real PITA for messages originating from countries
where ISO-8859-1 was the common character set, because accented
characters often appeared in messages. My take on this is that you are
more likely to achieve what the user wants by just transmitting such
characters.

Note that in some environments it is easy to get a top-bit-set character
into text by mistake - just brush against the wrong sequence of keys.
Post by Barry Pederson
I'm not going to argue whether the standard is stupid or not, but it *is* the
standard, and if you start ignoring the parts you don't like, that's where
the weird incompatibilities start popping up.
I'm afraid people have always ignored parts of RFCs they don't like.
This is the way the Internet works. In effect, the RFCs document
practice that should guarantee interworking. This doesn't always follow,
as I have learned in Exim. A number of restrictions have had to be
relaxed over the years because "other MTAs work that way". This means
there has to be a judgement call on each case.

I won't post anything more on this thread, but here is how I now stand:

ORIGINALLY

When I wrote Exim, I took the position that lines in files and pipes
inside Unix were LF-terminated, and lines on the wire in an SMTP
transaction were CRLF terminated. Translations were done for SMTP
input/output.

That didn't last long. :-(

COMPROMISES

(i) There were MTAs that sent bare LFs over the wire, "and they work
with other MTAs".

(ii) There were programs the injected local messages with CRLF, "and
they work with other MTAs".

(iii) There were cases where people wanted delivery using CRLF to a
file or a pipe.

For (ii) I invented -dropcr and later drop_cr. At first, it dropped
*all* CRs, later just those that precede LF. For (iii) I invented
use_crlf.

RETHINK

It seems clear that my previous approach to treating bare CRs like any
other "illegal" character (such as top-bit-set characters) is probably
not the best approach. Therefore, I am going to change Exim so that:

(a) The character sequences LF, CRLF, and CR-without-LF are all treated
as line endings when Exim reads a message.

(b) Each line ending will be converted to LF on input, so that
internally, Exim continues to use LF terminators.

(c) However, a bare CR in a message's headers will be treated specially,
because such a character probably does not indicate the end of a
header line. It will be converted to LF followed by one space.
(I might even try to be cleverer and do this only if the following
text doesn't look like the start of the next header line.)

(d) In a body, a bare CR will be converted to LF.

One advantage of doing bare CR handling is that it will stop people
playing silly games such as trying to obscure text and prevent it being
displayed.

I will make a new snapshot when I have done this work.


--
Philip Hazel University of Cambridge Computing Service,
***@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Sheldon Hearn
2003-07-23 09:11:40 UTC
Permalink
Post by Philip Hazel
Note: I have already said that I propose to do something about this (see
below). No need to keep on arguing!
Perfect timing. I've just seen Barry's point and am happy to be quiet.
:-)

Ciao,
Sheldon.

--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Barry Pederson
2003-07-23 14:52:36 UTC
Permalink
Post by Sheldon Hearn
Post by Philip Hazel
Note: I have already said that I propose to do something about this (see
below). No need to keep on arguing!
Perfect timing. I've just seen Barry's point and am happy to be quiet.
:-)
Me too. I didn't mean all that so much as an argument, but more as
constructive criticism/debate in the hopes of making Exim a better package.

Thanks Philip for a great program, and thanks Sheldon for maintaining the
FreeBSD Exim ports, which I use all the time :)

Barry


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Barry Pederson
2003-07-18 17:10:46 UTC
Permalink
Post by Tony Finch
Post by Philip Hazel
What do other people think? Now that drop_cr drops only one CR preceding
a linefeed (the original hack dropped *all* CRs in a message), this
might be more acceptable. But I can't say I like it.
I don't see much harm in turning on drop_cr by default. The whole area
is so evil that anyone who causes problems by relying on particular
behaviour deserves the worst possible consequences. Another evil idea
I had as a possible alternative way of dealing with the Cyrus mismatch
would be to convert bare CRs into CRLFSPC, so that the additional line
break doesn't split the header.
Let me throw another thing in here...

Philip mentioned testing without problem with an appendfile delivery, but in
my setup I was delivering to Cyrus using the "smtp" driver with the "lmtp"
option. I'd imagine Exim must be adding CRs to messsages to make it legal
SMTP/LMTP? Maybe Exim's doing that a bit too eagerly?

Barry


--

## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
Loading...