Recently I was contacted about a problem where sssd was "randomly" failing to continue functioning on a rebuilt server. Unreliably reproducible bugs are of course the most annoying to troubleshoot, fortunately this one was not actually random. A quick peek at the debug and audit logs showed sssd getting hung up trying to work with a file named ${hostname}-[0-9]+_0 in /var/tmp which was labeled "sshd_tmp_t". Ah, this old problem. It seems to come up pretty frequently when using a particular kerberos server.

The machine in question was setup to resolve users and groups via sssd which was configured to use kerberos for authentication ( auth_provider = krb5 ) and ldap for identification/authorization ( id_provider = ldap ), the latter of which was binding to its server via GSSAPI ( ldap_sasl_mech = GSSAPI ). Like any good kerberos client, sssd keeps a kerberos replay cache to protect against certain types of attacks, or at least it does when the non-default option of "krb5_validate" is set to "true" (you are setting it to "true", right?). As one would expect, sshd maintains a replay cache too.

When creating the kerberos replay cache, unless overridden, the kerberos libraries decide on the default file name by the identifier of the first viable service key in the selected keytab. That means, if your hostname is "erwin.example.com" and your system keytab happens to look like:

host/erwin.example.com@EXAMPLE.COM
erwin$@EXAMPLE.COM

then your default replace cache file will be:

/var/tmp/host_0

However, if your keytab happens to look like:

erwin$@EXAMPLE.COM
host/erwin.example.com@EXAMPLE.COM

then your default replace cache file will be something like:

/var/tmp/erwin-044_0

At this point you might reasonably ask, who cares what the replace cache is named? That is a very reasonably question. In this particular case, SELinux cares, a lot. One of SELinux's primary enforcement models is centered around the label on a file, and one of the methods for ensuring a file is correctly labeled upon creation, is its full path. Absent path based recommendations (subject to policy approval), newly created files are labeled based on the parent process's label and the label of the directory where the file is getting created. (That is simplified a little bit, but it is sufficient for this explanation).

If there were no path based rules in the standard RHEL/Fedora SELinux policy, when sssd wrote the replay cache into /var/tmp, the file would be labeled user_tmp_t. When sshd wrote the replay cache into /var/tmp, the file would be labeled sshd_tmp_t. As one would expect, sssd is not permitted to work with files labeled sshd_tmp_t, those are exclusive to sshd. So, if the file sssd thought was supposed to be the host's kerberos replay cache happened to be labeled sshd_tmp_t, sssd would be unable to manipulate the replay cache and would consequently fail secure (aka, stop functioning). The "random" failure experienced was sssd stopping after any user had authenticated to sshd (thereby causing sshd to write out a replay cache with the "wrong" label), effectively denying sssd access to the replay cache.

To address this, a path based rule is included in SELinux that says

/var/tmp/host_0 --  system_u:object_r:krb5_host_rcache_t:s0

By default, /var/tmp is a world writable space, any file by just about any name and created by any user can exist there. That means any SELinux path based rules for /var/tmp need to be quite specific, one cannot simply say that any file in that directory should be labeled krb5_host_rcache_t. Since the normal keytab layout results in a file name of "host_0", this is nearly always sufficient. However, as we have seen, the solution completely relies on the order inside the keytab being such that the file generated by the kerberos libraries is named "host_0".

That covers why the problem was occurring, on to how to fix it. There are a number of possible resolutions, here are two of them:

  1. Reorder the keytab so the "host/" entry is first
  2. Configure sssd to use a different rcache directory

Reorder the keytab so the "host/" entry is first

This is relatively simple for a single host. ktutil from the krb5-workstation package has the ability to manipulate keytabs. Read the original keytab in twice, delete the extraneous entries to reverse the order, and write the corrected version out to a new file.

$ ktutil
ktutil:  rkt /tmp/bad.keytab
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2                erwin$@EXAMPLE.COM
   2    2 host/erwinexample.com@EXAMPLE.COM

ktutil:  rkt /tmp/bad.keytab
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2                erwin$@EXAMPLE.COM
   2    2 host/erwinexample.com@EXAMPLE.COM
   3    2                erwin$@EXAMPLE.COM
   4    2 host/erwinexample.com@EXAMPLE.COM

ktutil:  delent 1
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2 host/erwinexample.com@EXAMPLE.COM
   2    2                erwin$@EXAMPLE.COM
   3    2 host/erwinexample.com@EXAMPLE.COM

ktutil:  delent 3
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2 host/erwinexample.com@EXAMPLE.COM
   2    2                erwin$@EXAMPLE.COM

ktutil:  wkt /tmp/fixed.keytab

Replace the host keytab with the re-ordered host keytab, restart the associated daemons, and all is well. The kerberos libraries will now generate a replay cache name of "host_0" and the default SELinux path rules will cover everything.

Configure sssd to use a different rcache directory

While rearranging a keytab is the traditional solution, depending on your environment, it may not scale well. This is especially true if your kerberos realm happens to be backed by Active Directory. (Note to self, get around to writing a post on decent ways to join RHEL into an AD domain without samba, it comes up frequently enough)

Fortunately, as of RHEL6.2 (scroll to BZ#732974) sssd provides work around. The bug is unfortunately private, but it points to a patch in sssd >= 1.7 which was also backported to sssd 1.5 and sssd 1.6.

The patch adds an option, "krb5_rcache_dir", which can be used to specify the directory for storage of replay caches and then sets the new default to be "%{_localstatedir}/cache/krb5rcache" (aka "/var/cache/krb5rcache/").

Additionally, as of selinux-policy-3.7.19-105.el6, a supplemental path labeling rule was added:

/var/cache/krb5rcache(/.*)? system_u:object_r:krb5_host_rcache_t:s0

That means, if you are using sssd >= 1.7 or the patched 1.5/1.6 along with selinux-policy >= 3.7.19-105, this problem is already solved for you. The updated sssd writes out its replay cache into "/var/cache/krb5rcache/", and any file in that folder is labeled "krb5_host_rcache_t". It no longer matters what the kerberos libraries happen to name the file.