# cat /dev/two

To content | To menu | To search

Tag - bugfix

Entries feed - Comments feed


SSSD and Kerberos Replay Cache

Recently I was contacted about a problem where sssd was "randomly" failing to continue functioning on a rebuilt server. Unreliably reproducible bugs are of course the most annoying to troubleshoot, fortunately this one was not actually random. A quick peek at the debug and audit logs showed sssd getting hung up trying to work with a file named ${hostname}-[0-9]+_0 in /var/tmp which was labeled "sshd_tmp_t". Ah, this old problem. It seems to come up pretty frequently when using a particular kerberos server.

The machine in question was setup to resolve users and groups via sssd which was configured to use kerberos for authentication ( auth_provider = krb5 ) and ldap for identification/authorization ( id_provider = ldap ), the latter of which was binding to its server via GSSAPI ( ldap_sasl_mech = GSSAPI ). Like any good kerberos client, sssd keeps a kerberos replay cache to protect against certain types of attacks, or at least it does when the non-default option of "krb5_validate" is set to "true" (you are setting it to "true", right?). As one would expect, sshd maintains a replay cache too.

When creating the kerberos replay cache, unless overridden, the kerberos libraries decide on the default file name by the identifier of the first viable service key in the selected keytab. That means, if your hostname is "erwin.example.com" and your system keytab happens to look like:


then your default replace cache file will be:


However, if your keytab happens to look like:


then your default replace cache file will be something like:


At this point you might reasonably ask, who cares what the replace cache is named? That is a very reasonably question. In this particular case, SELinux cares, a lot. One of SELinux's primary enforcement models is centered around the label on a file, and one of the methods for ensuring a file is correctly labeled upon creation, is its full path. Absent path based recommendations (subject to policy approval), newly created files are labeled based on the parent process's label and the label of the directory where the file is getting created. (That is simplified a little bit, but it is sufficient for this explanation).

If there were no path based rules in the standard RHEL/Fedora SELinux policy, when sssd wrote the replay cache into /var/tmp, the file would be labeled user_tmp_t. When sshd wrote the replay cache into /var/tmp, the file would be labeled sshd_tmp_t. As one would expect, sssd is not permitted to work with files labeled sshd_tmp_t, those are exclusive to sshd. So, if the file sssd thought was supposed to be the host's kerberos replay cache happened to be labeled sshd_tmp_t, sssd would be unable to manipulate the replay cache and would consequently fail secure (aka, stop functioning). The "random" failure experienced was sssd stopping after any user had authenticated to sshd (thereby causing sshd to write out a replay cache with the "wrong" label), effectively denying sssd access to the replay cache.

To address this, a path based rule is included in SELinux that says

/var/tmp/host_0 --  system_u:object_r:krb5_host_rcache_t:s0

By default, /var/tmp is a world writable space, any file by just about any name and created by any user can exist there. That means any SELinux path based rules for /var/tmp need to be quite specific, one cannot simply say that any file in that directory should be labeled krb5_host_rcache_t. Since the normal keytab layout results in a file name of "host_0", this is nearly always sufficient. However, as we have seen, the solution completely relies on the order inside the keytab being such that the file generated by the kerberos libraries is named "host_0".

That covers why the problem was occurring, on to how to fix it. There are a number of possible resolutions, here are two of them:

  1. Reorder the keytab so the "host/" entry is first
  2. Configure sssd to use a different rcache directory

Reorder the keytab so the "host/" entry is first

This is relatively simple for a single host. ktutil from the krb5-workstation package has the ability to manipulate keytabs. Read the original keytab in twice, delete the extraneous entries to reverse the order, and write the corrected version out to a new file.

$ ktutil
ktutil:  rkt /tmp/bad.keytab
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2                erwin$@EXAMPLE.COM
   2    2 host/erwinexample.com@EXAMPLE.COM

ktutil:  rkt /tmp/bad.keytab
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2                erwin$@EXAMPLE.COM
   2    2 host/erwinexample.com@EXAMPLE.COM
   3    2                erwin$@EXAMPLE.COM
   4    2 host/erwinexample.com@EXAMPLE.COM

ktutil:  delent 1
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2 host/erwinexample.com@EXAMPLE.COM
   2    2                erwin$@EXAMPLE.COM
   3    2 host/erwinexample.com@EXAMPLE.COM

ktutil:  delent 3
ktutil:  list
slot KVNO Principal
---- ---- ---------------------------------
   1    2 host/erwinexample.com@EXAMPLE.COM
   2    2                erwin$@EXAMPLE.COM

ktutil:  wkt /tmp/fixed.keytab

Replace the host keytab with the re-ordered host keytab, restart the associated daemons, and all is well. The kerberos libraries will now generate a replay cache name of "host_0" and the default SELinux path rules will cover everything.

Configure sssd to use a different rcache directory

While rearranging a keytab is the traditional solution, depending on your environment, it may not scale well. This is especially true if your kerberos realm happens to be backed by Active Directory. (Note to self, get around to writing a post on decent ways to join RHEL into an AD domain without samba, it comes up frequently enough)

Fortunately, as of RHEL6.2 (scroll to BZ#732974) sssd provides work around. The bug is unfortunately private, but it points to a patch in sssd >= 1.7 which was also backported to sssd 1.5 and sssd 1.6.

The patch adds an option, "krb5_rcache_dir", which can be used to specify the directory for storage of replay caches and then sets the new default to be "%{_localstatedir}/cache/krb5rcache" (aka "/var/cache/krb5rcache/").

Additionally, as of selinux-policy-3.7.19-105.el6, a supplemental path labeling rule was added:

/var/cache/krb5rcache(/.*)? system_u:object_r:krb5_host_rcache_t:s0

That means, if you are using sssd >= 1.7 or the patched 1.5/1.6 along with selinux-policy >= 3.7.19-105, this problem is already solved for you. The updated sssd writes out its replay cache into "/var/cache/krb5rcache/", and any file in that folder is labeled "krb5_host_rcache_t". It no longer matters what the kerberos libraries happen to name the file.


Bugfixing selinux login maps (p 8 of 8)

This is part 8 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Deploying the fix

Due to the combination of the sensitive nature of this library, my rusty C, and that I had never worked inside libselinux before, I was not in a particular rush to override the Red Hat provided (and supported) package without at least one other person reviewing it, preferably someone who had written C in the past decade.

After considering the options, I pushed the locally patched libselinux package to the two hosts that Alice had to use as soon as possible, and updated the issue with Red Hat. Fortunately, a few hours after RHBZ#748471 was filed, I spotted Dan Walsh's commit to upstream with the exact patch I had proposed. That eased concerns significantly.

About a month later Red Hat released the fix in RHBA-2011-1559 which is part of RHEL6.2.

I did not notice it at the time, but Dan Walsh also blogged about the issue.

It was too late to respond on Dan's blog (comments are now stuck in /dev/null'ed moderation), but in the unlikely event "abbra" is reading this: I really did not care what was in the errata, I just needed a fix ASAP and was trying to help along the process, providing all of the relevant information I had located. Had Dan, or anyone else, had a better (or even just different) fix, that would have been perfectly fine.


Bugfixing selinux login maps (p 7 of 8)

This is part 7 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Correcting the problem

Not wanting to duplicate work, I pulled down the selinux upstream trunk code from their git repo just to make sure the bug had not already been addressed. The entire file was identical to the shipped RHEL version. I could not pull a solution in, but at least the same patch would work for both.

Since I already had the trunk downloaded, I worked in there so git could handle the patch generation for me. As this needed to get accepted upstream, I tried to change as little as possible and match the coding style, complete with use of 'goto'.

The loop is particularly dirty, but if you read the bottom half of the getgrnam_r man page , it basically has to be that ugly. Errno was unreliably set and the function uses the unusual mix of return value and buffer squashing to indicate failures.

A 'do -> while' with a condition on retval and grent might be a little cleaner, but potentially deviate from the coding style a bit too much and therefor possibly slow adoption upstream. (There was a seriously powerful ticking clock in play here)

diff --git a/libselinux/src/seusers.c b/libselinux/src/seusers.c
index fc75cb6..b653cad 100644
--- libselinux/src/seusers.c
+++ libselinux/src/seusers.c
@@ -5,6 +5,7 @@
 #include <stdio.h>
 #include <stdio_ext.h>
 #include <ctype.h>
+#include <errno.h>
 #include <selinux/selinux.h>
 #include <selinux/context.h>
 #include "selinux_internal.h"
@@ -118,13 +119,26 @@ static int check_group(const char *group, const char *name, const gid_t gid) {
    long rbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
    if (rbuflen <= 0)
        return 0;
-   char *rbuf = malloc(rbuflen);
-   if (rbuf == NULL)
-       return 0;
-   if (getgrnam_r(group, &gbuf, rbuf, rbuflen, 
-              &grent) != 0)
-       goto done;
+   char *rbuf;
+   while(1) {
+       rbuf = malloc(rbuflen);
+       if (rbuf == NULL)
+           return 0;
+       int retval = getgrnam_r(group, &gbuf, rbuf, 
+               rbuflen, &grent);
+       if ( retval == ERANGE )
+       {
+           free(rbuf);
+           rbuflen = rbuflen * 2;
+       } else if ( retval != 0 || grent == NULL )
+       {
+           goto done;
+       } else
+       {
+           break;
+       }
+   }
    if (getgrouplist(name, gid, NULL, &ng) < 0) {
        groups = (gid_t *) malloc(sizeof (gid_t) * ng);

With the patch seemingly working against the little snippet of code I had pulled out, I went on to integrate it into a libselinux rpm for further testing. After building libselinux-2.0.94-5.el6$WORK.1.x86_64, I put it on my isolated system, verified it worked, and then tested it on the network host which had first exhibited the problem. It seemed to work there as well.

Next up, part 8: Deploying the fix.

- page 1 of 3