This is part 7 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Correcting the problem

Not wanting to duplicate work, I pulled down the selinux upstream trunk code from their git repo just to make sure the bug had not already been addressed. The entire file was identical to the shipped RHEL version. I could not pull a solution in, but at least the same patch would work for both.

Since I already had the trunk downloaded, I worked in there so git could handle the patch generation for me. As this needed to get accepted upstream, I tried to change as little as possible and match the coding style, complete with use of 'goto'.

The loop is particularly dirty, but if you read the bottom half of the getgrnam_r man page , it basically has to be that ugly. Errno was unreliably set and the function uses the unusual mix of return value and buffer squashing to indicate failures.

A 'do -> while' with a condition on retval and grent might be a little cleaner, but potentially deviate from the coding style a bit too much and therefor possibly slow adoption upstream. (There was a seriously powerful ticking clock in play here)

diff --git a/libselinux/src/seusers.c b/libselinux/src/seusers.c
index fc75cb6..b653cad 100644
--- libselinux/src/seusers.c
+++ libselinux/src/seusers.c
@@ -5,6 +5,7 @@
 #include <stdio.h>
 #include <stdio_ext.h>
 #include <ctype.h>
+#include <errno.h>
 #include <selinux/selinux.h>
 #include <selinux/context.h>
 #include "selinux_internal.h"
@@ -118,13 +119,26 @@ static int check_group(const char *group, const char *name, const gid_t gid) {
    long rbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
    if (rbuflen <= 0)
        return 0;
-   char *rbuf = malloc(rbuflen);
-   if (rbuf == NULL)
-       return 0;
-   if (getgrnam_r(group, &gbuf, rbuf, rbuflen, 
-              &grent) != 0)
-       goto done;
+   char *rbuf;
+   while(1) {
+       rbuf = malloc(rbuflen);
+       if (rbuf == NULL)
+           return 0;
+       int retval = getgrnam_r(group, &gbuf, rbuf, 
+               rbuflen, &grent);
+       if ( retval == ERANGE )
+       {
+           free(rbuf);
+           rbuflen = rbuflen * 2;
+       } else if ( retval != 0 || grent == NULL )
+       {
+           goto done;
+       } else
+       {
+           break;
+       }
+   }
    if (getgrouplist(name, gid, NULL, &ng) < 0) {
        groups = (gid_t *) malloc(sizeof (gid_t) * ng);

With the patch seemingly working against the little snippet of code I had pulled out, I went on to integrate it into a libselinux rpm for further testing. After building libselinux-2.0.94-5.el6$WORK.1.x86_64, I put it on my isolated system, verified it worked, and then tested it on the network host which had first exhibited the problem. It seemed to work there as well.

Next up, part 8: Deploying the fix.