This is part 6 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Searching for the hypothesized buffer which had been outgrown

The group mapping was getting evaluated and applied by the pam_selinux.so library in the pam stack. I turned on debugging for the module, but it had no useful information. PAM is notoriously annoying to trace because there is no simple way to call it directly (without writing your own interface). As a quick and dirty solution, I attached strace in follow mode to mgetty running on a tty and then proceeded to login. Other than identifying the library call, not much useful information came from this.

open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/security/pam_selinux.so", O_RDONLY) = 4

Needing to know the function called by the pam_selinux module, but not wanting to go through the pain of trying to run a gdb trace of a pam stack or try and trace library execution, I just downloaded the pam source and looked at the code for pam_selinux.

The non-standard selinux headers provided good clues as to where I was headed after pam:

#include <selinux/selinux.h>
#include <selinux/get_context_list.h>
#include <selinux/flask.h>
#include <selinux/av_permissions.h>
#include <selinux/selinux.h>
#include <selinux/context.h>
#include <selinux/get_default_type.h>

However, I still needed the function calls to start tracing. Fortunately the pam source is pretty vanilla and it was relatively simple to read from the entry call to return. The bit I cared about was just where you expected it to be:

#ifdef HAVE_GETSEUSER
  if (pam_get_item(pamh, PAM_SERVICE, (void *) &void_service) != PAM_SUCCESS ||
                   void_service == NULL) {
    return PAM_SESSION_ERR;
  }
  service = void_service;

  if (getseuser(username, service, &seuser, &level) == 0) {
#else
  if (getseuserbyname(username, &seuser, &level) == 0) {
#endif

Side Note: I ran across the utility a few days after this transpired, but the PAM tracing could have been sidestepped by using the utility 'selinuxdefcon' to lookup the default context of a user.

$ selinuxdefcon user1 system_u:system_r:sshd_t:s0
user_u:user_r:user_t:s0-s0:c0

Since the libselinux package owns the library identified earlier (rpm -qf) as well as the header files that pam_selinux used, off to that source. After greping for the function (getseuser) pulled out of the pam source, I landed in seusers.c.

Fortunately seusers.c had only 5 functions:

process_seusers(const char *buffer, char **luserp, char **seuserp, char **levelp, int mls_enabled)
static gid_t get_default_gid(const char *name)
static int check_group(const char *group, const char *name, const gid_t gid)
int getseuserbyname(const char *name, char **r_seuser, char **r_level)
int getseuser(const char *username, const char *service, char **r_seuser, char **r_level)

I knew pam was calling 'getseuser', but getting all the way down here was still a pretty big leap in the number of things that could have been the root of the problem. Rather than potentially waste time chasing a dead end, I wrote up a quick test to double check I was on the right track:

#include <stdio.h>
#include <stdlib.h>

#include <selinux/selinux.h>

int main(void) {
  const char *username = "user1";
  char *seuser=NULL;
  char *level=NULL;
  const char *service = "sshd";

  if (getseuser(username, service, &seuser, &level) == 0) {
          printf("Username= %s SELinux User = %s Level= %s\n",
                             username, seuser, level);
          free(seuser);
          free(level);
  }

}

After compiling and executing, this test was able to confirm that the problem was somewhere inside the 'getseuser' call and not buried in pam, nss, or any of its friends. That left a blissfully small search space.

$ gcc -lselinux test-small.c

When 10 users (including user1) are in 'largegroup':

$ ./a.out
Username= user1 SELinux User = user_u Level= s0-s0:c1.c2

and when 70 users (including user1) are in 'largegroup'.

$ ./a.out
Username= user1 SELinux User = user_u Level= s0-s0:c0

The entry I was looking for was 'getseuser'. It basically did some prep and then called 'getseuserbyname' which was responsible for the bulk of the work, including calling 'check_group':

if (username[0] == '%' &&·
    !groupseuser &&·
    check_group(&username[1], name, gid)) {
        groupseuser = seuser;
        grouplevel = level;
} else { 

Since 'getseuserbyname' was not really doing group processing, but rather letting 'check_group' handle just about everything, I focused on the latter.

I did not want to worry about compiling the entire libselinux stack for testing, so I grabbed the 'check_group' function and modified it just enough to get it to compile and run standalone:

#include <sys/types.h<
#include <grp.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <stdio_ext.h>
#include <ctype.h>
#include <errno.h>

// int check_group(const char *group, const char *name, const gid_t gid) {
int main (void) {
    int match = 0;
    int i, ng = 0;
    gid_t *groups = NULL;
    struct group gbuf, *grent = NULL;
    const char *group = "largegroup";
    const char *name = "user1";
    const gid_t gid = 502;

    long rbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
    if (rbuflen <= 0)
            return 0;
    char *rbuf = malloc(rbuflen);
    if (rbuf == NULL)
            return 0;
    
    if (getgrnam_r(group, &gbuf, rbuf, rbuflen,
                   &grent) != 0)
            goto done;

    if (getgrouplist(name, gid, NULL, &ng) < 0) {
            groups = (gid_t *) malloc(sizeof (gid_t) * ng);
            if (!groups) goto done;
            if (getgrouplist(name, gid, groups, &ng) < 0) goto done;
    }

    for (i = 0; i < ng; i++) {
            if (grent->gr_gid == groups[i]) {
                    match = 1;
                    goto done;
            }
    }

 done:
    free(groups);
    free(rbuf);
    printf("Returning %d\n", match);
    return match;
}

With this single function standalone I was still able to reliably reproduce the max members in group issue, so I continued on.

Reading through the code, the:

long rbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
    if (getgrnam_r(group, &gbuf, rbuf, rbuflen, &grent) != 0)
            goto done;

stood out, because how could a system constant possibly know the maximum size of a group set?

The man page for getgrnam_r indicated that it could return ERANGE as "Insufficient buffer space supplied" which effectively confirmed I was looking at the bug, since the existing code did not check or handle that situation.

I tested the errno and return values using the stub code from before, checking that in fact ERANGE was getting returned on large groups and that was resulting in the error.

Next up, part 7: Correcting the problem.