# cat /dev/two

To content | To menu | To search

Tag - bugfix

Entries feed - Comments feed

2012-09-08

Bugfixing selinux login maps (p 6 of 8)

This is part 6 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Searching for the hypothesized buffer which had been outgrown

The group mapping was getting evaluated and applied by the pam_selinux.so library in the pam stack. I turned on debugging for the module, but it had no useful information. PAM is notoriously annoying to trace because there is no simple way to call it directly (without writing your own interface). As a quick and dirty solution, I attached strace in follow mode to mgetty running on a tty and then proceeded to login. Other than identifying the library call, not much useful information came from this.

open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/security/pam_selinux.so", O_RDONLY) = 4

Needing to know the function called by the pam_selinux module, but not wanting to go through the pain of trying to run a gdb trace of a pam stack or try and trace library execution, I just downloaded the pam source and looked at the code for pam_selinux.

The non-standard selinux headers provided good clues as to where I was headed after pam:

#include <selinux/selinux.h>
#include <selinux/get_context_list.h>
#include <selinux/flask.h>
#include <selinux/av_permissions.h>
#include <selinux/selinux.h>
#include <selinux/context.h>
#include <selinux/get_default_type.h>

However, I still needed the function calls to start tracing. Fortunately the pam source is pretty vanilla and it was relatively simple to read from the entry call to return. The bit I cared about was just where you expected it to be:

#ifdef HAVE_GETSEUSER
  if (pam_get_item(pamh, PAM_SERVICE, (void *) &void_service) != PAM_SUCCESS ||
                   void_service == NULL) {
    return PAM_SESSION_ERR;
  }
  service = void_service;

  if (getseuser(username, service, &seuser, &level) == 0) {
#else
  if (getseuserbyname(username, &seuser, &level) == 0) {
#endif

Side Note: I ran across the utility a few days after this transpired, but the PAM tracing could have been sidestepped by using the utility 'selinuxdefcon' to lookup the default context of a user.

$ selinuxdefcon user1 system_u:system_r:sshd_t:s0
user_u:user_r:user_t:s0-s0:c0

Since the libselinux package owns the library identified earlier (rpm -qf) as well as the header files that pam_selinux used, off to that source. After greping for the function (getseuser) pulled out of the pam source, I landed in seusers.c.

Fortunately seusers.c had only 5 functions:

process_seusers(const char *buffer, char **luserp, char **seuserp, char **levelp, int mls_enabled)
static gid_t get_default_gid(const char *name)
static int check_group(const char *group, const char *name, const gid_t gid)
int getseuserbyname(const char *name, char **r_seuser, char **r_level)
int getseuser(const char *username, const char *service, char **r_seuser, char **r_level)

I knew pam was calling 'getseuser', but getting all the way down here was still a pretty big leap in the number of things that could have been the root of the problem. Rather than potentially waste time chasing a dead end, I wrote up a quick test to double check I was on the right track:

#include <stdio.h>
#include <stdlib.h>

#include <selinux/selinux.h>

int main(void) {
  const char *username = "user1";
  char *seuser=NULL;
  char *level=NULL;
  const char *service = "sshd";

  if (getseuser(username, service, &seuser, &level) == 0) {
          printf("Username= %s SELinux User = %s Level= %s\n",
                             username, seuser, level);
          free(seuser);
          free(level);
  }

}

After compiling and executing, this test was able to confirm that the problem was somewhere inside the 'getseuser' call and not buried in pam, nss, or any of its friends. That left a blissfully small search space.

$ gcc -lselinux test-small.c

When 10 users (including user1) are in 'largegroup':

$ ./a.out
Username= user1 SELinux User = user_u Level= s0-s0:c1.c2

and when 70 users (including user1) are in 'largegroup'.

$ ./a.out
Username= user1 SELinux User = user_u Level= s0-s0:c0

The entry I was looking for was 'getseuser'. It basically did some prep and then called 'getseuserbyname' which was responsible for the bulk of the work, including calling 'check_group':

if (username[0] == '%' &&·
    !groupseuser &&·
    check_group(&username[1], name, gid)) {
        groupseuser = seuser;
        grouplevel = level;
} else { 

Since 'getseuserbyname' was not really doing group processing, but rather letting 'check_group' handle just about everything, I focused on the latter.

I did not want to worry about compiling the entire libselinux stack for testing, so I grabbed the 'check_group' function and modified it just enough to get it to compile and run standalone:

#include <sys/types.h<
#include <grp.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <stdio_ext.h>
#include <ctype.h>
#include <errno.h>

// int check_group(const char *group, const char *name, const gid_t gid) {
int main (void) {
    int match = 0;
    int i, ng = 0;
    gid_t *groups = NULL;
    struct group gbuf, *grent = NULL;
    const char *group = "largegroup";
    const char *name = "user1";
    const gid_t gid = 502;

    long rbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
    if (rbuflen <= 0)
            return 0;
    char *rbuf = malloc(rbuflen);
    if (rbuf == NULL)
            return 0;
    
    if (getgrnam_r(group, &gbuf, rbuf, rbuflen,
                   &grent) != 0)
            goto done;

    if (getgrouplist(name, gid, NULL, &ng) < 0) {
            groups = (gid_t *) malloc(sizeof (gid_t) * ng);
            if (!groups) goto done;
            if (getgrouplist(name, gid, groups, &ng) < 0) goto done;
    }

    for (i = 0; i < ng; i++) {
            if (grent->gr_gid == groups[i]) {
                    match = 1;
                    goto done;
            }
    }

 done:
    free(groups);
    free(rbuf);
    printf("Returning %d\n", match);
    return match;
}

With this single function standalone I was still able to reliably reproduce the max members in group issue, so I continued on.

Reading through the code, the:

long rbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
    if (getgrnam_r(group, &gbuf, rbuf, rbuflen, &grent) != 0)
            goto done;

stood out, because how could a system constant possibly know the maximum size of a group set?

The man page for getgrnam_r indicated that it could return ERANGE as "Insufficient buffer space supplied" which effectively confirmed I was looking at the bug, since the existing code did not check or handle that situation.

I tested the errno and return values using the stub code from before, checking that in fact ERANGE was getting returned on large groups and that was resulting in the error.

Next up, part 7: Correcting the problem.

2012-09-05

Bugfixing selinux login maps (p 5 of 8)

This is part 5 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Isolating the cause

Having successfully determined that the problem lay somewhere in the number of users in the group, I started considering where a bug of that nature might have been introduced.

The 67/68 boundary did not fall on any standard C mistake areas (multiples of 32, unsigned int overflows, ...), so I was a little suspicious of the hard boundary, thinking it was more along the lines of buffer space. Anyway, I wanted to replicate the problem in a more isolated environment to eliminate variables (ldap, sssd, ...) and provide a safe place for more invasive testing. On a clean standalone system I did:

# create 70 users to work with
$ for i in $(seq 01 70) ; do adduser user$i ;done

# and group to put them in
$ groupadd largegroup

# add all of the users to the new group
$ for i in $(seq 01 70) ; do usermod -G largegroup user$i ;done

# set a password for one of the users (so we can test with login)
$ passwd user1

# count the number of users in the group (add 1, just quickly counting commas)
$ getent group largegroup | grep --only-matching ,  | wc -l
69

# setup basic login policy
$ semanage login -a -s staff_u -r s0-s0:c0.c1023 'adminuser'
$ semanage login -m -s user_u -r s0-s0:c0 __default__
$ semanage login -a -s user_u -r s0-s0:c1.c2 '%largegroup'
$ service sshd start
Starting sshd:                                             [  OK  ]

# connect in as 'user1' who is a member of 'largegroup' and should be s0-s0:c1.c2
$ ssh -q -x user1@localhost 'id -a'
user1@localhost's password: 
uid=501(user1) gid=502(user1) groups=502(user1),572(largegroup) context=user_u:user_r:user_t:s0-s0:c0

# did not work..., remove all users from 'largegroup'
$ for i in $(seq 01 70) ; do usermod -G user$i user$i ;done
$ getent group largegroup | grep --only-matching ,  | wc -l
0

# add only 10 users back in
$ for i in $(seq 01 10) ; do usermod -G largegroup user$i ;done
$ getent group largegroup | grep --only-matching ,  | wc -l
9

# try again, now with only 10 members in the group, works correctly
$ ssh -q -x user1@localhost 'id -a'
user1@localhost's password: 
uid=501(user1) gid=502(user1) groups=502(user1),572(largegroup) context=user_u:user_r:user_t:s0-s0:c1,c2

# put 65 members in 'largegroup'
$ for i in $(seq 01 65) ; do usermod -G largegroup user$i ;done
$ getent group largegroup | grep --only-matching ,  | wc -l
64

# still working
$ ssh -q -x user1@localhost 'id -a'
user1@localhost's password: 
uid=501(user1) gid=502(user1) groups=502(user1),572(largegroup) context=user_u:user_r:user_t:s0-s0:c1,c2

# add a 66th member
$ usermod -G largegroup user66
$ getent group largegroup | grep --only-matching ,  | wc -l
65

# still working
$ ssh -q -x user1@localhost 'id -a'
user1@localhost's password: 
uid=501(user1) gid=502(user1) groups=502(user1),572(largegroup) context=user_u:user_r:user_t:s0-s0:c1,c2

# add a 67th member
$ usermod -G largegroup user67
$ getent group largegroup | grep --only-matching ,  | wc -l
66

# breaks, despite being a member of 'largegroup', user1 is no longer coming in as s0-s0:c1,c2
$ ssh -q -x user1@localhost 'id -a'
user1@localhost's password: 
uid=501(user1) gid=502(user1) groups=502(user1),572(largegroup) context=user_u:user_r:user_t:s0-s0:c0

# remove a user from 'largegroup', get it down to 66 members
$ usermod -G user67 user67
$ getent group largegroup | grep --only-matching ,  | wc -l
65

# working again....
$ ssh -q -x user1@localhost 'id -a'
user1@localhost's password: 
uid=501(user1) gid=502(user1) groups=502(user1),572(largegroup) context=user_u:user_r:user_t:s0-s0:c1,c2

As suspected, this broke at a different number of users, meaning it was not a set limit, but probably a buffer size somewhere. To confirm this, I ran through the steps two more times, once with really long usernames and once with really short ones. The long usernames broke at 34/35 and the short at 83/84. Clearly it was dependent on the length of the usernames in a group, meaning it was almost certainly a buffer space issue.

Next up, part 6: Searching for the hypothesized buffer which had been outgrown.

2012-09-03

Bugfixing selinux login maps (p 4 of 8)

This is part 4 of an 8 part post covering the process used to trace down and correct a problem with semanage login record group matching. If you have not already read the previous parts, you may want to start at the beginning

Narrowing the scope

Armed with a test user experiencing the problem, I started trying to identify why this one group was not working.

At this point I should mention, the real group at $WORK (the one this post is aliasing as 'ft-financial-accounting') was the first group used for mapping to have a name longer than 10 characters or contain multiple hyphens.

So, I started looking at the differences in group name length and characters, thinking there may be a parsing error somewhere in the stack. In my first attempt at isolating this as the issue, I created 'fin' (without dashes) and just put my test user in that. Since that worked, I then tried 'ftfinancialaccounting' (which worked) and 'ft-financial-accounting-0' (which also worked). This effectively eliminated a group name parsing issue. The only appreciable difference remaining was something based on the uids or number of uids inside the ft-financial-accounting group.

Since all of the uids were standard $WORK accounts already seemingly functioning everywhere, the number of uids inside a group seemed a more likely culprit. I stripped 'ft-financial-accounting' down to just my test user and everything worked. Adding all of the members back in returned it to broken, but when I cut the number of uids in half, and it returned to working.

Still, either it was an issue with the number of uids, or I happened to get lucky and remove the half of the uids which contained a problem entry. A quick swap of the active half user set quickly indicated it was a size issue and not just a bad uid. There still could have been a uid interaction problem with some uid from the first set and some from the second set, but that seemed far less likely.

I began a quick binary search trying to isolate the hypothesized breaking point for the number of members. Upon finding out that when 'ft-financial-accounting' had 67 members it worked, and when it had 68 members the mapping failed, I confirmed it was not a uid conflict issue by exchanging some "working" and "broken" uids. The source of the problem had been successfully narrowed to the number of members in the group.

Next up, part 5: Isolating the cause.

- page 2 of 3 -