The tail of (RHBZ#748471), how it jumped from invisible, to painful production problem, to patched in the course of about 90 minutes.

The RHBZ 748471 is private, here is the brief summary:

When a semanage login record was set up using a group name and the number of elements in the group was too large, login programs failed to log in the user with the correct context.

What follows is the process used to trace down and correct the problem.

Due to length, this will be split into 8 parts.

  1. Background
  2. Problem Discovery
  3. Finding a reproducer
  4. Narrowing the scope
  5. Isolating the cause
  6. Searching for the hypothesized buffer which had been outgrown
  7. Correcting the problem
  8. Deploying the fix

Background

$WORK uses the group mapping functionality inside of selinux to tie system users to selinux users and MCS categories. In other words, upon logging into a system, users are funnelled into selinux users (e.g. 'staff_u' or 'user_u') and categories (e.g. 'ft-financial-accounting' or 'mind_control_research') based on unix groups of which they are a member (e.g. 'sysadm' or 'ft-financial-accounting').

More information on confining users using this method is available in Chapter 6 of the Red Hat Security-Enhanced Linux User Guide.

These selinux users and categories are used to protect systems, processes, data, and users from each other. Among many other configurations and restrictions, some of $WORK's servers utilize autofs to mount NFS directories with selinux contexts.

To ensure a consistent and uniform set of groups with a single authoritative source, user and group information is stored in a cluster of LDAP servers and is brought into the system by SSSD which was still relatively young at the time and had recently had its share of relatively painful show stopping bugs.

Next up, part 2: Problem Discovery.