capabilities: extract the max cap from the runtime system

The cap_valid() macro checks against a max define hardcoded at build time
from the kernel headers.  The runtime kernel might have a different max
value which means this code doesn't work exactly as we want.

For example, if you build against linux-3.8 headers but boot with a 3.4
kernel, the kernel headers know about 36 caps while the runtime kernel
only knows about 35.  When this minijail code tries to drop capset 36, it
dies because the kernel returns EINVAL.

Conversely, if you were to build against linux-3.4 headers but boot a 3.8
kernel, minijail would know to drop caps up through 35, but that 36 would
remain in place.

Typically these scenarios don't happen, but as people develop/test things,
it's not unreasonable to try these out (think testing newer kernel headers
or booting kernel next).  As such, suck up the max value at runtime via
/proc and use that instead.

BUG=None
TEST=built against linux-3.8 headers and booted a linux-3.4 kernel;
	minijail no longer aborts (networking works), and some logging added
	to the kernel shows it running PR_CAPBSET_DROP for [0, 35] since the
	runtime kernel max is 35 (even though the compiled headers say 36).

Change-Id: Ie9aec101263402a3e147e85caf1e8bda78008aa3
Reviewed-on: https://gerrit.chromium.org/gerrit/50702
Reviewed-by: Kees Cook <keescook@chromium.org>
Commit-Queue: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
diff --git a/libminijail.c b/libminijail.c
index 103aa5d..8961025 100644
--- a/libminijail.c
+++ b/libminijail.c
@@ -597,6 +597,28 @@
 		pdie("setresuid");
 }
 
+/*
+ * We specifically do not use cap_valid() as that only tells us the last
+ * valid cap we were *compiled* against (i.e. what the version of kernel
+ * headers says).  If we run on a different kernel version, then it's not
+ * uncommon for that to be less (if an older kernel) or more (if a newer
+ * kernel).  So suck up the answer via /proc.
+ */
+static int run_cap_valid(unsigned int cap)
+{
+	static unsigned int last_cap;
+
+	if (!last_cap) {
+		const char cap_file[] = "/proc/sys/kernel/cap_last_cap";
+		FILE *fp = fopen(cap_file, "re");
+		if (fscanf(fp, "%u", &last_cap) != 1)
+			pdie("fscanf(%s)", cap_file);
+		fclose(fp);
+	}
+
+	return cap <= last_cap;
+}
+
 void drop_caps(const struct minijail *j)
 {
 	cap_t caps = cap_get_proc();
@@ -611,7 +633,7 @@
 		die("can't clear effective caps");
 	if (cap_clear_flag(caps, CAP_PERMITTED))
 		die("can't clear permitted caps");
-	for (i = 0; i < sizeof(j->caps) * 8 && cap_valid((int)i); ++i) {
+	for (i = 0; i < sizeof(j->caps) * 8 && run_cap_valid(i); ++i) {
 		/* Keep CAP_SETPCAP for dropping bounding set bits. */
 		if (i != CAP_SETPCAP && !(j->caps & (one << i)))
 			continue;
@@ -632,7 +654,7 @@
 	 * have been used above to raise a capability that wasn't already
 	 * present. This requires CAP_SETPCAP, so we raised/kept it above.
 	 */
-	for (i = 0; i < sizeof(j->caps) * 8 && cap_valid((int)i); ++i) {
+	for (i = 0; i < sizeof(j->caps) * 8 && run_cap_valid(i); ++i) {
 		if (j->caps & (one << i))
 			continue;
 		if (prctl(PR_CAPBSET_DROP, i))