A mount namespace is a Linux kernel feature that provides isolation of the file system mount points seen by a group of processes. Each mount namespace has its own set of file system mount points, and changes to the mount points in one namespace do not affect other namespaces. This means that processes running in different mount namespaces can have different views of the file system hierarchy.
Mount namespaces are particularly useful in containerization, where each container should have its own file system and configuration, isolated from other containers and the host system.
How it works:
When a new mount namespace is created, it is initialized with a copy of the mount points from its parent namespace. This means that, at creation, the new namespace shares the same view of the file system as its parent. However, any subsequent changes to the mount points within the namespace will not affect the parent or other namespaces.
When a process modifies a mount point within its namespace, such as mounting or unmounting a file system, the change is local to that namespace and does not affect other namespaces. This allows each namespace to have its own independent file system hierarchy.
Processes can move between namespaces using the setns() system call, or create new namespaces using the unshare() or clone() system calls with the CLONE_NEWNS flag. When a process moves to a new namespace or creates one, it will start using the mount points associated with that namespace.
File descriptors and inodes are shared across namespaces, meaning that if a process in one namespace has an open file descriptor pointing to a file, it can pass that file descriptor to a process in another namespace, and both processes will access the same file. However, the file's path may not be the same in both namespaces due to differences in mount points.
Lab:
Create different Namespaces
CLI
sudounshare-m [--mount-proc] /bin/bash
By mounting a new instance of the /proc filesystem if you use the param --mount-proc, you ensure that the new mount namespace has an accurate and isolated view of the process information specific to that namespace.
Error: bash: fork: Cannot allocate memory
When unshare is executed without the -f option, an error is encountered due to the way Linux handles new PID (Process ID) namespaces. The key details and the solution are outlined below:
Problem Explanation:
The Linux kernel allows a process to create new namespaces using the unshare system call. However, the process that initiates the creation of a new PID namespace (referred to as the "unshare" process) does not enter the new namespace; only its child processes do.
Running %unshare -p /bin/bash% starts /bin/bash in the same process as unshare. Consequently, /bin/bash and its child processes are in the original PID namespace.
The first child process of /bin/bash in the new namespace becomes PID 1. When this process exits, it triggers the cleanup of the namespace if there are no other processes, as PID 1 has the special role of adopting orphan processes. The Linux kernel will then disable PID allocation in that namespace.
Consequence:
The exit of PID 1 in a new namespace leads to the cleaning of the PIDNS_HASH_ADDING flag. This results in the alloc_pid function failing to allocate a new PID when creating a new process, producing the "Cannot allocate memory" error.
Solution:
The issue can be resolved by using the -f option with unshare. This option makes unshare fork a new process after creating the new PID namespace.
Executing %unshare -fp /bin/bash% ensures that the unshare command itself becomes PID 1 in the new namespace. /bin/bash and its child processes are then safely contained within this new namespace, preventing the premature exit of PID 1 and allowing normal PID allocation.
By ensuring that unshare runs with the -f flag, the new PID namespace is correctly maintained, allowing /bin/bash and its sub-processes to operate without encountering the memory allocation error.
sudofind/proc-maxdepth3-typel-namemnt-execreadlink{} \; 2>/dev/null|sort-u# Find the processes with an specific namespacesudofind/proc-maxdepth3-typel-namemnt-execls-l{} \; 2>/dev/null|grep<ns-number>
findmnt
Enter inside a Mount namespace
nsenter-mTARGET_PID--pid/bin/bash
Also, you can only enter in another process namespace if you are root. And you cannotenter in other namespace without a descriptor pointing to it (like /proc/self/ns/mnt).
Because new mounts are only accessible within the namespace it's possible that a namespace contains sensitive information that can only be accessible from it.
Mount something
# Generate new mount nsunshare-m/bin/bashmkdir/tmp/mount_ns_examplemount-ttmpfstmpfs/tmp/mount_ns_examplemount|greptmpfs# "tmpfs on /tmp/mount_ns_example"echotest>/tmp/mount_ns_example/testls/tmp/mount_ns_example/test# Exists# From the hostmount|greptmpfs# Cannot see "tmpfs on /tmp/mount_ns_example"ls/tmp/mount_ns_example/test# Doesn't exist
# findmnt # List existing mounts
TARGET SOURCE FSTYPE OPTIONS
/ /dev/mapper/web05--vg-root
# unshare --mount # run a shell in a new mount namespace
# mount --bind /usr/bin/ /mnt/
# ls /mnt/cp
/mnt/cp
# exit # exit the shell, and hence the mount namespace
# ls /mnt/cp
ls: cannot access '/mnt/cp': No such file or directory
## Notice there's different files in /tmp
# ls /tmp
revshell.elf
# ls /mnt/tmp
krb5cc_75401103_X5yEyy
systemd-private-3d87c249e8a84451994ad692609cd4b6-apache2.service-77w9dT
systemd-private-3d87c249e8a84451994ad692609cd4b6-systemd-resolved.service-RnMUhT
systemd-private-3d87c249e8a84451994ad692609cd4b6-systemd-timesyncd.service-FAnDql
vmware-root_662-2689143848