Introduction to IPC on Linux
Inter-Process-Communication (or IPC for short) are mechanisms provided by the kernel to allow processes to communicate with each other. On modern systems, IPCs form the web that bind together each process within a large scale software architecture.
The Linux kernel provides the following IPC mechanisms:
- Anonymous Pipes
- Named Pipes or FIFOs
- SysV Message Queues
- POSIX Message Queues
- SysV Shared memory
- POSIX Shared memory
- SysV semaphores
- POSIX semaphores
- FUTEX locks
- File-backed and anonymous shared memory using mmap
- UNIX Domain Sockets
- Netlink Sockets
- Network Sockets
- Inotify mechanisms
- FUSE subsystem
- D-Bus subsystem
While the above list seems quite a lot, each IPC mechanism from the list describe above , is tailored to work better for a particular use-case scenario.
Signals are the simplest forms of IPC provided by Linux. Their primary use is to notify processes of change in states or events that occur within the kernel or other processes. We use signals in real world to convey messages with least overhead - think of hand and body gestures. For example, in a crowded gathering, we raise a hand to gain attention, wave hand at a friend to greet and so on.
On Linux, the kernel notifies a process when an event or state change occurs by interrupting the process's normal flow of execution and invoking one of the signal handler functinos registered by the process or by the invoking one of the default signal dispositions supplied by the kernel, for the said event.
Anonymous pipes (or simply pipes, for short) provide a mechanism for one process to stream data to another. A pipe has two ends associated with a pair of file descriptors - making it a one-to-one messaging or communication mechanism. One end of the pipe is the read-end which is associated with a file-descriptor that can only be read, and the other end is the write-end which is associated with a file descriptor that can only be written. This design means that pipes are essentially half-duplex.
Anonymous pipes can be setup and used only between processes that share parent-child relationship. Generally the parent process creates a pipe and then forks child processes. Each child process gets access to the pipe created by the parent process via the file descriptors that get duplicated into their address space. This allows the parent to communicate with its children, or the children to communicate with each other using the shared pipe.
Pipes are generally used to implement Producer-Consumer design amongst processes - where one or more processes would produce data and stream them on one end of the pipe, while other processes would consume the data stream from the other end of the pipe.
Named pipes or FIFO
Named pipes (or FIFO) are variants of pipe that allow communication between processes that are not related to each other. The processes communicate using named pipes by opening a special file known as a FIFO file. One process opens the FIFO file from writing while the other process opens the same file for reading. Thus any data written by the former process gets streamed through a pipe to the latter process. The FIFO file on disk acts as the contract between the two processes that wish to communicate.
Message Queues are synonymous to mailboxes. One process writes a message packet on the message queue and exits. Another process can access the message packet from the same message queue at a latter point in time. The advantage of message queues over pipes/FIFOs are that the sender (or writer) processes do not have to wait for the receiver (or reader) processes to connect. Think of communication using pipes as similar to two people communicating over phone, while message queues are similar to two people communicating using mail or other messaging services.
There are two standard specifications for message queues.
- SysV message queues.
The AT&T SysV message queues support message channeling. Each message packet sent by senders carry a message number. The receivers can either choose to receive message that match a particular message number, or receive all other messages excluding a particular message number or all messages.
- POSIX message queues.
The POSIX message queues support message priorities. Each message packet sent by the senders carry a priority number along with the message payload. The messages get ordered based on the priority number in the message queue. When the receiver tries to read a message at a later point in time, the messages with higher priority numbers get delivered first. POSIX message queues also support asynchronous message delivery using threads or signal based notification.
Linux support both of the above standards for message queues.
As the name implies, this IPC mechanism allows one process to share a region of memory in its address space with another. This allows two or more processes to communicate data more efficiently amongst themselves with minimal kernel intervention.
There are two standard specifications for Shared memory.
- SysV Shared memory. Many applications even today use this mechanism for historical reasons. It follows some of the artifacts of SysV IPC semantics.
- POSIX Shared memory. The POSIX specifications provide a more elegant approach towards implementing shared memory interface. On Linux, POSIX Shared memory is actually implemented by using files backed by RAM-based filesystem. I recommend using this mechanism over the SysV semantics due to a more elegant file based semantics.
Semaphores are locking and synchronization mechanism used most widely when processes share resources. Linux supports both SysV semaphores and POSIX semaphores. POSIX semaphores provide a more simpler and elegant implementation and thus is most widely used when compared to SysV semaphores on Linux.
Futexes are high-performance low-overhead locking mechanisms provided by the kernel. Direct use of futexes is highly discouraged in system programs. Futexes are used internally by POSIX threading API for condition variables and its mutex implementations.
UNIX Domain Sockets
UNIX Domain Sockets provide a mechanism for implementing applications that communicate using the Client-Server architecture. They support both stream and datagram oriented communication, are full-duplex and support a variety of options. They are very widely used for developing many large-scale frameworks.
Netlink sockets are similar to UNIX Domain Sockets in its API semantics - but used mainly for two purposes:
- For communication between a process in user-space to a thread in kernel-space
- For communication amongst processes in user-space using broadcast mode.
Based on the same API semantics like UNIX Domain Sockets, Network Sockets API provide mechanisms for communication between processes that run on different hosts on a network. Linux has rich support for features and various protocol stacks for using network sockets API. For all kinds of network programming and distributed programming - network socket APIs form the core interface.
The Inotify API on Linux provides a method for processes to know of any changes on a monitored file or a directory asynchronously. By adding a file to inotify watch-list, a process will be notified by the kernel on any changes to the file like open, read, write, changes to file stat, deleting a file and so on.
FUSE provides a method to implement a fully functional filesystem in user-space. Various operations on the mounted FUSE filesystem would trigger functions registered by the user-space filesystem handler process. This technique can also be used as an IPC mechanism to implement Client-Server architecture without using socket API semantics.
D-Bus is a high-level IPC mechanism built generally on top of socket API that provides a mechanism for multiple processes to communicate with each other using various messaging patterns. D-Bus is a standards specification for processes communicating with each other and very widely used today by GUI implementations on Linux following Freedesktop.org specifications.
Posted in Linux System Programming on Jun 30, 2020