Any modern Operating System must provide some ways to arbitrate access to the physical system resources, such as the memory or the I/O peripherals. In the Mach microkernel, the basic unit or resource allocation is named a Task. As in every modern OS, each task is basically isolated from the other tasks in the system : tasks have their own virtual address space, and they can't scribble into each other's memory. When a task needs to access to an external resource (communicate with another task, or with some I/O peripherals), it must use a special mechanism known as a system call to implement this access under the control of the kernel.
Because tasks are the basic unit for resource allocation, the kernel must maintain a lot of state information about them. Thus, the designers of the Mach microkernel decided to introduce the finer-granularity kernel-supported concept of a Thread. The thread is the basic unit of CPU utilization in the Mach microkernel. The kernel only has to maintain very little state information about a thread : mainly, it has to save a CPU context when the thread is interrupted. Because the state information of a thread is so small, threads can be created and destroyed very efficiently. Because any threads needs to use some system resources like memory, threads always run within a task. Multiple threads can be executing inside the same task, perhaps even simultaneously if the system has several processors.
Once again, the task is the basic unit for resource allocation, it provides the memory and system context. The threads execute inside this context, they are the basic unit for processor utilization and have a very small system overhead.
As we've seen, the mach tasks are isolated so that they can't be negatively impacted by each other's mistakes or use each other's allocated system resources. Still, they couldn't do anything useful if they had no way to communicate with the external world. The traditional UNIX way to solve this problem is that the kernel provides a set of system calls that implement the most important UNIX functions and semantics. Some other functions will be provided by the C library, but they will always rely on a kernel system call. For example, most UNIX kernels have a system call that implements the execve function, and the other exec* functions are then implemented in the C library.
This system provided a simple and efficient interface to early UNIX systems, but now their complexity has greatly increased and one wants to move as much functionality as possible out of the kernel. The traditional UNIX way of implementing system calls becomes then quite backwards : we don't want the microkernel to care about this moved-out functionalities, so it would be ugly if it had to take care of all this system calls and forward them to external services. As an example, the Linux kernel version 2.0.33 exports 166 system calls, so this is clearly too much for a microkernel approach. Instead, we want to implement a simple and extensible communication kernel.
In the Mach system, a service is represented by a Port. When you want to use this service, you send your request in a Message to the corresponding port. Once your request has been honoured, the server sends you back an acknowledgement message.
The mach ports implement an easy way to handle access rights : you
can't read or write arbitrary data to arbitrary ports. You need to
have send or receive rights on a port to access it. Any thread can
create a port (for example by using a system call named
mach_reply_port). The mach microkernel will then create
this port in the kernel state structures, store an information known
as a Port right in the task's state structures to specify that
this task has the right to read some messages from this port, and
return a value of the mach_port_t type to the calling
thread. This value is known as the name of the port right, and
it will be used each time we need to reference this port in the task
that owns the calling thread. Ports, port rights and port names musn't
be confused : ports are a protected entity that can only be addressed
by the Mach microkernel, port rights are attached to a given task and
describe the operations that they can provide on a port, and port names
are the identifiers that tasks must use to request some operations on
this ports. This is comparable to the difference between files, files
access rights and file descriptors in a traditional UNIX system.
Once a task has obtained a send right to a given port, and noted its
name in an internal variable, each thread in this task can send some
messages to the port using the mach_msg_trap system call.
Messages are not just some raw binary data : they have a fixed format,
and they can contain some typed data. First there is a header of type
mach_msg_header_t. This header contains the total size of
the message, the name of the port where we want to send the message,
the name of a reply port if we expect a reply, and a message ID field
that is used by the receiver. There is also a sequence number field
in the header, that will be filled by the kernel before it delivers
the message. After this header comes an unlimited number of typed data
elements, up to the size of the message as specified in the header.
We say that the data elements are typed, because they must be preceded
by a header of type mach_msg_type_t or
mach_msg_type_long_t. This header specifies the type of
the data elements, the size of each data element in bits, the total
number of data elements, and the way these data elements are
transmitted. A message consists of a mach_msg_header_t
header followed by any number of mach_msg_type_t and data
pairs.
Each data chunk following a mach_msg_type_t header can
be transmitted either inline or out-of-line. Inline
means that the data is transmitted with the message : it immediately
follows the mach_msg_type_t header. Out-of-line means
that the message only contains a pointer to the actual data in the
sending task's address space. The data will be carried with the
message, and when some task receives this message, some memory will
be allocated in their address space, the data will be copied in this
memory and the pointer will be modified so that it still points to the
data in the receiver's address space. Optionally, the sender can also
ask the memory gets deallocated from its address space once the message
is sent. Out-of-line processing is usually used for big data chunks :
it is implemented in the Mach microkernel using a memory-management
trick known as copy-on-write, which makes the messaging operation a
very efficient one.
Among the possible data types that can be transmitted in a message, one must notice the various types that can transmit some port rights. With these types, the transmitted data must be a valid port name for the sending task. The mach kernel will then process this message and create the specified port right in the receiving task. Using this mechanism, one can transmit any port right to the receiver of a message, and loose this right. One can also copy a send right, or create one from a receive right. Note that a receive right can never be duplicated, which ensures that only one task in the system will be able to read on a given port.
Note that each port includes a small FIFO queue, so that the execution of the sending and receiving tasks on a given port can be somewhat decoupled. This is very important for efficiency reasons and for multiprocessor support, for example.
Now this system brings us a way to transmit some information between several mach tasks, but how exactly will we use it ? It wouldn't be very helpful, if everyone had to decide on its own way to encode messages in his application, and then write the code to put all the data elements in the message data structures.
So instead, the conceptors of the mach system decided to think about a common, flexible IPC mechanism, and see how they can implement it using the messaging interface. The model they chose is the one of an object-oriented interface : in an object-oriented program, every object has a known set of methods. You can make a request to an object by asking it to execute a given method with a given set of arguments. Basically, the methods of an object define the set of allowable requests on this object, or in other words, its interface with the outer world.
The mach system defines a standard way to encode this object-oriented interface using the messaging interface. Each port in the system can be considered as the interface of an object - this object is embodied by the (unique) task that has the receive rights on this port. Each message sent to a port represents a method. The receiver of the message can quickly know what is the desired method : each of the methods of a given object-oriented interface is assigned a number, and this number is stored in the message's ID field in the message header. Then, all of the arguments of this method are stored, one after the other, in the typed message data fields.
This clearly-defined message-passing convention is a nice step, but one still has to write the code that implements it - each time you want to implement a new object-oriented interface, you have to write a set of functions to pack and unpack the messages that represent the methods of this interface. Because this is a boring and error-prone activity, a program has been developped to automate it. This program is known as the Mach Interface Generator, or just MiG. With the mig, all one has to write is a set of interface declarations - basically, these are just the declarations of the methods of the desired interface, along with some definitions of the data types used by these methods. The mig then reads these method declarations and generates a set of C functions to implement the packing and unpacking of messages that represent these methods.
For example, if one makes a mig definitions file that specifies that
a given object-oriented interface is made of two methods named
method1 and method2 that respectively take
one and two integer arguments, the mig would generate from this
interface :
method1 and method2. These functions
take a port parameter, and respectively one and two integer
values. The port parameter represents the object that must
execute the method. The other parameters are the arguments to
the methods. The generated functions pack the method and the
arguments in a message data structure and send it using the
mach messaging functions.
The mig-generated interface, as described above, is named an Asynchronous interface. What this means is that the task that uses this interface to request a service will then keep on its way, separately from the task that implements this service. The requesting task will have no way to get informed of the results of the method call other than waiting for a reply message.
Asynchronous interfaces can be useful in some cases, usually when this parallelism of execution is desired, but in most cases, the requesting task would rather just get the result of the method in a pointer-passed argument of this method. This is possible too using the mig, and it is called a Synchronous interface. The return values will in fact be passed thru a reply message, but this is transparent to the user of the mig-generated interface : he will just see that his pointer-passed argument has been modified after he called the method.
There is also a convention for documenting a mach interface : a given
method of a given object will always be noted as
port->method.
As we've seen, using the mig is a very convenient way to implement an object-oriented interface between several tasks. If you use the synchronous interfaces, this is even equivalent to a general RPC system : the user will call a mig-generated interface, and automagically he will receive back the results of a method that was executed in another task. This is a very powerful mechanism.
Now that we have defined a powerful object-oriented IPC interface, we still need to normalize the way we will request some kernel services and the way the kernel will notify us of certain conditions. For example, we need to have some way to ask the kernel to create a new task, and the kernel must have a way to notify us that it detected a divide by zero in our program. The designers of the Mach system made a very smart choice there : the messaging interface is multi-purpose, and it is used for communication with the kernel just as it is used for communication with another task. Each major object in the kernel is represented by a port, and a thread can request it some services if its task has a send right to the object's port.
One of the advantages of this system is the very flexible handling of access rights. With traditional UNIX systems, it's up to the conceptor of the operating system to decide if some operation should be priviledged or not. Priviledged operations are usually reserved to root, but some newer designs try to have a better granularity in the allocation of this rights - every different priviledge must be handled by a special flag, and this set of modifications must be done in the kernel. With the Mach microkernel, on the opposite, the handling of priviledges is much more of a user-land problem : each task can request a given kernel service, if this task has a send right to the corresponding port. The kernel can then have a very restrictive policy for distribution of the initial send rights, but after this the send rights can be redistributed from user-land using the messaging interface, if we want a less restrictive priviledge-handling policy.
Another advantage is that because of the generalized use of the messaging interface, a wide class of system-level tricks become possible on the Mach system. Distributed Computing is an example of those : because the sender of a message does not need to know how this message will be handled - be it by the kernel, by another task, or whatever - and because all messages are typed, it becomes possible to implement a proxy server that will relay some messages over a network. Using this proxy system, it is possible to have one task that thinks it sends a message to a given port, while in fact the target port is in another machine. The target machine might even use a different byte order - this will all be transparent, because all Mach messages are typed.
One last example of the flexibility of the Mach system interfaces is the use of debuggers : while traditional UNIX systems require the debugged program to be the child of the debugger, for protection reasons, this limitation can be avoided on a Mach system : it is conceptually possible to attach a debugger to a running program, if we have a user-land way to provide this debugger with the necessary port rights.
Some of the major objects in the mach microkernel are :
Return to my Hurd Page...
Last modification : January 11th, 1998