Chapter 5

Design and Implementation

5.1 General Description

The real-time system proposed here was developed for the Microsoft Windows NT platform using the Win32 Platform Software Development Kit. The design of this system is made up of two parts: the server on one end, and the client on the other. Each part can be broken up into three layers: the MPEG Encoding/Decoding Layer, the Streaming System Layer and the Network Interface Layer. Some of these layers will, as we shall see later in the chapter, contain a number of components that interact among themselves.

Figure 5.1 illustrates the basic layout of the system. The input to the server is a stream of video, typically generated from an analog device like a VCR or a TV camera. As soon as the MPEG encoder receives these data, it begins to digitize the stream. The stream is then compressed into an MPEG-formatted stream as quickly as it can manage in real-time. The continuous output from the encoder is passed into the system-streaming layer. In this layer, the stream is fragmented into packets. The resulting stream of packets is sent to the client over a network via the network interface layer. In between producing and sending out the packets, the server's system-streaming layer's components will temporarily store these incoming packets in an area allocated within the memory of the computer system, known as a buffer pool. Each buffer is used to cache a packet. This pool of buffers acts as a place to store packets in case retransmission is requested. Since the buffers are limited in number, packets that have remained the longest in the buffer will expire and will be replaced by newly produced ones.

On the client end, packets that are received by the network interface layer are transferred up into the system-streaming layer. This middle layer is responsible for caching the incoming packets into a buffer pool. These packets will be reassembled into a stream of MPEG video and passed to the decoding layer. The decoding layer will convert of the MPEG-encoded input into digital video for display. Should a packet be detained in the network or lose its way getting to the client, the system-streaming layer will detect its absence. In this event, the layer will assess whether or not there is sufficient time left to retrieve a copy of the packet from the server before the packet is needed by the decoding layer. To determine this, the client must take into account the propagation time for the retransmission request, possibly the response time of the server (upon receipt of the request), along with the propagation of the retransmitted copy of the packet.


5.2 Server Components

The server classifies its components into three functional classes: MPEG encoding layer, streaming system layer, and network interfacing layer. In the MPEG encoding layer (see Figure 5.2a), the MPEGator®â (see Appendix A) encoder hardware outputs the MPEG video stream in real-time and through interaction with a Component Object Model (COM) module via an internal channel, this real-time MPEG video is then transferred out to the streaming system layer below. The COM module is also responsible for giving the streaming system layer access to the video stream. The code for manipulating the COM module is a "library" package of pre-written functions (technically termed as a Software Development Toolkit or SDK) from MPEGator's manufacturer. It is the server program's responsibility to make use of this library package to produce the desired effect. Within the context of the server's code, it should place the MPEG video stream in an area accessible to the streaming layer below.



The streaming system layer that receives the MPEG video from the Encoding Layer has three principle components. Shown in Figure 5.2b, the packetizer component is responsible for obtaining MPEG from the layer above. This input is afterward broken up accordingly into small chunks called packets. This fragmentation process will be ongoing as long as there is data to dismantle. In this process, the packetizer will produce video packets that will not only contain the original MPEG data but also contain other control information that will be of great use to the client. For example, the packetizer takes advantage of the fact that MPEG contains three different frame types to allow identification of each packet produced. To uniquely identify a packet, it furthermore ensures that a sequence number can be tagged on to each packet. Juxtaposed packets will carry unique numbers in sequential order. This sequence number allows the client to detect a missing packet, should the client choose to make such observation. Sequence numbers

will also come in handy when a client requests that the server retransmit a particular packet. Since the packetizer is capable of identifying the MPEG frames, it can ascertain the number of packets required by each frame. In addition, there is a field within the packet that also indicates its order within the group of packets that make up a frame. The purpose of this is to let the client know beforehand how many more packets it should expect to receive for a particular frame. A packet's size is always fixed. Even so, the packetizer is not obligated to fill up a packet with data to its limit. Each packet is designed to carry a payload that is either an MPEG frame or non-frame information. In order to let a client know how many bytes a packet's payload contains, a payload-size field is added. A client however, may decide not to make use of every field within a packet. Nevertheless, the packet has been formatted as described here to simplify future implementations of the client.

The stream of packets generated by the packetizer is sent to the packetsender component (see Figure 5.2b). The packets are consumed by this component at a constant rate, as is the transmission of these packets to the client via the network interface layer. The packetsender is responsible for controlling the output rate of the packets to the network. Explicitly stated, this component must send the packets into the network at an appropriate rate that will ensure on-time delivery while smoothing the uneven bursts of packet outputs caused by the fact that MPEG frames are different in size. This component is furthermore responsible for sending control information to the client when transmission is about to cease, indicating that the server has come to a terminating point and will no longer be sending any more data.

The packetsender is also responsible for depositing a copy of each packet into the retransmission buffer pool before transmitting it (see Figure 5.2b). The buffers are limited in number and each packet is assigned to a specific buffer slot based on its sequence number. Eventually, the buffers are filled up. At this point, the oldest packet in the buffer pool will be overwritten by a newer one.

To help visualize this, let us suppose that the buffers were numbered from 1 to n. A packet with a sequence number m (where m is a positive integer) will be assigned the (m modulo n)-th buffer. Hence, every n packet produced is placed into the buffer pool and will ultimately be assigned to the same buffer slot. When a packet has been overwritten by a new one, this older packet can no longer be requested by the client because it has "expired". If the buffer pool was optimally sized, this will not be a problem. This optimal size is one that has enough buffers to last a round-trip propagation delay. The reason for this is that, while the client is in the process of retrieving a missing packet, other packets will continue to flow into the buffer pool. It takes at least one propagation-delay time for a client's retransmission request to be conveyed to the server, and another propagation delay for the retransmitted packet to reach the client. While the request is travelling across the network, the buffer pool will continue to be filled up with newly arrived packets. To avoid a premature overwrite of buffers, the number of buffers available must accommodate the number of incoming packets while waiting for a retransmission request to complete. However, the disadvantage of using this limited size technique is that a server can accommodate a retransmission request only once.

The retriever component of the streaming system layer is responsible for retrieving a packet copy from the buffer pool each time it receives a request from a client. The retrieval process is efficiently executed by doing a modulus operation, based on the sequence number specified by a client, to search for the packet copy. The modulus operation will bring the retriever directly to the buffer slot that keeps the packet with the specified sequence number. Should the packet not be there, the retriever would take no further action. Otherwise, it would duplicate the packet and send it into the network.


All packets that are sent from the streaming-system layer pass through the network interface layer (see Figure 5.2d). Within this bottom layer, there are three "gateways" (called ports) that lead to the network. Each of them, as we shall see later in the next section, is paired up with another port in the client's corresponding network interface layer. As already mentioned in Chapter 2, each port has a unique name or address bound to it and can therefore, be referred to by its remote counterpart. Indeed, the purpose of having these ports is to let both client and server mutually exchange data by direct reference. Figure 5.2d illustrates three classes of data flow, each using a separate port to direct information to (and from) the client.

The first class of data flow in the control information class. Recall from Chapters 2 and 4 that the Transmission Control Protocol (TCP) provides guaranteed services to the TCP/IP Reference Model while the User Datagram Protocol (UDP) provides "best-effort" services. The design of both server and client acknowledge this fact and are willing to give up speed for reliability by using TCP to transmit vital control information such as transmission start-up and termination. Such information is exchanged using TCP because it is important that it reaches the client. Furthermore, this information is not urgent and can afford to be transported over a less speedy link like TCP, using the "TCP control" port.

The second class of data flow carries MPEG video and retransmitted packets. This data is transmitted using the UDP service as its network transport and the "UDP real-time" port shown in Figure 5.2d. The third class of data flow is used by the ping server component. This component's task is to assist the client in estimating the round-trip propagation time between the client and the server. Each time the ping server receives a "ping" packet from a client, it will immediately bounce back this packet to the client without modifying anything. The "ping" packet contains clock information from the client, so once the client receives the returned packet, it can use the information to estimate the time taken for the packet to travel to the server and back.

5.3 Client Components

In this section, I will discuss the client components bottom-up, beginning with the Network Interface Layer and ending with the MPEG Decoding Layer. The network interface layer, symmetrical to the server's corresponding layer, has three "attached" ports (see Figure 5.3a). In general, data that enter this layer from the underlying network are transferred up to the layer above. Similarly, data that enter the layer from the layer above are transferred down into the underlying network.

Incoming video packets, including the retransmitted ones, that are channeled into the "UDP real-time" port are re-channeled up to the streaming layer. Outgoing requests for retransmission, coming from the streaming layer above, are channeled out into the network through the same real-time port.

Incoming and outgoing control information, such as transmission startup and termination, travels over a separate channel. Using the "TCP control" port, these data are exchanged reliably with the server. For example, a client wishing to receive a real-time video display will initiate the process by sending the server a "startup" packet. This packet is relayed to the server via the reliable TCP channel. Once received, the server begins to transmit a stream of video packets to the client. Should either the client decide to quit this transmission, it posts a "termination" packet to the server over this reliable channel. Likewise, a server that wishes to end its transmission should send the client a terminator. Once received, this termination packet is channeled up into the streaming layer for the

appropriate action.

The UDP "ping" port that is used by the ping manager component is responsible for the back-and-forth exchange of "ping" packets between the client and server. Already mentioned in the previous section, the ping service has been provided to estimate the round-trip propagation time between the server and the client. As we shall see later on, the round-trip delay time is essential for computing the practicality of a retransmission, given a deadline. The ping manager, like all other managers in the client, is a thread. At regular intervals, it forms a "ping" packet containing a time stamp. As soon as the packet is created, it is sent out to the server. The server, upon receiving this packet, bounces it backs to the client. The time that this packet reaches the client will be noted. Based on the time stamp information on the packet, which is the time the packet was sent originally, and the arrival time just noted, the ping manager now can compute the round-trip propagation delay. Since this value oscillates erratically over time depending on the network traffic, the ping manager has to exchange this "ping" information consistently and will thus, do so at a regular interval.

One layer above, the streaming system layer (see Figure 5.3b) is probably the most complicated of all the components discussed so far. Here, interactions between components are most pervasive and have to be carefully maneuvered.

To begin with, incoming video packets received by the buffer manager are cached into the buffer pool. The buffer pool is an array of buffers or a holding area that is vital to the performance of the application (see Section 4.5 of Chapter 4 for related discussion). As packets are buffered, the buffer manager will also coordinate with the retransmission manager a retransmission accounting list, a list that records the expected retransmission time of the next incoming packets. The buffer manager has to estimate the arrival time of the next packet. This is done using the transmission rate already known to it, through the control information stream from the network interface layer.

It is the retransmission manager's job to "blow the whistle" if a particular packet is not present in the buffer pool when the packet's time to request a retransmission arrives. The retransmission list is ordered by earliest time first. So, the accounting information for the packet with the earliest retransmission time will be at the top of the list. The retransmission manager will always look at the top of the list to check the retransmission time. It then sets a timer to count down the time left before a retransmission takes place. Once a timeout occurs, the retransmission manager checks the buffer pool to see if the expected packet was present. Of course, a packet that was delayed or lost will not be on the buffer pool. The absence of that packet will cause the retransmission manager to forward to the server a request for a retransmitted copy. Once a request is sent, that packet's accounting information will be updated to reflect the time to retransmit again if the packet had not yet been received. This accounting information is then placed at the bottom of the accounting list. For a packet that is already present in the buffer pool when the timer goes off, its accounting information will be deleted from the retransmission accounting list.

The remaining component of the streaming layer is the video stream manager. Its responsibility is to convert the packets on the buffer pool into an MPEG video stream. It has to tread carefully while reassembling the packets into MPEG stream. When it selects a packet from the buffer pool it has to ensure that the packet data it reads is valid and in the correct order. An implicit way to ensure this is to always read off the top of the buffer pool (assuming that the packets are ordered first-in-first-out among the buffers) and once read, have the top item removed from the pool. The next item of the pool will then become the top item. This process continues while there are more to read from the pool.

One thing perhaps not immediately obvious to the video stream manager is the fact that the number of buffers in the pool is limited. It is also unaware that the buffer pool is, in fact, a static array of buffers. This implies that while it is reading from the top item of the buffer pool, the "top of the pool" may not physically be at the top of the buffer pool. Figure 5.3c illustrates of this. It also illustrates a lost-packet scenario. As has been said, the limit in pool size makes it necessary that the buffers be reused. Just like the retransmission buffer on the server side, each packet is assigned to a specific slot in the buffer pool. Unlike the retransmission buffer however, an incoming packet should not overwrite another one already present. Overwrites must not happen; otherwise a portion of the final MPEG stream produced from these packets will be corrupted and rendered useless to the MPEG decoder. It is the task of the buffer manager to ensure that overwrites will not happen. The buffer manager must block the writing of a packet if it detects that the pre-assigned buffer slot is not available at that moment. We shall discuss in the next section, how semaphores are used to prevent overwriting.


Before reading any data from the pool, the video stream manager (Figure 5.3b) must wait for the buffer pool to fill. This is necessary so a smoothing algorithm can have sufficient data to reduce jitter in the resulting display. Incidentally, the time it takes to fill the buffers is also the minimal wait time required by the user before the video playback can begin. Once read from the buffer pool, the video data contained within every packet are extracted and patched into the MPEG stream. The result is pipelined upstream. But before the pipelining process can begin, the video stream manager has to first signal to the layer above to get ready to consume the data stream.

The MPEG decoding layer (Figure 5.3d), upon receiving the notification signal, will act accordingly to ensure that the MPEG video stream will be consumed from the pipeline. The signal causes the decoding component to begin. Similar to the COM module discussed in the MPEG Encoding Layer in the previous section, this component - Microsoft®â DirectX®â Media - can be manipulated by an SDK. In this case, the DirectX Media component will render an MPEG stream into digital color display in real-time.


The illustration of the decoding layer in Figure 5.3d may seem simple and straightforward although in actual fact, the DirectX Media component itself is a unique architecture. Within the DirectX Media component, one sub-component is most vital to the real-time display. This is the source filter (see Figure 5.3e).


The basic building block of the DirectX Media component is the filter graph of the Microsoft®â ActiveMovie®â controls. (A more detailed discussion about the Microsoft DirectX and ActiveMovie discussion can be found in Appendix B.) The DirectX Media component is capable of rendering media of forms other than MPEG but within the context of this thesis, let us limit the discussion only to MPEG decoding. A filter graph is composed of a series of filters. A filter, in the simplest words, is a COM object that performs a specific task. The filters in the filter graph will work together, each passing the data downstream to the next filter. Typically, there are three category types of filters: a source filter, a transform filter and a renderer filter. The source filter's task is to take data from a source and introduce it to the next filter in the filter graph. In our case, the source is the pipeline of MPEG stream. The downstream filter is the transform filter. Within the context of this discussion, it is an MPEG decoder that will take the stream of data produced by the source filter and perform decompression techniques over it. Downstream from the transform filter is the renderer filter. Its job is to render the resulting data from the transform filter to a hardware device (e.g., monitor screen).

The source filter is the most vital of all the three filters mentioned since it is the entry point for the MPEG video stream. The data read from the pipeline will be written to yet another pipeline that is accessible to the downstream transform filter. The way the filter graph works, the transform filter will never know about the pipeline at the entry point but it will be able to communicate with the source filter and still access the MPEG stream using a special pipeline designated only to these two filters.

The execution of both the transform and renderer filters is invisible to the client program. The DirectX Media component ensures that this is so and provides functionality to efficiently control data flow from the source to the final video display. In particular, it knows how to automatically stop the display when the last of the MPEG stream has been reached.

5.4 Programming Concerns

The descriptions of the system's implementations so far have been deliberately simplified in the hope that readers will get the gist without having to fully understand computing concepts. However, some programming issues must be addressed here. This section will explore these issues.

Firstly, both the server and client are multi-threaded. Threads are important to an application process that wants to run several tasks concurrently. In abstract terms, a process is an executing program; a running application in Windows NT may consist of one or more processes. The server is a process. And so is the client. A process may utilize one or more threads simultaneously to execute portions of the code within it. A thread is the basic unit of computation that can be allocated to a processor by the operating system.

The primary threads in the server are the ping server, packetizer, packetsender, and the retriever. In the client, they are buffer manager, retransmission manager, video stream manager, ping manager, dialog manager and display manager.

When multiple threads are used in an application, the synchronization of system resources among these threads becomes an issue. A careless implementation would bring about concurrency-control problems. The main reason is that resources such as the buffers are shared among these threads. We see this in the interaction of the buffer manager, retransmission manager and video stream manager with the buffer pool. These three threads share read/write access to the buffer pool. Since the buffer pool is modeled around a bounded-buffer scheme (explained in the previous section), the matter becomes complicated when the buffers are reused. The rule of the thumb is that a buffer that is already occupied must not be overwritten by something else. But one thread may not necessarily know what the other threads are doing at any point in time. Specifically, it does not know when exactly the others will access the buffers. One thread could easily write to a buffer slot, thinking that the slot contains expired data. Should another read from that same slot, the inconsistency of data it encounters may lead to harmful consequences.

One way to ensure that such access violation will not happen is to use semaphores. A semaphore is a counter - an integer between 0 and n, where n is a positive integer - whose value can be modified only by two atomic operations: wait and signal. Modifications to the semaphore in the wait and signal operations must be done indivisibly. In other words, while a thread is modifying a semaphore using wait or signal, no other threads in the system should be. The semaphore is decremented by 1 in the wait operation, unless its value was 0 or negative. In the signal operation, its value is incremented by 1.

Semaphores can be used to prevent overflows and underflows in the buffer pool. They can also be used to provide mutual exclusion for accesses to the buffer. A mutual exclusion access is desirable to prevent simultaneous read and write. That is, while a thread is reading from the buffer pool, some other thread should not be given access to read from or write to the buffers.

Let us assume that the pool has n buffers. Let us also provide two semaphores to the model: N and M. The N and M semaphores shall count the number of empty and full buffers respectively. Semaphore N shall be initialized to n, to indicate that there are n empty buffers in the beginning. Semaphore M shall be initialized to 0, indicating that no buffers can yet be consumed because none contain any data. The wait and signal operations can be manipulated to synchronize any two threads that are reading and writing to the buffer pool.

For example, when the buffer manager is writing to the buffer pool, the following code will be executed:

wait(N);

add new packet to a buffer;

signal(M);

The wait operation above will decrement N by 1. The signal operation on the other hand, will increment M by 1. In the wait operation, semaphore N will be decremented as long as its value is non-zero. If N reached zero, the wait operation will continue on waiting until the N value is incremented by another thread that reads from the buffer.

When the video stream manager is reading from the top of the buffer pool, a portion of its executing code may look like this:

wait(M);

read and remove the top buffer item;

signal(N);

Note the symmetry between the threads. If the buffer pool was empty (M = 0), the video stream manager will hang at the wait operation until at least a packet has been written to a buffer. Similarly if the buffer pool was full, the buffer manager hangs until at least a buffer has been "freed" before it can proceed to write.

To complicate the algorithm proposed above, let us now add packet delay and loss into the scenario. If there were no loss or delay, the packets would be flowing into the buffers in a sequential order. If we took packet loss and delay into account however, a slight modification will be required of the model proposed above since out-of-order delivery now becomes very possible.

Recall from earlier discussions that arriving packets are placed into specified buffers, based on their sequence numbers. The buffers are numbered from 1 through n so a packet having a sequence number m will be placed into the buffer with the number equivalent to m modulus n. The retransmission manager shall be responsible for determining whether or not any packet has been delayed and whether it has sufficient time left to recover the packet. Should this thread determine a loss or delay, it should call the wait and signal operations as if it were writing to a buffer. The N and M semaphores need to be decremented and incremented respectively to accommodate the fact that a buffer should have been used. Even though a packet is not present, it should nevertheless be included in the semaphore count. Otherwise, the buffer manager might think that there is one more buffer available. But suppose that all the buffers were taken (except for the one missing a packet), an accidental overwrite may be possible (see Figure 5.4). Now, when a retransmitted packet arrives to substitute the lost or delayed packet, the semaphores need not be modified again. This preventive measure has to be meted out to ensure accuracy in the semaphore counts.

Yet another concern associated with buffer management is determining whether a particular packet is present in a buffer or not. All the threads that access the buffers are also concerned about the difference between an empty buffer and a reserved buffer. A reserved buffer is a buffer that has been saved for a delayed packet while an empty buffer is a buffer that is available for writing. In the video stream manager for example, the consumption of a packet from the buffer would fail if the packet were not available. That is, the packet was delayed or lost and could not be recovered on schedule.

To differentiate among present, absent and delayed, each of the buffers should now also carry an arrival flag to indicate one of the three states. The three states are assigned integer values: 1 to represent a presence, 0 to represent an absence and -1 to represent a delay.

5.5 Transmission Rate

The rate of the server's packet transmission is a useful measure. As we shall see in the next section, the client uses it to schedule its retransmission policy.

As mentioned in Chapter 4, some researchers found it advantageous to adjust the transmission rate according to the network traffic. Others proposed to adjust the rate periodically to reflect the amount of data the server sends to its client.

While it is possible to constantly adjust the transmission rate over time without immense computation, it turns out that a constant rate that was pre-determined before transmission time would work just as well (Sen et al). In either case, the server should alert the client of the current transmission rate using the existing TCP-based communication line. This system uses the latter approach and takes advantage of buffering capabilities on both system ends.

5.6 Timer-based Retransmission Policy

Recall from the discussion in Section 4.2 of Chapter 4 that there are two ways of detecting whether a packet scheduled to arrive has already been received. One method is gap-based while the other is timer-based. The gap-based detection policy will identify a gap between consecutive packets received based on sequence numbers if one or more in-between packets were not present. This method of detection however is not applicable to this system's implementation unless some modifications are made. The reason is that the underlying UDP transport service does not maintain order of delivery. Hence, I choose to implement a timer-based detection schedule on the retransmission manager.

The arrival of a packet follows a loose schedule. Each packet that is scheduled to arrive is associated with a deadline. While this timeout value may be difficult to gauge due to the network's erratic propagation time, it can still be pre-determined, though less accurately, based on the time the first packet arrived and the server's packet transmission rate. More precisely, the time that the retransmission manager should expect a packet to be present in the buffer pool is:

1st packet's arrival time + time elapsed since 1st packet's arrival

where the time elapsed can be measured by:

units of (1/ packet_transmission_rate) time

The estimation of this deadline for each packet is systematically measured, one by one in sequence number order. The following table lists the schedule of deadlines associated with packet #2 through packet #4.
Packet sequence #Deadline: the expected retransmission time if the packet is not present in the buffer pool (in t time units)*
11st packet arrival time **
21st packet arrival time + t
31st packet arrival time + 2t
41st packet arrival time + 3t

* t is the unit of time elapsed expressed as 1/(packet transmission rate)

** 1st packet is the anchor for the upcoming packets

When a deadline is reached and the associated packet is not present in the buffer pool, the retransmission manager's job is to estimate whether there is ample time to request that the server retransmit a copy. So apart from a deadline, a "display" time is also associated with a packet. The display time approximates when the video stream manager will consume the packet. In the case where the retransmission manager has to determine the viability of a retransmission, the display time is measured against the deadline plus the round trip propagation time it takes to recover a retransmitted packet. If the expected arrival time of the retransmitted packet is later than the display time, a retransmission is no longer viable.

An indirect implication of this timer-based approach is that if a packet were delayed and the retransmission manager initiated a retransmission prematurely, the delayed packet would be ignored if its arrival were several milliseconds later than scheduled. A direct result from this is the redundancy of packets in the network.

Occasionally, the retransmission manager's timer may time out before the server has the chance to transmit the packet. If this happens, it is very likely that when the server receives the retransmission request, the server's retriever will not find the requested packet in the retransmission buffers. The server does not retransmit a packet that is not present in its buffers.

5.7 Implications of Packet Loss

It is important here to note that packet loss will lead to lower quality of video display. This is especially true when lost packets contain the I frame within a GOP. Since the I frame is the anchor within a GOP, the rest of the frames in the GOP depend heavily on it. Losing it will imply that the GOP cannot be rendered. The quality of display during an unrecoverable packet loss period will depend on the number of frames within a GOP. Indeed, the quality of display becomes worse when the GOP has more frames within it.

Theoretically speaking, losing non-I frames whether in whole or in parts, would also lower the quality of the movie display although at a lesser extent. However, the actual display will depend solely on how the MPEG decoder has been implemented. The client program does not implement the decoding process but instead, relies on the Microsoft DirectX Media's decoder to perform the MPEG decompression. A decoder may or may not drop a GOP if it finds a non-I frame to be missing or incomplete. If it drops a GOP, then clearly, the display will be less spectacular. The difference between dropping a GOP and dropping a non-I frame can be significant: losing a GOP would mean wasting a series of frames that would have otherwise been usable. Losing some frames within a GOP is comparably less severe because the other frames in the GOP can still be rendered.