The work discussed in this thesis has spanned a period of two
years. In this concluding chapter, I shall summarize the key
results accomplished and discuss some of the important aspects
of this research. Future research ideas will be discussed in
the last section.
Recollect that the one principal goal of this thesis was to provide
real-time Internet delivery of high-bandwidth video from a server
to a client. To reach that goal, this thesis first had to concern
itself with the underlying network's limitations. There are three
main obstacles preventing real-time video delivery: unreliable
delivery, unpredictable delay and limited bandwidth capacity.
I attempted to overcome these limitations using mainly three
approaches: client-side buffering, retransmission-based error
recovery and MPEG compression. In the next three sub-sections,
I discuss the effectiveness of these key approaches and their
implications.
There are two advantages to having buffers on the client's end. Firstly, by allowing arriving packets to fill up a collection of buffers before the video display begins and to continuously fill them up as the video is playing, the client should now be able to find the packet it wants from the buffers. This eliminates the duration that it may have to wait for a requested packet to arrive. Sometimes however, the client does not find the requested packet in the buffers. Even so, having buffering capabilities on the client side is better than having none at all. This is because the frequency of waiting encountered when no buffering capability is used would be higher. Furthermore, the frequency of packet loss would be higher if the server is sending at a rate higher than the rate the client is using. This is because the client cannot cope with the flood of packets that it is receiving and misses some packets that arrived.
The second advantage is related to the performance of packet retransmissions (the next sub-section discusses the effects of retransmission policies in more detail). Since the client waits for the buffers to completely fill up before beginning the buffer consumption for video playback, the duration of time that it waits is the difference between the times the first and last buffers are occupied. Having this wait duration is advantageous to the recovery of lost/delayed packets, if any were detected. Each packet expected to arrive would have that duration of time to recover, should it be delayed or lost. Since we know that it takes approximately a round-trip propagation time to recover any packet by retransmission, it would be to the client's advantage to set this wait duration to the round-trip time. This implies that theoretically, the number of buffers can be minimized to accommodate a wait-duration that is equivalent to the round-trip propagation time.
The buffering capabilities discussed above apply only to the client.
The server also owns a set of buffers. However, their purpose
is different from the ones on the client. They are used solely
to store packets to satisfy retransmission requests received by
the server.
The effectiveness of a retransmission-based recovery is dependent on three factors. First, there has to be adequate buffering on the client side. Second, there has to be good or adequate loss detection by the client. The detection policy I used in this thesis was timer-based. Third, related to loss detection, there has to be some form of identification on each of the packets to be received by the client. For identification purposes, the server I wrote attaches a sequence number on each of the packets that it sends. The sequence numbers begin with 0 and increases monotonically.
As far as the adequacy of buffering is concerned, there is one thing that the client ought to ensure. As long as the number of buffers on the client is greater than the number of packets that can arrive over a period equivalent to the round-trip propagation time, retransmission policies will work effectively without jeopardizing the feasibility of the retransmission itself. Specifically, since the buffers are queued for consumption in a video display, each buffered packet gets to wait ideally for a duration equivalent to the time it takes to fill up the buffers. With sufficient buffers, the client would have ample time to recover a lost/delayed packet. This is because when the timer goes off to signal that a loss/delay has been detected, the duration of time before the packet is required for consumption would be equal to or greater than the round-trip propagation time of a packet. If the number of buffers were less than the number of packets expected to arrive during a round-trip duration, the retransmission of packet would be no longer possible because the time left before that packet is consumed is not sufficient.
Every real-time video packet is characterized by a limited life span. Thus, a timer can be used to facilitate the client's detection of lost/delayed video packets. The client I wrote does exactly so by scheduling the expected arrival time for each packet. A timer-based detection policy such as the one implemented on the client is advantageous because this policy will never fail in detecting a missing packet. The disadvantage to it, however, is that premature timeouts can often happen thus causing unnecessary retransmissions.
To facilitate an efficient retransmission, the unique identification of video packets using sequence numbers is important. Without sequence numbers, it will take much more computation to identify and retrieve missing packets from the server's buffers.
Finally, it is important to note that retransmission is conditional
on whether the client thinks there is sufficient time to recover
a missing packet before its life expires. Remember that the difference
between a real-time and a non real-time data stream is that any
data that arrives too late is deemed useless and will have to
be discarded.
There are many advantages to using MPEG-1 video in this thesis. First and foremost, MPEG-1 compression can reduce the video bandwidth requirements by the order of 10. For example, a non-compressed video may require a network bandwidth between 40 and 70Mbps. An MPEG-1 compressed video on the other hand, requires only 1Mbps or less. Evidently, the bandwidth requirement can be reduced significantly using MPEG. It is interesting to note at this point that while the bandwidth requirement is significantly reduced, an MPEG-compressed video is still categorized as a "high-bandwidth" video. Indeed, there are other existing types of compression techniques that can produce "low-bandwidth" video.
Another advantage of MPEG is that its compression technique does not forego the quality of display; so the presentation of video is of considerably high quality, unlike other "low-bandwidth" compression techniques that reduce bandwidth requirements more significantly than MPEG thus causing the video display to look grainy.
As an industry standard, MPEG's purpose was to allow numerous
industries to have a consistent digital video technology standard.
A direct result of this initiative is the wide availability of
MPEG encoders and decoders. Having an industry standard like
MPEG means that one can have the option to either develop one's
own MPEG encoding/decoding system or rely on others to develop
the encoder and decoder. Within the context of this thesis, I
have chosen the latter.
It is important to differentiate between the client-server interaction that I developed and another reliable data transaction that uses protocol services like TCP. The real-time protocol I developed is different from a reliable protocol in some aspects. Let us first examine an example of a reliable protocol that resembles TCP in some ways and then, make a comparison with the real-time protocol developed in this thesis.
A reliable protocol guarantees the delivery of all packets sent
from sender to receiver. Here, a timer is associated with every
packet sent by a sender. That is, it is the server that handles
the timeout mechanism, not the client. Furthermore, the receiver
is required to acknowledge every packet it receives by sending
a positive acknowledgement packet, known hereafter as the ACK
packet. For every existing packet, there is an ACK packet. If
the packets were identified with sequence numbers, then so would
the ACK packets. On the server side, a timer will go off after
a certain time (approximately the length of a round-trip propagation
time) if it does not receive an ACK from the receiver by then.
The timer will, of course, be eliminated once an ACK associated
with a particular packet was received. Figure 6.1 illustrates
the operation of the protocol, taking into account for lost packets.
We observed in the example above a "handshake" that goes on between the sender and receiver each time a packet is transacted. In the real-time protocol, such handshaking is unnecessary and is in fact, time-consuming. Thus, ACK packets are not used. Instead, the receiver sends a negative acknowledgement (NACK) to the sender whenever it finds a packet to be missing. The NACK packets work together with timers on the receiver side. One assumption that has to be made about this protocol is that the sender sends packets at a fixed rate and is interrupted only by the NACK packets that it receives. This assumption is non-existent on the reliable protocol example because real-time is not an important issue, like it is here. From the time the receiver begins consuming the first packet from the sender, each packet that follows is associated with a timer. A timeout will occur if the associated packet that it expects to receive does not arrive. The receiver may choose to send a NACK. It may choose not to if it finds that there is insufficient time for a successful recovery. (Remember that a real-time packet is deemed useless if it arrived later than the time when it was to be consumed.)
Figure 6.2 illustrates the operations of this protocol, including
the use of a NACK to retrieve a lost packet.
While much work and effort have been poured into this thesis,
there is always room to improve on the performance of this real-time
system. First and foremost, modifications can be made to dynamically
readjust the packet transmission rate. In the current implementation,
the rate is constant throughout a transmission session. Although
this approach does not negatively affect the system performance,
a dynamic rate adjustment could make the design more interesting.
One proposal is to adjust the rate according to the Group of
Pictures (GOPs) within the MPEG video as the video is generated.
In a movie clip where there is a transition from a slow moving
scene to one that is fast paced, the actual bitrate of transmission
of the latter scene should be higher. An average transmission
rate estimated for each GOP would be a good measure of these scene
changes. Figure 6.3 is a graph that illustrates the rate changes
according to the GOP bitrate as the MPEG video is produced.
By modifying the transmission rate scheme, some of the component designs would have to be modified as well. In particular, the client must ensure that it is ready to adapt to the rate changes. To do this, the server should always notify the client of the transmission rate change so that the client can update its packet consumption rate.
Yet another interesting idea worth pursuing is the dynamic readjustment of the buffer pool size. As the network traffic fluctuates, the propagation time changes as well. This means that the client would have to ensure its buffers are sufficient to adapt to the changes over time. In particular, if the time to propagate a packet decreases now, the client will experience an increase in the number of incoming packets between now and the next transition in the propagation time. The client must be able to cope with the sudden surge of packets. Having the ability to forecast an increase in the arrival rate, as a result of the propagation delay, will help the client determine whether or not to increase the buffer pool size. The client has to ensure that packets are not dropped simply because there were not enough buffers to contains them.
Newer packet recovery schemes that save on bandwidth requirements can be added to the client. In the event that congestion occurs in the network, asking the server to retransmit the missing packets will further exacerbate the network's congestion. Therefore, the existing retransmission policy can be revised to allow only the retransmission of missing packets containing the I- and P-frame types within the MPEG frame sequence. The B-frames can afford to be dropped completely from an MPEG stream without jeopardizing the video content. This new policy will work because no other frames depend on the B-frame type for the decoding process. I wrote an experimental program (see Appendix G for the code listing), to assert that the above speculation is true. Taking a recorded movie clip as input, the program strips away all the B-frames contained within it and outputs a "shortened" version of the clip. Indeed, an MPEG stream can be edited to omit the B-frames and the person viewing the display may never perceive that B-frames were dropped unless he or she had already seen how the original stream looked like. The modified stream that is missing the B-frames is played out at a frame rate similar to the original stream. So, the modified stream, containing less number of frames, may seem to display at a higher rate than the original. (Imagine a fast-forwarded video playback.) To compensate for the B-frames that were dropped, it is possible to have the client insert "dummy" frames that have been client-side generated.
Many useful applications, such as video conferencing, can benefit from a real-time video system. However, the synchronization of audio with video images is a concern that should be dealt with. Although it is beyond the scope of this thesis, the issues that arise from synchronizing audio and video are important to a successful real-time multimedia application. Fortunately, MPEG encoders are capable of encoding audio to synchronize with the video stream. However, the packet recovery schemes already suggested (and implemented) would have to be readjusted to take the audio stream into account. Ideally, audio should have a higher priority over video. The human being is quicker to perceive the quality of audio than the quality of video. That is, should the quality of audio degrade only by a trivial degree, humans are bound to notice the change. With video however, humans are not quick enough to perceive the loss of one or two frames in a sequence.