The Internet, fueled by the explosive growth of the World Wide Web in the early 90s, has become the focus for multimedia enthusiasts to push for greater development in multimedia networks. As the Internet evolves into the standard medium of communication in the modern times, millions of people today are scrambling to get "online" on an average day. However, as the Internet traffic increases, its performance will begin to deteriorate and there will be network slowdowns. This is a concern for many of those who want to use the network for time-critical applications like real-time video communications.
The main focus of this thesis is real-time Internet video transmission.
In a computational sense, real-time is defined to be the computation
or processing done in the present to control physical events occurring
in the present (Telect). However, no end-to-end transmission
over the Internet can occur exactly in real-time, given the current
state of routing technology employed by the Internet. This is
because some network delay is inherent. We define real-time as
an attribute of stream-based transmission for which the original
framing and timing of the data (video) chunks, can almost be reproduced
after transmission over the network by the receiving hosts (computers).
There should be no distinction between a stream which has its
source on the local machine and the same stream coming over a
network connection (Klein, 1996). For example, a half-hour video
clip that was transmitted over a network would appear on its destined
host (located elsewhere on the network) a few seconds later, and
the content of the video clip would look exactly like the one
on the source host.
Designing real-time media systems is a challenge because the existing Internet architecture has its limitations. Furthermore, the services provided by the architecture are not sufficient to support real-time systems, mainly because of their lack of concern for time constraints. A real-time system, on the other hand, is constrained by deadlines. (We shall explore the characteristics of real-time video in the next section.)
There are primarily three limitations that hinder the performance of real-time video applications. The first is the unpredictability of network latency. Latency is the amount of delay a data stream experiences when it is transmitted from its source to its destination. Any delay due to the erratic latency of the network may cause a severe malfunction in a real-time multimedia application.
The second limitation is bandwidth capacity. Bandwidth is a measure, in bits per second, of the capacity of the physical link that connects machines to a network. The network bandwidth measures the maximum amount of data that can be transmitted over a network in a specific amount of time. The network bandwidth is typically the primary factor that determines the performance of the multimedia application. For example, a one-minute uncompressed, 640-pixels-by-480-lines (full screen) video clip without audio typically requires about 300MB (Mega Bytes) of storage space. This would imply that a bandwidth capacity of approximately 41.9Mbps (million bits per second) is required to transmit the video clip in real-time. However, given the current state of network technology, an end user who is using a common data modem of a 28.8Kbps (kilobits per second) bandwidth capacity can only receive 28,800 bits of data per second and will not be able to view this video clip in real-time. Of course, there are existing networks that run at higher bandwidths. The Ethernet network, commonly used by an organization as its internal network, runs at 10Mbps while the high speed Asynchronous Transfer Mode (ATM) network can run at up to 155MBps (megabytes per second). Even with such high-speed networks, transmitting video at 41.9Mbps is impractical because there may be existing high-priority applications that must use the network. It is safe to say that sending uncompressed video over a limited bandwidth network is unfeasible. This is where video compression can help. There are methods of video compression that can reduce bandwidth requirements by a significant margin. However, compressions usually lead to a data stream that can be characterized by its inconsistencies. This will be elaborated further in the next section.
The third factor that influences the performance of a real-time
multimedia network application is the occasional loss of
data on the network. Typically, the recovery of lost data is
made through retransmission from its source. Given the latency
nature of the network, multiple retransmissions to guarantee the
arrival of all data may cause significant delay and defeat the
purpose of a real-time transmission.
Despite flaws in the network architecture, real-time video transmission has already become a reality. Newer video conferencing tools are continuously being developed and improved. Solutions are being produced to provide better quality of service for real-time video transmission. Profit-making organizations like Real Networks (formerly known as Progressive Networks) have made great strides in developing transmission methods for low bandwidth video that require 36,000bps or less, even after taking the latency issue into account. Transmission of compressed, high bandwidth (1Mbps) video such as MPEG, on the other hand, still face implementation problems, even though MPEG video would ideally provide a better quality of display.
A successful implementation of real-time high-bandwidth video transmission can have great payoffs. A proposed application for high bandwidth video transmission is Video-on-Demand (VoD). VoD is envisioned as an electronic video rental store. A user would be free to choose from a large collection of videos from his/her home using the television set's remote control, and the video would start playing almost immediately. VCR capabilities such as play, fast forward, rewind, stop, and pause are also features envisioned for VoD. This would, needless to say, no trip to the video rental store is needed.
Video and audio streams are continuous media. They are time-dependent. Ideally, a real-time network video system should be capable of transmitting a continuous flow of live video data over the network in the form of packets at a constant and periodic rate. Packets are the original data fragmented into small fixed-sized chunks that can be defragmented when they arrive at their destination. Because of time-dependencies inherent in the video stream, each video packet is associated with a deadline (after calculating for network latency, the time it takes to travel from source to destination). When a packet does not reach its destination within its specified deadline, consider it lost. This real-time system must, therefore, know how to propagate the video packets from source to destination in a timely manner. By taking the network's unpredictable latency and lossiness into consideration, the task of developing a video system becomes interesting, because not only do the packets have to arrive on schedule but a solution must be found for the problems that will arise when packets are either delayed or lost.
A video stream is a sequence of pictures intended to be displayed in sequence at a constant rate. It requires extremely high bandwidth if transmitted in its raw form. To save on bandwidth, certain compression methods could be used. The method of compression used in this thesis is called MPEG. A much more detailed discussion about MPEG can be found in Chapter 3.
The MPEG video stream is characterized by a variable encoding
bitrate (VBR). That is, some pictures in the sequence are more
compressed than the rest because MPEG uses three types of encoding
schemes. This leads to a highly bursty data stream, where the
number of bits per second in the MPEG stream varies over time.
If a real-time system uses MPEG video as its medium of communication,
the VBR characteristic of MPEG could create a situation that would
require an interesting solution. Given that all the pictures
in the stream are compressed in any of the three different encoding
schemes, they will not have an equal number of bits. Recall from
above that the video stream will be transported in the form of
packets. Because of the different encodings, some pictures will
be too large to fit into one packet and will be fragmented accordingly.
Having these packets transmitted at a constant rate would imply
that some pictures would arrive sooner than others. However,
each picture has to be decoded and displayed at a constant rate
as well. Hence, the synchronization of packet transmission and
video display should be given top priority when developing this
real-time system.
This paper has two goals. The first is to describe in more detail the three principal issues mentioned in Section 1.2: unpredictable network latency, data loss and limited bandwidth capacity. The second is to offer solutions for each of them. These solutions have given rise to a system that schedules delivery of video packets according to specified deadlines while simultaneously guaranteeing a smooth real-time video display. It is always important to maintain this "quality of service" (QoS) for the end user.
The implementation of my proposed solutions is based on the Internet architecture. The Internet is organized as a series of layers or levels, each built upon the one below it (a more detailed description is given in Section 2.3 of Chapter 2). Figure 1.1 shows a simplified diagram of the layered architecture. The purpose of each layer is to offer certain services to its higher layer. The real-time video system application resides on the highest level. The layer directly below it is the transport layer, which provides two classes of service that an application can choose from: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).
TCP is a reliable protocol that allows a series of bytes (data) originating on one machine to be delivered without error to any other machine on the Internet. It fragments the incoming data into a sequence of packets and passes them to the layer below for delivery. At the destination, the receiving entity reassembles the packets into the original data. TCP also handles flow control to prevent a fast sender from choking a slow receiver with more packets than it can handle. All in all, these mechanisms are costly for time-critical applications.
UDP on the other hand, is an unreliable protocol that is
more suitable for applications that do not want TCP's packet sequencing
or flow control when prompt, one-shot delivery is better than
accurate delivery. UDP has been statistically shown to be the
faster of the two available transport services (refer to Section
2.5 for the statistical report). The advantage of speed that
UDP offers makes it more suitable for the implementation of the
real-time video system because it does not worry about overhead
costs such as those imposed on TCP. We can conclude that UDP
will not impede the performance of the real-time system.
The application implementation is based upon a client-server model (see illustration in Figure 1.2). In this model, one or more machines acting as the client can send a request to a server machine for some work to be done. The server then does the work and sends back the reply. In the case of the real-time video system, the server's reply to a client would be a stream of video packets. The server is the source of the real-time video. The client will receive the video packet stream, process it and display on the computer screen.
A real-time video system must observe two deadlines on the receiving end (client). The first is the arrival deadline. The second is the playout deadline. As already implied, time constraint is the primary concern of any real-time system. This system is no different. A packet is considered delayed if it does not arrive by its determined arrival deadline. In addition to that, however, the system has to tolerate or overcome the network's potential for data loss or delay while adhering to the imposed time constraint. Nevertheless, packet loss can be minimized using certain recovery mechanisms. This thesis looks specifically at using the retransmission of lost packets as its method of recovery. If an expected packet does not arrive by its specified (arrival) deadline, the receiver can request from the sender an exact copy of that lost packet if there is still sufficient time left before the playout deadline takes effect. The time gap between the two deadlines plays an important role in the retransmission policy. The bigger the gap, the better the chances for a recovery. This would imply that a (deliberate) startup delay of the video playout is needed in order to allow for a time gap. Therefore, video packets would need to be stored or buffered on the receiving end before they are processed for display. How large the buffer can be depends on the time gap (or playout delay) which in turn is limited by the perceptual tolerance of the human being waiting to see the video. That is, people will generally not tolerate long waiting periods. In interactive applications, this limit is about 200ms (Brady).
Another advantage to having this client-side buffer is that the
buffering capabilities on the receiver-side can be exploited to
smooth out the video playout. This buffering allows the
system to accumulate video packets before they are processed into
a series of images that are then displayed at a constant rate.
The next two chapters will introduce several important concepts related to computer networks and MPEG compression in further detail. Chapter 2 describes the architecture of the existing Internet model and relates the fundamentals of computer networks to this thesis work. Chapter 3 introduces the reader to MPEG compression and explains how the compression algorithms work.
The following chapter, Chapter 4, is a discussion of relevant research done by others regarding real-time video transmission. Several of the works discussed are important, as they are the foundation to my work. Chapter 5 contains the design and implementation of the real-time video streaming application. Each component of the complete system is discussed in detail as well as the programming techniques used.
Chapter 6 summarizes the key results of my study and highlights some important aspects of the research. Future research ideas are also discussed in this chapter.