Georgia Tech Homepage

small mmc logoMMC Logo Text

Research


The Multimedia Communications Laboratory conducts research aimed at the intersection of Content processing, Computing and Communications. In addressing this rich opportunity, we emphasize academic rigor, industry guidance and a pragmatic view for creating impact.


Content Processing

"Mixed Initiative Multimedia for Mobile Devices: Design of a Semantically Relevant & Low Latency System for News Video Recommendations" Abstract by Jeannie Lee

Mobile devices have inherent resource constraints such as limited network bandwidth and small screen size. To facilitate access to news video on mobile devices, a coordinated design approach is taken, considering various system perspectives. The goal is to provide a cognitively palatable stream of videos and a seamless and low latency user experience through the use of an adaptive mixed-initiative interface to solicit user relevance feedback, and content retrieval integrated with client-side video buffering and pre-fetching. These various components are otherwise usually considered independently in the system design. The experiments suggest that this approach is helpful for recommending news video content on a mobile device, and areas for future investigation are outlined.

 
"Image Compression to Enhance Clinical Diagnosis and Workflow in Telepathology" Abstract by Saunya Williams

Telepathology consists of a digital environment used for managing, interpreting, sharing, and transmitting pathological information to a remote site via a telecommunications link. While the technology for telepathology has been available for several years, it has yet to be fully adopted as commonplace amongst pathologist. Several challenges hinder the widespread application of telepathology, such as diagnostic accuracy, patient data security, medical liability, and image quality. The size of a digitized glass slide specimen is quite significant (up to 25 Megapixels). Hence, the importance of image compression. Typically, JPEG compresses 30:1 to 50:1 with defects varying from small to moderate. The peak signal-to-noise ratio (PSNR) is commonly used in image processing to measure the quality of two images. Without the presence of a clear standard for digital imaging within telepathology, image quality lacks a metric in terms of diagnostic losslessness and remains vulnerable to subjectivity.

This project includes a variety of digital slides provided by Emory University that are compressed using JPEG. The research on image compression will facilitate the development of criteria to ensure diagnostic accuracy. These findings will be invaluable in helping to increase the application of telepathology and the establishment of a telepathology network within the Emory healthcare system.

 
"Objective Measurement of Transcoded Video Quality in Mobile Applications" Abstract by Ramanathan Palaniappan

As wireless standards evolve to create a globally applicable third generation (3G) mobile phone system specification, there is significant interest in the user perceived quality of the multimedia services supported in mobile applications. Hence, the long standing practice of assessing video quality by conventional reference based objective measurements needs to be replaced by more ubiquitous, zero reference measurements. Such metrics will remove the need to access the original reference video to assess quality since this reflects the most practical scenarios in the end-to-end video distribution chain. In addition, these zero-reference metrics need to have a high degree of correlation to subjective measurements. One example of such a technique is the AVQ meter, developed at Georgia Tech and VQlink which has been shown to provide accurate estimates of subjectively evaluated Mean Time Between Failures (visible artifacts).

The 3GPP standard defines the multimedia format suitable for 3G mobile phones. This container format called 3GP is a simplified version of MPEG-4 part 14 (MP4) designed to decrease storage and bandwidth requirements in order to accommodate mobile phones. It supports the most recent advanced video codecs namely MPEG-4 part 2 Visual and MPEG-4 Part 10, more commonly referred to as H.264. Since currently most of the video content used in applications such as digital TV and DVD is available as MPEG-2 streams, there is a broad need for transcoding to translate MPEG-2 streams into lower bit rate MPEG content. The subjective quality of the transcoded video is an important dimension of transcoder performance and our work addresses this quantitatively.

In this work, we evaluate the performance of a zero reference metric used in objective measurement of the quality of video which has been transcoded at various bit rates to suit the needs of the 3G mobile environment. This Zero reference metric (AVQ) has been shown to exhibit a high correlation with Mean Time Between Failures (MTBF) which is a functional measure of video quality representing how often a typical viewer observes a noticeable visual error. This work reports on the performance of transcoders with MPEG2 video at 512 kbps as input and 192, 256 and 384 kbps versions of MPEG-4 Part 2 Visual and MPEG-4 Part 10 (H.264) as outputs. We use AVQ-estimated MTBF and PSNR as quality metrics. Our results show that MTBF better reflects subtleties such as lowered visual quality during significant motion. It also provides a more descriptive spread of quality in the 192-384 kbps range of transcoder outputs.

 
"Telepathology Research - Subjective Evaluation and Comparison of Compressed Image Quality" Abstract by Sourabh Khire
JPEG 2000 and JPEG are ISO/ITU-T standards for still image coding. Both these standards support lossy compression, i.e. they accept some loss of information in order to achieve higher compression. The effect of such lossy compression on the quality of the original image can be quantified using simple measures such as Mean Square error (MSE) or Peak Signal to Noise ratio (PSNR). However, PSNR and MSE do not always relate well with the visual quality of the compressed images. So it is of great value to gather information about the fidelity of compressed images by conducting experiments for subjective evaluation of image quality. One such subjective test was conducted at MMC to evaluate and compare images compressed using the JPEG and the JPEG 2000 algorithms. The image database for the test consisted of medical and non-medical image compressed using JPEG and JPEG 2000 at bitrates between 0.2 bits per pixel (bpp) to 1 bpp. The results indicate that visually, JPEG 2000 holds some advantage over the JPEG algorithms at lower bitrates (less than 0.6 bpp), but this perceptual difference between them tends to vanish at higher bitrates.
 
"Delay Bound Rich Image Delivery over WLANs" Abstract by Shira Krishnan
Today's globally distributed teams that seek the convenience of wireless workspace have created an increased need for collaboration through image transmission.  These scenarios require both acceptable image quality and transmission delay, while using networks that carry other services, such as voice, data and video as well. It is especially challenging when High-Definition images requiring lossless transmissions are encountered.  In this work, an attempt is made to identify wireless network systems that is most suited for rich media delivery within set interactive timeframes using a network designed to handle other forms of traffic as well. Traffic of interest is High-Density Images that have a delay bound acceptable to the users. Three different sizes of images with three degrees of compression - Raw format, Mathematically lossless compressed form and Diagnostically lossless compressed form are studied while making recommendations.
 
"Mean Time Between Visible Artifacts in Visual Communications" Abstract by Nitin Suresh
As digital communication of television content becomes more pervasive, and as networks supporting such communication become increasingly diverse, the long-standing problem of assessing video quality by objective measurements becomes particularly important. Content owners as well as content distributors stand to benefit from rapid objective measurements that correlate well with subjective assessments, and further, do not depend on the availability of the original reference video. This thesis investigates different techniques of subjective and objective video evaluation. Our research recommends a functional quality metric called Mean Time Between Failures (MTBF) [1] where failure refers to video artifacts deemed to be perceptually noticeable, and investigates objective measurements that correlate well with subjective evaluations of MTBF. In this work, the subjective tests for evaluating MTBF involve different video clips from the Video Quality Experts Group (VQEG [2]) encoded in MPEG-2 format at bit rates in the range of 1.5 - 5 Mbps, and subject to packet losses in the range of 0.1 – 2.0 %. Each of the test clips is 140 seconds in length, and a diverse viewer pool of 30 subjects was used. Work has been done for determining the usefulness of some existing objective metric by noting their correlation with MTBF. The metrics studied include full-reference, reduced-reference and noreference objective metrics: PSNR, Just Noticeable Difference metric (JND) [3], Spatial Temporal Join Metric (STJM) [4] and Blockiness metric (BLK) [5]. The research also includes experimentation with network-induced artifacts, and a study on statistical methods for correlating candidate objective measurements with the subjective metric [6]. The statistical significance and spread properties for the correlations are studied, and a comparison of subjective MTBF with the existing subjective measure of MOS is performed. These results suggest that MTBF has a direct and predictable relationship with MOS, and that they have similar variations across different viewers, when computed over any clip The research is particularly concerned with the development of new no-reference objective metrics that are easy to compute in real time, as well as correlate better than current metrics with the intuitively appealing MTBF measure. The approach to obtaining greater subjective relevance has included the study of better spatial-temporal models for noise-masking and test data pooling in video perception.

A new objective metric, 'Automatic Video Quality' metric (AVQ) [6] is described and shown to be implemented in real time with a high degree of correlation with actual subjective MTBF scores, with the correlation values approaching the correlations of metrics that use full or partial reference. This is metric does not need any reference to the original video, and when used to display MPEG2 streams, calculates and indicates the video quality in terms of MTBF. Certain diagnostics like the amount of compression and network artifacts are also shown.
 
 

"Elastic Algorithms for Region of Interest Video Compression, with Applications to Mobile Telehealth" Abstract by Sira Rao

Video is the most demanding modality from the viewpoints of bandwidth, computational complexity, and resolution. Thus, there has been limited progress in the field of mobile video technology. In the research, the focus is on elastic wireless video technology, and its adaptation to diagnostic application requirements in real-time clinical assessment. It is important and timely to apply wireless video technology to real-time remote diagnosis of emergent medical events. This premise comes from initial successes in telehealth based on wired networks. The enablement of mobility (for the physician and/or the patient) by wireless communication will be a next major step, but this advance will depend on definitive and compelling demonstrations of reliability. Thus, an important goal of the research is to develop a complete methodology that will be embraced by physicians. Acute pediatric asthma has been identified as a domain where this new capability will be highly welcome.

The research uses flexible and interactive algorithms for Region-of-Interest (ROI) processing. ROI processing is a useful approach to achieve the optimal balance in the quality-bandwidth tradeoff characteristic of visual communication services. The notion of ROI has been traditionally used mostly for foreground-background separation in scene rendering and manipulation, and only more recently for variably quality compression. Even when the latter goal is considered, quality criteria have been ad-hoc and at best useful for video conferencing, given that the medical domain has its own fidelity criteria. The research thus focuses on the design of an elastic ROI-based compression paradigm with medical diagnosis as a central criterion.

The research describes the methodology to achieve elasticity through rate control algorithms at the encoder. An elastic approach is proposed that uses a priori user-specified video quality information, quantifies this information, and incorporates this into the encoder in the form of region-quality mappings. This method is compared to a parametric bit allocation approach that is based on region-features and a set of tuning weights. A number of videos of actual patients were filmed and used as the video database for the developed algorithms. In testing the elastic and parametric algorithms, both objective measures – in the form of Peak Signal to Noise Ratio (PSNR), and subjective evaluations were used.

 
"Techniques for Robust Communications of One-Way Video" Abstract by Seong Hwan Jang
The primary objective of the proposed research is to develop video coding systems over error prone networks for one-way transmission such as Video on Demand (VOD), digital broadcasting TV and video messaging. In this proposal, we investigate the efficient video coding and transmission algorithms using the characteristics of one-way video transmission systems.

Recently, there has been a great demand for high quality visual services. In particular, one-way digital video services using pre-encoded video bit streams are becoming widely available via internet or wireless channels. However, due to bandwidth constraints and transmission errors, the transmitted and decoded video quality is still inadequate. The delay constraints for interactive real time video applications such as video conferencing make it even more difficult to effectively encode and transmit the video signal. On the other hand, there are some unique conditions for both source coding and transmission, in one-way video transmission systems. In one-way communication, the encoder is allowed to have much more coding delay, and can take advantage of this for effective coding and transmission.

Video encoders have to operate within fixed bandwidth limitations unless network provides variable bit rate (VBR) transmission. The output bit rate of encoder increases as the video sequence has a large amount of motions or textures, while it decreases as the video sequence has stationary scenes. The exceeded output bit rate of encoder should be trimmed in order to meet the bandwidth constraints by an appropriate rate control algorithm. Higher compression ratios are possible at the cost of imperfect video source representation in decoder. Therefore, the video quality could fluctuate according to the video sequences at the same bit rate. In one-way video application, encoder can access to a limited number of subsequent future frames as well as to the current frame, for efficient temporal bit allocation and constant quality. The optimal bit allocation problem can be solved by Lagrangian Multiplier-based operational rate-distortion (R-D) frame work and some coding delay [4-8]. However, many popular video coding schemes involve dependent coding units such as motion compensations. Therefore, the set of available R-D operating points of predictive frame depends on the R-D points of reference frame. The complexity of solving optimal bit allocation problem using dependent R-D tree exponentially increases the progress of dependent tree depth. In order to ease this computational complexity, path-pruning algorithm or dynamic programming algorithm can be used to avoid the need to grow all the R-D data, while retaining optimality.

The output of encoder is connected to a buffer whose purpose is to even out the fluctuations of variable rate and to transmit the output bit stream at constant bit rate (CBR). The range of bit allocation is constrained again to prevent the buffer from overflow or underflow. The constraints can be reduced by increasing buffer size. However, as encoder enlarges the size of buffer for more flexible bit allocation, initial delay in decoder buffer should be increased, too. If we use pseudo VBR scheme and increase the size of decoder buffer, we can have more flexibility in buffer constraints and bit allocation without increasing initial decoder buffer delay. The basic idea of pseudo VBR is that encoder can save and store bits in stationary sequences in order to use the stored bits in active sequences. This scheme is possible in one-way video coding by accessing and observing limited number of sequences to be coded.

Even though video quality is optimized in the given bit rate constraints, the decoded quality can be severely damaged by transmission error. Error concealment scheme can be used to visually hide the damaged area, but the quality is not effective at packet error rate of more than 3%, since the error region is propagated both spatially and temporally [15–20]. The error propagation can be minimized by error resilient coding scheme. The basic idea of error resilient coding is to prevent spatial error propagation by marker bits insertion and temporal error propagation by INTRA mode update. The error resilient coding is not optimal in the sense of coding efficiency in error-free environments. The error resilient coding scheme can be different with the presence of network or channel feedback information. Therefore, we need to distinguish video coding scheme among error-free, error prone with feedback and error prone without feedback environments. To dynamically change the coding mode of pre-encoded bit stream with feedback, a transcoding scheme can be used. The main issue of transcoding is to minimize the computational complexity while minimizing quality degradation.

Multiple Description coding (MDC) is a coding technique that generates independently coded multiple bit stream to transmit in separate path. Even though only one or a few description bit streams are successively received, the decoder can still reconstruct a lower, but acceptable quality. Object oriented source-channel coding is another approach to improve visual quality and error resilience. The main idea of the object oriented approach is to discriminate resource allocation between objects and non-objects, since human attention is usually on one dominant object. The object-oriented coding approaches are characterized by computationally intensive algorithms for segmenting objects, which is acceptable in one-way video. The approach can also provide adaptivity to the semantic content of video.

Therefore, we can improve the visual quality and error resilience in one-way video coding system by efficient coding scheme and encoding delay. The error resilience scheme in one-way video should be different by the presence of feedback information from channel or networks.
 

"Distributed Speech Recognition for Mobile Devices " Abstract by Brian Delaney
Mobile wireless devices are a driving force in the computer and communications industry. The demand for tetherless access to data will drive the industry toward smaller but more capable devices. It has already begun with the widespread use of Personal Digital Assistants (PDAs) and cellular telephones. The trend continues toward smaller devices which offer high quality wireless web browsing, multimedia e-mail and messaging services, as well as personal data management (scheduling, contacts, etc.) These pocket sized devices have small screens and little to no keyboard input, so appropriate use of speech technology can allow users to interact with the system in a natural manner. However, the problems of speech recognition, speech synthesis, and wireless connectivity are far from solved, thus the currently fielded solutions have many deficiencies. Integrating each of the technologies into a robust wireless voice user interface is a difficult task given the problems associated with each of the enabling technologies. The high computational demands of multimedia processing applications on digital hardware further complicates the problem.

Given that portable wireless devices are limited in computation, memory size, wireless bandwidth, and battery energy, distributing the speech recognition task across the network is an attractive alternative. Speech recognition can be a computationally demanding application that can easily use all available resources. An in-depth understanding of these issues in the context of a distributed speech recognition system will enable designers of future systems to build more efficient devices and algorithms. In particular, we will study the effects of wireless networking and fading channel characteristics on distributed speech recognition. We will investigate quality of service and energy trade-offs in this context.

 

"Minimum Distortion Data Hiding For Compressed Images" Abstract by Cagatay Candan
In this thesis, we present a new data hiding method for compressed images. The method embeds a given set of digital data into the JPEG compressed images at a minimal loss of image quality. The image compression rate and information embedding rate of the method is a variable determined by the user. The method is designed by minimizing the distortion with the objective distortion metrics such as the expected mean square error and then its performance is improved by using subjective metrics based on human visual system, such as the Just-Noticeable Distortion metric. The distortion performance of the method has been evaluated with the subjective tests and some other critical factors such as the effect of image-size and the change in file-size after embedding are examined. We also describe an application to the designed method in this thesis. The application upgrades the security level of JPEG image transmission without any modification to the existing standard or infrastructure. With this upgrade, it is possible to verify the identity of the sender and detect the tampering locations in the received image.

In this chapter, we present a description of data hiding and introduce the problem examined and examine the motivation for this line of research. The motivation is described with a complicated video application to illustrate all of the requirements for similar applications. We have preferred to explicitly discuss this example, since most data hiding literature is not designed for this kind of an application, instead they are targeted towards copyright protection applications. The copyright application aims to embed an imperceptible data into a multimedia signal in such a way that embedded data is guaranteed to survive after deliberate attacks of hackers. For this application field, the level of distortion on the multimedia signal is a secondary factor in comparison with the robustness feature. The application area of our minimally distortive method is the communications applications. The previously mentioned standard compatible upgrade of the security of JPEG compression system is a good example for such applications. To further elaborate the application range, we have a discussion about a complicated video data broadcasting system. This discussion should pinpoint the necessary requirement or expectations from a data hiding system and illustrate the applicability of the this relatively new line of research. In this thesis, we focus still JPEG images which is the major sub-component of MPEG based video systems.

Communications & Networking

Wireless Channel Modeling: Ray Tracing for Propagation Modeling: Abstracts by Junghyuck Jo
Any type of cellular or personal communication system requires careful planning and prediction of signal coverage and interference levels. Unfortunately, this type of in-depth site planning requires tremendous amounts of measured data and trial-and-error testing that can often be prohibitively expensive. Therefore, a huge demand already exists in the wireless industry for the development of accurate propagation prediction technique, such as site-specific prediction of channel information using ray tracing method. The prediction of large-scale path loss has represented the dominant application of site-specific techniques. However, as computerized site information becomes available and as future wireless systems operate with higher bandwidths, the application of deterministic prediction techniques becomes very attractive. Deterministic methods could provide a wireless engineer with any number of channel parameters, including angle-of-arrival, delay spread, fading characteristics, and the complete channel impulse response. Site-specific techniques promise to facilitate the design of wireless modems by replacing the test and measurement with the convenience of computer simulations. In fact, the ability of a ray tracing algorithm to estimate the actual wideband channel impulse response with angle-of-arrival data may make significant advances in many other areas of wireless research that depend on spatial-temporal channel characteristics. These areas include position location, adaptive arrays, smart antennas, diversity, and equalization.
   
Interference Modeling between IEEE 802.11 and Bluetooth Wireless Systems
IEEE 802.11 and Bluetooth radios share common spectrum in the 2.45GHz ISM band. The issue of coexistence between IEEE 802.11 Direct Sequence Spread Spectrum (DSSS) and Bluetooth radios with both radio types located within a mixed environment is studied. This study focuses exclusively on the reliability of the IEEE 802.11 wireless network in the presence of interference from Bluetooth radios. The reverse situation that is modeling interference from IEEE 802.11 to Bluetooth is not analyzed.
 
Cross talk Cancellation in DSL Systems: Abstract by Roberto A. Uzcategui
No one would contend that the twisted pair access network does not represent one of the biggest investments made by the telephone companies. The importance of that asset explains why attempts to provide new services have focused on making the most out of the existing infrastructure. Digital subscriber line (DSL) systems transport information at broadband speeds over telephone cables, thus giving telephone companies the opportunity to compete with cable, wireless and satellite service providers for a share of the emerging broadband market without needing to overhaul their outside plant.

Unfortunately, because it was designed for voice-grade communications, the physical makeup of telephone cables gives rise to a series of impairments at broadband speeds. One of them, whose impact is minor at voice-band frequencies but potentially crippling at wider bandwidths, is the capacitive and inductive coupling between the signals transported in different subscriber loops known as crosstalk. Crosstalk may be produced by a transmitter located at either the same (NEXT) or the opposite end (FEXT) of the cable relative to the position of the disturbed receiver.

For research purposes, crosstalk is considered an additive phenomenon in which each individual interfering signal is produced by passing the information-bearing signal the other users through the appropriate coupling transfer functions.  Since the resulting model is very similar to the multi-access channel model employed in wireless multiuser communications, it is intuitively appealing to try to use interference cancellation techniques similar to the ones used in wireless communications to mitigate crosstalk. Multiuser detection is one of such techniques.
 

Multimedia Transport

"Multimedia Communications Over Wireless Home Networks" Abstract by Babak Firoozbakhsh
In recent years, there has been a lot of interest in wireless residential networks in order to connect the many devices inside the house. Ideally, the devices can join the network in a plug and play fashion, and can easily move within the house. In addition to sharing the properties of indoor wireless LAN's, wireless home networks pose many unique challenges. They have to support a large number of applications with different characteristics and different needs. The QoS requirements range from low delay, high bandwidth to high delay, low bandwidth communication. In addition, wireless home networks must be scalable, flexible, safe, secure, easy to use, inexpensive, power efficient, and tolerant of other (wireless) technologies within the home.

In our research, we have focused on a wide range of issues spanning the wireless homes. We started by a novel demonstration of the communication of vital signs from the body over WaveLAN wireless networks. We have also studied the feasibility of UWB in an indoor wireless area subjected to interference from IEEE 802.11a. We are currently focusing on conducting measurements in the Georgia Tech Residential Laboratory (Aware Home), as well as developing an optimized UWB medium access control (MAC) protocol specifically designed for high data rate wireless home networks.

Computing

"Mixed Initiative Multimedia for Mobile Devices: Design of a Semantically Relevant & Low Latency System for News Video Recommendations" Abstract by Jeannie Lee
Mobile devices have inherent resource constraints such as limited network bandwidth and small screen size. To facilitate access to news video on mobile devices, a coordinated design approach is taken, considering various system perspectives. The goal is to provide a cognitively palatable stream of videos and a seamless and low latency user experience through the use of an adaptive mixed-initiative interface to solicit user relevance feedback, and content retrieval integrated with client-side video buffering and pre-fetching. These various components are otherwise usually considered independently in the system design. The experiments suggest that this approach is helpful for recommending news video content on a mobile device, and areas for future investigation are outlined.

 
"A Network-Aware Semantics-Sensitive Image Retrieval System" Abstract by Janghyun Yoon
While significant progress has been made in content-based image retrieval over the last several years, there has been less work that has addressed issues related to overall system design from a networked system viewpoint. Since most of the image retrieval services are requested by remote users, possibly mobile users with limited device resources, the net-work environments of the CBIR systems will affect the overall performance of the image retrieval process. Currently, this process is optimized and well-tuned on stand-alone workstations with traditional performance metrics such as recall and precision. These metrics do not guarantee a satisfactory user experience in a general network scenario.

In this dissertation, we investigate how to enhance semantic relevancy in the retrieval process by semantic feature extraction and relevance feedback. In addition, we propose prefetching and scalable image delivery (progressive and region of interest based delivery of images) to reduce network latency in the retrieval process and to adapt the retrieval process to the network bandwidth and the capabilities of the user device (especially, size of the display screen). However, these two goals, the maximization of semantic relevancy and the minimization of retrieval latency, sometimes conflict with each other. If the user's resources are limited, a slight sacrifice of semantic relevancy can im-prove the overall performance of image retrieval. As such, we investigate the issues on the joint optimization of these two goals as a specific goal of this dissertation.
 

The MMC program is part of the School of ECE at Georgia Tech, with cross-disciplinary collaborations , particularly with the College of Computing. Industry connections are in the context of advanced telecommunications initiatives such as the Georgia Tech Broadband Institute and the Yamacraw and GCATT programs of the Georgia Research Alliance. We are also involved in specific student internships with industry partners such as Alcatel-Lucent, AT&T, Cox Communications, Cisco, HP Labs, NCR , Echostar, EG Technology, Tellabs and VQLink.


Home | About MMC | Faculty Leader | Personnel | Research | Courses | Links

More information or problems with this page? Please contact web master Please contact Gordon L. Stuber

This page has been accessed Have a nice day! times since March 1, 2001.
This page was last modified January 2009 .
Copyright (C) 2009 MMC, Georgia Tech. All Rights Reserved.

School of Electrical and Computer Engineering Homepage