A DASH-Based Adaptive Multiple Sensorial Content Delivery Solution for Improved User Quality of Experience

Increasing number of researchers are focusing on the emerging communication technologies which enrich user perceived quality of experience by involving vision, auditory, tactile, olfaction, gustatory, and other senses. However, there are multiple challenges related to using multiple sensorial media (i.e., mulsemedia), including synchronization with the traditional multimedia content and delivery over diverse network environments. This paper proposes MulseDASH, a novel multiple sensorial media content delivery solutions based on the Dynamic Adaptive Streaming over HTTP standard (DASH). MulseDASH is described and evaluated in a real test-bed in terms of the effectiveness of its adaptive streaming and synchronization mechanisms. The extensive testing involving both network emulation and subjective assessment experiments shows how MulseDASH performs an excellent real-time streaming adjustment to match network conditions and improves user quality of experience.


I. INTRODUCTION
Current interactive rich media technologies (e.g.online multimedia streaming, social media tools, Virtual Reality (VR) and Augmented Reality (AR) applications, etc.) have narrowed significantly the distances between people, reformed the way people communicate, and provided a more immersive environment people experience.According to the most recent Cisco Internet traffic statistics report [1], the global Internet video traffic over IP which accounted for 75% of all IP data traffic in 2017 is expected to reach 82% in 2022.Moreover, the potential value of VR/AR technology is no longer a secret since numerous industry analyst reports have forecasted enormous growth for the rich media content exchange market, as they fundamentally enhance the way humans interact with digital and physical real worlds [2].The Cisco data analytics report also indicates that the VR/AR traffic will increase more than 18-fold between 2017 and 2022 [1], and VR/AR headset The associate editor coordinating the review of this manuscript and approving it for publication was Wenchi Cheng.
devices will grow from around 20 million in 2017 to nearly 100 million in terms of numbers by 2022 [3].
Increasing rich media interaction involves more than just audio and video content.The multimedia specialist and inventor, Morton L. Heilig produced Sensorama Simulator, the first VR/AR machine, which provided support for offering users a multiple sensorial experience back in 1961 [4].After several decades, 26 multimedia scientists gathered at the ACM SIGMM Conference in 2005 discussed publicly future directions in multimedia research, and made highly challenging proposals.They focused on making ''interactions with remote people and environments nearly the same as interactions with local people and environments'' and explore other media content types alongside audio and video [5]. Figure 1, illustrates haptic, olfaction (smell), airflow (wind) and other potential sensorial inputs which could be used by researchers to enhance the way people interact remotely with equipment, machines, computers and other humans [5].
Since the term mulsemedia, derived from multiple sensorial media, was first introduced in 2010 [6], a high number of pioneering research activities have involved mulsemedia and user interaction with multi-sensorial content.For instance research works have focused on user experience optimization by employing olfaction (i.e.odor, smell) [7]- [9], airflow (i.e.wind effect) [10], [11], tactile interaction (i.e.kinesthetic, haptic, vibration, etc.) [12]- [14] and even gustatory stimuli (i.e.targeting taste) [15], [16].However, most of the research already mentioned is off-line and has involved people interacting with local applications only.Additionally, most of these works are related to a single novel sensorial stimulus apart from audio and video, instead of designing solutions for interaction with multiple stimuli rich media content.Finally, even fewer researchers have focused on mulsemedia content delivery-related aspects [17].
At the same time, diverse studies indicate a clear trend of user preference and industry push towards rich media content (e.g.ultra high definition video, VR/AR, omni-directional video, etc.), including delivery solutions [1].Such content has high resolution which impacts positively user experience levels, but also it has high bitrate and low latency delivery requirements [18].Unfortunately, despite the efforts put in terms of network advancements, including Fifth Generation (5G) technologies in general and 5G Tactile Internet research and development in particular [19], there is still need for innovative delivery solutions.Such solutions should support rich media exchange over the existing networks in order to support high user quality of experience (QoE).Related to mulsemedia delivery, challenging is to balance the need for inclusion of multi-sensorial media components and consequent higher bitrate with latency and network bandwidth delivery-related requirements [17].Enabling synchronization between diverse mulsemedia components and video during their delivery is also challenging [20], [21].Adaptive solutions which adjust content delivery characteristics and ultimately transmitted bitrate to match network delivery conditions or device properties have had highly positive results, especially in terms of increasing user QoE [22], [23].Among these solutions very successful are those based on the latest standard which supports multimedia delivery adaptation, the Dynamic Adaptive Streaming over HTTP (DASH) [24].
This paper introduces MulseDASH, a novel DASH-based adaptive delivery solution for mulsemedia content which increases user QoE.In its dynamic adjustment of multiple sensorial content characteristics, MulseDASH performs an innovative trade-off between video quality and presence of diverse sensorial components.This trade off relies on the fact that the presence of other sensorial components has a masking effect on potential video quality variations, as noted for audio [25].The paper describes the adaptive mulsemedia delivery architecture, presents the principles of MulseDASH and introduces its design.MulseDASH evaluation was performed by involving a real-life implementation of MulseDASH and a real test-bed.Testing results show how user perceived QoE increases when using MulseDASH in comparison with classic approaches.
This paper is organized as follows.Section II discusses studies related to multiple sensorial media and rich media content delivery.Section III introduces MulseDASH principle and framework, whereas section IV presents the implementation of MulseDASH and its deployment on a real test-bed.The MulseDASH real-life evaluation of user experience is also described.Section V analyzes the results and highlights the outcome of MulseDASH evaluation.Section VI draws conclusions and presents future work avenues.

II. RELATED WORKS A. MULSEMEDIA: STATE OF THE ART
There have been many recent advances related to digital content beyond the classic multimedia format, including introduction of high and ultra-high resolution video, omnidirectional video content and interactive multimedia.Among these highly diverse rich media content types, multiple sensorial media (mulsemedia) has inspired academic and industrial researchers and developers, especially in relation to its potential to increase perceived QoE levels by improving user sense of reality.Researchers and designers were mostly focused on finding ways to overcome many existing challenges in acquiring, storing, displaying and exchanging mulsemedia content and propose solutions to address them.
Although the number of recent proposals involving multiple sensorial technologies is much lower than those targeting visual and auditory human senses only, many mulsemedia solutions have been proposed and have even been deployed, reaching the wide public.For instance haptic wearable devices were introduced and are used for health recovery,1 haptic gloves are part of VR gaming kits,2 haptic control is employed in robotics, 3 gas sensors are deployed for environment monitoring, 4 smell displays are used for art design 5and aroma diffusers are employed for interior decorations. 6 Next most important research outputs related to the human senses 7 other than sight and hearing are discussed.

1) TASTE
Currently, the research related to the gustatory sense is lagging behind, as overcoming the issues related to taste including its perception, interpretation, description, replication, etc. are more complex.However, recently a super-family of G-protein-coupled receptors were found to be responsible to most of human tastes (e.g.sweet, sour, salt, bitter, and savory) [26].Additionally, an artificial lipid membrane based technical taste sensor (i.e.electronic tongue) was developed to detect the same taste similarly to the human tongue. 8 Different from the costly taste sensors, some cheaper interactive taste actuating devices were implemented to stimulate the human tongue and provide the specific taste experience, which are based on changes in terms of vibration, electric current and temperature [16], [27].

2) TOUCH
Haptic technologies have been well studied, especially in the context of interaction between humans and machines, and real and virtual worlds.Haptic solutions support user immersion in VR both as input and feedback, to and from the virtual environment.By employing them could benefit various applications in diverse deployment areas such as medicine, entertainment, education, industry, arts and so on.In general, the overall machine haptic sensory-motor loop contains three major components: sensors, controller and actuators.The controller deploys the strategies or algorithms designed to process the sensing information collected by sensors and make actuators to perform actions i.e. provide users with a response or feedback [28].Haptic sensors can be divided into two primary types: tactile sensors for cutaneous perceptual measurement and kinesthetic sensors for modeling force and position.Differing in terms of their piezoresistivity, capacitance, piezoelectricity, temperature or humidity transduction, diverse tactile sensors are currently deployed on wearable human-machine interfaces (e.g.haptic gloves), skin prosthesis, strain sensors, blood flow monitors and so on [29].Regarding kinesthetic sensors, the magneto-resistive angle, optical (e.g.Microsoft Kinect), 9 acoustic and inertial (e.g.gyroscopes) sensors are employed to measure the force, velocity and relative placement information, respectively [30].Compared to haptic sensors, haptic actuators are more complex and ingenious, providing a tactile-interactive interface between real and virtual worlds, machine and human users.Currently, the haptic actuators working with 6

Muji
Aroma Diffuser: http://muji.us/store/ultrasonic-aromadiffuser.html 7 The five human senses are sight (visual), hearing (auditive), smell (olfaction), taste (gustatory) and touch. 8Insent Taste Sensors: http://www.insent.co.jp/en/products/taste_sensor_ index.html 9Kinect: https://www.xbox.com/en-US/xbox-one/accessories/kinecthaptic sensors are based on mechanical structures that offer force or vibration feedback.For example the iPhone 7 taptic engine simulates a ''3D'' tactile haptic feedback when users touch the screen.The engine uses a Linear Resonant Actuator (LRA) to generates vibrations [31].In general, the mechanical structure of force actuators is composed of a power source, a motor and a component for force transmission based on gears, pulleys/belts, oil/air pressure or capstans.For instance, actuator examples include the multiple degree of freedom manipulator Phantom Omni, which is based on a capstan drive [32] and the haptic glove Dexmo, which employs belt and pulley force transfer [33].

3) OLFACTION
Olfaction plays an important role in human daily lives, and involves stimulation by odors.Various odors influence human affective states and moods, and facilitate memory retrieval i.e. recollection of events associated with particular odors [34].Odor detection is one of the main applications of machine olfaction technologies which are based on classic chemical gas sensors, optical sensors, chromatography and other spectrometers (e.g.ion, infrared, mass, etc.) [35].
Recently, a novel Artificial Neural Networks (ANN)-based pattern recognition system was developed based on optical sensing results, which chemically learns about any changes in the surrounding environment [36].Most machine olfaction sensing technologies (e.g electronic noses) are employed in environment monitoring, industrial manufacturing, disease diagnostic and so on.Conversely, olfaction actuators working with VR applications have been widely employed in the fields of entertainment (i.e.gaming), environment decoration and education.For instance SBi4 v2, a olfaction diffuser produced by Exhalia10 has been used to assist in relevant olfaction experiments in academic education.However, it is not easy to control the diffusion direction and intensity.A 3D-printed bespoke was produced to adapt the direction of SBi4 diffusion in [9], and a Surface Acoustic Wave (SAW) device was designed to control the diffusion intensity [37].

4) AIR FLOW
In terms of sensorial input media, air flow refers to a stream of air perceived by humans.However, it has also become an important actuator, as it makes people have special feelings when the air in motion touches human skin.Often the air flow is also associated with other sensorial inputs, as for instance it carries and diffuses odors.Stand alone or in conjunction with other sensorial media, air flow enhances the immersive experience of users.Matsukura et al. in [10] developed a twodimension multisensorial field display device which conducts the spatial airflow velocity, controls odor concentration and synchronizes the output with the images in the computer screen to improve user experience.An important challenge noted in relation to air flow is controlling the wind display direction.Kulkarni et al. [38] and Nakano et al. [11] have developed a professional large wind display device called Treadport Active Wind Tunnel (TPAWT) and a portable wind display device based on a matrix of CPU fans, respectively.The former could provide users any air flow distribution with infinite number of degrees of freedom (DoF) in a big room, and the latter could create a controllable air flow experience at a short distance.However, most related works have used air flow in conjunction with other media types and have focused mainly on its delivery to users.[39].This is due to the growing number of devices and users and their increasing demand for mobile rich media networked applications.The adaptation solutions have also extended their focus, concentrating not only on network delivery, but also on other aspects such as those related to energy efficiency [40], wireless access networks [41], device screen resolutions [42], overall user QoE [43] and service cost [44].
The principles behind diverse research-proposed adaptive delivery methods have been deployed as part of mainstream practical protocols and solutions, including Microsoft Smooth Streaming (MSS), 11 Apple HTTP Live Streaming (HLS) 12 and Adobe Real Time Message Protocols. 13More recently the MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) 14 was standardized and has become the most popular technology for low-cost on-demand and live adaptive video streaming over the current network infrastructure.MPEG-DASH, which is also compatible to MSS and HLS, enables client-side selection and request from the server of video segments with different quality levels.The result is a smooth delivery of a video which adapts its bitrate according to dynamically changing network bandwidth.
Nowadays, multiple sensorial content is being used to complement classic audio-visual material.This is fueled by the existing rich support of various display devices, user interest in exploring novel technical infotainment avenues and studies which have shown how by employing multiple sensorial content, user QoE increases [17].In terms of haptics, an important research avenue primarily focuses on human-machine teleoperation technologies.One of the proposed solution has transmitted haptic information (i.e.force and vibration) over IP networks at distance.Experiments studying haptic remote collaboration over a network connection between USA and UK [45] have shown that the latency leads to instability of the remote haptic interaction in a Shared Virtual Environment (SVE) [46].Other works [47]- [49] studying SVE collaboration found that the latency tolerance of haptic communication is under 60ms and the jitter is under 10ms.
Olfaction is one of the most popular sensorial media components.A new study in neuroscience found that a VR system which incorporated smell could influence human behaviors in navigation [50].This finding is similar to those of several other olfaction media works which are focused on user experience of immersive applications (e.g.VR and AR).Richard et al. [51] and Zou et al. [52] have employed odor diffusion to enhance user QoE during immersive learning.'Smelling Screen', an olfaction display machine was developed and used by Matsukura et al. to present corresponding odor distribution while a user is watching an image sitting in the front of a monitor at a distance of 0.5m [8].The difficulty in terms of interactive olfaction-enhanced media is to maintain low any potential difference between display times of olfaction and visual content [20].A series of studies by Murray et al. found that user QoE is impacted by several factors, including skew between video and olfaction media, delivery jitter, number of odors, and user profile.The authors recommend the potential skew between olfaction and video media component should remain within −5s to +10s [21].
Impressive research and standardization effort has been put in order to enable good synchronization between different multi-sensorial media components and bridge the gap between virtual and real worlds, including when there is networked interaction with and between remote users.The Virtual Reality Markup Language (VRML) based on Extensible Markup Language (XML), used also as part of the MPEG-4 Binary Format for Scenes (BIFS), was designed to describe some haptic content (e.g.depth, stiffness, friction or any texture of a scene/object) associated with 3D or 2D objects within video content.VRML, popular for development of Web-based 3D or 2D multimedia content, was superseded by X3D developed by Web3D. 15However, latest 3D or VR content development has increasingly relied on Unity3D, 16 Unreal17 or other commercial programming platform recently, so VRML and X3D are less used in the market [53].The ISO/IEC 23005 MPEG-V standard 18 formulates, describes and organizes sensorial effects in a multimedia content based on the XML format [54].In particular MPEG-V standardizes a unified format for interaction information between real and virtual worlds, including haptic messages, vibration pattern, thermal effects and so on.In general, the MPEG-V file, formatting sensorial content and corresponding audio-visual content is multiplexed into a MPEG-2 TS container, and then transmitted to users [55].However, MPEG-V integrated with MPEG-2 TS does not support adaptive streaming over the real-time dynamic changing networks, or varied user profiles and device characteristics, affecting the user QoE during multi-sensorial effect rendering.
The ADAptive MulSemedia delivery solution (ADAMS), proposed by Yuan et.al [17], is a method for performing adaptive multi-sensorial media delivery.In ADAMS, the metadata set annotating the different sensorial content associated with the video is organized and described using XML based on the MPEG-7 standard. 19Different from MPEG-V, ADAMS adapts the specific sensorial media segments combined with the video packets according to the predicted network bandwidth variation and user profiles.
In this context, the existing mulsemedia description and delivery solutions organize the sensorial media-related information into an extended XML file associated to the audiovisual content.Additionally, some of them convey the multimedia and mulsemedia content together to the user-side immediately.They consider or not adaptation of the different levels of the media segments to specific user operational conditions (e.g.network bandwidth, user profile, device, etc).Unfortunately, it is not enough if the adaptive mulsemedia delivery solutions consider user profile-related feedback only, due to the fact that the existing sensors and actuators are highly complex and heterogeneous.Therefore, network conditions (e.g.buffer-based network information measurement) and sensory-oriented characteristics (e.g.type/model, predefined priority, effect synchronization delay, etc.) should be taken into account in conjunction in order to best enhance user perceived QoE levels.Additionally, the extreme low latency requirements of mulsemedia interactive services are a great 19 MPEG-7: https://mpeg.chiariglione.org/standards/mpeg-7challenge for the current network architecture.Moreover, the current XML-based mulsemedia description standards (e.g.MPEG-V, MPEG-7, MPEG-4 BIFS, 20 etc.) are unable to address the requirements of mulsemedia delivery in the context of the heterogeneous 5G network architecture and services.
Therefore, this paper introduces MulseDASH, a novel adaptive mulsemedia streaming solution over MPEG-DASH which supports the following advanced features beyond those offered by existing related solutions: • Hierarchical MulseMedia Presentation Description • Multi-sensorial Content Encapsulation using JSON 21

• Receiver Buffer-based and Multi-sensory-oriented
Adaptive Mulsemedia Streaming Scheme Additionally, alongside MulseDASH design, this paper presents its deployment and testing in a real experimental test-bed.This demonstrates the benefit of using an adaptive mulsemedia streaming scheme with existing sensors, actuators and equipment.

III. MULSEDASH FRAMEWORK DESIGN
Inspired by the adaptive multimedia streaming standard DASH, the proposed mulsemedia streaming framework MulseDASH inherits its advantages in terms of media information organization and extends it by integrating new features.These features include support for multiple sensorial media components and integration of an adaptive delivery scheme that maintains smooth mulsemedia streaming and sensory synchronization.
The MulseDASH framework design is illustrated in Figure 2 and involves two main components which inter-communicate via the Internet: MulseDASH Server and MulseDASH Client, which are introduced next.

1) HTTP SERVER AND DATA STORAGE
As shown in Figure 2, the MulseDASH Server facilitates the HTTP responses to the multimedia/mulsemedia segment requests received from the client-side, integration of mulsemedia and multimedia, handling of specific feedback from clients (e.g.network information, device characteristics, user experience feedback) and distribution management of multimedia/mulsemedia data.The MulseDASH Server provides the interface between the mulsemedia/multimedia content and the MulseDASH clients over the HTTP-based transmission, which contains both MPEG-DASH metadata in Media Presentation Description (MPD) format and mulsemedia metadata, labeled Mulse-MPD, which will be described separately.After MulseDASH clients retrieve the MPEG-DASH MPD and Mulse-MPD from the serverside, based on their content and following the proposed MulseDASH algorithm, the clients will access the appropriate media segments from their distributed storage locations.

2) HIERARCHICAL MULSE-MPD AND MULSEMEDIA SEGMENTS
MulseDASH employs MPEG-DASH standard MPD structure for its audio-visual components.However, in order to accommodate its multi-sensorial media components, Mulse-MPD was introduced.Mulse-MPD extends MPEG-DASH and inherits the XML-based hierarchical architecture, providing a flexible and reliable content organization for different sensory effects, quality/intensity levels, adaptation sets and play periods.However, different from the classic XML-based description of sensorial media segment information in MPEG-V and ADAMS/MPEG-7, Mulse-MPD employs a JSON-based encapsulation for mulsemedia segments.This is as comparative performance studies of JSON and classic XML [56], are in favor of JSON when large number of objects encoded in JSON and XML are transmitted.The hierarchical architecture of Mulse-MPD consists of Periods, Mulse-Adaptation Sets, Representations and Mulsemedia Segments, as illustrated in Figure 3.The Mulse-MPD structure includes a sequence of Periods, where a Period contains the top-level description of a sensory element including start time (i.e.PST) and duration (i.e.DUR).As the current multi-sensorial devices follow diverse production standards, sensory media codecs and intercommunication protocols, Mulse-MPD also accommodates this rich variability.Therefore, each Period contains several Mulse-Adaptation Sets that are associated with different adaptation groups defined by users or systems depending on the mulsemedia effect types, device characteristics, user preferences, scenarios and so on.
For example, two different types of haptic effects are going to be rendered via two different haptic devices, respectively.One haptic effect is played on a vibrating mouse, whereas the other one is rendered via a haptic gaming vest produced by a different company, so two Mulse-Adaptation Sets are needed in Mulse-MPD.Also another example is shown in Figure 3   The MulseDASH client addresses two major concerns: adaptive mulsemedia streaming and synchronization between multimedia and mulsemedia content.They are introduced next.

1) ADAPTIVE MULSEMEDIA STREAMING ALGORITHM
Different from the conventional multimedia streaming, mulsemedia streaming is not always continuous.Depending on the audio-visual scene design, at any moment in time, there may or may not be required a multi-sensorial effect playout.Therefore these effects are distributed discretely along the continuous audio-visual content timeline.According to the Mulse-MPD structure, the discretely distributed mulsemedia content will be divided into segments and only some of them involve mulsemedia content.This is indicated by their Type set to ''full'', or ''empty'' (i.e.indicating there is no mulsemedia effect during this time slot and zero padding is included (see Figure 3).At design and as shown in Figure 4, the temporal length of mulsemedia segments τ , expressed in milliseconds, is the same as that of audio-visual segments.Additionally, the Mulse-MPD Segment information contains the start time offset of mulsemedia playout in each segment, namely StartOffset or τ start .Mulsemedia streaming also considers the delay tolerances of different mulsemedia effects.For instance, a user may prefer that the haptic effect is more important than other effects, and would benefit in their experience from haptic higher intensity or may want to make sure the odor diffuses earlier and thus increase the level of scene immersiveness.

a: BUFFERED-BASED QUANTIZED RATE ADAPTATION SCHEME (BQRAS)
Consider that the types of different mulsemedia content rendering on different mulsemedia devices are denoted by I := {1, 2, 3, . . ., i, . . ., I } and |I| = I .Each mulsemedia streaming content is composed of segments, namely N := {1, 2, 3, . . ., n, . . ., N } and |N | = N shown in Figure 4.The downloading bitrate of the n th segment of the i th type of mulsemedia effect is represented as r i (n) ∈ R i and the size in bytes and length in milliseconds of the segment are , where τi (n) ≤ τ i (n) is the actual length of mulsemedia effect playout , respectively.In general, the MulseDASH client initiates a HTTP-based request to the MulseDASH server for the n th segment of the i th type of mulsemedia effect with the bitrate r i (n), and then the downloading starts immediately.Let T d i (i) be the download duration.Then the next segment of the i th type of mulsemedia effect will start to be downloaded after time T i (n): where T p i (n) is the target duration to playback a segment, and the segment request indicator denoted by a i (n) which is 1 if the download request of the n th segment is made and 0 otherwise.b i (n) is 1 if the next requested segment is ''Full'' and 0 otherwise.After its arrival, n th segment is stored into the client-side playback buffer from where it is consumed by the mulsemedia player.The consumption rate is the same rate as that of one mulsemedia or audio-visual segment, hence the instantaneous buffer which stores all the mulsemedia segments measured in video time (milliseconds) can be expressed as in (2): The playback buffer level is strongly affected by the network conditions.Then the buffer level reading is noisefiltered to yield the smoothed buffer level Bi (n) depending on the moving average of the historical buffer levels, with the aim of removing the abnormal volatility of transmission delay measurement.The smoothed buffer level is calculated as in (3): where the smoothed buffer status update coefficient β n is denoted by: The coefficient β n ∈ [0, 1] is exponentially calculated based on the difference between the past buffer level moving averages, sampled at every k segments.When the smoothed  4) when β n = 0.5), the variation of the smoothed buffer level and the network condition during the past k segments is small, and the older smoothed buffer status record should account more in the overall equation.On the contrary, when 693, the variation of the buffer level and the network condition is important and therefore the updated smoothed buffer level is much more related to the current buffer reading than the old records so the latter should account for less.In general, the smoothed buffer status update coefficient β n balances the current and older buffer readings and smoothen historical buffer records.
In this paper, a quantized method is proposed to control mulsemedia adaptation.A different number q of mulsemedia effect deliveries will be adjusted based on the quantized buffer level Bi (n), according to the expression shown in (5): where z = Bi (n) − 1 2 B max , and B max is the maximum buffer level customized by the mulsemedia player.
The quantization method Q t (•) uses the logistic function Q t (x) = 1 1+exp(−γ x) , γ > 0 to activate the shifting up or down of the number q of the adapted mulsemedia effects: ), the number of mulsemedia effects affected will increase and q will decrease when the buffer level is lower than the minimum B min .Otherwise q will remain the same.

b: PRIORITY-AWARE REQUEST SCHEDULING SCHEME (PRSS)
Starting from the same number of mulsemedia effects result of BQRAS adaptation, there are multiple combinations of mulsemedia effects for possible selection.In Figure 2, the mulsemedia player also cooperates with multiple mulsemedia devices which require different levels of tolerance to the network conditions.In order to solve both effect selection and network tolerance issues, a priority-aware request scheduling scheme (PRSS) is used to determine the best adapted combination set of the mulsemedia effects triggered by the different mulsemedia effects based on network performance, pre-defined priorities, segment types and effect of mulsemedia playout StartOffset times.
In this context, PRSS selects the adapted combination subset I * sub of mulsemedia effects with the highest priority (i.e.resulted from summation of all the next requested mulsemedia segments belonging to the set) as in (7): (7) where [I] q n+1 denotes the collection of all subsets of the mulsemedia effect type set I of size q n+1 , and the I sub is the subset of the collection.p i ∈ [0, 1] is the pre-defined priority for the users.The normalized network throughput is and the nor- Considering the two stages and the proposed BQRAS and PRSS, the proposed MulseDASH adaptation algorithm is also composed of two parts which are described in Algorithm 1 with the complexity denoted by O(n • n).However, as the number of effect types is in general small, the computation resource requirement is also low.

2) SYNCHRONIZATION BETWEEN MULTIMEDIA AND MULSEMEDIA CONTENT
There is evidence [57] that the response time of tactile stimuli for individuals was 28% and 34% shorter than those for  3) and (4); 7 Calculate Q t (z) and select the next adapted quantized number q of mulsemedia effects by using ( 5) and ( 6); 8 //Part II: Priority-aware Request 9 //Scheduling Scheme 10 The collection of all subsets of the mulsemedia effect [I] sub is generated ; sub do 13 Calculate the priority of each mulsemedia effect for next requested segment based on (7); 14 end 15 end 16 The next requested mulsemedia segments within the combination subset I * sub with the highest priority are scheduled to be downloaded based on (7); auditory and visual stimuli, respectively.Moreover, our previous research [58] and [20], on mulsemedia synchronization has shown that the acceptable skews for tactile/haptic effects (i.e.[0, 1] seconds) were much ''narrow'' than those of airflow (i.e.[−5, 3] seconds) and olfaction (i.e.[−7.5, 10] seconds), respectively.This suggests the ''inter-stream'' synchronization involving haptic/tactile effect is much more sensitive than that of other effects in terms of user QoE.As a matter of fact, real mulsemedia and multimedia devices are affected by many encoding, decoding and playback issues which may cause varying ''inter-stream''playback delay and lower viewer QoE.This paper focuses on introducing a novel multimedia-mulsemedia content synchronization algorithm to offer ''smooth'' user experience when availing remotely from MulseDASH content.The algorithm for mulsemedia synchronization is presented in Algorithm 2. It has computation complexity of O(n).
Assuming that the i th effect is selected as the base clock and the other effect j th ∈ I\i is about to synchronize.Due to its higher sensitivity, the haptic effect playing time is selected as the reference time clock (i.e.base clock).The time difference T c_diff j (n) between the base clock and other effect's playback clock is used to adjust the new play timestamp T c_start j (n + 1) of that effect.When T c_diff j (n) is smaller than the length of a segment, if the T c_diff j (n) is lower than the threshold d thres (δ is the target threshold value defined by the system), it means that the playout rate of the effect is slower than that of the haptic effect and suggests the next segment be played  immediately; otherwise the playout rate is faster than that of the haptic effect and the next segment will be played after a sleep moment λ•d j (n) (i.e.λ is a synchronization factor which can be tuned).Following this solution, after some necessary adjustments, the mulsemedia effect playout times will be synchronized.
In general, the video and audio components are already synchronized.If the mulsemedia is about to synchronize with the multimedia content, then next mulsemedia and multimedia tracks can be synchronized following (8): where d mulse is the final synchronized delay of mulsemedia tracks calculated based on Algorithm 2, and d av is calculated based on the audiovisual media synchronization mechanism which defined by general audiovisual player (i.e. it is not the point of this paper).The next played segments of mulsemedia and multimedia will start after the same delay of max{d mulse , d av }.

IV. MULSEDASH DEPLOYMENT AND EVALUATION
MulseDASH was deployed in a real-life system and was employed for delivery of multi-sensorial media including airflow, haptic and olfactive stimuli alongside the audiovisual components.This section describes the performance evaluation and subjective testing setup in a real network environment.
Due to the lack of mature mulsemedia hardware support, the devices used for MulseDASH testing have been modified to fit the purpose.

A. HARDWARE SETUP
Due to the lack of mature mulsemedia hardware support, the devices used for MulseDASH testing have been modified to fit the purpose.The following devices were modified and used, as illustrated in Figure 5: • Airflow Generator consists of a Pulse-width Modulation fan and an Arduino board-based circuitry which was designed to enable control of the fan.A specific Arduino program code was written, compiled and run on the Arduino board to control the fan ON, OFF and its speed.
• Haptic Mouse is a SteelSeries Rival 700 professional gaming mouse which provides vibration effects during user interaction. 22Rival 700 mouse was modified to generate tactile/vibration stimuli of different intensity, duration and frequency for users during the multimedia content playback.
• Olfaction Diffuser Sbi4 was produced by Exhalia. 23he diffuser has 4 fans and when equipped with 4 aromatic cubes can distribute 4 different scents at different times.The support SDK toolkit help to control the diffusion in terms of scent type, delay, strength and density.

B. SOFTWARE SETUP
The MulseDASH player was developed as a Web-based mulsemedia player application using JavaScript to enable deployment of MulseDASH and work with the modified devices.The player follows the MulseDASH architecture illustrated in Figure 2, extends the MPEG DASH player developed part of the dash.jsproject 24 and integrates a dashboard-based management module.The mulsemedia player and its dashboard-based management module support Mulse-MPD request and retrieval, network connection and receiver buffer information management, multimedia display, mulsemedia device connectivity, multi-sensorial effect rendering and control and mulsemedia and multimedia synchronization.A screenshot is shown in Figure 5.

C. MULSEDASH PERFORMANCE EVALUATION SETUP
In order to evaluate the performance of MulseDASH and especially its mulsemedia and multimedia synchronization algorithm proposed in this paper, a real network environment was setup and mulsemedia and multimedia delivery experiments were run.Similar to the network architecture illustrated in Figure 2, the network evaluation framework consists of a Network Emulator, a Linux-based HTTP server and a HTTP client.MPEG-DASH and MulseDASH content was stored at the HTTP server, and Node.js was deployed.The proposed MulseDASH player was installed at the client and was working with the mulsemedia devices.The Linux Traffic Control (TC) utility supported by Network Emulator (NetEm) 25 was deployed between server and client to emulate desired changes of the real network conditions.TC invokes the Linux kernel packet scheduler to control packet delay and loss and limits the outbound bottleneck at the server to simulate network load.Poisson and Exponential distributions (i.e.X ∼ Poisson(µ) and X ∼ Exp(λ −1 ), µ and λ −1 are the means of the distributions, respectively) are utilized to model the outbound bandwidth limitation and inter-arrival time of the concurrent events in our network emulation, respectively.The configuration of the network emulation is illustrated in Table 1.
In the network emulation experiments, the server outbound total bandwidth is limited to 10 Mbps and three different scenarios are designed and run using Python scripts: • Scenario 1: A Poisson distribution with µ = 7 was used to model a high number of users who access the streaming service at the same time, and an Exponential distribution with λ −1 = 2 was employed to simulate the high inter-arrival frequency of the concurrent events; • Scenario 2: A Poisson distribution with µ = 5 and an Exponential distribution with λ −1 = 5 were used to configure a medium level of concurrent user numbers and inter-arrival frequency, respectively; 25 Linux NetEm: https://wiki.linuxfoundation.org/networking/netem• Scenario 3: The means of the distributions, µ = 3 and λ = 8 are used to model a low concurrent user number and low event inter-arrival frequency, respectively.A high-quality video clip (i.e.1080P, 3840kbps, 30fps) was cropped from the Big Buck Bunny animation movie (i.e. from 2:10 to 7:30) and was encoded with the 3 types of mulsemedia effects in order to test the synchronization mechanism.The indexes i of the effects are as follows: haptic i = 1, olfaction i = 2 and airflow i = 3, and the priorities p i of the effects in the adaptation mechanism are all set to 1.The 320-second long video clip contains content with high temporal and spatial encoding complexity [42] which causes high data rate variance of the adaptive multimedia streaming segments (see Figure 6c).

V. MULSEDASH TESTING -RESULT ANALYSIS A. MULSEDASH SYNCHRONIZATION PERFORMANCE ANALYSIS 1) IMPACT OF ADAPTIVE STREAMING ON THE SYNCHRONIZATION INTER-MEDIA DELAY
From the ''up-and-down'' variances illustrated in Figure 6a and Figure 6c, both inter-media delays and adaptive segment bitrate levels are recorded.Notably, each change of the intermedia delay in a current time slot is triggered by a change of the adaptive bitrate level in the past slot, and a high level bitrate causes larger download time.At the beginning of the video clip (i.e. from 0s to 80s), the adaptive bitrate levels start from low to high, and then the inter-media delay varies from a very high level to a low level based on the mechanism proposed in (8).This mechanism helps reduce the gap of the inter-media delay between the mulsemedia and multimedia download time.Another case of impact of adaptive multimedia bitrate levels on the inter-media delay is shown from 150s to 200s.In this period, the inter-media delay is boosted from a ''flat'' level (i.e. from 150s to 180s) to a ''steep'' variation (i.e. from 180s to 200s) due to the sudden change of adaptive segment bitrate.Moreover, the proposed synchronization mechanism helps to smoothen the changes of inter-media delay after 2 or 3 segments after 200s.Similar cases also happen from 240s to 320s as shown in Figure 6a and Figure 6c.

2) IMPACT OF MULSEMEDIA EFFECTS ON THE SYNCHRONIZATION INTER-MEDIA DELAY
MulseDASH is tested with different numbers of mulsemedia effects in the three different load network emulation scenarios.The time-varying inter-media delays in each scenario shown in Figure 6a indicate that the 3 mulsemedia effects in the heaviest traffic environment cause high inter-media delay variation in comparison with those in the other tested scenarios.The average results of the inter-media delay suggest that the adaptive streaming with highest number of mulsemedia effects causes higher delays than those with lower number of mulsemedia effects.For example in Figure 6b, the average inter-media delay of the 3 mulsemedia effects in Scenario 1 is higher by 15.7% than that for 1 effect in Scenario 1.The same comparisons reveals the delay increasing by 13.3% and 55.8% in Scenario 2 and Scenario 3, respectively.Additionally, the inter-media delay of the same numbers of mulsemedia effects streaming tested in the lowest load scenario (i.e.Scenario 3) is on average reduced by 84.3% compared to those in the highest load scenario (i.e.Scenario 1).Moreover Figure 6d, which presents the results of average jitters in different scenarios, shows how the proposed synchronization mechanism for the mulsemedia effects is faster to reduce the jitter when the number of mulsemedia effects decreases in the high traffic environment (e.g.Scenarios 1 and 2).In general, the results of the whole network and synchronization experiments in Figure 6 demonstrate good performance for the inter-media delays and jitters (i.e.lower than 20ms) due to the proposed MulseDASH synchronization mechanism.

B. MULSEDASH: USER PERCEIVED QUALITY ANALYSIS
The ITU-T P.913 in [59] recommendations for subjective assessment of audiovisual media were followed.24 participants were invited to experience MulseDASH adaptive streaming during the subjective evaluation of MulseDASH.Each testing section has involved 64 combinations of random quality video clips with random numbers of mulsemedia effects selected from the 192 mulsemedia and multimedia combination samples defined in Section IV.Each testing section involves 8 questions (i.e.related to video and mulsemedia enjoyment) for each video clip.The subjective test results expressed in terms of Mean Opinion Score (MOS from 1 to 5) and predefined user enjoyment level (i.e. from 1 to 10) are retrieved and presented in Figure 7, Figure 8 and Figure 9, respectively.In this paper, the MOS is used to measure user perceived QoE of the video in the presence of mulsemedia effects, whereas the predefined user enjoyment level is supposed to grade user overall experience when subject to multimedia and mulsemedia effects.In the experiments without mulsemedia effects, the enjoyment level grades have increased gradually with the increase in multimedia quality (i.e. from low to high).However, MOS has not followed a similar increase pattern.For example, the MOS for high quality videos is lower than that for medium quality clips.A possible reason is the fact that MOS is averaged from different video clips with different temporal and spatial complexity content which probably affects the subjective grading of MOS.A second reason can be MOS scale's limited granularity.

b: HAPTIC EFFECT
From Figure 7, it can be noted how the MOS results have increased with video quality levels, and the lowest video quality is graded higher with addition of the haptic effect, enhancing the enjoyment level for the whole user experience.Yet, the enhancement is limited for the medium and high quality level videos, which probably is caused by the simple haptic effects (e.g.vibrations) generated by the mouse.

c: OLFACTION EFFECT
The MOS grades show that the olfaction effect improves the user perceived quality level of multimedia.In terms of enjoyment level, the grades vary most likely due to the fact that some users are not happy with olfaction effects during video playing, However, the average scores are higher by 2% compared to the tests without effects.

d: AIRFLOW EFFECT
the airflow provides the best experience quality for the participants in the tests.Compared to the haptic effect, the MOS and enjoyment level of airflow are increased by 12.3% and 4.7%, respectively.Similarly, the airflow effect also increases with 2.9% the subjective video quality and provides with 4.2% higher user experience than those due to the olfaction effect.

2) IMPACT OF DIFFERENT NUMBER OF MULSEMEDIA EFFECTS ON USER EXPERIENCE
Depending on the analysis for the impact of different types of mulsemedia effects, it has been confirmed that the mulsemedia effects influence the user experience during video clip playout, even increase the perceived video quality level when watching a lower quality video clip.Moreover, the results of the experiments with multiple effect combinations, shown in Figure 7, provide more favorable evidences to demonstrate that the number of effects also influences users' grading.For example, the combination of one type of haptic and one type of airflow effects increases the user enjoyment level by 7.8% compared to the case with only one haptic effect is employed.The combination of haptic and olfaction effects also enhances the user perceived quality (i.e.MOS) and the whole enjoyment level by 9.6% and 1% compared to those of any one of them, respectively.However, the combination of olfaction and airflow does not have any improvement in terms of user experience.The potential reason is that the airflow would boost the diffusion and concentration of the odours which might irritate some of the participants who do not like the smell.Due to the random types and numbers of mulsemedia effects that would be generated in the real would, the average results of MOS and enjoyment level for the different numbers of effects are presented in Figure 8.Both user perceived quality and enjoyment level are increased gradually with the number of effects growing from 0 to 3. For instance, the average increasing rate of MOS with the number of effects is 2.26%, and the average increasing rate of enjoyment level score is 6.80%.In total, all of the results shown in Figure 7 and Figure 8 indicate that the participants have mostly enjoyed when the they are watching a video clip with 3 types of mulsemedia effects, for which both MOS and enjoyment level score have reached the highest level.

3) STUDY OF TRADE-OFF BETWEEN MULSEMEDIA EFFECTS AND VIDEO QUALITY
There is an interesting dilemma: what do the participants actually prefer to access: more mulsemedia effects or higher video quality level?The trade-offs between employing mulsemedia effects and improving video quality are displayed in Figure 9. Between 60% to 80% of the participants who are watching high quality and low quality videos, prefer adding more mulsemedia effects.More mulsemedia effects help those participants to improve their enjoyment level.The results show that the participants who are watching medium quality video clips have a good balance between the number of mulsemedia effect and video quality.Those of them who prefer increasing the video quality level gradually when the number of mulsemedia effects is also increasing .

VI. CONCLUSION AND FUTURE WORKS
This paper has proposed MulseDASH, an innovative adaptive mulsemedia streaming solution which is designed to improve user QoE levels.MulseDASH was tested and the network performance evaluation results present much lower intermedia delay results (i.e., the average value < 18ms) against the skew requirement between different media provided in [21], [47]- [49].The subjective tests for the different combinations of mulsemedia effects show that the MulseDASH improves the user enjoyment level and perceived quality during the whole video playing.Moreover, more users prefer adding more effects while watching the video in a lower quality level.Future work includes large-scale subjective assessments, adaptation/scheduling/synchronization algorithm improvement and evaluation in wireless network environments which contain more complex heterogeneous conditions and higher latency requirement applied in the future Internet of Things or 5G scenarios.
presenting how different types of odors (i.e.chocolate and diesel) are decoded by different Mulse-Adaptation Sets, respectively.Additionally, different numbers of Mulse-Adaptation Sets can be grouped and delivered based on different user preferences and scenarios designed in the audiovisual content.For instance, three different Mulse-Adaptation Sets with three different types of odors are configured in the Mulse-MPD for a movie scene designed with three types of odors.Additionally, a haptic Mulse-Adaptation Set is configured in Mulse-MPD if the user prefers the haptic effect or if the haptic device is available (i.e.some devices may be ''offline'' or with low battery levels).In each Mulse-Adaptation Set, the same sensory content can be encoded and shown in different Mulse-MPD Representations.Different from the classic MPD Representation which is dependent on the bitrates or other video quality features, the Mulse-MPD Representation considers specific features of the sensory content.For instance, the haptic effect can have different Mulse-MPD Representations which differ in terms of their intensity levels.Similarly, the olfaction effect can have different intensity levels, as shown in Figure 3. Additionally, the different Mulse-MPD Representations can have different start offsets and play durations depending on user preferences.The Representation can enable adaptation of different sensory effects based on the network conditions, device characteristics, pricing strategies and so on.Considering a delay-sensitive case for example, the higher level Mulse-MPD Representation with high intensity of vibration effects can be downgraded to a lower Mulse-MPD Representation with lower intensity when the network delay becomes longer.Notably, the sensory content organization and configuration in Mulse-MPD Representation is much more flexible than those designed based on MPEG-V and MPEG-7.Mulsemedia Segments contain the actual sensory media information and describe the type of sensory effect, start time, duration, effect intensity and other content-related data, stored in JSON format.
Figure 3 also includes WURL, HURL and OURL, which are used to indicate the URL address of Wind effect, Haptic effect and Olfaction effect segments, respectively.Note that other sensorial effects can also be considered.B. MULSEDASH CLIENT Extending the classic MPEG-DASH client, the MulseDASH client retrieves both MPD and Mulse-MPD files from the MulseDASH server.An innovative MulseDASH Adaptive Streaming Algorithm is introduced which governs the manner in which audio-visual and multi-sensorial segments are requested and presented to the viewers in order to achieve increased user QoE levels.

FIGURE 4 .
FIGURE 4. Timeline of Diverse Multimedia and Mulsemedia Segments.

Algorithm 2
Mulsemedia Synchronization Algorithm 1 initialization: T c_base , T c_diff i , T c_start i , d i ; 2 Get the base clock time: T c_base (n) = T c i (n) // the current playback timestamp of the n th segment of the i th type of mulsemedia effect ; 3 foreach j ∈ I\i do 4 T c_diff j

D
. MULSEDASH SUBJECTIVE TESTING SETUP Subjective testing experiments are performed to assess the user perceived quality when employing MulseDASH-based adaptation.The video clips used in the subjective tests are pre-encoded from the animation movie, Big Buck Bunny, using different quality levels from high to low, as described in Table 2. Eight 30-seconds video clips with a wide range of temporal and spatial complexity are selected from the 596-second long movie.The eight video clips encoded at 3 different quality levels (i.e.generating 24 video clips) are compiled by MP4Box 26 in MPEG-DASH-formatted MPD files with 2-second long video segments.

FIGURE 6 .
FIGURE 6. Results of Network Emulation for the Synchronization Mechanism.

FIGURE 7 .
FIGURE 7. User Perceived Experience for Different Types of Mulsemedia Effects.

1 )
Figure 7 presents the user experience quality grading for one case with no mulsemedia effects and 7 cases of different mulsemedia effect combinations.

FIGURE 8 .
FIGURE 8. User Perceived Experience for Different Numbers of Mulsemedia Effects.

FIGURE 9 .
FIGURE 9.More Mulsemedia Effects VS.Higher Video Quality Level.