SEARCH WITHIN CONTENT
Citation Information : International Journal of Orientation & Mobility. Volume 2, Issue 1, Pages 8-26, DOI: https://doi.org/10.21307/ijom-2009-002
License : (CC-BY-NC-ND-4.0)
Published Online: 16-April-2018
The Americans with Disabilities Act has brought about many changes that have improved the accessibility of public spaces for people with disabilities. However, accommodations for people who are blind or vision impaired have lagged behind because of the technical difficulties of adapting visually oriented environments for those who cannot distinguish visual cues (Landau, Wiener, Naghshineh, & Giusti, 2005). Part of this problem is that of the three types of movement information used by humans in navigation - position, velocity, and acceleration, only acceleration-based information does not require any external visual or audio cues (Loomis, Klatzky, Golledge, Cicinelli, Pellegrino, & Fry, 1993). As a result, people with vision impairment are dependent on prior knowledge of the environment (from previous experience, visual descriptions, raised-line maps, GPS information systems) or audio and other sensory cues for navigation.
The complex process of human hearing is only beginning to be understood, although much has been published on the topic. Most of the experimental focus has been on sound localisation with less emphasis on the ways sound cues are used for indoor navigation. The research has been ‘laboratory’ oriented in the sense that the test environments do not replicate most ‘real world’ situations, a fact recognised by Middlebrooks and Green (1991). While useful information is gained from these types of experiments as far as the mechanism of human hearing, it is difficult to see how individuals will perform in ‘real world’ situations.
In an effort to increase the accessibility of public spaces for people who are vision impaired, an interactive audio-based navigation system, known as PING, was developed by Touch Graphics, Inc. of New York. The system is interfaced with the user via a portable telephone that is used to select a destination from a menu and trigger a pathway of audio beacons that play, in sequence, an attractor sound starting from the user’s location and ending at the desired destination. An attractor sound is a distinctive sound that can be identified by users of the PING system among the background noise of a public space, but is not disruptive to other visitors within the space. System users must select an attractor sound, then listen for that sound played through a beacon, and then travel towards the originating beacon. The sound can be triggered by the user as often as is necessary. When the beacon is reached, the user activates the next beacon in the path and continues, repeating the process until he or she reaches the final destination. For a more thorough description of the system, see Landau, Wiener, Naghshineh, and Giusti (2005).
Since the system supports multiple users who are navigating a space at the same time, there must be multiple attractor sounds available for selection. The sounds are stored in an auditory library, where once a sound is selected by one user, or ‘checked out’, it is unavailable to other users until it is ‘checked in’ at the end of the visit. One question that arose during system development was what are the characteristics ‘good’ sounds and ‘bad’ sounds? A ‘good’ sound was defined as one that was easily recognisable to users, well-liked by users, provided enhanced localisation, and was non-obtrusive to other people in the environment. The research reported here was conducted with two goals in mind. First was to develop a series of experiments that can be used to identify ‘good’ sounds for inclusion in the library. Second was to identify any characteristics common to the ‘good’ sounds that could aid in selecting future sounds for the library.
Participants included five sighted persons and five individuals who were blind or severely vision impaired. They ranged in age from their twenties to their fifties. The participants with vision impairment consisted of both congenitally blind and adventitiously blind persons and were experienced cane travellers. All of the participants with vision impairments possessed normal hearing, with pure tone hearing thresholds between 5 decibels and 25 decibels (dB) as tested with a Maico model MA 41 audiometer. All of the sighted participants reported no known hearing problems. Five of the participants with impairments took part in Experiment 1 (to be described) and four of those took part in Experiments 2 through 4, the fifth participant having become unavailable.
This research used both quantitative and qualitative research designs. The search for sounds focused on royalty-free files available on the internet. Chosen were 26 total sounds. The sounds under consideration were similar to those used by Guettler, Bolia, and Nelson (2000, and unpublished draft) for example, bells, birdcalls, whistles and claps. All sounds were normalised such that their time-domain peak levels were identical.
The normalised sounds were then modified in the following manners using computer software: pitch and tempo were doubled and halved, amplitude was doubled, and the sound was repeated twice in a row. The result was 156 total sounds in the Waveform audio format (*.wav) to be examined. A *.wav is a common format used to encode audio for PC-based devices. These changes were made to examine the effect they might have on sound localisation and likeability.
A screening experiment, using sighted participants with occluded vision, was used to reduce the 156 sounds to a more manageable number. The experimental setup was similar to that used by Giguere and Abel (1993), in that it was conducted in a reverberation chamber with an array of loudspeakers around the listener who made a forced-response location selection, a schematic of which is presented in Figure 1. Based on accuracy of localisation and comments from participants, 12 sounds were chosen, to be used from this point forward for testing. Each sound is described in ‘real world’ terms by assigned number in Table 1.
This research was conducted in two university locations: 1) in a laboratory reverberation chamber and 2) along a path through a large university hallway and in a museum-like exhibit area.
Three distinct experiments were developed and conducted to establish a library of sounds for the navigation system. Each experiment was designed to explore a specific aspect of the sounds in relation to their use as a navigational aid. The experiments were (a) indoor navigability and subjective participant response (b) indoor localisation and sound level, and (c) sound identification and recognition. The indoor navigability experiment involved five participants. The remaining experiments involved four of those five participants. The fifth participant was unavailable during the later testing periods.
In the navigation tests, markings, such as a circle, triangle, or ‘X’, were made and examined to determine efficiency of subjects’ travel routes to the beacons. The other tests combined automated recording of data lists by computer programs, and hand recorded data by the researchers. Data were collected using a coding system that did not directly identify the subjects. All data are reported in aggregate form.
One of the most important aspects of the sounds used for the auditory catalog is a person’s ability to navigate an intended course while localising a chosen sound. Another important aspect of the system is that users choose their own sound (it is not assigned) out of an array of available sounds in the catalog. Therefore, all of the sounds in the catalog must lend themselves to successful navigation for all users, not just one specific user. The indoor navigability experiment was designed to test each of the 12 candidate sounds and reveal which are better suited for navigation in an indoor environment.
The experimental area was within a building comprised of two intersecting hallways, a corridor, and two large empty rooms (see shaded area of Figure 2). The majority of the testing took place in room ‘A’, the museum-like exhibit area. The dimensions of the exhibit area are 8.84 metres (29 feet) by 18.29 metres (60 feet). In their similar experiments on navigation using sound, Loomis, Hebert, and Cicinelli (1990) used a large gymnasium as the test location.
Room ‘A’ had a similar construction to a typical elementary school gymnasium, consisting of cinderblock walls, a linoleum tiled floor, and a sound-reflective ceiling.
Twelve different pathways, one for each sound, were created within the experimental area. All paths started at the intersection of the two hallways, indicated by the circle-with-cross, in Figure 2. From the starting point, six of the paths proceeded down the hallway (horizontally in Figure 2), through the corridor, briefly into room ‘B’, and then into room ‘A’. The other six paths led down the hallway (vertically in Figure 2) and directly into room ‘A’. Each path consisted of five straight legs of varying length, of which, at least three were located within the ‘exhibit’ area. At the end of each leg, a sound beacon was placed, thus, allowing the user to travel from one beacon to the next. The audio beacon enclosures ranged in height from 0.74 metre (2.4 feet) to 1.98 metres (6.5 feet). Drawings of each path were generated using laser surveying and AutoCAD software. Figure 3 indicates the beacon locations, labelled as PING #, and a list of the 12 paths defined by beacon number. The PING # labelled in the figure corresponds to the serial numbers of the unit. Some units were damaged during shipping and could not be used, so not all numbers between 1 and 13 appear in the figure.
The beacons were controlled by the PING computer system, provided by Touch Graphics, and were activated by a cordless telephone. After activating the system, the participants heard a recorded message providing instructions on how to use the phone to control the system. For the purpose of the experiment, it was only necessary for participants to press 1 to ‘ping’ the beacon (play sound) and press 2 when the destination had been reached to activate the next beacon along the path. In addition to the user-activated sounds, a separate laptop computer was used together with stereo amplifier and two 2-way bookshelf loudspeakers to play a background soundtrack that would better replicate the ambient sound of conversations typical of a museum experience. The soundtrack consisted of two recordings, of about two minutes each, played in sequence and looped continuously. The background sound pressure level was set to 59 dB(A) (decibels, A-weighted), which is consistent with the sound level measured at a local art museum. The PING sounds were set to a sound pressure level of approximately 60 dB(A). This combination of levels provides a realistic representation of what can be expected in a museum.
To account for differences in participant ability and to provide multiple data sets, three different participants travelled each path. The paths each participant travelled were randomly selected. At the start of the experiment, a description of the test was given to each user and each was given a chance to become familiar with the PING system. The participant then activated the system, listened to the recorded instructions and proceeded to travel from one beacon to the next. As the participant walked the route, a researcher followed behind and marked the path travelled, on the floor, using a water-based marker.
After all five legs of the path had been completed each participant was assisted in returning to the starting point. This process was then repeated for all sounds assigned to the participant and for all participants. At the end of each path travelled, participants were asked about their subjective responses to the sounds they had just heard. The participants were asked to rank subjectively the pleasantness of the sounds on a scale of 1 – 3, where 1 was unpleasant or unable to locate, 2 was pleasant or somewhat easy to locate, and 3 was very pleasant or very easy to locate. They were also asked to comment on their ability to identify the location of the sound source. Any other comments by the participant regarding the sound were also noted.
When all of the participants had completed the experiment, the research team followed the ideal paths, which had been marked on the floor prior to testing, and identified the extreme points of travel deviation along the right and left-hand sides of the path. These extreme points marked an ‘envelope of travel’ for each path leg. A professional land surveyor was then brought in to map the rooms, beacon locations, ideal paths, and path envelopes using a laser-surveying device. Using the data gathered, the surveyor was able to create an AutoCAD® drawing for each path. An example is presented in Figure 4, where the beacons are identified by PING #, the ideal path is identified by three parallel lines between beacons, and the travel envelope is identified by the shaded region.
To develop a ranking for navigability, a comparison of travelled areas was used. Analysis of the paths travelled by the participants was possible by using the CAD drawings developed during the testing phase. First, the ‘ideal area’ of an ‘ideal path’ was measured as being 0.31 metre (one-foot) on either side of a straight line between beacons for the length of the path leg - a leg being from one beacon to the next. Second, the area outside of the ‘ideal path’ was calculated. Third, a composite area was also calculated as the sum of the left and right areas. Only the last three legs of each path located in room ‘A’ for each sound were considered. A rank, from one (high) through 12 (low), was assigned to each path based on the composite area, with a small area ranking higher than a large area.
Since the paths were not of equal lengths, it was thought that using the composite area to rank the sounds would cause longer paths to rank lower than shorter paths. Therefore, the standard deviation of the area, of the last three legs, was calculated for the left and right sides.
A new composite rank was then calculated. When the composite standard deviation of area rank was compared with the composite area rank, the results for the top 12 sounds closely agreed. The close agreement between the ranking of the sounds based on composite area and composite standard deviation of the areas argues against any concern that path length affected performance. A simple visual examination of the plotted path areas showed that short path legs did not necessarily produce less deviation from the ‘ideal path’ than long path legs, or vice versa.
It is interesting to note, that in the majority of cases, users typically showed a distinct pattern to their movement: departure, correction, and arrival. Starting with their departure from a given location, the users would ‘ping’, or activate the system, and then begin travelling towards where they believed the next beacon was located, possibly ‘pinging’ several more times while walking. At a point between beacons (often one-half to two-thirds the distance) users would stop walking, ‘ping’ again, correct their direction of travel, and continue towards the beacon possibly ‘pinging’ several more times along the way. A similar pattern was demonstrated by participants, one of whom was blind, in the study conducted by Loomis, Hebert, and Cicinelli (1990). It appears that this pattern of movement is a common and sensible approach to navigation using sound by people with or without vision impairments. When the users arrived at where they believed the beacon to be located, in almost every case, they would ‘ping’ one final time and then feel for the beacon to assure themselves that they were in the correct location. While this pattern was not used to rank the sounds, it does show a common approach between participants in using the system. Furthermore, an examination of the users’ paths shows that, although very long distances between beacons frequently required several corrections, all users successfully reached their destinations.
From Experiment 1, the 12 tested sounds were ranked based on the composite scores for the travelled area and the standard deviation of the travelled area as determined during data analysis. The resulting ranking is presented in Table 2. The subjective scores given by the participants were averaged and constituted a ranking for the sound. The subjective responses are important because they relate to how comfortable the participants were in their ability to navigate using the different sounds. For instance, one particular sound might have resulted in good navigation for a particular participant, but the participant himself might have felt unsure or unclear that he was on the right path, and was thus, uncomfortable in using the system. The resulting subjective ranking is presented in Table 3.
This experiment was to determine which sounds were most localisable and at what volumes (sound pressure levels). To accomplish this task, room ‘A’ shown in Figure 2 was used. Software was used to create three new sets of *.wav files representing a high volume (peak level = 65 dB), a medium volume (peak level = 55 dB), and low volume (peak level = 45 dB) for each of the 12 sounds. These sound levels and the setup are again similar to experiments by Rakerd and Hartmann (1986) in which two sound pressure levels were compared, 65 dB(A) and 40 dB(A). For each participant a unique playlist of sounds was created that randomised the sound heard, the volume, and the beacon location. The list assured that multiple participants would hear each sound to provide a better sample set of data for analysis.
A layout similar to that used and researched by Hartmann (1983) was used. A listening position was identified at one end of the room and five beacons were placed in semi-circular fashion 9.14 metres (30 feet) away and with 1.22 metres (4 foot) spacing between beacons. Subjects were asked to use a laser pointer to identify the location of the sound beacons as the sounds were played. Chalkboards behind the beacons were marked with vertical lines in 0.30-metre (one-foot) increments so that an approximate horizontal pointing error of the laser could be measured. If the user identified the location within 0.30-metre (one-foot) left or right of the beacon, the response was considered accurate with no error. This range of accuracy was decided upon after noting the difficulty of maintaining the laser dot at an exact location due to slight hand motions.
At the time of testing, the participant stood at the listening position; used the cordless phone to ‘ping’ the beacon, and with a laser pointer pointed in the direction from which he believed the sound to originate. Participants were given an opportunity to practice using a sound/volume combination not included on their playlist. They were allowed to ‘ping’ as many times as necessary until confident of the location. A record of the number of ‘pings’ was made. For all participants the laser pointer was taped to their index finger to improve accuracy. The same background recordings of ambient sounds used during the navigability experiment were also used for this experiment at a sound pressure level of 60.8 dB(A). Figure 5 presents the experimental setup.
The scores for this experiment were based upon the number of ‘pings’ the participant required to be confident of the sound source location, the accuracy of the participant in localising the sound source and the sound level. The first steps were to calculate the average number of ‘pings’ required by the participants for the sounds they heard and their average localisation accuracy (in feet). Then a raw score for each sound at each sound level was calculated as the sum of the two averages. Using the raw score the sounds were ranked based on sound pressure level (volume) in the categories of high volume, medium volume, and low volume (where a low score was better than a high score) and overall with respect to all sounds and levels.
In the museum environment, the PING system must not be disturbing to other patrons, hence, the lower the PING volume the better. In order to account for the differences in sound pressure levels between the sounds, the A-weighted equivalent sound pressure level (Leq) was measured for each sound over 20 seconds. As a means of penalising louder sounds, the raw score was multiplied by the Leq value to create a new weighted score and rank.
From Experiment 2, the 12 tested sounds were ranked based on the weighted scores for each sound as determined during the data analysis. The resulting ranking is presented in Table 4.
The final experiment was designed to see how well the participants could recognise an assigned ‘ping’ against a background of other ‘ping’ sounds. In a museum there might be multiple users activating the PING system at the same time, therefore, the user must be able to pick out their ‘ping’ from among the other user’s ‘pings’.
The reverberation chamber and loudspeaker array (shown previously in Figure 1) was used again for this experiment. However, the setup changed from that used in the screening experiment.
A computer program was developed in LabView for automatically generating a random list of sounds at a rapid pace. The program was designed to play the 12 sounds in a random order; record a list of the order played and indicates if the participant responded to the sound. In addition, a background speaker played another set of random PING sounds so that the sounds overlapped.
The participant sat in the reverberation chamber and held a computer mouse. The computer program would randomly select one of the 12 sounds to play. The computer soundcard output was connected to five stereo amplifiers. Each amplifier then split the signal two more times (left and right audio channels). A total of nine of the array loudspeakers were used. All nine loudspeakers played at the same time creating a non-directional sound field relative to the participant seating location. Located at the rear of the chamber, directly in front of and facing away from the participant, was a single three-way loudspeaker. The three-way loudspeaker was connected via another receiver to a different PC. The second computer was used to play a different set of sounds in random order, minus the participant’s assigned ‘ping’. Since the second playlist did not contain the sound that the participant was listening for, it was assured that the only time the assigned sound was heard was if it had been played from the LabView program.
Both computers began playing their respective playlists at the same time. When the participant heard his ‘ping’ played, he clicked the mouse, which recorded a response in the LabView program. Each participant listened for each of the sounds at a single volume level in separate trials. The LabView program generated text files for each participant and each sound that indicated when each sound played and when the participant responded. These text files were then converted into spreadsheet files for analysis.
Again, the sounds were scored. In this case, the total number of times the sound was played for all participants was determined from the LabView output files generated during the experiment, along with the total number of correct, missed, and incorrect responses. A correct response was one to which a participant indicated a response when the desired sound was played, a missed response was one to which the participant did not respond when the desired sound was played, and an incorrect response was one to which the participant responded to a different sound. The percentages of missed and incorrect responses were subtracted from the percent of correct responses for each sound and the result was the sound’s score.
From Experiment 3, the 12 tested sounds were ranked based on their percentage score as determined during the data analysis. The resulting ranking is presented in Table 5.
The ultimate goal of this research was twofold: to develop a series of experiments to be used in identifying ‘good’ attractor sounds based on navigability, localisation, recognition, and likeability and use these experiments to recommend a collection of sounds for the PING library; second, to identify any common characteristics that could aid in selecting future sounds for the library. Data collection during physical testing and its subsequent analysis resulted in a ranking system that identifies ‘good’ sounds from within a larger group.
However, the small number of participants involved in the study substantially limits the generalisability of the research. That being said, the experiments were approached with this in mind and provisions were made to reduce the potential for the performance of a participant to disproportionately influence the outcome. Such provisions included having multiple participants evaluate each sound, calculating standard deviation, and adapting techniques similar to those used by other researchers. The limitation on drawing broader generalisations does make it more difficult to identify trends within the data that may identify characteristics of ‘good’ versus ‘bad’ sounds as applied to the PING system.
In an attempt to determine whether or not there were common characteristics associated with the best attractor sounds, the sounds were examined and compared within their time- and frequency-domains. One trend that stands out from the majority of sounds (‘good’ and ‘bad’) is a concentration of intensity within the 500 Hz to 2000 Hz octave bands. Experiments by Blauert (1969/70) showed the importance of lower-frequency content for vertical localisation. Blauert’s (1969/70) test revealed that content in the third-octave bands between 125 Hz and 500 Hz are important for correctly locating sound sources in front of the listener and from 630 Hz to 2000 Hz for sound sources behind the listener. The importance of frequency content between 1 kHz and 3 kHz for recognition within the median sagittal plane was reported by Rakerd, Hartmann, and McCaskey (1999). Since few of the tested sounds exhibited frequency content above 10 kHz, it is believed (based on the data available) that lower frequencies are also essential to localisation, recognition, and navigation in the horizontal plane for reverberant environments when head movements are allowed.
Head movements (dynamic cues) are thought to be important in resolving the front/back confusions that can arise because of the influence of these frequencies in the head-related transfer function. While the work of Middlebrooks and Green (1991, p. 19) concluded that head movements were “probably not a critical part of the localisation process, except in cases where time permits a very detailed assessment of location…”. The length of sounds tested here were much greater than the one-second used in their study and therefore do permit “a very detailed assessment of location” (Middlebrooks & Green, 1991, p. 19) a requirement for dynamic cues to be effective.
Another trend that emerged from the analysis is that many of the top ranked sounds had quick onsets and quick decays. The benefit of a rapid onset in a reverberant environment was shown by Rakerd and Hartmann (1986) and participants reported during the current research that sounds with a shorter decay tended to cause fewer reflections from surrounding surfaces and, thus, were easier to localise. The navigation experiments conducted by Loomis, Hebert, and Cicinelli (1990) also used a sound with a rapid onset and decay (a square wave) with positive results.
The current experiments also revealed a close agreement between sounds that were ranked highly in localisation and recognition. Previous work found that correct localisation does not necessarily result in correct recognition in the vertical plane (Rakerd, Hartmann, & McCaskey, 1999). The current work raises the question of whether or not correct localisation results in correct recognition (or vise versa) in the horizontal plane. However, this question was not addressed in the design of the experiments presented here.
After having completed all experiments and analysis, and using the information in Tables 2 through 5, the top six ranking sounds in each test were compared across the three tests as shown in Table 6. From this table and those shown previously, seven sounds were identified as suitable for addition to the PING catalog. Although all 12 sounds resulted in adequate performance by the participants, the seven sounds selected ranked higher in the testing than the other five sounds and should provide a better experience to users of the PING system. The selected sounds can be organised into three groups based on their rate of recurrence of high ranks.
Group A includes sounds that ranked within the top six in all three objective experiments (sounds 5, 6, and 10). Group B includes sounds that ranked within the top six in the subjective responses and multiple objective experiments (sound 3). Sounds that ranked within the top six in any two experiments are included in Group C (sounds 1, 9, and 11). Sound 7 was not included in the final selection, despite its high ranking in some experiments, because all researchers involved and most participants found it to be exceedingly unpleasant. It was therefore recommended that the sounds identified in Table 7 be added to the PING catalog at the volume levels indicated. Of the recommended sounds, three sounds are birdcalls, one is a familiar tune from television (the Jetson’s doorbell), and three are some other type of bell.
The research discussed here has been applied by Touch Graphics to the PING audio navigation system. Trials of the PING system that incorporate the recommended sounds were conducted in the fall of 2006 at the New York Hall of Science, a hands-on science museum in New York City. The trial was a rousing success, with several users easily navigating the museum simultaneously.
Two of the major limitations of this research involved the amount of time needed to conduct the physical experimentation and the limited number of participants in the experiments. It was hoped that analysis of the time- and frequency-domain data would lead to clear trends that would distinguish the ‘good’ from the ‘bad’ sounds and that the information could be used in the future to better select sounds at the outset, thus reducing the amount of physical participation required. That was not the case, although some trends within the data were observed.
Future research should consider two areas to address this issue: investigation into frequency characteristics within the 500 – 2000 Hz octave bands and the relationship between localisation and recognition. An examination focusing on the frequency structure within the aforementioned bands of a larger number sounds with a larger number of participants could yield highly useful information for the selection of attractor sounds in the future.
In the current study a strong, positive relationship between localisation and recognition in the horizontal plane resulted. Five of the top six ranked sounds were the same for both localisation and recognition. Such a relationship has not been found to be previously reported in literature. This relationship between localisation and recognition may render one experiment or the other (localisation or recognition) unnecessary, but further research is needed with a larger sample set before that decision can be made.
Finally, it would be desirable to develop a model similar to that discussed by Gilkey and Anderson (1997). Such a model could be used to develop a computer program that could be used to predict accurately the performance of a potential attractor sound, without the need for human participants. A similar program would save significant time compared to physical testing. To create such a program will require a better understanding of the processes of localisation, recognition, and navigability of ‘real-world’ sounds that this work has begun to explore.
The authors would like to thank the following: Steve Landau of Touch Graphics, New York for providing the funding and PING! system used for this research and for his assistance in setting up and operating the system. John Stahl, graduate student at Western Michigan University (WMU), for developing the LabView program used in Experiment 3. Jay Pliskow, assistant for the Noise and Vibration Lab at WMU, for helping compile the data. The Department of Blindness and Low Vision studies at WMU for identifying students with vision impairments, interested in participating in this research. And finally, all of the participants who gave their time to assist in this study.