By Topic

Audio and Electroacoustics, IEEE Transactions on

Issue 3 • Date June 1973

Filter Results

Displaying Results 1 - 25 of 33
  • [Front cover and table of contents]

    Publication Year: 1973 , Page(s): 0
    Save to Project icon | Request Permissions | PDF file iconPDF (173 KB)  
    Freely Available from IEEE
  • Guest editorial

    Publication Year: 1973 , Page(s): 133
    Save to Project icon | Request Permissions | PDF file iconPDF (131 KB)  
    Freely Available from IEEE
  • Introduction at award lunch

    Publication Year: 1973 , Page(s): 134
    Save to Project icon | Request Permissions | PDF file iconPDF (78 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 1973 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (2159 KB)  
    Freely Available from IEEE
  • A vocal data management system

    Publication Year: 1973 , Page(s): 185 - 188
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB)  

    This paper describes an implementation strategy for a vocal data management system (VDMS) being developed by the voice input/output project at the System Development Corporation. VDMS will accept connected speech of a language describable by 25-50 phrase equations and having a vocabulary of approximately 1000 words formed from about 100 data records. The strategy is based on the concept of predictive linguistic constraints (PLC). The present concepts of fixed directionality in parsing are replaced by a more generalized approach. To facilitate this flexibility, the system comprises a set of near-independent coroutines that are interconnected by a software busing structure. The VDMS acoustic processors verify the predictions. Very loose matching criteria are used for locating the predicted words. Special attention is given to word segments that are experimentally determined to be most invariant. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A model and a system for machine recognition of speech

    Publication Year: 1973 , Page(s): 229 - 238
    Cited by:  Papers (17)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1208 KB)  

    This paper presents a model for machine recognition of connected speech and the details of a specific implementation of the model, the HEARSAY system. The model consists of a small set of cooperating independent parallel processes that are capable of helping in the decoding of a spoken utterance either individually or collectively. The processes use the "hypothesize-and-test" paradigm. The structure of HEARSAY is illustrated by considering its operation in a particular task situation: voice-chess. The task is to recognize a spoken move in a given board position. Procedures for determination of parameters, segmentation, and phonetic descriptions are outlined. The use of semantic, syntactic, lexical, and phonological sources of knowledge in the generation and verification of hypotheses is described. Preliminary results of recognition of some utterances are given. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Listener performance in speaker verification tasks

    Publication Year: 1973 , Page(s): 221 - 225
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB)  

    The ability of listeners to perform some speaker verification tasks has been measured experimentally and compared with the performance of an automatic system for speaker verification. A test presentation in the subjective experiments consists of a pair of utterances. One of these is drawn from the recordings of a group of speakers designated customers while the second utterance is either a distinct recording from the same customer or the recording of an impostor. Listeners must respond whether the utterances are from the same or different speakers. The impostor classes that have been considered are casual impostors making no attempt to mimic customers, trained professional mimics, and an identical twin of a customer. Listener performance is specified by the two types of error that can be committed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Approach to syntactic recognition without phonemics

    Publication Year: 1973 , Page(s): 249 - 258
    Cited by:  Papers (11)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1200 KB)  

    Linguistic and perceptual arguments suggest that, in speech recognition systems, syntactic hypotheses should be formed before phonemic segments are identified. Prosodic features can provide some cues to constituent structure. In a variety of texts and excerpts from conversations, spoken by several talkers, a decrease in voice fundamental frequency (F0) usually occurred at the end of each major syntactic constituent, and an increase in F0occurred near the beginning of the following constituent. A computer program based on this regularity correctly detected over 80 percent of all syntactically predicted boundaries. Some boundaries between minor constituents were also detected by the fall-rise patterns in F0. False boundary detections resulted from F0variations at boundaries between vowels and consonants, but most such false alarms could be eliminated by setting a minimum percent variation in F0for a boundary detection. Sentence boundaries were accompanied by large F0increases and substantial pauses. The categories of constituents affect boundary detection results, with noun phrase-verbal sequences showing particularly infrequent detection. Prosodic cues to stress patterns and stress-to-syntax rules may be used to detect other aspects of syntactic structure. Syntactic structure hypotheses might then be used to guide phonetic recognition procedures within constituents. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reading machines for the blind:The technical problems and the methods adopted for their solution

    Publication Year: 1973 , Page(s): 259 - 264
    Cited by:  Papers (6)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB)  

    In order to assess current efforts devoted to reading machine design, it is first necessary to develop a set of requirements for an ideal device. Direct translation aids are then seen to lack several of these desirable features, and more general, linguistically based techniques are then examined. Structural properties of English are found to be obtainable from the orthographic representation, and these abstract relations can then be used to infer structural correlates in the output speech waveform. Current knowledge is centered largely at the word level, but several correlates of higher order units have been studied and rules for their behavior have been implemented in working systems. Finally, direct assessment of speech synthesized by rule has shown that even currently available techniques can yield speech acceptable to blind users. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic synthesis from ordinary english test

    Publication Year: 1973 , Page(s): 293 - 298
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB)  

    We summarize work between 1969 and 1972 in a continuing project With two objectives: to produce acceptable synthetic speech directly from English text; and to demonstrate with speech synthesis a detailed model of human articulatory movements. Work in the four-year period has yielded moderately accurate rules for predicting the occurrence of pauses and lesser breaks in the sentence; rules for vowel duration in many conditions, not just primary stressed syllables immediately before a pause; rules for contextual variations of consonants; and rules for durational and other allophonic variations on consonants at word boundaries. Presently we are studying natural speech to quantify and add detail to these rules, and we are working to extend the vocal tract model to closer agreement with human articulation and vocal cord control. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delta modulation of pitch, formant, and amplitude signals for the synthesis of voiced speech

    Publication Year: 1973 , Page(s): 135 - 140
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    A computer simulation was performed to demonstrate the feasibility of delta modulation (DM) as a simple alternative to pulse-code modulation (PCM) for encoding the control signals of a voiced-speech synthesizer. Quantized signals representing the time variations of pitch period, amplitude, and the first three formant frequencies, all band limited to 16 Hz, were available in a 1500-b/s PCM format. Each of the five signals was over sampled at 100 Hz for delta encoding, resulting in a representation at 500 b/s (an information rate utilized recently for an adequate PCM representation of the control signals). Low-pass filtered versions of the DM signals were used to synthesize the original all-voiced utterance with a quality very close to that obtaining in the original 1500-b/s system. Both "linear" and "adaptive" delta modulators were considered; in the latter case, the step size is adapted continuously to the changing slope statistics of an input signal and this provides more efficient encoding. When additional band limiting was applied to the original control signals, resulting in a 700-b/s representation adaptive DM at 250 b/s was sufficient to encode the information without further degradation of the synthetic speech. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Subjective evaluation of differential pulse-code modulation using the speech "Goodness" rating scale

    Publication Year: 1973 , Page(s): 179 - 184
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (720 KB)  

    The objectives of this investigation were twofold: 1) to demonstrate the utility of the nine-point speech "goodness" rating scale as a method for scaling user opinion of speech quality, and 2) to use this method to determine optimum parameters for differential pulse-code modulation (DPCM) systems with bit rates from 25.6 to 51.2 kb/s. Fifteen DPCM and pulse-code modulation (PCM) systems were simulated on a digital computer. The parameters investigated included the tradeoff between bandwidth and number of quantization levels, and the number of taps in the DPCM predictor network. A total of 248 ratings were obtained from 31 trained listeners for each of the systems under consideration. Both the intra- and interrater reliability of these data, as obtained from the speech "goodness" rating scale, were found to be greater than 0.95. Results indicated that: 1) at any bit rate, DPCM is significantly better than PCM; 2)DPCM with a three-tap predictor is not significantly better than DPCM with a one-tap predictor; 3)between 2.4 and 4.3 kHz changes in bandwidth are inconsequential in terms of user opinion; and 4) the number of quantization bits appears to be the primary determinant of speech quality judgment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis

    Publication Year: 1973 , Page(s): 165 - 174
    Cited by:  Papers (43)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1016 KB)  

    This paper discusses the theoretical basis for representation of a speech signal by its short-time Fourier transform. The results of the theoretical studies were used to design a speech analysis-synthesis system which was simulated on a general-purpose laboratory digital computer system. The simulation uses the fast Fourier transform in the analysis stage and specially designed finite duration impulse response filters in the synthesis stage. The results of both the theoretical and computational studies lead to an understanding of the effect of several design parameters and elucidate the design tradeoffs necessary to achieve moderate information rate reductions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An electrotactile sound detector for the deaf

    Publication Year: 1973 , Page(s): 285 - 287
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (576 KB)  

    The electrotactile sound detector described here is designed to enable deaf persons to detect and localize sounds. Two microphones are worn bilaterally on the head, the sounds received are converted to electrical pulses, and the pulses are fed to two electrodes applied to the forehead. Differences in intensity of the pulses permit the wearer to localize the source of a sound. Additional information is furnished about the rhythmic patterning of sounds. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discrete-word recognition utilizing a word dictionary and phonological rules

    Publication Year: 1973 , Page(s): 239 - 249
    Cited by:  Papers (4)  |  Patents (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1192 KB)  

    A discrete-word recognition system utilizing a word dictionary and phonological rules is described. In this system, nine distinctive features are extracted from a discrete-word input. Segmentation is performed using these features. Segmentation errors are corrected by applying a phoneme connecting rule. The input word is transformed into an input feature matrix. The comparison of this matrix with the standard derived from the dictionary is performed in the feature (matrix) space. Another method of segmentation is also described in which segmentation is performed using a duration dictionary. The effectiveness of utilizing a word dictionary and phonological rules in automatic discrete-word recognition is discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Some experiments on the control of voice in the profoundly deaf using a pitch extractor and storage oscilloscope display

    Publication Year: 1973 , Page(s): 274 - 278
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB)  

    A visual pitch display is described. This extracts fundamental frequency by low-pass filtering and displays frequency as a function of time on a storage oscilloscope. Three studies with deaf children are described. In the first it is found that the subjects have poor voluntary pitch control, despite generally good oral skills. In the second it is shown that simple pitch control can be learned quickly with the use of the visual display. However, traditional noninstrumental techniques are found to be almost as effective. In the third study, a profoundly deaf girl learns to control voice register and to generate acceptable intonation patterns within words and sentences. Unfortunately, this has no immediate effect on her everyday communicative speech. It is suggested that would-be designers of speech processing aids for the deaf should take note of two of the implications of these findings: 1) instrumental techniques may offer little or no advantage over properly applied traditional techniques for the teaching of certain skills; and 2) the process of modifying a child's communicative behavior is a complex one in which the technical aid has only a limited role to play. Unless use of the speech processing aid is incorporated in a meaningful way into an effective total program, its benefits are likely to be minimal. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A plan for the field evaluation of an automated reading system for the blind

    Publication Year: 1973 , Page(s): 265 - 268
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (528 KB)  

    After more than two decades of research it is now possible to construct a high-performance reading system for the blind that will produce synthetic speech from printed text. The entire process can be carried out automatically by computer and associated special-purpose devices. As a first step toward the eventual deployment of a reading system, we have begun an evaluation study in collaboration with faculty and students at the University of Connecticut and with trainees at the Veterans Administration Eastern Blindness Rehabilitation Center. Questions to be answered concern the comprehensibility and educational uses of the output and the technical and economic resources required to make automated reading services accessible to progressively larger groups of blind people. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time pitch extraction by adaptive prediction of the speech waveform

    Publication Year: 1973 , Page(s): 149 - 154
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB)  

    With the exception of relatively sophisticated methods such as cepstrum analysis, the problem of reliable pitch-period extraction has remained largely unsolved. This paper examines the feasibility of pitch-period extraction by means of the nonstationary error process resulting from adaptive-predictive quantization of speech. A real-time hard-ware system that may be realized at low cost is described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An audio response unit for telephone needs

    Publication Year: 1973 , Page(s): 291 - 292
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    An audio response unit has been built that synthetizes messages composed of a fixed sentence and any number from 0-999 999. The method used is synthesis by concatenation of words, and automatic corrections on pitch and rhythm are used to improve naturalness and intelligibility. The synthesizer is a part of a channel vocoder. This audio response unit is to be used in a telephone exchange to answer cost inquiries from the subscribers. Larger applications using touch-tone telephone as data input device are under consideration. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dichotic signs of the recognition of speech elements in normals, temporal lobectomees, and hemispherectomees

    Publication Year: 1973 , Page(s): 189 - 195
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (824 KB)  

    When patients with hemispherectomies or temporal lobectomies listen to dichotic pairs of equal-intensity C-V syllables, they do poorly identifying the stimuli presented to the ear contralateral to the lesion. This effect is similar to that seen for normals, who in the same circumstances, perform poorly on the left-ear stimulus. (The ear contralateral to a lesion for patients and the left ear for normals will be designated the "weak ear", the ear ipsilateral to a patient's lesion and the right ear for normals will be called the "strong ear".) To further explore these phenomena we investigated the ability of stimuli other than C-V's in the strong ear to suppress the perception of C-V's in the weak ear. Suppression was found when the strong-ear stimulus was a vowel. Somewhat more suppression was seen when the strong-ear stimuli were computer-generated signals with acoustic features similar to C-V's ("bleats"). Suppression was seen even if the strong-ear vowels and bleats were 20-40 dB less intense than the syllables in the weak ear. A model was developed that interprets weaker suppression to be, in part, a consequence of the interaction of the auditory features of the dichotic signals prior to phonetic processing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the automatic recognition of continuous speech:Implications from a spectrogram-reading experiment

    Publication Year: 1973 , Page(s): 210 - 217
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1040 KB)  

    An experiment was performed in which the authors attempted to recognize a set of unknown sentences by visual examination of spectrograms and machine-aided lexical searching. Ninteen sentences representing data from five talkers were analyzed. An initial partial transcription in terms of phonetic features was performed. The transcription contained many errors and omissions: 10 percent of the segments were omitted, 17 percent were incorrectly transcribed, and an additional 40 percent were transcribed only partially in terms of phonetic features. The transcription was used by the experimenters to initiate computerized scans of a 200-word lexicon. A majority of the search responses did not contain the correct word. However, following extended interactions with the computer, a word-recognition rate of 96 percent was achieved by each investigator for the sentence material. Implications for automatic speech recognition are discussed. In particular, the differences between the phonetic characteristics of isolated words and of the same words when they appear in sentences are emphasized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On transient distortion in hearing aids with volume compression

    Publication Year: 1973 , Page(s): 279 - 285
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1008 KB)  

    The influence of harmonic distortion on the performance of ordinary hearing aids is discussed, as well as several kinds of possible distortions found in the transient state when compression is introduced. Transient response was determined for several hearing aids with compression that are available on the market. It is shown that each transient response is composed of two parts: the first part is determined by the frequency response of the whole transmission channel; the second one by the transient response of compression. The latter may cause overshoots, during which large distortions are found in some hearing aids. In others, a superposition of, or a modulation by, the damped low-frequency oscillation was observed at the output. Examples of such distortion are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech processing with Walsh-Hadamard transforms

    Publication Year: 1973 , Page(s): 174 - 179
    Cited by:  Papers (13)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB)  

    High-speed algorithms to compute the discrete Hadamard and Walsh transforms of speech waveforms have been developed. Intelligible speech has been reconstructed from dominant Hadamard or Walsh coefficients on a medium sized computer in a non-real-time mode. Degradation of some phonemes was noted at low bit rates of reconstruction, but the reconstruction could be improved by varying the position of the sampling window. A digital processor, which allows real-time analysis of speech to be conducted on the system, is described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech processing aids for the deaf:An overview

    Publication Year: 1973 , Page(s): 269 - 273
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (688 KB)  

    Two major obstacles have hindered progress in the development of speech processing aids for the deaf. The first is a lack of basic knowledge as to how speech is acquired, produced, and perceived. The second is a paucity of objective, evaluative data on potentially useful aids. This paper reviews progress in the development of speech processing aids, both for speech perception and for speech training. Progress with training aids is quite promising and reasonably positive evaluative data are currently being obtained. The use of synthetic speech as a research tool in simulating speech problems is discussed and preliminary simulation data are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral analysis of speech by linear prediction

    Publication Year: 1973 , Page(s): 140 - 148
    Cited by:  Papers (29)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (928 KB)  

    The autocorrelation method of linear prediction is formulated in the time, autocorrelation, and spectral domains. The analysis is shown to be that of approximating the short-time signal power spectrum by an all-pole spectrum. The method is compared with other methods of spectral analysis such as analysis-by-synthesis and cepstral smoothing. It is shown that this method can be regarded as another method of analysis-by-synthesis where a number of poles is specified, with the advantages of noniterative computation and an error measure which leads to a better spectral envelope fit for an all-pole spectrum. Compared to spectral analysis by cepstral smoothing in conjunction with the chirp z transform (CZT), this method is expected to give a better spectral envelope fit (for an all-pole spectrum) and to be less sensitive to the effects of high pitch on the spectrum. The normalized minimum error is defined and its possible usefulness as a voicing detector is discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This Transactions ceased production in 1973. The current retitled publication is IEEE Transactions on Signal Processing.

Full Aims & Scope