By Topic

Audio and Electroacoustics, IEEE Transactions on

Issue 3 • Date June 1973

Filter Results

Displaying Results 1 - 25 of 33
  • [Front cover and table of contents]

    Publication Year: 1973 , Page(s): 0
    Save to Project icon | Request Permissions | PDF file iconPDF (173 KB)  
    Freely Available from IEEE
  • Guest editorial

    Publication Year: 1973 , Page(s): 133
    Save to Project icon | Request Permissions | PDF file iconPDF (131 KB)  
    Freely Available from IEEE
  • Introduction at award lunch

    Publication Year: 1973 , Page(s): 134
    Save to Project icon | Request Permissions | PDF file iconPDF (78 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 1973 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (2159 KB)  
    Freely Available from IEEE
  • An electrotactile sound detector for the deaf

    Publication Year: 1973 , Page(s): 285 - 287
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (576 KB)  

    The electrotactile sound detector described here is designed to enable deaf persons to detect and localize sounds. Two microphones are worn bilaterally on the head, the sounds received are converted to electrical pulses, and the pulses are fed to two electrodes applied to the forehead. Differences in intensity of the pulses permit the wearer to localize the source of a sound. Additional information is furnished about the rhythmic patterning of sounds. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A system for converting english text into speech

    Publication Year: 1973 , Page(s): 288 - 290
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB)  

    The feasibility of converting English text into speech using an inexpensive computer and a small amount of stored data has been investigated. The text is segmented into breath groups, the orthography is converted into a phonemic representation, lexical stress is assigned to appropriate syllables, then the resulting string of symbols is converted by synthesis-by-rule into the parameter values for controlling an analogue speech synthesizer. The algorithms for performing these conversions are described in detail and evaluated independently, and the intelligibility of the resulting synthetic speech is assessed by listening tests. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of various parameter sets in spoken digits recognition

    Publication Year: 1973 , Page(s): 202 - 209
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB)  

    Various parameter sets-including a spectrum envelope, cepstrum, autocorrelation function, linear predictive coefficients, and partial autocorrelation coefficients (PAC's)- are evaluated experimentally to determine which constitutes the best parameter in spoken digit recognition. The principle of recognition is simple pattern matching in the parameter space with nonlinear adjustment of the time axis. The spectrum envelope and cepstrum attain the best recognition score of 100 percent for ten spoken digits of a single-male speaker. PAC's seem to be preferable because of their ease of extraction and theoretical orthogonalities; however, these PAC's tend to suffer from computation errors when computed by fixed-point arithmetic with a short accumulator length. We find two effective means to improve the errors; one is variable use of the PAC dimensions controlled by computation accuracy, and the other is smoothing along the time axis. With these improvements the PAC's offer almost 100 percent recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advantages of experienced listeners in intelligibility testing

    Publication Year: 1973 , Page(s): 161 - 165
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (704 KB)  

    The use of a highly experienced, permanent panel of listeners has eliminated some of the variables of intelligibility testing and minimized many of the other objections to such testing. Variables such as training time, learning, and personnel changes are no longer significant. In addition, the problems of individual listener variation and less tangible questions of motivation and fatigue have been minimized. Consequently, measurements with low statistical variation and high repeatability have resulted; typically, less than 2 percent difference in intelligibility score has resulted from identical conditions measured nine months apart. This high reliability has also been noted in other psychoacoustic experiments in which the listeners have participated. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application of sequential decoding for converting phonetic to graphic representation in automatic recognition of continuous speech(ARCS)

    Publication Year: 1973 , Page(s): 225 - 228
    Cited by:  Papers (5)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (504 KB)  

    Following segmentation and phonetic classification in automatic recognition of continuous speech (ARCS), it is necessary to provide methods for linguistic decoding, In this work a graph search procedure, based on the Fano algorithm, is used to convert machine-contaminated phonetic descriptions of speaker performance into standard orthography. The information utilized by the decoder consists of a syntax, a lexicon containing transcription variation for each word, and performance-based statistics from acoustic analysis. The latter contain information related to automatic segmentation and classification accuracy and certainty (anchor-point) data. A distinction is made between speaker- and machine-dependent corruption of phonetic input strings. Preliminary results are presented and discussed, together with some considerations for evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discrete-word recognition utilizing a word dictionary and phonological rules

    Publication Year: 1973 , Page(s): 239 - 249
    Cited by:  Papers (4)  |  Patents (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1192 KB)  

    A discrete-word recognition system utilizing a word dictionary and phonological rules is described. In this system, nine distinctive features are extracted from a discrete-word input. Segmentation is performed using these features. Segmentation errors are corrected by applying a phoneme connecting rule. The input word is transformed into an input feature matrix. The comparison of this matrix with the standard derived from the dictionary is performed in the feature (matrix) space. Another method of segmentation is also described in which segmentation is performed using a duration dictionary. The effectiveness of utilizing a word dictionary and phonological rules in automatic discrete-word recognition is discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer

    Publication Year: 1973 , Page(s): 298 - 305
    Cited by:  Papers (32)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (952 KB)  

    A computer-simulated parallel formant synthesizer has been used to copy short samples of human speech. It is possible to make the synthetic speech almost indistinguishable from the natural in spectrum, waveform, and by earphone listening, provided that the synthetic glottal pulse is derived by inverse filtering a typical natural vowel from the same talker. Various other pulse shapes have been tried, such as the combination of cosine segments suggested by various workers as a close approximation to human glottal pulses. For producing speech acceptable as natural, none of these idealized pulse shapes has been as successful as those derived by inverse filtering. However, the subjective differences are small compared with the differences that would be caused by reverberation when listening to a loudspeaker in an ordinary room with good acoustics; it has been demonstrated that under such listening conditions, the phase structure of glottal pulses is of no importance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral analysis of speech by linear prediction

    Publication Year: 1973 , Page(s): 140 - 148
    Cited by:  Papers (29)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (928 KB)  

    The autocorrelation method of linear prediction is formulated in the time, autocorrelation, and spectral domains. The analysis is shown to be that of approximating the short-time signal power spectrum by an all-pole spectrum. The method is compared with other methods of spectral analysis such as analysis-by-synthesis and cepstral smoothing. It is shown that this method can be regarded as another method of analysis-by-synthesis where a number of poles is specified, with the advantages of noniterative computation and an error measure which leads to a better spectral envelope fit for an all-pole spectrum. Compared to spectral analysis by cepstral smoothing in conjunction with the chirp z transform (CZT), this method is expected to give a better spectral envelope fit (for an all-pole spectrum) and to be less sensitive to the effects of high pitch on the spectrum. The normalized minimum error is defined and its possible usefulness as a voicing detector is discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delta modulation of pitch, formant, and amplitude signals for the synthesis of voiced speech

    Publication Year: 1973 , Page(s): 135 - 140
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    A computer simulation was performed to demonstrate the feasibility of delta modulation (DM) as a simple alternative to pulse-code modulation (PCM) for encoding the control signals of a voiced-speech synthesizer. Quantized signals representing the time variations of pitch period, amplitude, and the first three formant frequencies, all band limited to 16 Hz, were available in a 1500-b/s PCM format. Each of the five signals was over sampled at 100 Hz for delta encoding, resulting in a representation at 500 b/s (an information rate utilized recently for an adequate PCM representation of the control signals). Low-pass filtered versions of the DM signals were used to synthesize the original all-voiced utterance with a quality very close to that obtaining in the original 1500-b/s system. Both "linear" and "adaptive" delta modulators were considered; in the latter case, the step size is adapted continuously to the changing slope statistics of an input signal and this provides more efficient encoding. When additional band limiting was applied to the original control signals, resulting in a 700-b/s representation adaptive DM at 250 b/s was sufficient to encode the information without further degradation of the synthetic speech. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic synthesis from ordinary english test

    Publication Year: 1973 , Page(s): 293 - 298
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB)  

    We summarize work between 1969 and 1972 in a continuing project With two objectives: to produce acceptable synthetic speech directly from English text; and to demonstrate with speech synthesis a detailed model of human articulatory movements. Work in the four-year period has yielded moderately accurate rules for predicting the occurrence of pauses and lesser breaks in the sentence; rules for vowel duration in many conditions, not just primary stressed syllables immediately before a pause; rules for contextual variations of consonants; and rules for durational and other allophonic variations on consonants at word boundaries. Presently we are studying natural speech to quantify and add detail to these rules, and we are working to extend the vocal tract model to closer agreement with human articulation and vocal cord control. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dichotic signs of the recognition of speech elements in normals, temporal lobectomees, and hemispherectomees

    Publication Year: 1973 , Page(s): 189 - 195
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (824 KB)  

    When patients with hemispherectomies or temporal lobectomies listen to dichotic pairs of equal-intensity C-V syllables, they do poorly identifying the stimuli presented to the ear contralateral to the lesion. This effect is similar to that seen for normals, who in the same circumstances, perform poorly on the left-ear stimulus. (The ear contralateral to a lesion for patients and the left ear for normals will be designated the "weak ear", the ear ipsilateral to a patient's lesion and the right ear for normals will be called the "strong ear".) To further explore these phenomena we investigated the ability of stimuli other than C-V's in the strong ear to suppress the perception of C-V's in the weak ear. Suppression was found when the strong-ear stimulus was a vowel. Somewhat more suppression was seen when the strong-ear stimuli were computer-generated signals with acoustic features similar to C-V's ("bleats"). Suppression was seen even if the strong-ear vowels and bleats were 20-40 dB less intense than the syllables in the weak ear. A model was developed that interprets weaker suppression to be, in part, a consequence of the interaction of the auditory features of the dichotic signals prior to phonetic processing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time pitch extraction by adaptive prediction of the speech waveform

    Publication Year: 1973 , Page(s): 149 - 154
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB)  

    With the exception of relatively sophisticated methods such as cepstrum analysis, the problem of reliable pitch-period extraction has remained largely unsolved. This paper examines the feasibility of pitch-period extraction by means of the nonstationary error process resulting from adaptive-predictive quantization of speech. A real-time hard-ware system that may be realized at low cost is described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech processing with Walsh-Hadamard transforms

    Publication Year: 1973 , Page(s): 174 - 179
    Cited by:  Papers (13)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB)  

    High-speed algorithms to compute the discrete Hadamard and Walsh transforms of speech waveforms have been developed. Intelligible speech has been reconstructed from dominant Hadamard or Walsh coefficients on a medium sized computer in a non-real-time mode. Degradation of some phonemes was noted at low bit rates of reconstruction, but the reconstruction could be improved by varying the position of the sampling window. A digital processor, which allows real-time analysis of speech to be conducted on the system, is described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A model and a system for machine recognition of speech

    Publication Year: 1973 , Page(s): 229 - 238
    Cited by:  Papers (17)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1208 KB)  

    This paper presents a model for machine recognition of connected speech and the details of a specific implementation of the model, the HEARSAY system. The model consists of a small set of cooperating independent parallel processes that are capable of helping in the decoding of a spoken utterance either individually or collectively. The processes use the "hypothesize-and-test" paradigm. The structure of HEARSAY is illustrated by considering its operation in a particular task situation: voice-chess. The task is to recognize a spoken move in a given board position. Procedures for determination of parameters, segmentation, and phonetic descriptions are outlined. The use of semantic, syntactic, lexical, and phonological sources of knowledge in the generation and verification of hypotheses is described. Preliminary results of recognition of some utterances are given. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the automatic recognition of continuous speech:Implications from a spectrogram-reading experiment

    Publication Year: 1973 , Page(s): 210 - 217
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1040 KB)  

    An experiment was performed in which the authors attempted to recognize a set of unknown sentences by visual examination of spectrograms and machine-aided lexical searching. Ninteen sentences representing data from five talkers were analyzed. An initial partial transcription in terms of phonetic features was performed. The transcription contained many errors and omissions: 10 percent of the segments were omitted, 17 percent were incorrectly transcribed, and an additional 40 percent were transcribed only partially in terms of phonetic features. The transcription was used by the experimenters to initiate computerized scans of a 200-word lexicon. A majority of the search responses did not contain the correct word. However, following extended interactions with the computer, a word-recognition rate of 96 percent was achieved by each investigator for the sentence material. Implications for automatic speech recognition are discussed. In particular, the differences between the phonetic characteristics of isolated words and of the same words when they appear in sentences are emphasized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis

    Publication Year: 1973 , Page(s): 165 - 174
    Cited by:  Papers (43)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1016 KB)  

    This paper discusses the theoretical basis for representation of a speech signal by its short-time Fourier transform. The results of the theoretical studies were used to design a speech analysis-synthesis system which was simulated on a general-purpose laboratory digital computer system. The simulation uses the fast Fourier transform in the analysis stage and specially designed finite duration impulse response filters in the synthesis stage. The results of both the theoretical and computational studies lead to an understanding of the effect of several design parameters and elucidate the design tradeoffs necessary to achieve moderate information rate reductions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On transient distortion in hearing aids with volume compression

    Publication Year: 1973 , Page(s): 279 - 285
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1008 KB)  

    The influence of harmonic distortion on the performance of ordinary hearing aids is discussed, as well as several kinds of possible distortions found in the transient state when compression is introduced. Transient response was determined for several hearing aids with compression that are available on the market. It is shown that each transient response is composed of two parts: the first part is determined by the frequency response of the whole transmission channel; the second one by the transient response of compression. The latter may cause overshoots, during which large distortions are found in some hearing aids. In others, a superposition of, or a modulation by, the damped low-frequency oscillation was observed at the output. Examples of such distortion are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech processing aids for the deaf:An overview

    Publication Year: 1973 , Page(s): 269 - 273
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (688 KB)  

    Two major obstacles have hindered progress in the development of speech processing aids for the deaf. The first is a lack of basic knowledge as to how speech is acquired, produced, and perceived. The second is a paucity of objective, evaluative data on potentially useful aids. This paper reviews progress in the development of speech processing aids, both for speech perception and for speech training. Progress with training aids is quite promising and reasonably positive evaluative data are currently being obtained. The use of synthetic speech as a research tool in simulating speech problems is discussed and preliminary simulation data are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measurement of articulation functions using adaptive test procedures

    Publication Year: 1973 , Page(s): 196 - 201
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (776 KB)  

    The conventional methods for measuring speech intelligibility/discrimination present entire lists of words at constant levels whereas an adaptive procedure shifts levels within a single list according to a preselected strategy. The results reported in this paper indicate that adaptive testing of monosyllabic speech communication : 1) provides reasonably stable and accurate results with a CNC (words with a consonant-vowel-consonant structure) test vocabulary of 50 words; 2) permits an efficient description of selected points on the rising portion of an articulation function; and 3) gives the tester a number of flexible testing options such as choice of strategies and preselection of target scores. In addition, the potential exists for estimation of measurement errors both within and between test sessions, with these estimates based on either group or individual test responses. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application of a digital inverse filter for automatic formant and Foanalysis

    Publication Year: 1973 , Page(s): 154 - 160
    Cited by:  Papers (5)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (864 KB)  

    In this paper, a new algorithm based upon a digital inverse filter formulation is presented for automatically determining VU, a voiced-unvoiced decision (VU = 0 during unvoiced speech and VU = 1 during voiced speech), F0, the fundamental frequency, and Fi, i = 1, 2, 3, the first three formant frequencies, as a function of time. Formant trajectory estimates are obtained for all speech sounds that satisfy VU = 1. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A plan for the field evaluation of an automated reading system for the blind

    Publication Year: 1973 , Page(s): 265 - 268
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (528 KB)  

    After more than two decades of research it is now possible to construct a high-performance reading system for the blind that will produce synthetic speech from printed text. The entire process can be carried out automatically by computer and associated special-purpose devices. As a first step toward the eventual deployment of a reading system, we have begun an evaluation study in collaboration with faculty and students at the University of Connecticut and with trainees at the Veterans Administration Eastern Blindness Rehabilitation Center. Questions to be answered concern the comprehensibility and educational uses of the output and the technical and economic resources required to make automated reading services accessible to progressively larger groups of blind people. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This Transactions ceased production in 1973. The current retitled publication is IEEE Transactions on Signal Processing.

Full Aims & Scope