Chinese Physics Letters, 2019, Vol. 36, No. 2, Article code 024302 Pitch Shift in Exsomatized Cochlea Observed by Laser Interferometry * Zhang-Cai Long (龙长才)1, Tao Shen (沈涛)1, Yan-Ping Zhang (张艳平)2, Lin Luo (骆琳)3** Affiliations 1School of Physics, Huazhong University of Science and Technology, Wuhan 430074 2Affiliated Hospital, Huazhong University of Science and Technology, Wuhan 430074 3Department of Chinese language and literature, Huazhong University of Science and Technology, Wuhan 430074 Received 23 October 2018, online 22 January 2019 *Supported by the National Natural Science Foundation of China under Grant Nos 11374118 and 90820001.
**Corresponding author. Email: luolin@hust.edu.cn
Citation Text: Long Z C, Chen T, Zhang Y P and Luo L 2019 Chin. Phys. Lett. 36 024302    Abstract Pitch is one of the most important auditory perception characteristics of sound; however, the mechanism underlying the pitch perception of sound is unclear. Although theoretical researches have suggested that perception of virtual pitch is connected with physics in cochlea of inner ear, there is no direct experimental observation of virtual pitch processing in the cochlea. By laser interferometry, we observe shift phenomena of virtual pitch in basilar membrane vibration of exsomatized cochlea, which is consistent with perceptual pitch shift observed in psychoacoustic experiments. This means that the complex mechanical vibration of basilar membrane in cochlea plays an important role in pitch information processing during hearing. DOI:10.1088/0256-307X/36/2/024302 PACS:43.66.Hg, 43.64.Kc, 87.85.fk © 2019 Chinese Physics Society Article Text Pitch is one of the most important perceptual attributes of sound. The pitch of a sound is the perceived position of the sound in musical scale and is described in frequency of pure tone with the same perceived pitch. In music, a sequence of pitches composes a melody, and simultaneous combinations of pitches compose harmony. In speech, the rise and fall of pitch contour compose prosody, which plays an important role in expressing meaning of words in tonal languages, such as Chinese, and improves speech intelligibility, especially in circumstance with competing sounds. For a long time, the mechanism of pitch perception has been a focus of research. However, how a hearing system processes and abstracts pitch information is still a mystery.[1] Due to the general deficit of pitch perception in cochlear implant and hearing aid users, the mechanism of pitch perception, especially the pitch processing mechanism in cochlea, is significantly important.[1] The scientific investigation of pitch perception mechanism can be dated back to Pythagoras. In modern times, Helmholtz connected this perception with physics in cochlea, who attributed different perceptual pitches of pure tones to sound sensors in hearing systems with different mechanical resonant frequencies.[2] Bekesy observed that sounds with different frequencies evoke mechanical vibrations at different positions of basilar membrane in cochlea, which is connected with different neural channels to higher hearing systems. This frequency-position topology in the cochlea can explain why sound with different frequencies has a different pitch corresponding to the frequency, and can also explain the phenomena that complex sound, composed of a fundamental component with frequency $f_{0}$ and its higher harmonic components with frequencies $2f_{0}$, $3f_{0}$, $nf_{0}$, has a pitch corresponding to its fundamental frequency $f_{0}$. However, when the fundamental component is removed, a complex sound comprised of only higher harmonic components still has perception pitch, and the pitch is that of the removed fundamental component. This pitch without corresponding physical frequency component is known as virtual pitch.[3] The virtual pitch phenomena had ever been explained by Helmholtz proposed combination-tone theory.[2] According to the theory, two tones with frequencies $f_{1}$ and $f_{2}$ respectively will produce components with combination frequencies $f_{\rm C}=nf_{1}+mf_{2}$ ($n$ and $m$ are integers) in a nonlinear auditory system, which are called combination tones. Among these combination tones, the component with frequency $f_{2}-f_{1}$ (and $f_{1} -f_{2}$) is called difference-frequency component. When a harmonic complex sound without fundamental frequency component is applied, the consecutive harmonic components will produce difference-frequency components with frequency of the missing fundamental component in a nonlinear auditory system, which would produce the pitch of the missing fundamental component. Combination tones in cochlear membrane vibration were finally observed experimentally in 1991.[4] However, combination-tone explanations of virtual pitch are still confronted with pitch shift observed by Boer and Schouten et al. in psychoacoustic experiments.[5,6] Boer[5] and Schouten et al.[6] used amplitude modulated signals in the form of $s(t)=\frac{A}{2}({1+\cos 2\pi f_{0} t})\cos 2\pi ({kf_{0} +\delta f})t$ ($k$ is an integer), which contains only three components with frequencies $\{({k-1})f_{0} +\delta f$, $kf_{0} +\delta f$, $({k+1})f_{0} +\delta f\}$. These components, centered at the carrier frequency $kf_{0} +\delta f$, are separated by modulating frequency $f_{0}$, and the frequency difference of consequent components is always $f_{0}$. When $\delta f=0$, these components are higher harmonics of missing fundamental with frequency $f_{0}$, and the signal has percept pitch of the missing fundamental, which is equal to the difference frequency of consequent components. When $\delta f\ne 0$, these components, which are transferred in frequency with a common value $\delta f$ and maintained difference frequency unchanged, are generally not harmonic. However, the signal still has perceptual pitch, and the percept pitch is not the difference frequency $f_{0}$, instead, with a shift from the difference frequency $f_{0}$ dependent on $k$ and $\delta f$. Boer described the perceived pitch of this signal in formula $f_{\rm pp}=f_{0} +\frac{\delta f}{k}$.[5] To explain the pitch shift phenomena, neuron stochastic resonance[7,8] and quasi-periodic signal driven three-frequency resonance in nonlinear vibration system[9,10] have been proposed. The former demonstrated theoretically that noise in neurons can produce spike train with frequency corresponding to the shift virtual pitch in stochastic resonance, a phenomenon in which optimal noise can enhance signal transmission.[11,12] The latter demonstrated theoretically that amplitude modulated signal adopted by Boer and Schouten et al. can produce so-called three-frequency nonlinear resonance with resonance frequency corresponding to the shift virtual pitch. Recently, Martignoli and Stoop demonstrated, in an electronic simulation device of cochlea consisting of a cascade of Hopf vibrators,[13,14] that pitch shift phenomena can originate from temporal structure of auditory peripheral nonlinear vibration.[15] However, there is no direct physiological experimental evidence for any of these theories. In this work, to demonstrate whether the pitch shift phenomenon is originated from cochlea and how the shift virtual pitch is expressed in cochlea, we measure and analyze basilar membrane vibration in cochlea by laser interferometer. For the first time, we observe pitch shift phenomena in exsomatized cochlear basilar membrane vibration, which is consistent with percept pitch observed in psychoacoustic experiments. The pitch that we observed in cochlea with shift characteristics of hearing perception comes from the temporal structure of basilar nonlinear vibration, instead of nonlinear three-frequency resonance. We used the cochlea of a guinea pig, which is similar to human cochlea, to conduct this research. The Guinea pigs that we adopted were about 400–500 g, which were provided by Experimental Animal Center under Hubei Provincial Center for Disease Control and Prevention. All Hubei provincial and national guidelines of China for the care and use of animals were followed. These animals were chosen without limit of gender and colour. Every Guinea pig was examined under microscope to make sure that it was without abnormality in ear channel and eardrum. After it was anesthetized by abdominal injection of pentobarbital sodium (concentration 3%, dose 30 mg/kg), cochlea with integer auricle was separated from guinea pig. Under microscope, the bulla was opened, and a hole about 200 µm was drilled at the top first turn of cochlea. Then, the prepared cochlea was placed on measuring platform of interferometry. Our previous researches had shown that after separated from body, exsomatized cochlea remains active for a short period of time.[16] To have an active cochlea to be measured in this research, the time taken for measurement was limited less than 30 min. In addition, the activity of a measured cochlea was checked according to whether its basilar membrane vibration contains combination-tone components.[16] The vibrations of basilar membrane were measured by our self-developed laser interfering measurement system.[17] In this system, we adopted new technology to achieve sub-nanometer resolution without nonlinear distortion.[17] In addition, this system was designed to measure vibration of object with weak refection, so that the vibration of basilar membrane was measured without reflecting micro beads projected into the cochlea. In this measurement system, displace wave forms of basilar membrane vibration, instead of vibration velocity, were recoded real time in computer. Detail of this system is available in our published articles[16,17] and dissertation.[18]
cpl-36-2-024302-fig1.png
Fig. 1. Stimulus signal: (a) stimulus wave form in the form of $s(t)=\frac{A}{2}({1+\cos 2\pi f_{0} t})\cos 2\pi ({kf_{0} +\delta f})t$, $f_{0}=200$ Hz, $k=4$, $\delta f=-50$ Hz. (b) Spectrum of stimulus with component amplitude ratio of 0.45:1:0.45, $f_{1}=550$ Hz, $f_{2}=750$ Hz and $f_{3}=950$ Hz.
Signals with the same form of Boer and Schouten et al. pitch shift psychoacoustic experiments[5,6] were used: $s(t)=\frac{A} {2} ({1+\cos 2\pi f_{0} t})\cos 2\pi ({kf_{0} +\delta f})t$, which were composed of pure tone components with frequencies: $\{({k-1})f_{0} +\delta f$, $kf_{0} +\delta f$, $( {k+1})f_{0} +\delta f\}$. Amplitude ratio of the three components was 0.45:1:0.45. Figure 1 is an example of input signal wave (Fig. 1(a)) and frequency spectrum (Fig. 1(b)). The signals were synthesized and produced digitally with 0.01 µs time resolution by digital signal generator RIGOL dG4062. The signals were introduced to CD/tape input of AC33 aduitometer (Denmark) and amplified, then drove an insert earphone (EARTONE-3A) inserted in guinea pig ear channel to produce sound stimulus. The sound level was 80 dB. We first checked whether three-frequency resonance occurs in cochlear membrane vibration. This was carried out by analyzing the frequency components of basilar membrane vibration. According to theory, for stimuli with frequencies $f_{1}$ and $f_{2}$, nonlinear three-frequency resonances would appear at frequencies $f_{\rm R}$ determined by both $f_{1}$ and $f_{2}$ in formula: $Rf_{\rm R}=pf_{1} +qf_{2}$ ($R$, $p$ and $q$ are integers). These resonances would exhibit resonance peaks in frequency spectrum. Compared with combination tones, which are only the parts of three-resonance for $R=1$, frequency components of three-frequency resonance are more populous, one of which would be with frequency identical to perception pitch. Our analysis showed that except peaks of stimuli and combination tones, there is no other peak in the frequency spectrum of basilar membrane vibration. Figure 2 is an example of observed basilar membrane vibration and frequency spectrum. This means that theory expected three-frequency resonance did not occur in the cochlea, and virtual pitch, which is shifted from difference frequency, cannot be attributed to three-frequency resonance of basilar membrane vibration.
cpl-36-2-024302-fig2.png
Fig. 2. Vibration response of basilar membrane. (a) Vibration wave form of basilar membrane. (b) Spectrum of basilar membrane vibration responding to signal stimulus of Fig. 1, which includes combination tones (200 Hz, 400 Hz, 600 Hz, etc.) in addition to stimulus frequency components ($f_{1}=550$ Hz, $f_{2}=750$ Hz, $f_{3}=950$ Hz). The characteristic frequency of the measuring position is 450 Hz.
cpl-36-2-024302-fig3.png
Fig. 3. Scheme of calculating temporal structure expressed pitch: (a) basilar membrane vibration wave form and peak-interval time, (b) distribution of peak-interval time versus inverse of peak-interval time. The most frequent peak-interval time is $T_{\rm P}$, and the inverse of which is temporal expressed pitch $f_{\rm P}$.
Our next work was to explore whether the temporal structure of the basilar vibration expresses pitch information. We accounted numbers of time interval $T$ between peaks (Fig. 3(a)) in basilar membrane vibration wave form to obtain inter-peak time interval distribution. Then we obtained the most frequent inter-peak time interval $T_{\rm P}$ of basilar membrane vibration, at which there is a maximum in the time-interval distribution (Fig. 3(b)). We use $1/{T_{\rm P}}$ as basilar membrane vibration temporal structure expressed pitch $f_{\rm P}$. This calculation of pitch $f_{\rm P}$ has been adopted in previous theoretical studies,[15] and some researchers suggested that this time information can be maintained in the following neuron transfer and can be extracted in a higher neuron system.[15]
cpl-36-2-024302-fig4.png
Fig. 4. Pitch shift of basilar membrane vibration. In this figure $f_{\rm car}$ is the carrier frequency, that is, the central frequency of stimulus signal. Stars are temporal structure expressed pitch measured in this research. Slope dashed line is pitch described by the Boer formula $f_{\rm PP}=f_{0} +{\delta f}/k$, $k=7$, and $f_{0}=200$ Hz. Frequencies of stimulus are $\{{1200+\delta f,1400+\delta f,1600+\delta f}\}$. When $\delta f\ne 0$, they shift apart from harmonic frequencies of 200 Hz, (1200 Hz, 1400 Hz, 1600 Hz), but difference frequency of neighbor components keeps unchanged in 200 Hz. The characteristic frequency of the measuring position is 500 Hz.
Keeping $k$ and $f_{0}$ constant, and changing $\delta f$, we obtained the situation where three components of stimulus changed around the harmonic. For example, when $k=7$, $f_{0}=200$ Hz, frequencies of three components, (1200 Hz$+\delta f$, 1400 Hz$+\delta f$, 1600 Hz$+\delta f$), changed around (1200 Hz, 1400 Hz, 1600 Hz), harmonic frequencies of 200 Hz. We measured temporal structure expressed pitch of these stimuli. Figure 4 is an example of this measurement. It can be seen that when $\delta f=0$, the temporal structure expressed pitch is 200 Hz, which is the missing fundamental frequency of stimulus frequencies (1200, 1400, 1600), and also the difference frequency of consequent components of the stimulus. With increasing (decreasing) $\delta f$, the temporal structure expressed pitch increases (decreases), and shifted from original missing fundamental frequency and difference frequency (200 Hz). The observed relation of pitch shift with $\delta f$ (expressed in stars in Fig. 4) is consistent with the Boer formula $f_{\rm pp}=f_{0} +\frac{\delta f}{k}$ (slope dashed line in Fig. 4), except a slight difference in the changing slope. This subtle difference in slope also existed in the psychoacoustic experimental result by Boer and Schouten, which is called the second pitch shift. Due to the limit of active time of an exsomatized cochlea for data collection, we could not obtain data for all different $k$ from one active cochlea. To check the general characteristics of basilar membrane vibration temporal structure expressed pitch and to compare it with psychoacoustic experimental result, we collected data from different cochleae. Although data from every cochlea were measured at the top first turn of a cochlea, measuring positions of different cochleae were not with the same characteristic frequency. We pool these data and show them in Fig. 5 (solid dots). In all this data in Fig. 5, except those for $k=2$, 11, 12, data for a certain integer $k$ from 3 to 10 were not from a same cochlea, but obtained from different cochleae.
cpl-36-2-024302-fig5.png
Fig. 5. Temporal structure expressed pitch observed in cochlear basilar membrane vibration. Ordinate ($f_{\rm p}$) is pitch, and abscissa ($f_{\rm car}$) is carrier frequency of stimulus signal. The stimulus signal form is $s(t)=\frac{A}{2}({1+\cos 2\pi f_{0} t})\cos 2\pi ({kf_{0} +\delta f})t$, $f_{\rm car}=kf_{0} +\delta f$, $f_{0}=200$ Hz. Dots are the measured data in this research. Slope dashed lines are pitch described by the Boer formula.
It can be seen that though these data were collected from different cochleae, and measuring positions did not have the same characteristic frequency, these data (solid dots in Fig. 5) still show ensemble characteristics. The data cluster according to $k$. For a certain $k$, it exhibits pitch shift depending on $\delta f$ (or center frequency, carrier frequency $f_{\rm car}$, $kf_{0} +\delta f$), which is consistent with psychoacoustic experimental based pitch shift formula (slope dashed line in Fig. 5). For different $k$, the slope of the pitch shift changes with $k$, which is also consistent with the psychoacoustic experimental based pitch shift formula (slope dashed lines with different $k$). The only difference from psychoacoustic result is that for $k$ from 3 to 10, the slope of pitch shift does not show a systematically slight difference from the Boer formula $f_{\rm pp}=f_{0} +\frac{\delta f}{k}$; that is, the second pitch shift. The fact that the pooled data from similar position of different cochleae exhibit ensemble pitch shift characteristics consistent with perception pitch characteristics indicates that temporal expressed pitch of basilar membrane vibration is general, unanimous, and robust, which is not influenced by individual difference of specimens. The absence of second pitch shift in the pooled data means that pitch shift is position-dependent. If pitch shift is slightly dependent on position in cochlea, then the subtle second pitch shift can be blurred by this position dependence in the pooled data. In conclusion, this research has revealed that basilar membrane vibration of active cochlea, in temporal structure, can express pitch consistent with perceptual virtual pitch observed in psychoacoustic experiments. We have not observed three-frequency resonance in cochlear basilar membrane vibration. This suggests that, in cochlea, basilar membrane vibration expresses virtual pitch of sound in temporal structure, instead of nonlinear three-frequency resonance. With this research, we can expect that there is temporal mechanism in neuron systems, in addition to frequency-channel topology mechanism and to abstract pitch information. Consequently, a hearing aid or cochlear implant that is endowed with nonlinearity of cochlea to transfer pitch information in temporal structure should produce a better effect in speech intelligibility and music appreciation than linear sound signal processing.
References Revisiting place and temporal theories of pitchTwo-tone distortion in the basilar membrane of the cochleaPitch of Inharmonic SignalsPitch of the ResidueSubharmonic stochastic synchronization and resonance in neuronal systemsNews and views in briefNonlinear Dynamics of the Perceived Pitch of Complex SoundsPitch perception: A dynamical-systems perspectiveLower Hearing Threshold by NoiseFrequency Sensitivity in Nervous SystemsEssential Nonlinearities in HearingHopf Amplification Originated from the Force-Gating Channels of Auditory Hair CellsLocal Cochlear Correlations of Perceived PitchHigh resolution heterodyne interferometer based on time-to-digital converter
[1] Oxenham A J 2013 Acoust. Sci. & Tech. 34 388
[2]von Helmholtz H 1863 Die Lehre von dem Tonempfindungenals Physiologische Grundlage für die Theorie derMusik (Braunschweig)
[3]Boer E D 1976 Handbook of Sensory Physiology, Auditory System (New York: Springer)
[4] Robles L, Ruggero M A and Rich N C 1991 Nature 349 413
[5] Boer E D 1956 Nature 178 535
[6] Schouten J F, Ritsma R J and Cardozo B L 1962 J. Acoust. Soc. Am. 34 1418
[7] Chialvo D R, Calvo O, Gonzalez D L et al 2002 Phys. Rev. E 65 050902
[8] Philip B 2003 Nature 425 914
[9] Cartwright J H E, González D L and Piro O 1999 Phys. Rev. Lett. 82 5389
[10] Cartwright J H E, González D L and Piro O 2001 Proc. Natl. Acad. Sci. USA 98 4855
[11] Long Z C, Shao F, Zhang Y P and Qin Y G 2004 Chin. Phys. Lett. 21 757
[12] Liu F and Wang W 2001 Chin. Phys. Lett. 18 292
[13] Eguíluz V M, Ospeck M, Choe Y et al 2000 Phys. Rev. Lett. 84 5232
[14] Tian L, Zhang Y P and Long Z C 2016 Chin. Phys. Lett. 33 128701
[15] Martignoli S and Stoop R 2010 Phys. Rev. Lett. 105 048101
[16]Long X M, Zhang Y P, Lu J and Long Z C 2015 Journal of Clinical Otorhinolaryngology Head and Neck Surgery 29 1644 (in Chinese)
[17] Wang F, Long Z C, Zhang B et al 2012 Rev. Sci. Instrum. 83 045112
[18]Wang F 2014 PhD Dissertation (Wuhan: Huazhong University of Science and Technology) (in Chinese)