Module 2 - Spectrogram

음성

Module 2 - Spectrogram

김아다만티움 2023. 8. 16. 21:45

Courses Speech Processing Module 2 – Acoustics of Consonants and Vowels Videos Spectrogram

https://speech.zone/courses/speech-processing/module-2-acoustics-of-consonants-and-vowels/videos-2/spectrogram/

Spectrogram

speech.zone

이전 시간에 소리(sound)는 단순(simple), 복합(complex), 주기적(periodic), 주기적(aperiodic) 등 다양한 형태로 존재한다는 것을 살펴봤습니다. 그리고 이러한 소리를 표현할 수 있는 방법을 살펴봤습니다. 파형은 시간에 따른 진폭의 변화를 표시하며, 이를 통해 periodic sounds의 기본 주파수를 계산할 수 있습니다. 스펙트럼은 복잡한 파동에서 component waves 주파수의 진폭을 표시하며, harmonics라고 하는 source 및 포먼트라고 하는 필터의 속성을 모두 보여줍니다.

이 동영상에서는 세 번째 표현 소리의 표현인 스펙트로그램(spectrogram)에 대해 살펴봅니다. 스펙트로그램은 진폭(amplitude), 주파수 및 시간 등 모든 것을 하나의 그림으로 보여주는 표현입니다. 여기에 표시된 그림과 같이 일련의 스펙트럼이 순차적인 시점에 따라 정렬된 것으로 생각할 수 있습니다. 이 그림의 앞쪽에는 스펙트럼이 표시되어 있으며, 진폭은 y축에, 주파수는 x축에 표시됩니다. 시간은 z축에 표시되며, 앞쪽의 0에서 시작하여 이후 레이어에서 시간이 앞으로 이동합니다.

여기에 표시된 표현은 다양한 시간에 따른 speech sounds에 따라 스펙트럼이 어떻게 변하는지에 대한 정보를 조금 더 제공합니다. 하지만 2차원 화면에서 보고 있기 때문에 여전히 필요한 모든 정보를 볼 수는 없습니다. 이 그림의 방향을 바꾸어 시간을 x축에, 주파수를 y축에 배치하면 지형도와 유사하게 그레이 스케일 음영으로 진폭을 나타낼 수 있습니다.
여기에는 3차원 특성을 보여주는 두 가지 스펙트로그램의 예가 있습니다. 스펙트로그램은 파형과 스펙트럼에 존재하는 모든 3차원을 표시합니다. X축은 시간을 나타냅니다. Y축은 주파수를 나타내고 음영은 진폭을 나타냅니다. 이 두 스펙트로그램에서 왼쪽에서 약 500~1500헤르츠의 주파수 범위는 진한 음영으로 표시되는 높은 진폭을 가지고 있음을 알 수 있습니다.

오른쪽은, 더 밝은 음영과 일부 흰색 또는 0 진폭 영역으로 표시되는 똑같이 낮은 진폭 범위입니다. 종종 언어학적 음성(speech) 조사에서 파형과 스펙트로그램은 시간에 맞춰 정렬된 전화 및 단어 수준의 대본과 함께 제공됩니다. 앞으로 살펴보겠지만 스펙트로그램을 사용하면 음향 특성에 따라 specific phones를 더 정확하게 식별할 수 있습니다.

In previous videos, we've seen that sounds come in various forms, simple, complex, periodic, a periodic, etc. And we have looked at the ways we can represent these sounds. The waveform displays changes in amplitude over time, and from this we can calculate the fundamental frequency of periodic sounds. The spectrum displays amplitudes of component wave frequencies in a complex wave, and reveals properties of both the source, known as harmonics, and the filter, known as formants.
In this video, we will look at a third representation of sound, known as the spectrogram. The spectrogram is a representation that shows amplitude, frequency, and time all in one figure. We can think of this as a series of spectro lined up at sequential points in time, somewhat like the figure shown here. This figure has a spectrum visible at the front, with amplitude on the y-axis and frequency on the x-axis. Time is shown here on the z-axis, starting from zero at the front, and moving forward in time in subsequent layers.
The sort of representation shows here gives us a bit more information about how the spectrum changes over time with varying speech sounds. But because we are looking at it on a two-dimensional screen, we still cannot see all of the information that we might need. If we orient this figure so that time is on the x-axis, and frequency on the y-axis, we can then represent amplitude with gray scale shading, similar to a topographic map.
Here we have examples of two spectrograms demonstrating its three-dimensional nature. The spectrogram displays all three dimensions that are present in the waveform and the spectrum. The x-axis represents time. The y-axis represents frequency, and the shading represents amplitude. In these two spectrograms, we can see that on the left, the frequency range from about 500 to 1500 hertz has high amplitude represented by darker shading.
While on the right, the same range has lower amplitude, represented by lighter shading and even some areas of white or zero amplitude. Often in linguistic investigations of speech, the waveform and spectrogram will be presented together with time-aligned phone and word-level transcriptions. As we will see, the spectrogram will allow us to identify with more precision specific phones based on their acoustic characteristics.