pvanal

pvanal — Converts a soundfile into a series of short-time Fourier transform frames.

Description

Fourier analysis for the Csound pvoc generator

Syntax

csound -U pvanal [flags] infilename outfilename
pvanal [flags] infilename outfilename

Pvanal extension to create a PVOC-EX file.

The standard Csound utility program pvanal has been extended to enable a PVOC-EX format file to be created, using the existing interface. To create a PVOC-EX file, the file name must be given the required extension, .pvx, e.g test.pvx. The requirement for the FFT size to be a power of two is here relaxed, and any positive value is accepted; odd numbers are rounded up internally. However, power-of-two sizes are still to be preferred for all normal applications.

The channel select flags are ignored, and all source channels will be analysed and written to the output file, up to a compiler-set limit of eight channels. The analysis window size (iwinsize) is set internally to double the FFT size.

Initialization

pvanal converts a soundfile into a series of short-time Fourier transform (STFT) frames at regular timepoints (a frequency-domain representation). The output file can be used by pvoc to generate audio fragments based on the original sample, with timescales and pitches arbitrarily and dynamically modified. Analysis is conditioned by the flags below. A space is optional between the flag and its argument.

-s srate -- sampling rate of the audio input file. This will over-ride the srate of the soundfile header, which otherwise applies. If neither is present, the default is 10000.

-c channel -- channel number sought. The default is 1.

-b begin -- beginning time (in seconds) of the audio segment to be analyzed. The default is 0.0

-d duration -- duration (in seconds) of the audio segment to be analyzed. The default of 0.0 means to the end of the file.

-n frmsiz -- STFT frame size, the number of samples in each Fourier analysis frame. Must be a power of two, in the range 16 to 16384. For clean results, a frame must be larger than the longest pitch period of the sample. However, very long frames result in temporal "smearing" or reverberation. The bandwidth of each STFT bin is determined by sampling rate / frame size. The default framesize is the smallest power of two that corresponds to more than 20 milliseconds of the source (e.g. 256 points at 10 kHz sampling, giving a 25.6 ms frame).

-w windfact -- Window overlap factor. This controls the number of Fourier transform frames per second. Csound's pvoc will interpolate between frames, but too few frames will generate audible distortion; too many frames will result in a huge analysis file. A good compromise for windfact is 4, meaning that each input point occurs in 4 output windows, or conversely that the offset between successive STFT frames is framesize/4. The default value is 4. Do not use this flag with -h.

-h hopsize -- STFT frame offset. Converse of above, specifying the increment in samples between successive frames of analysis (see also lpanal). Do not use with -w.

-H -- Use a Hamming window instead of the default von Hann window.

-K -- Use a Kaiser window instead of the default von Hann window. The Kaiser parameter default is 6.8, but can be set with the -B option.

-B beta -- Set the beta parameter for any Kaiser window used to the floating point value beta.

Files

The output file has a special pvoc header containing details of the source audio file, the analysis frame rate and overlap. Frames of analysis data are stored as float, with the magnitude and frequency (in Hz) for the first N/2 + 1 Fourier bins of each frame in turn. Frequency encodes the phase increment in such a way that for strong harmonics it gives a good indication of the true frequency. For low amplitude or rapidly moving harmonics it is less meaningful.

Diagnostics

Prints total number of frames, and frames completed every 20th frame.

Examples

pvanal asound pvfile

will analyze the soundfile "asound" using the default frmsiz and windfact to produce the file "pvfile" suitable for use with pvoc.

Here is an example of the pvanal utility. It uses the file pvanal.csd.

Example 1372. Example of the pvanal utility.

See the sections Real-time Audio and Command Line Flags for more information on using command line flags.

<CsoundSynthesizer>
<CsOptions>
; Select audio/midi flags here according to platform
-odac   -m0  --limiter=.95 ;;;realtime audio out, with limiter protection
; For Non-realtime ouput leave only the line below:
; -o pvanal.wav -W ;;; for file output any platform
</CsOptions>
<CsInstruments>

sr = 44100
ksmps = 32
nchnls = 2
0dbfs  = 1

; by Menno Knevel 2021

gilen  filelen "fox.wav"	    ; get length of impulse soundfile

; analyze sound file and output result to 3 pvoc-ex files
ires1 system_i 1,{{ pvanal fox.wav fox1.pvx }}          ; default settings
ires2 system_i 1,{{ pvanal -K -w1 fox.wav fox2.pvx }}   ; very low indow setting
ires3 system_i 1,{{ pvanal -n256 fox.wav fox3.pvx }}    ; different frame size

instr 1 ; untreated signal
asig    diskin2   "fox.wav", 1
prints  "\n---***YOU NOW HEAR THE UNTREATED SOUND SAMPLE***---\n"
outs    asig*.8, asig*.8
endin

instr 2

prints  "\n---***YOU NOW HEAR THE RESULT OF THIS ANALYZED FILE:***---\n"
ktime line 0, p3, gilen/2.4     ; slow down to have a good listen at what happens
asig  pvoc ktime, 1, p4, 1 
prints  "(playback is slowed down & limited to 'the quick brown fox')\n"
      outs asig*.8, asig*.8
endin

</CsInstruments>
<CsScore>

i1 0 2.76               ; original sample

i2  5 10  "fox1.pvx"    ; default but slowed down
i2 16 10  "fox2.pvx"    ; low window setting
i2 27 10  "fox3.pvx"    ; smearing
e
</CsScore>
</CsoundSynthesizer>


Credits

Author: Dan Ellis

MIT Media Lab

Cambridge, Massachussetts

1990