pvsanal — Generate an fsig from a mono audio source ain, using phase vocoder overlap-add analysis.
Generate an fsig from a mono audio source ain, using phase vocoder overlap-add analysis.
ifftsize -- The FFT size in samples. Need not be a power of two (though these are especially efficient), but must be even. Odd numbers are rounded up internally. ifftsize determines the number of analysis bins in fsig, as ifftsize/2 + 1. For example, where ifftsize = 1024, fsig will contain 513 analysis bins, ordered linearly from the fundamental to Nyquist. The fundamental of analysis (which in principle gives the lowest resolvable frequency) is determined as sr/ifftsize. Thus, for the example just given and assuming sr = 44100, the fundamental of analysis is 43.07Hz. In practice, due to the phase-preserving nature of the phase vocoder, the frequency of any bin can deviate bilaterally, so that DC components are recorded. Given a strongly pitched signal, frequencies in adjacent bins can bunch very closely together, around partials in the source, and the lowest bins may even have negative frequencies.
As a rule, the only reason to use a non power-of-two value for ifftsize would be to match the known fundamental frequency of a strongly pitched source. Values with many small factors can be almost as efficient as power-of-two sizes; for example: 384, for a source pitched at around low A=110Hz.
ioverlap -- The distance in samples (“hop size”) between overlapping analysis frames. As a rule, this needs to be at least ifftsize/4, e.g. 256 for the example above. ioverlap determines the underlying analysis rate, as sr/ioverlap. ioverlap does not require to be a simple factor of ifftsize; for example a value of 160 would be legal. The choice of ioverlap may be dictated by the degree of pitch modification applied to the fsig, if any. As a rule of thumb, the more extreme the pitch shift, the higher the analysis rate needs to be, and hence the smaller the value for ioverlap. A higher analysis rate can also be advantageous with broadband transient sounds, such as drums (where a small analysis window gives less smearing, but more frequency-related errors).
Note that it is possible, and reasonable, to have distinct fsigs in an orchestra (even in the same instrument), running at different analysis rates. Interactions between such fsigs is currently unsupported, and the fsig assignment opcode does not allow copying between fsigs with different properties, even if the only difference is in ioverlap. However, this is not a closed issue, as it is possible in theory to achieve crude rate conversion (especially with regard to in-memory analysis files) in ways analogous to time-domain techniques.
iwinsize -- The size in samples of the analysis window filter (as set by iwintype). This must be at least ifftsize, and can usefully be larger. Though other proportions are permitted, it is recommended that iwinsize always be an integral multiple of ifftsize, e.g. 2048 for the example above. Internally, the analysis window (Hamming, von Hann) is multiplied by a sinc function, so that amplitudes are zero at the boundaries between frames. The larger analysis window size has been found to be especially important for oscillator bank resynthesis (e.g. using pvsadsyn), as it has the effect of increasing the frequency resolution of the analysis, and hence the accuracy of the resynthesis. As noted above, iwinsize determines the overall latency of the analysis/resynthesis system. In many cases, and especially in the absence of pitch modifications, it will be found that setting iwinsize=ifftsize works very well, and offers the lowest latency.
iwintype -- The shape of the analysis window. Currently three choices computed windows implemented:
0 = Hamming window
1 = von Hann window
3 = Kaiser window (not in sliding form)
These are also supported by the PVOC-EX file format. The window type is stored as an internal attribute of the fsig, together with the other parameters (see pvsinfo). Other types may be implemented later on; if the value of wintype is strictly negative then the absolute value is used as the number of an f-table which must pre-exist. A significant issue here is the common constraint of f-tables to power-of-two sizes, so this method does not offer a complete solution. Most users will find the Hamming window meets all normal needs, and can be regarded as the default choice.
iformat -- (optional) The analysis format. Currently only one format is implemented by this opcode:
0 = amplitude + frequency
This is the classic phase vocoder format; easy to process, and a natural format for oscillator-bank resynthesis. It would be very easy (tempting, one might say) to treat an fsig frame not purely as a phase vocoder frame but as a generic additive synthesis frame. It is indeed possible to use an fsig this way, but it is important to bear in mind that the two are not, strictly speaking, directly equivalent.
Other important formats (supported by PVOC-EX) are:
1 = amplitude + phase
2 = complex (real + imaginary)
iformat is provided in case it proves useful later to add support for these other formats. Formats 0 and 1 are very closely related (as the phase is “wrapped” in both cases - it is a trivial matter to convert from one to the other), but the complex format might warrant a second explicit signal type (a “csig”) specifically for convolution-based processes, and other processes where the full complement of arithmetic operators may be useful.
iinit -- (optional) Skip reinitialization. This is not currently implemented for any of these opcodes, and it remains to be seen if it is even practical.
Warning | |
---|---|
It is unsafe to use the same f-variable for both input and output of pvs opcodes. Using the same one might lead to undefined behavior on some opcodes. Use a different one on the left and right sides of the opcode. |
Here is an example of the pvsanal opcode. It uses the file pvsanal.csd.
Example 825. Example of the pvsanal opcode.
See the sections Real-time Audio and Command Line Flags for more information on using command line flags.
<CsoundSynthesizer> <CsOptions> ; Select audio/midi flags here according to platform -odac ;;;realtime audio out ;-iadc ;;;uncomment -iadc if realtime audio input is needed too ; For Non-realtime ouput leave only the line below: ; -o pvsanal.wav -W ;;; for file output any platform </CsOptions> <CsInstruments> sr = 44100 ksmps = 32 nchnls = 2 0dbfs = 1 instr 1 ;pvsanal has no influence when there is no transformation of original sound ifftsize = p4 ioverlap = ifftsize / 4 iwinsize = ifftsize iwinshape = 1 ;von-Hann window Sfile = "fox.wav" ain soundin Sfile fftin pvsanal ain, ifftsize, ioverlap, iwinsize, iwinshape ;fft-analysis of the audio-signal fftblur pvscale fftin, p5 ;scale aout pvsynth fftblur ;resynthesis outs aout, aout endin </CsInstruments> <CsScore> s i 1 0 3 512 1 ;original sound - ifftsize of pvsanal does not have any influence i 1 3 3 1024 1 ;even with different i 1 6 3 2048 1 ;settings s i 1 0 3 512 1.5 ;but transformation - here a fifth higher i 1 3 3 1024 1.5 ;but with different settings i 1 6 3 2048 1.5 ;for ifftsize of pvsanal e </CsScore> </CsoundSynthesizer>