DRC: Digital Room Correction
Denis Sbragion
2008-06-16
Copyright © 2002-2008 Denis Sbragion
Version 3.0.0
|
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as published
by the Free Software Foundation; either version 2 of the License, or (at
your option) any later version.
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
You can contact the author on Internet at the following
address:
d.sbragion@infotecna.it
This program uses the parsecfg library from Yuuki NINOMIYA.
Details on this library can be found in the parsecfg.c and parsecfg.h
files. Many thanks to Yuuki NINOMIYA for this useful library.
This program uses also the FFT routines from Takuya Ooura and
the GNU Scientific Library (GSL) FFT routines. Many thanks to Takuya
Ooura and the GSL developers for these efficient routines.
Contents
1 Introduction
DRC uses a lot of signal, linear system and DSP theory to achieve its
results. In the following explanations some knowledge about those
arguments is assumed. I'm planning to write a more extensive manual with
the basics needed to understand what DRC does, or at least is trying to
do, but unfortunately I have just little spare time to dedicate to this
project, so I will concentrate just on improving the program
performances. Of course volunteers, suggestion, patches, better
documentation, pointers and references on the subject are all
appreciated.
For a basic introductory guide to DSP theory and practice you might look
at:
http://www.dspguide.com/
For a basic introduction to DSP applied to audio you might read the good
book available at:
To better understand what DRC is trying to do you might look at:
http://en.wikipedia.org/wiki/Digital_Room_Correction
This clear and concise Wikipedia article contains all the basics needed
to understand digital room correction in general. Another interesting
page is available at:
http://www.ludd.luth.se/~torger/filter.html
On this web page you'll find some good explanations about the Nwfiir
Audio Tools suite, which was a project now discontinued and similar to
DRC but implemented using warped FIR filters instead of the usual linear
FIR filters.
Compared with the Nwfiir Audio Tools suite DRC does only the job carried
out by the wfird program, generating just the linear FIR filters for
digital room correction. In order to measure the room impulse response
and to perform real time or offline convolution (i.e. the correction) of
the digital signal, you have to use some external programs, like, for
example, BruteFIR (see section 2).
A good DRC step by step guide has been written by “Jones Rush”, and it
is available at the following URL:
http://www.duffroomcorrection.com/images/d/de/DRC_Guide_v1.0.pdf
“Jones Rush” spent quite a lot of time learning the complete procedure
needed to set up a full digital room correction system and also spent a
lot of time writing the guide from the beginner's point of view, so this
is a really good starting point for everyone who has never played before
with this sort of things. The guide is now a bit outdated, but despite
this it is still a valid reference for the whole procedure of creating
DRC filters. The main difference is the name of the output file
generated by the latest sample configuration files, which now is
“rps.pcm” instead of “dxf.pcm”.
Ed Wildgoose is trying to create a collaborative documentation effort at:
http://www.duffroomcorrection.com/
Please, take the time to improve the available documentation and to
share your experience participating to those nice Wiki pages. It could
be really useful for other DRC users.
2 Getting the latest version
The official DRC web site is available at the following address:
http://drc-fir.sourceforge.net/
On the web site you will always find news and up to date informations,
the full documentation for the latest version, informations on where to
download it and many other DRC related informations. New DRC releases
are announced using the Freshmeat announcement and tracking service. The
Freshmeat DRC page is available at the following address:
http://freshmeat.net/projects/drc/
3 What's new in version 3.0.0
A new method for the computation of an optimized psychoacoustic target
response, based on the spectral envelope of the corrected impulse
response, has been introduced.
3.1 Compatibility with previous versions
Configuration files for version 3.0.0 are not backward compatible with
previous versions. Configuration files from previous versions could be
easily adapted to the new version by copying the dip limiting parameters
added to the base configuration section (BCDLType, BCDLMinGain,
BCDLStartFreq, BCDLEndFreq, BCDLStart, BCDLMultExponent) and the
psychoacoustic target parameters (see section
6.8) from one of the sample configuration files
available for version 3.0.0.
3.2 History: what was new in previous versions
A new method for the computation of the excess phase component inverse,
based on a simple time reversal, has been introduced. The sample
configuration files have been rewritten to take advantage of the new
inversion procedure. Sample configuration files for 48 KHz, 88.2 KHz, 96
KHz sample rates have been added. The homomorphic deconvolution
procedure has been improved to avoid any numerical instability. A new
Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation
method, providing monotonic behaviour, has been introduced in the target
response computation. All the interpolation and approximation procedures
have been rewritten from scratch to provide better performances and
accuracy.
A new command line parameters replacement functionality has been
introduced. The dip and peak limiting procedures have been improved in
order to avoid numerical instabilities. A new wavelet based analysis
graph has been added to the sample results. Many performance
improvements have been introduced. A new optional parameter used to
define the base directory for all files has been added.
3.2.3 Version 2.6.1
Minor corrections and improvements have been applied to the
documentation and to the pre-echo truncation inversion procedure. A new
target transfer function definition procedure based on Uniform B Splines
has been introduced. The development environment has been moved to
Code::Blocks and GCC/MinGW.
3.2.4 Version 2.6.0
A new prefiltering curve based on the bilinear transformation has been
introduced. An improved windowing of the minimum phase filters used to
apply the target frequency response and the microphone compensation has
been implemented. A missing normalization of the minimum phase
correction filter has been added. A new logarithmic interpolation has
been added to the target transfer function computation. The new
interpolation method simplifies the definition of the target transfer
functions. Small improvements to the documentation and to the Octave
scripts used to generate the graphs have been applied. A new improved
version of the measurejack script has been included in the package. Some
new sample configuration files, including one approximating the ERB
psychoacoustic scale, have been added.
3.2.5 Version 2.5.1
Small improvements to the documentation and to the Octave scripts used
to generate the graphs. The sliding lowpass prefiltering procedure has
been rewritten to make it a bit more accurate and to make the code more
readable. Few other minor bugs have been fixed.
3.2.6 Version 2.5.0
With version 2.5.0 a general overhauling of the filter generation
procedure has been performed. Some steps (peak limiting for example)
have been moved to a different stage of the procedure, and new stages
have been added.
A new ringing truncation stage has been added to remove excessive
ringing caused sometimes by the pre-echo truncation procedure. Now the
filter impulse response is enclosed in a sort of psychoacoustic jail
that prevent, or at least reduces a lot, any artifact that could arise
as a side effect of the filter generation procedure. With this changes
DRC becomes somewhat “self tuning” and now it is able to adapt itself
to the input impulse response, at least to some extent, providing as
much correction as possible without generating excessive artifacts.
The postfiltering stage, where the target transfer function is defined,
has been split to provide a separate stage for microphone compensation.
This allows for a greater flexibility defining both the target transfer
function and the microphone compensation, and provides as a side effect
correct test convolutions even when microphone compensation is in place.
With the previous versions the test convolution was improperly altered
by the microphone compensation, because both the target transfer
function and the microphone compensation were generated and applied
using the same filter.
Many other procedures have been refined. For example the peak and dip
limiting procedures now ensure continuity up to the first derivative of
the magnitude response on the points where the magnitude limiting starts
its effect. This further reduces the ringing caused by abrupt changes in
the magnitude response.
Finally many other minor bugs have been corrected and the documentation
has been improved, switching to LATEX for document generation and
formatting.
3.2.7 Version 2.4.2
Version 2.4.2 added a better handling of underflow problems during
homomorphic deconvolution. Some little speed improvements have been
also achieved. Added search and output of peak value and peak position
into lsconv.
3.2.8 Version 2.4.1
Version 2.4.1 added some tools for accurate time aligned impulse
response measurements. This make it possible to compensate for
interchannel misalignments, at least up to a limited extent. Some minor
bugs have been also corrected.
3.2.9 Version 2.4.0
In version 2.4.0 the Takuya Ooura and GNU Scientific Library FFT
routines have been included in the program. These routines are about 10
times faster than the previous routines, providing about the same
accuracy. Furthermore some checks have been added to the sharpness
parameters to avoid program crashes when these parameters are missing.
The FFT routines described above are available at:
http://www.gnu.org/software/gsl/
http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html
3.2.10 Version 2.3.2
In version 2.3.2 a new sharpness factor parameter has been added to the
sliding low pass prefiltering procedure. This parameter provides a
control between filtering sharpness and spectral spreading in the filter
transition region. A new option to read and write double precision
floating points files has been added. Some checks to warn when the input
signal is too short to provide accurate results has been added.
3.2.11 Version 2.3.1
In version 2.3.1 some minor corrections to the program have been
performed and the documentation has been restructured. A new option to
automatically count the number of lines in the target function and
microphone compensation files has been added. A new optimized sample
configuration file has been added.
3.2.12 Version 2.3.0
Version 2.3.0 adds two parameters to control the gain limiting
procedures. These parameters control a sort of “soft clipping” of the
frequency response, avoiding ringing on abrupt truncations of the
frequency response. A new parameter to select the magnitude type, either
linear or expressed in dB, of the target frequency response has been
added. The optional capability to perform microphone compensation has
also been added. The license has been switched to the GNU GPL.
3.2.13 Version 2.2.0
Version 2.2.0 added a sliding low pass procedure to the pre-echo
truncation inversion procedure. This pre-echo truncation procedure is
much more similar to the pre-echo sensitivity of our hear and so
slightly better results are achieved. Furthermore the sliding low pass
prefiltering procedure has been completely rewritten to provide better
accuracy, especially with the short window lengths needed for pre-echo
truncation.
3.2.14 Version 2.1.0
Version 2.1.0 added two new parameters that allow for the windowing of
everything coming more than few samples before the impulse center.
Usually before the main spike there's only noise and spuriae. I have
found that in certain situations this small noise may lead to audible
errors in the correction, so windowing it out in order to clean the
impulse response is a good practice.
3.2.15 Version 2.0.0
Version 2.0.0 added many new features that provides much better control
on pre-echo artifacts problems. The most important change is the new
pre-echo truncation inversion procedure. Loosely derived from Kirkeby
fast deconvolution this new procedure truncates any pre-echo on the
excess phase part inversion. This leads to something like minimum phase
inversion on frequency ranges where a complete inversion would lead to
pre-echo artifacts. This critical frequency ranges are usually no more
than 5 or 6 and no wider than about 1/12 of octave for a typical room
impulse response. Reducing the correction to minimum phase on so narrow
bands has little or no subjective effect on the correction quality and
allows for the correction of much longer windows, with much better
overall results.
Avoiding pre-echo artifacts also provides the ability to create low
input-output delay filters. The resulting delay is often low enough (few
ms) to allow the use of these filters in home theater applications. For
situations where even few ms aren't adequate there's now also an option
to generate zero delay minimum phase filters. Minimum phase filters
provides correction of the amplitude response and just the minimum phase
part of the phase response.
In order to avoid pre-echo artifacts there are also many other aspects
that should be taken into account. For a better explanation of the whole
procedure and the selection method for the DRC parameters needed to
achieve this result look at the section 4.5.1.
Version 2.0.0 adds also many other improvements, including the single
side version of the prefiltering procedures and fixing for many minor
bugs that where still laying around.
A test convolution stage is now also available. This convolves the input
impulse response with the generated filter to get the impulse response
after correction. The impulse response obtained by this method is
usually really reliable. As long as the measurement microphone isn't
moved I have been able to verify the computed impulse response with less
than 0.5 dB errors, which is impressive considering the cheap
measurement set I use. In my situation may be that the computed
corrected impulse response is even more accurate than the measured one,
because of noise problems being doubled by my cheap measurement set in
the second measure.
3.2.16 Version 1.3.0
Version 1.3.0 provided some new features with respect to version
1.2.1:
- More flexible prefiltering curve parameters
- New time varying sliding lowpass prefiltering stage
- New minimum phase or homomorphic renormalization of the
prefiltered excess phase component
- Homomorphic deconvolution based on the Hilbert transform instead
of the cepstrum method
- Slightly improved documentation
- Many minor bugs fixed
4 Program description and operation
DRC is a program used to generate correction filters for acoustic
compensation of HiFi and audio systems in general, including listening
room compensation. DRC generates just the FIR correction filters, which
must be used with a real time or offline convolver to provide real time
or offline correction. DRC doesn't provide convolution features, and
provides only some simplified, although really accurate, measuring
tools. So in order to use DRC you need:
- At least 1 second of the impulse response of your room and audio
system at the listening position, separated for each channel, which means
usually just left and right for a basic HiFi system. The impulse
response should be provided in raw format (flat file with just samples,
no headers or additional information whatsoever), either in signed 16
bit format or using 32/64 bit IEEE floating point samples. From version
2.4.1 DRC includes some command line tools to do accurate time aligned
impulse response measurement, see section
4.3 for further details. Many other
systems, either commercial or free, are available on the Internet to
carry out this task. Take a look at:
Many information and free programs useful for measuring and handling
impulse responses are available at the NoiseVault web site:
http://www.noisevault.com/
Of course a good instrumentation microphone and preamplifier are needed
to get accurate measurements of your listening room response. The ETF
web sites has a link to a cheap but still quite good instrumentation
microphone, which comes with an individual calibration file:
http://www.ibf-akustik.de/
It is built around the Panasonic electrect capsules (WM-60A and WM-61A)
which can be used also to build a DIY microphone. Of course you won't
get the same quality of a professional instrumentation microphone, but
it is enough to get good results. Another good and inexpensive solution
is the Behringer ECM8000 measurement microphone (see
http://www.behringer.com for details).
- A real time convolver able to deal with FIR filters with at least
4000 and up to more than 32000 taps, like BruteFIR, Foobar2000, the
ActiveX Convolver Plugin and others. References for these programs can
be found at the following links:
Furthermore a ready to use Linux distribution suited for audio
application is available at Planet CCRMA:
http://ccrma-www.stanford.edu/planetccrma/software/
This distribution already contains most of what is needed to create a
real time convolution engine suited for digital room correction.
On the DRC Wiki pages created by Ed Wildgoose there's a document,
created by Uli Brueggemann, on how to create a Linux mini distribution
suited to run BruteFIR out of a USB memory stick. Take a look at:
http://www.duffroomcorrection.com/
and search for “BruteFIR on a USB memory stick”. Further informations about
this option are available on the Acourate web site:
http://www.acourate.com/
- Hardware needed to run all the programs. I'm actually using a
“Shoe Box” PC manufactured by ITOX (http://www.itox.com) along
with a TerraTec EWX 24/96 sound card. This “Shoe Box” PC is running
Linux (RedHat 7.3), ALSA (see http://www.alsa-project.org) and
BruteFIR to provide real time correction from the optical S/PDIF output
of a consumer CD player. The PC configuration is really simple (Intel
Celeron CPU running at 800 MHz, 64 Mb of RAM, old 1.6 Gb Hard Disk) but
it is more than adequate for real time correction of two channels running at
44.1 KHz. With this configuration BruteFIR uses just about 15% of the
CPU power.
Of course to test the program you can also apply the correction off-line
on audio files ripped from ordinary audio CDs, burning the corrected
files on some fresh CDR and listening to them using a standard CD
player. This avoids the need of any dedicated hardware and lets you test
DRC on your favourite CD player.
Since few years there are many good, silent and compact PCs designed for
multimedia usage which could be used to build a complete noiseless real time
convolver with little effort. For some examples take a look at:
http://www.cappuccinopc.com/
http://www.stealthcomputer.com/
http://www.tranquilpc.co.uk/
http://www.mini-itx.com/
http://www.hushtechnologies.com/
4.1 Filter generation procedure
The creation of a correction filter for room acoustic compensation is
quite a challenging task. A typical acoustic environment is a non
minimum phase system, so in theory it cannot be inverted to get perfect
compensation. Furthermore a typical HiFi system in a typical listening
room isn't either a single linear system, but it is instead a different
linear system for every different listening position available.
Trying to get an almost perfect compensation for a given position
usually leads to unacceptable results for positions which are even few
millimeters apart from the corrected position. The generation of a
filter that provides good compensation of magnitude and phase of the
frequency response of the direct sound, good control of the magnitude of
the frequency response of the stationary field and an acceptable
sensitivity on the listening position, requires many steps. Here it is a
brief summary of what DRC does:
- Initial windowing and normalization of the input impulse response.
- Decomposition into minimum phase and excess phase components
using homomorphic deconvolution.
- Prefiltering of the minimum phase component with frequency
dependent windowing.
- Frequency response dip limiting of the minimum phase component to
prevent numerical instabilities during the inversion step.
- Prefiltering of the excess phase component with frequency
dependent windowing.
- Normalization and convolution of the preprocessed minimum phase
and excess phase components (optional starting from version 2.0.0).
- Impulse response inversion through least square techniques or fast
deconvolution.
- Optional computation of a psychoacoustic target response based on the
spectral envelope of the corrected impulse response.
- Frequency response peak limiting to prevent speaker and
amplification overload.
- Ringing truncation with frequency dependent windowing to remove
any unwanted excessive ringing caused by the inversion stage and the
peak limiting stage.
- Postfiltering to remove uncorrectable (subsonic and ultrasonic)
bands and to provide the final target frequency response.
- Optional microphone compensation.
- Optional generation of a minimum phase version of the correction
filter.
- Final optional test convolution of the correction filter with the
input impulse response.
Almost each of these steps have configurable parameters and the optional
capability to output intermediate results.
Of course I'm not sure at all that this is the best procedure to get
optimal correction filters. There is a lot of psychoacoustic involved in
the generation of room acoustic correction filters, so probably the use
of a more psychoacoustic oriented procedure would give even better
results. Any suggestion with respect to this is appreciated.
Within my HiFi system the global frequency response with the correction
settings supplied by the optimized sample configuration file goes from
about ± 5 dB in the 20 Hz - 20 KHz range to about ± 1.0 dB
in the same range, of course with respect to the configured target
frequency response. Furthermore the main spike of the impulse response
becomes much more clean for about 1.5 ms with an almost linear phase at
least for the direct sound. There are also big improvements in waterfall
plots, going from something that is everything but similar to the
waterfall plots of a Dirac pulse to something pretty good for about 1
ms. For some example of the results achieved see appendix
A.
4.2 Frequency dependent windowing
The frequency dependent windowing is one of the most common operations
within DRC. This type of windowing follow up directly from the fact that
within a room the sensitivity of the room transfer function to the
listening position is roughly dependent on the wavelength involved. This
of course implies that the listening position sensitivity increase quite
quickly with frequency.
This dependence has the side effect that the room correction need to be
reduced as the frequency increase, or, seen from the other side, as the
wavelength decrease. For this reason DRC tries to apply a correction
that is roughly proportional to the wavelength involved. This approach
has also some psychoacoustic implications, because our auditory system
is conceived to take into account the same exact room behaviour, and so
its own behaviour follow somewhat similar rules.
Within DRC the frequency dependent windowing is implemented with two
different kind of procedures: band windowing and sliding lowpass linear
time variant filtering. The first procedure simply filters the input
signals into logarithmically spaced adjacent bands and applies different
windows to them, then summing the resulting signals together to get the
output windowed impulse response. The second procedure uses a time
varying lowpass filter, with a cut-off frequency that decreases with the
window length. The results are pretty similar, but usually the sliding
lowpass procedure is preferred because it is less prone to numerical
errors and allows for a bit more of flexibility.
Both procedures follow the same basic rules to define the type of
windowing that gets applied to the input signal. The basic parameters
are the lower window, i.e. the window applied at the lower end of the
frequency range involved, the upper window, i.e. the window applied at
the upper end of the frequency range, and the window exponent, i.e.
the exponent used to connect the lower window to the upper window with a
parametric function that goes about as the inverse of the frequency. For
a description of the parametric function used see section
6.3.8.
Figure 1:
Frequency dependent windowing for the
normal.drc sample settings on the time-frequency plane. Linear scale.
The X axis is time in milliseconds, the Y axis is frequency in Hz. The
part that gets corrected is the one below the windowing curves.
For example figure 1 show the typical set of
prefiltering curves applied to the input impulse response by the
normal.drc sample settings (see section 5.2).
For this sample settings file the default settings for the window
exponent, which is the WE parameter in the figure, is 1.0, which
corresponds to the cyan line. The part of the input impulse response
that is preserved, and so also corrected, is the one below the curves.
The remaining part of the time-frequency plane is simply windowed out.
Looking at this figure it becomes pretty clear that only a tiny fraction
of the time-frequency plane gets corrected by DRC. This tiny fraction
pretty much defines the physical limits where digital room correction is
applicable. Above this limit the listening position sensitivity usually
becomes so high that even a small displacement of the head from the
optimal listening position causes unacceptable results with the
appearance of strong audible artifacts.
Figure 2:
Frequency dependent windowing for the
normal.drc sample settings on the time-frequency plane. Logarithmic
frequency scale. The X axis is time in milliseconds, the Y axis is
frequency in Hz.
By the way it should be also taken into account that our ear perceives
this time-frequency plane on a logarithmic frequency scale. Looking at
the same graph on a logarithmic frequency scale, as in figure
2, it becomes clear that from our auditory system
point of view a much bigger fraction of the time-frequency plane gets
corrected. It becomes also clear that above 1-2 KHz only the direct
sound gets corrected and that above that range room correction actually
reduces to just minimalistic speaker correction.
Figure 3:
Frequency dependent windowing for the
normal.drc sample settings on the time-frequency plane. Logarithmic time
and linear frequency scales. The X axis is time in milliseconds, the Y
axis is frequency in Hz.
Figure 4:
Frequency dependent windowing for the
normal.drc sample settings on the time-frequency plane. Logarithmic time
and frequency scales. The X axis is time in milliseconds, the Y axis is
frequency in Hz.
Figure 5:
Frequency dependent windowing for the normal.drc
sample settings on the time-frequency plane. Logarithmic time and
frequency scale with Gabor limit superimposed. The X axis is time in
milliseconds, the Y axis is frequency in Hz.
While developing DRC I've read some informal notes on the Internet
stating that on short time windows our perception of time should be
considered on a logarithmic scale too. I'm not quite convinced that this
assumption is actually true, but if such an assumption is correct our
perception of the room correction would be as in figures
3 and 4. Even if this
assumption is not true these graphs are pretty useful to make clearly
visible the part of the time-frequency plane that gets corrected by DRC
and becomes even more useful if also the Gabor limit is placed in the
graph as in figure 5.
The Gabor limit is defined by the following simple inequality:
where f is frequency and t is time, and defines the limit of
uncertainity in the time-frequency plane. This means for example that,
looking at picture 5, when the window exponent goes below
about 0.5 the frequency dependent windowing starts violating the Gabor
inequality at least in some small frequency range. Within that range the
room transfer function estimation performed by DRC becomes inaccurate
and the room correction might be affected by appreciable errors in the
evaluation of the room transfer function.
Figure 6:
Comparison between the standard
proportional windowing curve and the new one based on the bilinear
transformation. Logarithmic time and frequency scale. The X axis is time
in milliseconds, the Y axis is frequency in Hz.
Starting with version 2.6.0 a new prefiltering curve based on the
bilinear transform has been introduced. This new curve provides a better
match with the typical resolution of the ear and also with the typical
behaviour of the listening room. The new windowing curve provides the
same exact results as the previous windowing curve when the window
exponent is set to 1.0, but provides a different behaviour when the
window exponent is changed, as showed in figure 6. The
closer approximation of the ear behaviour is clearly visible in figure
9, where it is shown that using an appropriate
configuration of the windowing parameters it is possible to get a close
to perfect match with the ERB psychoacoustic scale (see curves labeled
ERB and erb.drc, which overlap almost perfectly).
4.2.1 Pre-echo truncation
Starting from version 2.7.0 this step is implicitely disabled.
Considering that excess phase inversion is performed simply by time
reversal of the excess phase component, pre-echo is implicitely limited
by the frequency dependent windowing performed on the excess phase. The
description of this step and the code performing it has been retained
because it could be used for experimental reasons, especially
considering that the reequalization to flat performed in this step is
time reversed with respect to that performed in the excess phase
pre-processing. This meas that a minimum phase reequalization performed
in the pre-echo truncation is equivalent to a maximum phase
reequalization in the excess phase pre-processing, and the other way
around.
If enabled frequency dependent windowing is used also to truncate the
pre-echo caused by the inversion of the excess phase part of the impulse
response. The truncation procedure is always the same but it is
implicitely applied on the left side of the time-frequency plane instead
of the right side, because inversion of the excess phase corresponds to
a time reversal. A much shorter windowing is used because our ear is
quite sensitive to pre-echo. Windowing out part of the impulse response
of the excess phase component of the correction filter of course makes
it no longer an all-pass filter, i.e. the excess phase part no longer
has a flat magnitude response.
To compensate for this problem the excess phase component magnitude
response is equalized back to flat using a minimum phase filter before
inversion. This of course causes some post-ringing, which when inverted
might become pre-echo again. Even though this usually happens only on
few narrow bands, it might be quite audible. For this reason the excess
phase component correction is always limited to a tiny fraction of what
gets corrected for the minimum phase correction.
4.2.2 Psychoacoustic target computation
Starting from version 3.0.0 an optional stage, used to compute a target
frequency response based on the psychoacoustic perception of the
corrected frequency response, has been introduced. The target response
is based on the computation of the spectral envelope of the corrected
impulse response. This is performed before the application of the usual
target response, so that the standard target response is not compensated
back by this stage.
The spectral envelope is a concept which has been introduced in the
field of speech synthesis and analysis and is defined simply as a smooth
curve connecting or somewhat following the peaks of the magnitude
response of the signal spectrum. There are strong arguments and
experimental evidence supporting this approach and the idea that our ear
uses the spectral envelope for the recognition of sounds. The spectral
envelope, for example, allow our ear to understand speech under many
different conditions, whether it is voiced, whispered or generated by
other means. These different conditions generate completely different
spectrums but usually pretty similar spectral envelopes. The spectral
envelope also easily explains why our ear is more sensitive to peak in
the magnitude response and less sensitive to dips. A curve based on the
peaks of the magnitude response is by definition little or not affected
at all by dips in the frequency response.
In the speech recognition field many procedures have been developed to
compute the spectral envelope. Some of them are Linear Predictive Coding
(LPC), the Discrete Cepstrum, the so called “True Envelope” and
finally the Minimum Variance Distortionless Response (MVDR). Most of
these methods are optimized for speed, noise resilience and to provide
good results in the voice spectrum range sampled at low sample rates, so
they are not really suited for HiFi usage.
Within DRC a different procedure has been developed. This is a variation
of the usual fractional octave smoothing procedure, using the parametric
Hölder mean instead of the usual simple averaging. Furthermore the
smoothing has been extended to provide the Bark and ERB scales
resolution when applicable.
Figure 7:
Example of a spectral envelope. The
unsmoothed magnitude response of a typical room, corrected with a flat
target, is plotted. Superimposed there is the usual smoothing, computed
on the ERB scale, showing an essentially flat magnitude response, as
expected. The spectral envelope, computed on the ERB scale too using
standard parameters, show a rising slope which is in good agreement with
the inverse of one of the target responses suggested in the literature.
For the tuning of the parameters used in the spectral envelope
computation some typical real world room magnitude responses have been
taken. The computation parameters have been set so that the resulting
spectral envelope provides a target response as close as possible to the
inverse of the usual target responses suggested in the literature. An
example could be seen in picture 7. The same
parameters have been tested also in some not so common room to check
that they were still providing the expected results. Of course,
considering that now the basic target response is provided by the
inverse of the spectral envelope, the usual target responses are no
longer needed, apart from subsonic or ultrasonic filtering, and should
be set to flat. For this reason the flat response is now the default
target for all standard configuration files. The standard target
response stage should be used only to adjust the response to taste, but,
unlike previous versions, for a neutral reproduction a flat response
should be used.
From the subjective point of view a system equalized to the inverse of
the spectral envelope usually sound really neutral. Even though most of
the times the spectral envelope response is different among the various
channels, resulting in an obvious channel misalignement if evaluated
with the standard smoothing procedures, the imaging usually improves,
becoming more stable and focused. This appear to confirm that the
estimation performed by the spectral envelope has to be really close to
the subjective perception of the magnitude response.
4.2.3 Ringing truncation stage
Since version 2.5.0 a further frequency dependent windowing is applied
directly to the filter after impulse response inversion and peak
limiting. This is performed to remove any residual ringing caused by the
previous steps, especially the dip and peak limiting steps, even though
this implies some tradeoff on filter accuracy.
Figure 8:
Frequency dependent windowing jail for the
normal.drc sample settings on the time-frequency plane. Linear time
scale and logarithmic frequency scale. The X axis is time in
milliseconds, the Y axis is frequency in Hz.
With this further step the filter impulse response gets enclosed in a
sort of time-frequency jail defined by the pre-echo truncation settings
or the excess phase windowing settings on the left side of the
time-frequency plane and by the ringing truncation settings on the right
side (see figure 8). Considering that this time-frequency
bounds have also some psychoacoustic implications, with this
time-frequency enclosure DRC should be able to truncate automatically
any part of the correction that is probably going to cause audible
artifacts. Following this lines DRC gains at least a bit of
psychoacoustically based “self tuning” and should become more robust
and less prone to artifacts.
Figure 9:
Resolution bandwidth, as a function
of frequency, for the frequency dependent windowing and various standard
smoothing procedures, including the Bark and ERB psychoacoustic scales.
The X axis is frequency in KHz, the Y axis is frequency in Hz, both
plotted on a logarithmic scale. The windowing parameters of the
normal.drc and erb.drc sample settings files have been used to plot the
DRC resolution curves.
Applying the Gabor inequality to the window length between the two
curves of pre-echo and ringing truncation it is pretty easy to get an
equivalent frequency resolution, as a function of center frequency, of
the frequency dependent windowing procedure. This resolution could be
compared, as in figure 9, to some standard
smoothing procedures widely used within many audio applications, like
fractional octave smoothing and the classical Bark and ERB
psychoacoustic scales.
From figure 9 it is pretty clear that the
correction resolution used by DRC is well above that of any of the
standard smoothing procedures, at least with the normal.drc sample
settings file (see section 5.2). This means
that the correction should provide a perceived frequency response that
is really close to the configured target frequency response.
The “erb.drc” resolution plot show the approximation of the ERB scale
provided by the “erb.drc” sample settings file (see section
5.2. The approximation has been created
assuming:
Δ f Δ t = 2
instead of the usual Gabor inequality (see figure 5), i.e.
assuming that the frequency dependent windowing with the settings used
has a resolution that is about four times above the Gabor limit. This is
a rough estimate of the true resolution achieved by the DRC procedure in
this situation. This estimation has been derived considering the
compound effect of the various overlapped windows applied at various
stages of the filter generation procedure.
4.3 Impulse response measurement
Starting from version 2.4.1 two simple command lines tools, glsweep
(Generate Log Sweep) and lsconv (Log Sweep Convolution), are available
to perform accurate time aligned impulse response measurements. This
tools are based on the log sweep method for impulse response
measurement, which is one of the most accurate, especially for acoustic
measurements. This method is based on a special signal, which is a
logarithmic sinusoidal sweep, that need to be reproduced through the
system under test, and an inverse filter, which, when convolved with the
measured log sweep, gives back the impulse response of the system.
The steps needed to get the impulse response are the following:
- Generate the log sweep and inverse filter using glsweep,
optionally converting the sweep to a suitable format.
- Play the log sweep through your system using a soundcard while
recording the speaker output using a (hopefully good) microphone and any
recording program.
- Convolve the recorded log sweep with the inverse filter using
lsconv to get the final impulse response.
If full duplex is supported by the soundcard recording can be performed
using the same soundcard used for playing. Using two different
soundcards, or a CD Player, for reproduction and a soundcard for
recording, usually provides worse results unless they are accurately
time synchronized.
The lsconv tool allows also for the use of a secondary reference channel
to correct for the soundcard frequency response and for any time
misalignment caused by the soundcard itself, the soundcard drivers or
the play and recording programs. This soundcard compensation of course
works if the reference channel has the same behaviour of the measuring
channel. The typical use of this feature would be the use of one channel
of a stereo or multichannel soundcard to measure the system and another
one used in a loopback configuration to get the reference channel needed
to correct the soundcard itself.
With this configuration even with cheap soundcards it is pretty easy to
get a ± 0.1 dB frequency response over the audio frequency range
with near to perfect time alignment and phase response. Considering that
the log sweep method ensure by itself a strong noise rejection (90 dB of
S/N ratio is easily achievable even in not so quiet environments) and a
strong rejection to artifacts caused by the system non linear
distortions, with this method the final measurements usually have true
state of the art accuracy.
Finally an important warning: playing the log sweep signal at an
excessive level can easily damage your speakers, especially the
tweeters. So be really careful when playing such a signal through your
equipment. The “Jones Rush” guide (see section 1)
provides some useful hints to help you in the use of this kind of tools
without the risk of damaging your equipment. No responsibility is taken
for any damage to your equipment, everything is at your own risk.
4.3.1 The glsweep program
When executed without parameters the glsweep program gives the following
output:
GLSweep 1.0.2: log sweep and inverse filter generation.
Copyright (C) 2002-2005 Denis Sbragion
Compiled with single precision arithmetic.
This program may be freely redistributed under the terms of
the GNU GPL and is provided to you as is, without any warranty
of any kind. Please read the file "COPYING" for details.
Usage: glsweep rate amplitude hzstart hzend duration silence
leadin leadout sweepfile inversefile
Parameters:
rate: reference sample rate
amplitude: sweep amplitude
hzstart: sweep start frequency
hzend: sweep end frequency
duration: sweep duration in seconds
silence: leading and trailing silence duration in seconds
leadin: leading window length as a fraction of duration
leadout: trailing window length as a fraction of duration
sweepfile: sweep file name
inversefile: inverse sweep file name
Example: glsweep 44100 0.5 10 21000 45 2 0.05 0.005 sweep.pcm inverse.pcm
with some brief explanation of the generation parameters and some sample
options.
The longer the log sweep used the stronger the noise rejection of the
measure. A 45 seconds log sweep usually gives more than 90 dB of signal
to noise ratio in the final impulse response even when used in somewhat
noisy environments, for example one where the computer used to do the
measure is in the same room of the measured system, producing all of its
fan noise. The output format is the usual raw file with 32 bit IEEE
floating point samples. If you need to convert the sweep generated using
the example above to a 16 bit mono WAV file you can use SoX with a
command line like this:
sox -t raw -r 44100 -c 1 -fl sweep.pcm -t wav -c 1 -sw sweep.wav
SoX, can be downloaded at:
http://sox.sourceforge.net/
If you want to create a stereo WAV file to get also the reference
channel you can use something like:
sox -t raw -r 44100 -c 1 -fl sweep.pcm -t wav -c 2 -sw sweep.wav
The inverse filter doesn't need to be converted to a WAV file because
lsconv is already able to read it as is. If you need to convert a
recorded sweep stored in a wav file back to a raw 32 bit floating point
format use this:
sox recorded.wav -t raw -c 1 -fl recorded.pcm
If you have a stereo WAV file with both the measurement channel and the
reference channel you can extract them into two files using:
sox recorded.wav -t raw -c 1 -fl recorded.pcm avg -l
sox recorded.wav -t raw -c 1 -fl reference.pcm avg -r
provided that the recorded channel is the left one (“avg -l”
parameter) and the reference channel is the right one (“avg -r”
parameter). You need at least SoX 12.17.7 for this to work as expected,
previous versions had some bugs with this feature.
4.3.2 The lsconv program
When executed without parameters the lsconv program gives the following
output:
LSConv 1.0.3: log sweep and inverse filter convolution.
Copyright (C) 2002-2005 Denis Sbragion
Compiled with single precision arithmetic.
This program may be freely redistributed under the terms of
the GNU GPL and is provided to you as is, without any warranty
of any kind. Please read the file "COPYING" for details.
Usage: LSConv sweepfile inversefile outfile [refsweep mingain [dlstart]]
Parameters:
sweepfile: sweep file name
inversefile: inverse sweep file name
outfile: output impulse response file
refsweep: reference channel sweep file name
mingain: min gain for reference channel inversion
dlstart: dip limiting start for reference channel inversion
Example: lsconv sweep.pcm inverse.pcm impulse.pcm refchannel.pcm 0.1 0.8
All files must be in the usual raw 32 bit floating point format. To get
the impulse response without the use of a reference channel just use
something like:
lsconv recorded.pcm inverse.pcm impulse.pcm
Where “recorded.pcm” is the recorded sweep, “inverse.pcm” is the
inverse filter generated by glsweep and “impulse.pcm” is the output
impulse response ready to be fed to DRC.
If you also want to use the reference channel use something like:
lsconv recorded.pcm inverse.pcm impulse.pcm reference.pcm 0.1
The “0.1” value is the minimum allowed gain for the reference channel
inversion. 0.1 is the same as -20 dB, i.e. no more then 20 dB of the
reference channel frequency response will be corrected. This is needed
also to prevent numerical instabilities caused by the strong cut off
provided by the soundcard DAC and ADC brick wall filters.
When used with the reference channel the main spike of the impulse
response is always at exactly the same length of the log sweep used,
provided that the two soundcard channels are perfectly synchronized. Of
course this is usually true for all soundcards.
For example if a 10 seconds sweep is used the main spike will be exactly
at 10 seconds from the beginning of the output impulse response, i.e. at
sample 441000 if a 44.1 KHz sample rate is used.
If the main spike is at a different position it means that there's some
delay in the measurement channel, usually caused by the time the sound
takes to travel from the speaker to the microphone. If this delay is
different for different channels it means that there's a time
misalignment between channels that needs to be corrected. Up to a
limited amount, and using some small tricks, DRC is already able to
compensate for interchannel delays (see section
4.5.5). Some future DRC release will include
better support for interchannel time alignment.
4.3.3 Sample automated script file
Under the “source/contrib/Measure” directory of DRC there's a sample
Linux shell script, called “measure”, that uses glsweep, lsconv, SoX
and standard ALSA play and recording tools to automate the time aligned
measurement procedure using a reference channel. This sample script can
be used only under Linux and it is just a quick hack to allow expert
users to automate the whole procedure. Use it at your own risk.
Furthermore this script need a recent version of SoX (12.17.7 or newer)
and all related tools directly executable from the work directory, else
it doesn't work. When executed without parameters the script gives the
following output:
Automatic measuring script.
Copyright (C) 2002-2005 Denis Sbragion
This program may be freely redistributed under the terms of
the GNU GPL and is provided to you as is, without any warranty
of any kind. Please read the file COPYING for details.
Usage:
measure bits rate startf endf lslen lssil indev outdev impfile [sweepfile]
bits: measuring bits (16 or 24)
rate: sample rate
startf: sweep start frequency in Hz
endf: sweep end frequency in Hz
lslen: log sweep length in seconds
lssil: log sweep silence length in seconds
indev: ALSA input device
outdev: ALSA output device
impfile: impulse response output file
sweepfile: optional wav file name for the recorded sweep
example: measure 16 44100 5 21000 45 2 plughw plughw impulse.pcm
This script assumes that the measuring channel is on the left channel
and that the reference channel is the right one. To use it just take a
look at the sample command line provided above. You have to provide
proper ALSA input and output devices, but “plughw” usually works with
most soundcards.
Using 24 bits of resolution to measure an impulse response is usually
just a waste of resources. In most rooms getting a recorded sweep with
more than 60 dB of S/N ratio is close to impossible, so 16 bits of
resolution are already plain overkill. On the other hand, thanks to the
strong noise rejection provided by the log sweep method, a sweep S/N
ratio of 60 dB is already high enough to get more than 90 dB of S/N
ratio in the recovered impulse response, at least with a 45 s sweep
running at a 44.1 KHz sample rate.
The impulse response is what DRC works on, so it is the impulse response
that needs an high S/N ratio, not the sweep. If you really want a better
impulse response S/N ratio, or if you measure in a noisy environment,
increase the sweep length instead of using 24 bits of resolution. A
longer sweep will improve the S/N ratio of the impulse response,
increasing the resolution instead will provide no benefit at all.
Chris Birkinshaw created a modified version of the measure script which
adds Jack support. The script is named “measurejack” and you can find
it under the “source/contrib/MeasureJack” directory of the standard
distribution. For informations about Jack take a look at:
http://jackit.sourceforge.net/
Finally Ed Wildgoose created a simple program with about the same
functionality of the measure script. It works also under Windows and
being written in C instead of being a simple shell script it is less
dependent on other tools and usually provides a more reliable
functionality. You can download it from:
4.3.4 Beware cheap, resampling, soundcards
Most cheap game oriented soundcards often include a sample rate
converter in their design, so that input streams running at different
sample rates can be played together by resampling them at the maximum
sample rate supported by the soundcard DAC. Usually this is 48 KHz as
defined by the AC97 standard. These sample rate converters often are of
abysmal quality, causing all sort of aliasing artifacts.
Most deconvolution based impulse response measurement methods, including
the log sweep method, are quite robust and noise insensitive, but cause
all sorts of artifacts when non harmonic but still signal related
distortion is introduced, even at quite low levels. The aliasing
artifacts introduced by low quality sample rate converters are exactly
of this kind and are one of the most common cause of poor quality
impulse response measurements and consequently of correction artifacts.
4.3.5 How to work around your cheap, resampling, soundcard
Despite this, most of the times good measurements are possible even out
of cheap soundcards if the maximum sample rate supported by the DAC is
used, usually 48 KHz, so that the soundcard internal sample rate
converter isn't used at all. You can change the impulse response sample
rate after the measurement using high quality software sample rate
conversion algorithms (see section 4.4), so
preserving the impulse response quality.
To check the quality of the impulse response measurement perform a
loopback measurement without using a reference channel, else any
measurement problem will be washed out by the reference channel
compensation. The impulse response you get must be a single clean spike
much similar to that of a CD Player (see for example the upper graph of
picture 96, labeled “Dirac
delta”). A bit of ringing before and/or after the main spike is normal,
but anything else is just an artifact. Only after you are sure that the
measurement chain is working as expected, open the loopback and do the
real measurement, eventually adding also a reference channel to
compensate for any remaining soundcard anomaly.
4.4 Sample rate conversion
If you have the impulse response sampled at a different rate than the
one needed for the final filter, you need to convert the sample rate
before creating or applying the filters. For example you might have a 48
KHz impulse response but you may need to filter standard CD output at
44.1 KHz. In this situation you can either convert the impulse response
to 44.1 KHz before feeding it to DRC or you can convert the resulting
filters to 44.1 KHz after DRC has created them. I generally prefer the
first procedure, which leads to exact filter lengths in the DRC final
windowing stage, but in both cases you need a good quality sample rate
converter, which uses, for example, band limited interpolation. A reasonable
choice, free both under Linux and Win32, is SoX, which may be downloaded
at:
http://sox.sourceforge.net/
SoX also provides a lot of other features for sound files manipulation.
For a reference on band limited interpolation take a look at:
http://ccrma-www.stanford.edu/~jos/resample/
Another free good sample rate converter comes from the shibatch audio
tools suite. This sample rate converter provides a quality which is
adequate for the task of converting the impulse response file before
feeding it to DRC. You can find the shibatch audio tools at it at:
http://shibatch.sourceforge.net/
4.5 Correction tuning
Proper tuning of the correction filter generation procedure easily
provides a substantial improvement over the standard sample
configuration files provided along with DRC (see section
5.2). To properly tune the filters to closely
match your room behaviour there are many different issues that should be
taken into account.
4.5.1 Preventing pre-echo artifacts
One of the main problems in digital room correction are pre-echo
artifacts that arise when compensation accuracy is pushed above a
certain threshold. This pre-echo artifacts usually occur on narrow bands
and are easily audible as a sort of ringing or garble before transients
or sharp attacks. In order to avoid them there are basicly two options:
- Reduce the correction on critical frequency regions where pre-echo
artifacts may arise.
- Use a minimum phase approach to avoid pre-echoes. This way you get
increased ringing after the main spike instead of pre-echo, but this is
usually masked both by our ear temporal masking and by the reverberant
nature of common listening rooms, so it is much less audible, if audible
at all.
DRC uses both options in different steps of the correction procedure. So
in order to avoid pre-echo artifacts you basically have to:
-
Reduce the amount of correction applied to the excess phase
component by reducing the size of the frequency dependent window
applied. With the standard configuration files this is usually
everything that need to be done, because all the other procedures are
already configured to avoid pre-echo problems.
- Use a long enough FFT where circular convolution is involved
(basicly homomorphic deconvolution and pre-echo truncation inversion),
because circular artifacts may easily become pre-echo.
- Use the single side sliding lowpass prefiltering procedure; this
is just because of small numerical errors in band windowing that causes
small amounts of pre-echo on band edges.
- Use the minimum phase versions for some of the accompanying
procedures (peak and dip limiting for example)
- Use the pre-echo truncation fast deconvolution for the inversion
procedure, with appropriate pre-echo truncation parameters. By the way,
if the excess phase component windowing parameters are already set
appropriately, this should not be needed.
The sample configuration files supplied are a good example of all these
options combined together. In normal situations you can use them as they
are changing only EPLowerWindow, EPWindowExponent, MPLowerWindow and
MPWindowExponent to fit your needs.
4.5.2 Preventing clipping
One of the problems of real time correction is the prevention of DAC
clipping caused by the filter intrinsic amplification. First of all the
normalization factor to be used (see sections 6.11.9 and
6.12.9) depends on the convolver used. Some convolvers want
the filter normalized to 16 bit range, i.e. 32768, others want a
standard normalization, i.e. normalization to ± 1.0. For example
BruteFIR needs a filter normalized to 1.0 to get 0 dB amplification
between input and output.
All the normalization steps used within DRC, included those needed to
output the final filter (see sections 6.11.10 and
6.12.10), accept three types of normalization:
-
S, i.e. sum normalization, also called L1 norm
- E, i.e. euclidean normalization, also called L2 norm
- M, i.e. Max normalization, also called L∞ norm
For a detailed description of the three types of normalization see
section 6.1.11.
The S normalization guarantees against overflows in the output stream,
i.e. it guarantees that if any input sample is never greater than X than
any output sample is never greater than X multiplied by the
normalization factor. This means also that if the normalization factor
is 1 and the input sample is never greater than 32767 (i.e. the input is
a 16 bit stream) the output is never greater than 32767, i.e. a 16 bit
DAC on output will never clip or overflow.
Anyway, using common musical signals, and depending on the filter
frequency response, the use of the S normalization might lead to filters
with a global gain substantially lower than 1 (0 dB), i.e. filters with
a typical output level which is lower, sometimes much lower, than the
input level. With such low levels part of the resolution of the used DAC
gets lost. With normal musical signals it is usually safe to use a
filter with an S normalization factor greater than 1, because,
considering the typical frequency response of a room, and the
corresponding reversed frequency response of the filter, overflows would
occur in frequency ranges where typically there is not enough musical
signal to cause it.
If you use BruteFIR it is advisable to use 1 for the PLNormFactor and S
for PLNormType and then use the rescaling and monitoring features of
BruteFIR to boost the gain up to few dB below overflow with typical
musical signal. Try using a 0 dB white noise source as a sort of worst
case situation. Furthermore, be careful setting the basic filter gain: I
found that many recent musical recordings, especially compressed and
rescaled pop music productions, cause output levels that are just 1 or 2
dB below the white noise worst case scenario. The degradation in the
sound quality caused by DAC clipping is typically much more audible than
the degradation you get loosing a single bit or less of your DAC
resolution, especially if you use 16 bit DACs with dithering or 24 bits
DACs.
If you're unable to perform tests using 0 dB white noise a simple rule
of thumb is to use the E normalization with a normalization factor which
is a couple of dB lower than the maximum gain allowed during peak
limiting. With the standard configuration files, where the maximum
allowed gain is never greater than about 6 dB, this means using a
normalization factor around 0.3 − 0.4 with convolvers which use 1.0 as
the 0 dB reference level like BruteFIR, or using something like 10000 -
13000 with convolvers which use 32768 as the 0 dB reference level.
4.5.3 Some notes about loudspeaker placement
As most audiophiles already know, in a basic stereo loudspeaker
configuration it is important that the distance between the loudspeakers
and the listening position is exactly the same for both loudspeakers,
and also not too much different from the distance between the two
loudspeakers themselves (the classical equilateral triangle placement).
If this rule isn't satisfied usually the stereo image become distorted
and confused. With digital room correction enabled this rule becomes of
paramount importance.
DRC doesn't automatically compensate for delays caused by loudspeaker
misplacement and having the two channel with near to perfect direct
sound, both in phase and magnitude, makes any difference in the arrival
time immediately and clearly audible, with a nasty phasey sound and a
blurred stereo image. Less than 10 cm are enough to cause clearly
audible problems, so take your time to measure the distance from both
loudspeakers and the listening position before doing any measure, and
also do your measures by placing the microphone exactly at the listening
position.
Furthermore, with digital room correction it is worth to experiment with
unusual speaker placements. Reflections from nearby walls are more
difficult to correct when they are away from the main spike, so placing
the speakers near to the walls, or may be even in the corners, might
sometime give better results with DRC, provided that you place some
absorbing material near the speakers to remove early reflections in the
high frequency range, where DRC is able to correct only the direct
sound.
This type of placement is exactly the opposite of what is usually done
if you don't use digital room correction, where it is usually better to
try to put loudspeakers away from the walls to avoid early reflections,
that cause major problems to the sound reproduction and almost always
boomy bass. Anyway, remember that there is no ready to use recipe to
find the best speaker placement, even with DRC in use, so a bit of
experimenting is always needed.
4.5.4 Some notes about channel balance
DRC doesn't compensate for channel level imbalance, so this should be
done manually after correction changing a little the filters level until
a perfect balance is achieved. This is of course better achieved using
an SPL meter with pink noise and proper weighting. Anyway after
correction the two channels start having a frequency response that is
pretty much the same, so achieving perfect balance becomes pretty easy
even by ear. Just use a mono male, or, better, female voice, and adjust
the filters level until the voice comes exactly from the center of both
loudspeakers.
To achieve a perfect balance you can also use the level hints provided
by DRC at the beginning and at the end of the correction procedure,
provided that the measured impulse responses have levels that are
directly related to the original levels of the channels, i.e. these
levels haven't been changed by the measuring procedure itself.
4.5.5 Interchannel time alignment
First of all the current DRC release is able to compensate for
interchannel misalignment of only few samples, no more than ± 8
with the default configuration files. Furthermore accurate time aligned
measurements must be supplied, using either the glsweep and lsconv tools
with a reference channel or some other tool providing the same degree of
accuracy.
To get this limited time alignment you have to execute the following
steps:
- Execute DRC on one channel as usual. At the beginning of the DRC
output on screen you will see a line like this one:
Impulse center found at sample 1367280.
Take note of the impulse center value.
- After DRC has finished prepare the configuration files for the
other channels as usual but change the BCImpulseCenterMode parameter to
M and the BCImpulseCenter parameter from 0 to the value of the impulse
center noted before for the first channel. This way DRC will use the
value of the impulse center of the first channel as a reference for the
other channels and will compensate for any misalignment with respect to
the first channel. If channels are misaligned more than few samples this
will cause errors in the correction filters, usually causing a rising
frequency response, and so a bright sound.
4.5.6 How to tune the filters for your audio system
A proper tuning of the filters for your audio system and your listening
room easily provides a substantial improvement over the standard
configuration files. The best way to do this is to use the correction
simulation provided by DRC and to check the results using the Octave
scripts supplied with the documentation (see section
A), but if you have little experience with
measurements interpretation you can also try to tune the correction by
simple listening to the results, even though it isn't an easy task.
One of the most common mistakes performed in the tuning procedure is the
use of an excessive correction, which initially gives the impression of
a good result, but cause also the appearance of subtle correction
artifacts that becomes audible only with some specific musical tracks.
These artifacts often have a peculiar resonant behaviour so they become
audible only when they get excited by specific signals. To learn how to
recognize them try using the ”insane.drc” sample configuration file,
which applies an overly excessive amount of correction, causing clearly
audible artifacts on all but the most damped rooms.
The best procedure to use is to start from the minimal amount of
correction, like that provided by the minimal.drc or erb.drc correction
settings. If your impulse response measurements are of good quality
these minimalistic correction settings should already provide a
substantial improvement over the uncorrected system, without any
perceivable artifact. If this doesn't happen it's better to first double
check the measurements performed before fiddling with the correction
parameters. Remember that measurements problems are the most common
cause of unsatisfactory correction results.
After this first test you can slowly switch to stronger correction
settings using the soft.drc, normal.drc, strong.drc and extreme.drc
settings, always listening to the results after each step, if possible
using quick switching between the filters. When correction artifacts
start to arise, which usually happens between the normal.drc settings
and the extreme.drc settings, it's time to stop and to start playing
with some specific correction parameters.
The first parameters to modify are those that define the windowing
correction curve applied to the signal, i.e MPWindowExponent,
EPWindowExponent, ISPEWindowExponent and RTWindowExponent, slowly
reducing them to 0.95, 0.9, 0.85 and so on, down to about 0.7, thus
reducing the correction in the critical mid and mid-bass range. These
are really sensitive parameters, so changing them by as little as 0.01
easily cause an audible difference, especially when you are close to the
boundary where correction artifacts start to appear. When the artifacts
disappear you can start increasing the windows applied to the bass
range, slowly increasing, by about a 5% at a time, the MPLowerWindow,
EPLowerWindow, ISPELowerWindow and RTLowerWindow parameters, until
artifacts start to appear again. After that you can decrease again the
window exponent parameters until artifacts disappear again, and so on.
This procedure may be repeated until there's no further improvement or
the parameters reach an excessive value, i.e below about 0.6 for the
window exponents, above 1 second for the minimum phase and ringing
truncation windowing parameters (MPLowerWindow, RTLowerWindow) and above
100 ms for the excess phase windowing parameter (EPLowerWindow).
Remember also to set the pre-echo truncation parameter (ISPELowerWindow,
ISPEUpperWindow) according to the excess phase windowing parameters (see
sections 6.7.4 and 6.7.5).
Of course the tuning procedure has to be carefully adapted to your
specific room, so, after a good tuning has been reached following the
basic procedure, you can further try playing a little with the available
parameters, applying even different values to each of them, proceeding
one at a time to avoid confusion. By the way, be careful, because after
the initial tuning the differences between the filters will start to be
quite subtle, most of the times will be barely audible, and quick
switching between the filters, possibly even under blind conditions,
will become almost mandatory to really understand what's happening and
which filter is better or at least audibly different.
5 Program compilation and execution
DRC can be compiled either under Win32 or Linux, but because of its
simplicity it will probably work under most operating system with a
decent C++ compiler with support for the standard template library
(STL). The Win32 executable (drc.exe) is provided precompiled with the
standard DRC distribution under the sample directory, where there are
also the executables for the impulse response measuring tools
(glsweep.exe and lsconv.exe).
A Makefile is provided for Linux and other Unixes, but it has been
tested only under Fedora Core 7. To build the program under Linux
usually what you have to do is just type “make” in the source
directory of DRC, where the makefile resides. A Code::Blocks workspace
is also available for use both under Win32 and Linux. Code::Blocks can
be downloaded at:
http://www.codeblocks.org/
The file drc.h contains a configurable define (UseDouble) which can be
set to use double or float as the data type used for all internal
computations. Despite some microscopic differences in the final output,
I have never found any real advantage using doubles as the internal
basic type.
During the testing for the 2.5.0 release I have performed some tests to
check the signal to noise ratio of the output filters. Despite the
amount of processing performed and the fact that little effort has been
placed into keeping the maximum accuracy throughout the processing, even
using single precision arithmetic the signal to noise ratio of the final
filter resulted to be greater than 145 dB in the worst case. This is
more than 20 dB better than the signal to noise ratio provided by the
best DACs available in the world.
Considering this results the supplied Win32 executable is compiled for
single precision. If you want to switch to the double precision you have
to recompile it yourself. Of course using the double data type makes DRC
a bit slower and, most important, much more memory intensive.
Starting from version 2.4.0 DRC is able to use the Takuya Ooura and GNU
Scientific Library FFT routines, which are included in the distributed
package. The inclusion of these routines is controlled by the UseGSLFft
and UseOouraFft defines in drc.h. These routines are about 10 times
faster than the standard routines used by DRC, but to use them with the
STL complex data type a clumsy hack has been used, and it is not
guaranteed that this hack will work with all the STL implementations
available. If it causes any problem simply comment out the UseGSLFft and
UseOouraFft defines in drc.h and recompile the program. This will force
DRC to use the older, slower but STL compliant, FFT routines.
The accuracy of the different FFT routines is pretty much the same. The
Ooura routines work only on powers of two lengths, so they are used only
on power of two lengths computations. Ooura FFT routines are somewhat
faster than the GSL routines but are also a little bit less accurate.
The default configuration uses only the GSL FFT routines, providing the
best compromise between speed and accuracy. The Ooura FFT routines
become useful when DRC is compiled for double precision arithmetic.
Most text files supplied with the standard distribution use Unix line
termination (LF instead of CR/LF). Be aware of this when opening files
under Win32 systems. WordPad is able to open LF terminated text files,
NotePad isn't.
Finally an important note, especially for Win32 users. DRC is a console
program, it has no graphical interface. All program execution parameters
must reside in a plain ASCII text file which is supplied as an argument
on the program command line. In order to execute the program you have to
open a command prompt (or DOS Prompt or whatever is named a console in
your version of Windows) and type something like:
drc test.drc
followed by a carriage return (enter or return key). Test.drc should be
the text file already prepared with all the parameters needed to run
DRC.
Under Linux of course you have to use a console program (the Linux
console, a terminal emulator like XTerm or something like this if you are
using XWindows). The DRC executable must be in the system path or in the
directory where you execute.
5.1 Command line parameters replacing
Starting from version 2.6.2 all the parameters available in the
configuration file may be replaced by an equivalent parameter on the
command line. For example if you want just to change the input and
output filter files of the normal.drc sample configuration file you
may use a command like this:
drc --BCInFile=myfile.pcm --PSOutFile=myfilter.pcm normal.drc
The parameter parsing procedure support also quoting of filenames with
spaces and setting of strings to empty values, which is the same as
commenting a parameter in the configuration file. For example to use
some filename with spaces, to disable the output of the test convolution
file, to change the maximum allowed gain, all in a custom configuration
file with spaces in its name, you could use a command line like this:
drc --BCInFile="my file.pcm" --PSOutFile="my filter.pcm"
--TCOutFile="" --PLMaxGain=3.5 "my custom config.drc"
Along with all the default configuration parameters there is also a
special “–help” parameter that show the full list of all the
available parameters with the associated parameter type. The list of the
parameters is really long, so some sort of pager is needed to see it all.
5.2 Sample configuration files
DRC has started as an experimental program and because of this it has a
lot of tunable parameters, actually more than 150. Only few of them are
really important for the final filter correction quality. Most of them
are used to take a look at intermediate results and check that
everything is working as expected. DRC flexibility might of course be
used also to deal with complex or unusual situations or to experiment
with weird configurations.
Along with the DRC distribution six main sample configuration files are
provided: minimal-XX.X.drc, soft-XX.X.drc, normal-XX.X.drc,
strong-XX.X.drc, extreme-XX.X.drc, insane-XX.X.drc. The ”XX.X' in the
file name stands for the sample rate in KHz which the file is configured
for. For example ”normal-44.1.drc” is the normal configuration file
for the 44.1 KHz sample rate. Considering that the only difference
between the files is the base sample rate they are configured for, all files
are named omitting the sample rate part in the rest of this document.
The sample configuration files provide most parameters set to reasonable
defaults, with stronger correction, but also worse listening position
sensitivity, going from the minimal.drc settings to the extreme.drc
settings. In the same directory of the configuration file there is also
a sample impulse response (rs.pcm, this is the impulse response of the
right channel of my cheap HiFi system, in 32 bit IEEE raw format) usable
with the sample configuration files to see just what happens when DRC is
run.
The insane correction settings are not meant for normal use but are used
just to provide an example of excessive correction that is going for
sure to cause audible correction artifacts. Using these settings file you
can easily check how correction artifacts actually sound like, thus
learning to identify them within normal filters while you are tuning
them for your audio system (see section 4.5.6).
Starting form version 2.7.0 all sample configuration files are available
for 44.1 KHz, 48 KHz, 88.2 KHz, 96 KHz sample rates. By the way you
should be careful with sample rates above 44.1 KHz because most of this
sample files have been derived from the 44.1 KHz version without testing
them.
The sample configuration files for the higher sample rates aren't in the
sample directory. To avoid placing a lot of similar files in the same
directory the files for the higher sample rates are placed in the
”src/config” directory.
Remember also that all the sample correction files output the correction
filter (rps.pcm) in 32 bit floating point format normalized to 1.0,
which is the format suited for use with BruteFIR. Most sound editors
expect 16 bits integer files normalized to 32768, so the file above
might look either empty or completely clipped when opened with a sound
editor without using the appropriate options.
5.2.1 Target magnitude response
Starting from version 3.0.0 the basic target magnitude response is
automatically generated by DRC using a specific procedure based on some
documented psychoacoustic assumptions. See section
4.2.2 for the details. Because of this
there should be no need to define a specific target curve and a flat
target, with just some limiting for subsonic and ultrasonic frequencies,
should be used instead. This is accomplished by using the pa-XX.X.txt
target, which is now the default for all standard configuration files
and is just a variation of the previous flat target adjusted to better
work with the “B Spline” target transfer function interpolation
procedure. The old target responses are retained for those situations
where the old approach may be preferable. Of course the postfiltering
stage might be used for adjusting the magnitude response to taste, but
for a neutral reproduction the target magnitude response should be left
to the flat one.
Figure 10:
Comparison of the main target
functions provided along with DRC.
The most important postfiltering target magnitude response files
supplied in the standard distribution (see section 6.11.7
“PSPointsFile” for details) are subultra-XX.X.txt, bk-XX.X.txt,
bk-2-XX.X.txt, bk-3-XX.X.txt (see figure 10).
Here again ”XX.X” stands for the sample rate used and is omitted in
the rest of this document. The target response files for the higher
sample rates are available in the ”src/target” directory.
The first target response file provides just simple removal of
overcompensation on the extremes of the frequency range and has a linear
target frequency response, so it hasn't been plotted in figure
10. The bk.txt file follows the Bruel & Kjaer
(i.e. Moeller) recommendations for listening room frequency
response, i.e. linear from 20 Hz to 400 Hz, followed by slow decrease
by 1 dB per octave up to 20 KHz. The bk-2.txt file is similar to bk.txt
but it is linear up to 200 Hz and then provides a slow tilt of 0.5 dB
per octave up to 20 KHz. The bk-3.txt file is somewhat between bk-2.txt
and bk.txt, with a 0.5 dB per octave tilt above 100 Hz. The versions
with the “sub” suffix are the same target functions with the addition
of a steep subsonic filter. The versions with the “spline” suffix are
again the same target transfer functions but with a set of control
points suitable for the “B Spline” target transfer function
interpolation. Figure 10 also show an example
of PCHIP interpolation of the "bk-3-sub" target function. In the sample
directory there are also some other simple postfiltering files.
Like the sample configuration files starting from version 2.7.0 the
sample target responses are available for 44.1 KHz, 48 KHz, 88.2 KHz and
96 KHz sample rates and they must be used with the corresponding set of
configuration files. By the way the target response files for higher
sample rates are simply extended versions of the 44.1 KHz target
responses created by simply moving the last frequency point up to the
Nyquist frequency. This means that for most configuration files there is
either a gentle roll-off at higher frequency or a supersonic brickwall
filter applied above 20 KHz. If you want to properly correct content
above 20 KHz, provided that you have a microphone capable of recording
ultrasonic frequencies, you have to adapt the supplied files to your
needs.
In the same sample directory you can find another DRC sample settings
file called optimized.drc. This is the configuration file I'm actually
using with my own HiFi system. The optimized.drc configuration file is
somewhat a balanced mix between the strong and extreme settings files,
with even stronger correction in the bass range below 200 Hz, and has
been strongly optimized for my HiFi system, so it is of much less
general use than the other configuration files above. Anyway, it may be
worth a try.
The first important difference when compared to the other sample
configuration files is the use of a modified bk-3.txt target frequency
response called bk-3-subultra-spline.txt. This is the same as the
bk-3.txt target frequency response but with the addition of a steep
subsonic filter below 18 Hz, with a filtering slope around 200 dB/Oct.
This avoids unwanted non linear distortion on subsonic signals, caused
by overexcursion of my vented subwoofer, which is unable to reproduce
such subsonic signals.
This target response includes also a strong brickwall filter above 20
KHz to remove any unwanted aliasing artifact caused by the now common
use of oversampling DACs with brickwall reconstruction filters with a
stopband starting above 0.5 FS. To work as expected this kind of filters
rely on the presence of a perfect 20 KHz brickwall filter on the ADC
used for the recording. This is often no longer true with the diffusion
of higher sampling rate recording techniques, typically running at 96
KHz, which are often downsampled to 44.1 KHz using software sample rate
converters having often a cutoff frequency really close to 0.5 FS, i.e.
really close to 22.05 KHz instead of 20 KHz. Considering the high
frequencies involved I doubt this is going to cause any audible
difference even in the worst case scenarios, but considering that the
added brickwall filtering comes for free with the DRC filters I prefer
to include it anyway.
To avoid unwanted phase and group delay distortion around the subsonic
filter cut-off frequency, which was causing a perceived “dry” bass
effect, the target frequency response computation has been switched to
linear phase (see section 6.11.1 “PSFilterType” for
details). This avoids any phase distortion but has the unwanted side
effect of generating an output filter with an intrinsic delay of about
half the filter length, i.e. around 740 ms with the configuration used.
This makes this configuration completely useless for applications where
low latency is important, like home theater applications.
Furthermore the excess phase windowing parameters (see sections
6.5.4 “EPLowerWindow” and 6.5.8
“EPWindowExponent”) have been changed to allow for a bit more of
pre-echo in the bass range. This gives a stronger phase correction up to
the upper bass, further improving the bass and mid bass. This, of
course, also yields a worse listening position sensitivity, but being
limited to frequencies below the lower midrange doesn't seem to cause
any unwanted side effect, at least in my room.
All these changes together provided in my system a substantial
improvement with respect to the standard configuration files, with a
more controlled bass and a bit more transparent midrange. The midrange
improvement is probably more a consequence of the reduction of the non
linear distortion in the bass and midbass caused by the subsonic
filtering than the effect of the crossover group delay compensation, but
unfortunately I don't own the measurement instrumentation needed to
verify this hypothesis.
Finally another interesting sample configuration is the one provided by
the “erb.drc” file. This file provides an accurate approximation of
the ERB psychoacoustic scale (see figure 9). It
is important to notice that basicly the correction isn't much stronger
than the “minimal.drc” sample configuration, but being approximately
tuned to our ear psychoacoustic resolution it is probably going to
provide a good perceived correction accuracy with minimal listening
position sensitivity, and so it is well suited for multiple users
utilization, like home theater applications.
6 DRC Configuration file reference
The DRC configuration file is a simple ASCII file with parameters in the
form:
ParamName = value
Everything after a '#' and blank lines are considered comments and
are ignored. Each parameter has a two character prefix which defines the
step the parameter refers to. These prefixes are:
-
BC = Base Configuration
- HD = Homomorphic Deconvolution
- MP = Minimum phase Prefiltering stage
- DL = Dip Limiting stage
- EP = Excess phase Prefiltering stage
- PC = Prefiltering Completion stage
- IS = Inversion Stage
- PT = Psychoacoustic Target
- PL = Peak Limiting
- RT = Ringing Truncation stage
- PS = Postfiltering Stage
- MC = Microphone Compensation stage
- MS = Minimum phase filter generation Stage
- TC = Test Convolution stage
DRC does some checks to ensure that each parameter provided has a value
that makes sense, but it isn't bulletproof at all with respect to this.
Providing invalid or incorrect parameters may cause it to fail, or even
to crash.
Parameters which are important for the quality of the generated filters
are marked with (*). When it makes sense a reasonable value or range of
values is also provided.
Many parameters have often a value which is a power of two. This is
mainly for performance reasons. Many steps require one or more FFT
computations, which are usually much faster with arrays whose length is
a power of two. The default values supplied are defined for a 44.1 KHz
sample rate. If a different sample rate is used the supplied values
should be scaled accordingly.
Now let's take a look to each parameter.
6.1 BC - Base Configuration
This parameter define the base directory that is prepended to all file
parameters, like for example BCInFile, HDMPOutFile or PSPointsFile. This
parameter allow the implicit definition of a library directory where all
DRC support file might be placed.
File parameters supplied on the command line are not affected by this
parameter unless the BCBaseDir parameter is also supplied on the command
line. File parameters supplied in the configuration file are instead
always affected by the BCBaseDir parameter, no matter if it has been
supplied in the configuration file or on the command line.
Just the name of the input file with the input room impulse response.
6.1.3 BCInFileType
The type of the input file. D = Double, F = Float, I = Integer.
6.1.4 BCSampleRate
The sample rate of the input file. Usually 44100 or 48000.