Home > Tom's Blog > Listener Response – On MP3s

Listener Response – On MP3s

February 9th, 2010

I received this response from one of our listeners to our most recent podcast on MP3s. Thank you Harold from Austria for your insightful email.

As a loyal listener as well as professional audio engineer I feel the need to respond to your podcast about MP3 in general and the idea of “hearing the difference” between uncompressed and MP3 audio.

Short story: Although I have no doubt that the fellow studio guy did a great job of producing theses files, it does prove NOTHING but instead your conclusion of this is what mp3 is missing is highly misleading!

Slightly longer story:
In all lossy compression algorithms the question arises: What is left out? It is NOT frequency range and it is also almost never dynamic range if one goes beyond 128kbs at MP3 for example. So what is left out? In essence: What you cannot hear can be left out! This gets you the compression ratios of 1:7 instead of the 1:2 for purely mathematically (=lossless) compression.

Much as you don’t hear the TV news when your kids run around the house screaming, louder parts of the music and percussive parts (drums, cymbals) do MASK other parts of the signal. As soon as the kids run into the other room, the TV host can be heard and understood again.

So to stay in the above example: The news guy is still in the recorded signal but because of the kids screaming, you cannot hear him. MP3 detects such masked signals and more or less omits it! Believe it or not but during a drum solo you are almost deaf as your hearing is VERY limited about 20ms after every cymbal crash.

http://en.wikipedia.org/wiki/Temporal_masking

What sounds very dramatic in theory is in fact pretty unnoticeable.
If you are really curious: http://en.wikipedia.org/wiki/Auditory_masking and http://en.wikipedia.org/wiki/Cocktail_party_effect

Of course I simplified the above, because the lower the bitrates get, the more the dynamic as well as frequency range IS compromised as well as more and more artifacts come in. These are signals, which were not originally there but came in by the process itself. Go ask your mobile phone for a demo!

Since MP3 takes the hearing of the average human into account, there are people who detect compression better than others.

Sad thing is, that youngsters tend to prefer nowadays MP3s over real music. It is assumed that it is “easier” to listen to and our hearing gets “lazy” because we are only presented with the most needed parts of the music.

Conclusion: The provided “differential” audio file indeed lets you listen to the difference between MP3 and uncompressed BUT it does NOT show you what you “missed” hearing. You would have not heard it anyhow!!

Just think about the CSI guys recording in your house and afterwards trying to extract the voice of the TV host masked by all this kids screaming!

BTW: This is the very reason why all this intelligence guys do NEVER record in MP3 format no matter how big the bitrate is. The recorder would simply throw everything away, what you cannot hear now. This means the masked voice would not be on the recording and therefor NOBODY could filter it out again no matter how cool the gear is.

So, MP3 can only be the END product, but never a production format! The encoder needs to know the whole picture before it knows what to omit because of masking. As long as all this mixing and mastering is not yet finished, there is no “big picture” yet.

Hope this is of interest to you and your listeners.

Again, thank you Harold. You make some very valid points.

Liked it? Take a second to support AV Rant on Patreon!
Become a patron at Patreon!

Categories: Tom's Blog Tags:
  1. jnmfox
    February 9th, 2010 at 19:02 | #1

    Interesting. I found the podcast and this subsequent article very informative. MP3 compression is something I’ve always wondered but never took the time to look into. Thanks to all!

  2. Downtowner
    February 9th, 2010 at 20:25 | #2

    I’m not fully buying the “you couldn’t hear it anyway” argument because there are certainly very audible differences between the Control .wav and the various .mp3s on the sample tracks provided (I burned them to CD as suggested and listened on a full stereo). I would be interested to know how the lower-level audio can be detected, separated, and then discarded from the full audio signal by the codec. In the analog domain, an audio signal is nothing more than an alternating voltage of random frequency. At any given instant in time, it is the sum of all recorded signals occurring in that instant. If a full analog signal at some given instant has the value 10, how can the codec infer from that information that it actually contains two signals of value 2 and 8, and that the signal at level 2 can be thrown away? Even if we are starting with a digital .wav file, the only information it contains is a digital stream of data representing the analog signal level at a series of instants across time. Does the mp3 codec operate on a moving slice of signal, such that it can make mathematical predictions and inferences about the individual components that ultimately make up the whole signal? For example, if a flute is playing quietly and then a cymbal crashes loudly, does the software then say “delete flute signal from slice of audio signal containing loud cymbal”? Even if yes, I’m not sure what that gets you in terms of compression, because now you still have to store a value of, say 8 (cymbal only) instead of 10 (cymbal 8 plus flute 2).

  3. Downtowner
    February 9th, 2010 at 20:46 | #3

    Check out the last animation regarding the “beat” wave and ask yourself how you could remove one of the two original signals and still have the same summation signal. http://paws.kettering.edu/~drussell/Demos/superposition/superposition.html

  4. February 10th, 2010 at 11:29 | #4

    @Harold:

    You’re absolutely right about the significance of the masking effect, but I don’t think it makes Austin’s analysis highly misleading.

    The MP3 algorithm is meant to remove both frequencies that you probably can’t experience (e.g., < 10hz and masked frequencies) and frequencies/dynamic-resolution that you probably won't "miss very much." In other words, as you compress audio further and further, the algorithm makes choices about what gets removed, and the it removes the next "least needed."

    This effect is evident as you start decreasing bitrate.

    Confusing audio that is "least needed" with "can't be heard anyway" can be very misleading as well!

    @Downtowner:

    My understanding of the way compression removes lower frequencies is the same way my DAW equalizer (or any other equalizer) removes particular frequency bands; it works in the time domain. Over time, a signal, digital or analog, gives you enough information to determine what frequencies are represented in the signal. Obviously, a single point cannot represent a frequency, but two points or more points can (this is set by the sampling frequency of the code, which is typically 44.1khz to overcome the Nyquist effect on signals that reach up to 22khz, the limit of human hearing: http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem).

    That is why there is so much discussion, at least in the musical world, about which EQs are best; they all make decisions about how to deal with time-domain problems, like frequency boost/cut, in slightly different ways, which can produce different results (e.g., lots of phasing, so-called musical phasing, linear phasing, odd/even order harmonics, etc).

  5. February 10th, 2010 at 17:02 | #5

    While the tech details of how digital audio data is thrown away may be interesting, it boils down to emasculating the music just for convenience. If you actually SIT and actually really LISTEN to the music, even 44.1/16bit CD that is your supposed standard is totally compromised! The only possibilities are hi-res audio of SACD, 96K/24bit PCM, DTS-HD Master Audio, or audiophile-level vinyl. (I’m sane about this: I think 192K and 176.4K PCM are overkill and take way too much memory.)

  6. kompressorlogik
    February 12th, 2010 at 12:02 | #6

    Here’s my official response to Harold’s email:

    “Short story: Although I have no doubt that the fellow studio guy did a great job of producing theses files, it does prove NOTHING but instead your conclusion of this is what mp3 is missing is highly misleading!”

    I did not intend to be misleading, and I fear I may have left out some details regarding the original purpose of this exercise. Prompted by people on a local music forum who claim that mp3 is qualitatively “just as good” as CD as well as having bands bring in mp3s to a mastering engineer because they don’t understand the difference, we assembled the original CD (that I sent to Tom some weeks ago) to show objectively what the actual difference was. It was not primarily intended to show the difference in perception, but the actual difference in the data.

    I do not dispute that the mp3 encoding process operates by giving lower priority (encoding with fewer bits; i.e. a 4 or 8-bit word instead of a 16-bit word). Arguably, high bitrate mp3s like 320kpbs really do quite well at only removing audio that is masked by other portions of the signal. However, as you reduce the bitrate there are fewer and fewer bits of data to be used and as such the encoder must start encroaching on the less masked, more “hearable” sounds. As Ted says, each time you step down “it removes the next ‘least needed.'” To claim that a 160kbps mp3 has only removed un-hearable, masked sounds would seem a folly to me.

    Additionally, I just recently was made aware that in fact mp3 encoding does throw away some information wholesale. Spectral analysis I observed of the 128kbps mp3 (converted track, not difference) from our original demo disc showed that the process unceremoniously brick-walled the signal at 16kHz, as did the mp2 encode, neither bothering to a single bit to anything from 16kHz and upwards. Comparatively, the 320kpbs mp3 lopped off everything above 18kHz and the AAC at 256kpbs appeared to have a sort of “gate,” allowing material above 16kHz to be encoded when it wasn’t sufficiently masked (read: perceptible) and dousing it entirely when it was masked. I can certainly see the reason for this, as the range from 16kHz to 20kHz is less than half an octave and from 18kHz to 20 is only a note or two, so you’re really only losing a minimal amount of overtone information in exchange for a quite large number of individual frequencies. Also, the average human of middle age can’t effectively hear all the way up to 20kHz anyway, so the effect of the loss is further minimized. However, this observation clearly refutes any claim that the mp3 encoding process only removes masked information.

    I agree about the introduction of artifacts as bitrates are lowered. In fact, there’s an interesting phenomenon that can occur when an encoding scheme with large windows of time analysis are used called “pre-echo.” Essentially, if a transient occurs mid-way through a given time window, the encoder may interpolate part of that transient to the beginning of the window. This gives you an echo of the sound before it actually occurs! Another artifact that’s more common is what is known as “birdies” or “space monkeys.” This the high-frequency flutter you hear in highly compressed files. It can either sound somewhat like birds chirping, underwater bubbling, or apparently even monkeys from space screaming about how much they’d like to eat you. This is a result of the change in bit-word length I mentioned earlier. As the word length of high frequency information ping-pongs around, this flutter sound is produced.

    I also disagree a bit with your statement that mp3s are “easier” to listen to. I’m unsure what you mean exactly by “easy;” if you indeed mean easier on the ears, then I most certainly disagree. However, if you mean “convenient” then I believe we are on the same page. The fact that mp3s are so convenient to download and carry in large numbers has made them ubiquitous in today’s music culture and I believe that to be the main problem; because of mp3s near-exclusive use among large portions of youths, many have become accustomed to only hearing highly compressed music. It’s not that they necessarily prefer it, because that would require a direct comparison. Their brains are merely accustomed to hearing compressed music so that has become what they expect to hear and, if we take the human population as an example, any deviation from expectations is generally regarded negatively, at least at first. The same thing happens when auditioning a new pair of speakers; your initial reaction to them is rooted firmly in your familiarity with whatever speakers you had previously. It takes time to become accustomed to a new type of sound.

    Lastly, to briefly respond to your conclusion and to recap my inital statement: I never intended this exercise to be an illustration of what you’re missing hearing, only what the actual encoded file is missing. If I was misleading, then all have my apologies and I hope this sets them straight.

    Thanks for reading, everyone who made it through that!

    -Austin

Comments are closed.