Featured image of post What Is an IAB Download?

What Is an IAB Download?

NOTE: Before you read this post, it’s very important that you understand the four building blocks that comprise an IAB download. These have been written about in detail in the following articles:

  1. What Is a Download?
  2. What Is a Progressive Download?
  3. What Is a Server Log?
  4. What Is a Request Header?

# Definition

An IAB Download is a metric based on measurement guidelines created by the IAB Tech Lab. While the IAB uses the word “download” this is an oversimplification of what it truly means. An IAB Download represents a deduplicated and filtered version of downloads and progressive downloads, with the server logs and request headers being an integral part to identifying how to filter and deduplicate the data.

To put this another way, the IAB Download metric is meant to represent unique human downloaders of a particular podcast episode by filtering out any data that doesn’t indicate a user actually intended to listen to a particular piece of audio. Let’s dive into what exactly is getting filtered.

# Filtering, Filtering and More Filtering

Think of the process as one big funnel, tons of data comes into the top of the funnel, but only data filtered according to the IAB Podcast Measurement Technical Guidelines comes out as an IAB Download. This is always a reduction of the overall available data. As mentioned before, all filtering is meant to weed out any data that does not reasonably indicate a user actually intended to listen to a particular piece of audio.

Let’s get nerdy 🤓

# Eliminate Pre-Load Requests

A pre-load request is when an audio player asks to download an audio file before the user has interacted with the audio player to do so. Because the purpose of the IAB Download metric is to reflect the user’s intention listen, pre-load requests should be filtered out. They do not reflect whether a user actually wanted to listen or not.

Unfortunately, pre-loading is controlled by the audio player. Audio players are not always maintained or managed by the companies that are delivering podcast metrics and therefore 3rd party pre-load filtering is impossible or difficult at best.

The IAB recommends two methods for eliminating pre-loads:

  1. If you have control over the audio player, don’t preload
  2. Use the file threshold guidelines below

# Eliminate Potential Bots and Bogus Requests

The next step is to filter out non-human traffic sources. To identify non-human traffic sources, it’s important to look at the server logs and try to determine which requests are legitimate and which are not. There are six pieces of information that are present in the server logs to make these types of decisions:

  1. IP Address
  2. User Agent
  3. Request Headers
  4. Response Headers
  5. Referer (not always present)
  6. Byte Range Start/End (not always present)

Here is a truncated version of a Cloudfront log with the pieces of information used to determine non-human sources:

Property Value
c-ip 192.0.2.100
cs(Referer) -
cs(User-Agent) Podcasts/4023.540.3 CFNetwork/1494.0.7 Darwin/23.4.0
sc-range-end -
sc-range-start -

# IP Address

An IP address identifies each computer that is connected to a computer network. Using the IP address alone, other pieces of data can be looked up to help identify the requesting computer and network. Various companies retain databases that allow anyone to look up additional information tied to an IP address. MaxMind is one example of a company that provides this type of information. There are also websites that will do this like What Is My IP Address. Here’s the data gleaned from my IP address:

Property Value
Hostname 172.59.xxx.xxx
ASN 21928
ISP T-Mobile USA Inc.
Services None detected
Assignment Likely Static IP
Country United States
State/Region New York
City Syracuse
Latitude 43.0481 (43° 2′ 53.23″ N)
Longitude -76.1474 (76° 8′ 50.72″ W)

The ASN (Autonomous System Network) is also a valuable piece of information returned from an IP address. The ASN is an identifying number for a network of IP addresses. You’ll notice in the table above, the network was identified as T-Mobile. If I were to look up the IP address for a server I run on DigitalOcean, you’ll see the information changes. While my server has a unique IP address, it is part of the overall DigitalOcean network with many IP addresses:

Property Value
Hostname 198.211.xxx.xxx
ASN 14061
ISP DigitalOcean LLC
Services Datacenter
Assignment Likely Static IP
Country United States
State/Region New Jersey
City North Bergen
Latitude 40.8041 (40° 48′ 14.68″ N)
Longitude -74.0124 (74° 0′ 44.52″ W)

The IAB Podcast Measurement Guidelines state that IP addresses cannot originate from known servers (AWS, DigitalOcean, etc.) or bots (Google, Bing, etc.). This can easily be determined by the IP addresses and ASNs. Traffic can be filtered out at the ASN level – good for filtering out big blocks of IP addresses – traffic can also be filtered out on a per-IP basis.

Because it’s not always apparent if traffic coming from an IP address is human or not, the IAB also recommends that a system be put in place to recognize when an unrealistic number of downloads are coming from a single IP address. There are exceptions to this rule, like in the case when the IP address represents a dorm, corporation or other similar entity that has many users with a single identifying IP address.

# User Agent

A User Agent is a piece of information sent by the requesting device in the request headers that is meant to identify the device asking to download a particular file. The User Agent request header is tracked in the server logs and should be used in the filtering process.

User Agents that don’t exist or that identify themselves as being a bot should be filtered out. User Agents can also help identify traffic that could be coming from servers that are not already blocked.

✅ Examples of good User Agents:

  • Podcasts/4023.540.3 CFNetwork/1494.0.7 Darwin/23.4.0
  • Spotify/123400783 OSX/0 (MacBookPro15,2)
  • Mozilla/5.0 (iPad; CPU OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 OPT/3.2.13

❌ Examples of User Agents to filter:

  • Firefox3.06 (not real)
  • Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot) (bot)
  • Uptime-Kuma/1.23.11 (monitoring software)

For more examples, the Open Podcast Analytics Working Group (OPAWG) maintains a comprehensive open-source collection of broadly-compatible regular expression patterns for identifying and analyzing podcast player user agents.

# Referer

The Referer, a misspelling of referrer, is another request header typically sent when a request is made from web-based applications. Depending on how a web-app requests a file, the referrer can be changed. When the referrer is correct, it can be helpful in identifying further details about a web-app.

Sometimes web-based audio players pre-load audio, if the referrer is present it can be used to help make decisions around if these should be filtered out as bogus requests.

The OPAWG maintains an extensive examples list of web-app referrers.

# Handling HTTP Requests and Responses

A request for a file always contains a header indicating the request’s “method” (eg. GET, HEAD, POST, PUT, etc.). Each request will receive a response back. This response contains a status code header from the server indicating information about the response. The CloudFront server logs for these two properties will look something like this:

Property Value
cs-method GET
sc-status 200

Only GET requests and status codes of 200 (ok response) and 206 (partial response) should be counted.

NOTE: The IAB measurement guidelines state there could be quirks to watch for. Akamai specifically uses a HTTP code of 000 for a prematurely ended request. The IAB says these can be counted if they pass the file threshold levels below. My personal opinion is that this is an incorrect treatment of this type of response. Generally speaking a request that is ended abruptely for any reason should have a status code in the 400s or 500s. I would argue that a service trying to download an audio file that receives a non-200/300 response isn’t going to try and play the file. Happy to argue about this further with anyone that wants to 😉

# Apply File Threshold Levels

If a file request has made it this far in the filtering pipeline it’s time to make sure that at least 1 minute of audio has been downloaded. The good news is that if a status code of 200 was sent, this indicates that the whole file was returned and therefore meets the minimum file threshold. If a 206 was returned, then it is required that the system ensures that at least 1 minute was returned. To do this we have to look at the byte-range in the request.

# Byte Range

As mentioned in What Is a Progressive Download?, byte-range information indicates how much of the audio file was requested in bytes. This information can be used to calculate out the approximate amount of audio time downloaded as long as you have other pieces of data. To do this it’s important to understand the coorelation between byte-range, file size, encoding quality and audio time.

Since the IAB states that at least 1 minute of downloaded audio and the byte-range tells us how many bytes were downloaded, we’ll need an equation to figure out if the total bytes download are enough to equal that 1 minute of audio. The equation looks like this:

(time in seconds * audio bits per second) / 8 = size in bytes

Since we’re solving for bytes, we need numbers for time in seconds and bits per second. Time in seconds is 60 since we want to know the size of 1 minute of audio. Bits per second is what the audio file was encoded at for the .mp3 being delivered. While most podcast hosting companies default to 128kbps (128000 bps) this is not always the case. Regardless, you will need to know the bit rate of the audio file. If we assume that the audio file was encoded at 128kbps, the equation will look like this:

(60 * 128000) / 8 = 960000

The size in bytes needed for a response to be valid of a 128kbps .mp3 file calculates out to 960000 bytes.

To connect all the dots, a byte-range request will have a start and end number in bytes. It might look something like this bytes=15000000-30000000. We can tell how many bytes were requested and sent back by subtracting the end from the start 30000000 - 15000000 = 15000000 bytes. If this audio file was encoded at 128kbps, we know that 1 minute of audio (960000 bytes) is well under the amount that was requested (15000000 bytes).

Obviously, a byte-range request might not reflect the total size of audio returned by the server. It’s important to use the server logs as the source of how much data was returned to the end-user. In a CloudFront log you would have these values to compare against and ensure that the returned bytes met the minimum threshold:

Property Value
sc-bytes 15000000
sc-content-len 15000000
sc-range-end 30000000
sc-range-start 15000000

# Identify and aggregate uniques

At this stage all valid requests have been identified and duplicate requests need to aggregated. The most common way of identifying a unique request is to combine the IP address and UserAgent together paired with the episode requested. For example, if I were listening to an episode on Apple Podcasts, my unique request would be a combination of this 192.0.2.100 + Podcasts/4023.540.3 CFNetwork/1494.0.7 Darwin/23.4.0 for that specific episode.

It’s also important to understand that deduplication is done within a 24 hour window. As a listener I will only count as one IAB download per episode per 24 hour period.

# Notes

If you’ve made it all the way to this section I congratulate you 🙌 🥳

As an astute reader you probably noticed that these guidelines are very squishy. There are many things that are next to impossible to be 100% accurate at filtering. Pre-loads, bots, deduplication, etc. are all fraught with problems. The measurement window of 24 hours isn’t even mandated in a rolling or fixed window. Literally every IAB Certified podcast hosting company will have differing results. Although numbers should track similarly, they are never 1-to-1 when compared across certified companies.

Pre-loads are easily gamed. referrers can be changed, either for the good or for the bad.

Because IAB Downloads are based on guidelines, these guidelines will continue to be iterated on and will therefore continue to change what an IAB Download truly means. Revisions to the guidelines almost always create a reduction of total downloads.

Since the IAB is concerned with advertising and the IAB Podcast Measurement Technical Guidelines are meant to help advertisers get relevant and accurate podcast analytics, it’s rare that subsequent enhancements to the measurement guidelines will be in favor of the podcaster. Think of an IAB Download as a depreciating “asset” of your podcast. It’s dangerous to assume that the value of an IAB Download tomorrow will be the same or more valuable than it is today. The reason for this is because filtering of the overall data becomes better, but also more strict over time.

Built with Hugo
Theme Stack designed by Jimmy