Shots Fired: When a Picture is Not Worth a Thousand Words
Written by Sgt. Christopher Andreacola   

WE HAVE ALL HEARD THAT SAYING: “A picture is worth a thousand words.” In the field of forensic video and image analysis, that adage is certainly applicable. The increase in storage space, speed of processors, and improved technology has resulted in an increase in quality video in many of our law enforcement investigations, including 4K-resolution footage. The quality of images produced from this evidence can often solve cases that would have been useless in the past. If a picture is worth a thousand words, then a video must be worth tens of thousands… unless it is the video of a shooting.

This article appeared in the March-April 2021 issue of Evidence Technology Magazine.
You can view that full issue here.

As a sergeant heading the video analysis unit with the Tucson Police Department, I respond to all our officer-involved shootings (OIS). In addition, we are often asked to analyze surveillance and cell phone video of homicide investigations involving firearms. Over the past eight years, I have come to appreciate the audio track as evidence—just as much, if not more, than the images. Let me start by looking at the basics of conducting a forensic analysis of a shooting captured on video.

After completing the standard verification and authentication review of the evidence, I start by watching the video. I try to watch it with as few assumptions as possible and make some notes about my basic observations as a viewer. I look and listen for any indications of firearm use. If I am lucky, I will have at least two major parts of the video file: the video track and the audio track.

The analysis must start somewhere. Whether you start with the video track or the audio track, there is no “right” place to begin. I guess you could say I am biased, as I started my career as a video analyst, but I recommend starting with the video. To avoid any other bias caused by the sounds on the audio track, I start with my first file and complete a full analysis of the video track alone. I make notes on my observations to include frame numbers, timing, and the visual evidence that I see. I continue and perform this analysis for any of my other video tracks. After I have reviewed all the video tracks independently, I compare the results in any files that were recorded at the same moment in time to look for images that capture the event from different cameras.

Next, I conduct a full analysis of the audio track (or tracks), making notes regarding timing, amplitude (loudness), and any other audible evidence I hear, independently. Then I compare any other audio tracks that were recorded at the same moment in time.

My last step involves comparing my notes of the video and audio tracks to see where they match. This process is performed individually on each file, and then collectively on any files that were recorded at the same moment in time.

When it comes to video evidence of a shooting, there are several things to look for: weapon mechanics, recoil, muzzle flash, projectiles, casings, impacts and/or ricochets. Many of these observations can be seen in the video below.

Weapon mechanics include things like the pulling of a trigger, the rotation of a cylinder, or movement of a slide on a semi-automatic. Depending on the action observed, they can indicate if the weapon is about to fire (by the movement of the trigger toward the rear of the weapon and the rotation of the cylinder); or if the weapon is in the process of firing or has already fired (by the backwards movement of the slide).

Recoil is the result of physics. For every action, there is an equal and opposite reaction. Keep in mind that recoil is not only about the weapon barrel. You may see recoil in the hand, arm, shoulder, or even the whole body of the shooter. As recoil is a reaction, it is always occurring after the shot fired and there is a greater chance it will be observed over weapon mechanics due to the length of time it is present.

At its most basic, the act of shooting a firearm is just a controlled explosion. As a result, we can expect to see fire and smoke. For firearms, we refer to that as muzzle flash—demonstrated in the video below.

Most of the research indicates muzzle flash occurs in 1 to 3 milliseconds, leaning closer to the 1 millisecond. Sometimes the evidence can help prove that all by itself without any of the research.

Suspect firing a weapon at officers captured on two body-worn cameras at the same moment. Image: Author / Tucson Police Department

In the image on the left, a subject wearing a white shirt, red jacket, and blue pants is standing still and firing a weapon at the approaching officer. You can see the muzzle flash at about chest height. The image to the right was captured at the same moment in time by another officer’s body worn camera (BWC). This second officer was running toward the subject and, as a result, there is significant motion blur.

Motion blur occurs when the camera is trying to take a picture while it is moving. It is more often seen during low-light conditions. The time the camera needs to capture a picture—the sampling period—is increased due to the lack of available light. If there is movement of the lens during that sampling period, the camera tries to record all that movement on the one image, and you end up with motion blur.

However, notice there is no motion blur of the muzzle flash in either image. The flash occurred so fast (1 millisecond) during the extended sampling period that it remains as a small circle.

The last things I look for are projectiles or casings leaving the weapon and impacts or ricochets caused by the projectile. The following image shows these observations in two consecutive frames of video.

Samples of visual indicators of weapons fire captured on body-worn cameras. Upper left: projectile. Upper right: casing. Lower images: impact. Image: Author / Tucson Police Department

There is no doubt these images provide valuable evidence when found. But that is just the problem: they must be found. As it turns out, these observations are rarely captured on video. Most of today’s advancements in technology are geared toward creating higher-resolution images with more pixels. More pixels are not going to solve the main problem here. This problem stems primarily from the frame (or sample) rate and sampling period.

Modern video in the U.S. is recorded at 30 frames per second. In essence, 30 samples of short periods of time are captured every second. Generally, the sampling period is twice the frame rate. This means that, with a frame rate of 30 samples taken every second (1 frame every 1/30th of a second or .033 milliseconds), the period the camera is recording that individual frame is about 1/60th of a second or .017 milliseconds.

Illustration of sample rate compared to sample period. Illustration: Author

Although this is often referred to as the shutter speed, in modern digital video recorders the shutter has been replaced with an image sensor; however, the principle is the same. Considering the speed at which a weapon can be fired, even under ideal conditions with a standard frame rate, at most we would only observe half of the visual indicators of a shooting.

Most of the visual indicators I have been discussing occur in approximately 1-3 milliseconds. As a result, generally they will only be observed in one single frame of video, if at all. During a study of ten OIS cases in 2018, I found visual indicators of a shot fired only 25% of the time. When you consider things like surveillance cameras recording at much lower frame rates, low-resolution images, poor lighting, movements, and camera positions, it is common to not find any visual indications of the discharge of a firearm anywhere on the video track.

As it turns out, in a shooting incident a picture is not always worth a thousand words. But that audio track can be priceless.

Just like the video, there are several things I listen for when evaluating the audio tracks. The first three—shell casings hitting the ground, the impact or ricochet of the projectile, and the mechanical actions of the weapon (slide cycling in a semi-automatic)—are self-explanatory. Unfortunately, because these sounds are often so faint and are usually occurring during the act of firing the weapon, they are often covered up by the two more pronounced audible indicators of a gunshot: muzzle blast and the shock wave.

Muzzle blast is the audible equivalent of the muzzle flash. It is the result of that controlled explosion and is the most obvious indicator of gunshots on an audio track, especially if the gunfire occurs close to the microphone. However, there are two important issues that I have learned to take into consideration.

The first is the inability to identify gunshots over other loud bangs when captured on an audio device. Now, to be clear, I am not referring to scientific experiments with multiple extreme frequency-range microphones strategically placed to determine shots fired from a rifle verses a handgun. I am also not referring to research comparing the calibers of weapons fired in a controlled environment. I am referring to surveillance footage captured with one outdoor microphone on the other side of the building, or cell phone video captured a block away, or a BWC video right in the middle of the action.

Muzzle blast is considered a percussion event. Why? Because the science at this point does not provide the data or tools to differentiate between a car exhaust, a boxer hitting a bag, bubble wrap popping, the beating of a drum set, or a gunshot. When evaluating sound, the two basic tools we use to visually represent the sound are a sound wave, a measurement of the amplitude or loudness of the sound, and a frequency spectrum—that is, the total frequencies (pitch) that make up the sound.

Five samples of sound wave and frequency spectrum of percussion events, including one of gunfire. Illustration: Author

All five of these sounds have the same basic features. They are sudden, loud, and fill the frequency spectrum. Without listening to the sound in context with the video images, I would be very careful about identifying a gunshot versus bubble wrap popping until I have considered all the other evidence I am evaluating. Which letter do you think represents the gunshot? (See the end of this article for the answer.)

Now, the loudness or amplitude of the muzzle blast, when considered in context with the visual images, can provide evidence of the distance to the microphone, difference in caliber of multiple weapons fired, and even the movements of the shooter toward or away from the recording.

The other pronounced audible indicator when dealing with supersonic rounds is the shock wave, sometimes referred to as the crack. There is enough data about shock waves and their use in determining distance and direction of weapons fire to write another article. If you want more information on this topic, I suggest starting with the video below.

The second important issue I take into consideration is the speed of sound. This differs dramatically from the speed of light. If a single event is captured visually on two or more cameras, regardless of the distance of the event from the cameras, you can confidently state it is happening at the about the same moment in time, allowing you to sync the video images. This is due to the speed of light being constant at 186,000 miles per second. If the single event was captured on two cameras half a mile apart, the difference in time it takes the image (light) to reach and be “seen” and recorded by the two different cameras is relatively insignificant.

In this same scenario, the speed of sound is very significant and, as a result, what you hear from two separate recordings of the same incident will often sound significantly different. The speed of sound is variable, but most agree it is approximately 1,125 feet per second. This fact, when taken into consideration, can provide significant additional evidence, including things like the timing of the shots fired, the position of the weapons in relation to the recordings, and possibly movements of the shooters. However, if the speed of sound is ignored, in many cases improper assumptions will be reached if the sound heard is assumed to be occurring at the same exact moment in time as the frame of video it is synced with.

I was asked to assist in an OIS where an assumption was made based on the single BWC of the involved officer. The officer stated the suspect fired three shots before the officer returned fire. However, the audio on his BWC recorded two distant shots followed by what was clearly the officer’s shot. I located additional footage from officers in the surrounding area and, after syncing the recordings, discovered three distinct shots followed by a fourth, as described by the involved officer.

Sounds waves of two body-worn camera audio tracks of the same moment in time recorded approximately 400 feet apart. Image: Author / Tucson Police Department

In addition, the amplitude of the sound was also consistent with the positions of all involved. The involved officer’s shot is obviously loudest on his microphone, whereas his shot is quieter on the surrounding officers’ microphones because they were closer to the suspect.

Map demonstrating the relationship between the positions of two recordings and how, over time, the sound waves of two separate shots move toward the two microphones. Illustraton: Author

If you are like me, you are probably asking yourself, Who cares if the officer fired after the second round was fired by the suspect versus the third round? It was still justified. I agree, but this is about the collection of evidence and what an audio track can provide to enhance your analysis. What if the involved officer fired after only the first shot by the suspect? What if (as in this incident) the involved officer’s audio covered up the suspects shot? What if the investigators could not find the projectile fired by the suspect? What if the suspect denied ever firing a shot? When we perform these investigations the same way every time and consider all the available evidence, even in the cases where it “probably doesn’t matter,” we are preparing ourselves for the time it does.

So why are we getting so much evidence out of an audio track? It is all about data over time. It is not uncommon for a single frame of video recorded in high definition to contain more data than the entire audio track in a short video clip. But, over time, the audio provides thousands more samples of what is occurring.

As I discussed earlier, a video recorded at 30 frames per second is only providing 30 samples of moments in time for that single second. If any part of your event—or, more specifically, the evidence you are looking for—occurs when the camera is not sampling, you miss it. The average audio track records at almost 1,500 times that rate or 44,100 samples per second. As a result, in the average shooting which takes place in literal seconds, the audio is truly worth tens of thousands of words.

If you were curious, here's the answer to the quiz:
A = bubble wrap
B = car exhaust
C = punching bag
D = drums
E = gunshots

About the Author
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it is a 34-year veteran of the Tucson Police Department and currently heads their Video Analysis and Management Unit. He is a Certified Forensic Video Technician with LEVA and teaches nationally on the use of in-car and body worn cameras and analyzing video from officer involved shootings.

Next >

Image Clarification Workflow

A FEW WEEKS AGO, I received a call from Ocean Systems asking if I would like to beta test their newest software—ClearID v2.0 Image Clarification Workflow. The new progam has filters that were designed for use with Adobe’s Photoshop graphics-editing program.