Machine Learning and Artificial Intelligence Increase Efficacy of Criminal Investigations
Written by KRISTI MAYO   

TRADITIONALLY, DETECTIVES SEEKING TO SOLVE CRIMES gather physical evidence such as fingerprints, drugs, and firearms. Cases can take weeks, months, even years to solve—or go unsolved altogether—not because detectives lack the skills, but simply because they lack the resources to gather the necessary information to solve crimes.

New developments in technology have delivered a radical upgrade for criminal investigation, giving investigators tools to rapidly search online for information needed for an investigation—information that would be time-consuming or even impossible to find by traditional methods.

At the same time, advances in technology have opened up new territory for the criminal element to exploit, and ways to hide illegal activity out of sight of the law. More than ever before, it’s a cat-and-mouse game.

Police investigating criminal activity online face myriad challenges. When criminals commit offenses, communicate with each other, or move information or potential evidence online, they do not operate in a user-friendly place like a website with a username and password. Rather, they move deep into another world—a world that is accessible to investigators with the right tools, but which proves a nightmare for the uninitiated trying to navigate and locate the evidence needed to solve crimes.

A criminal network analysis within the Cobwebs Technologies platform

Challenges in the online world
Three sectors contribute to the complexity of criminal investigations online:

The surface web is where most of us visit every day. We use a search engine like Google or Firefox to find data, read news and mail, shop, and research. All information stored and found on the surface web is indexed and accessible.

The deep web is a component of the surface web. Here information is not indexed and cannot be found by search engines. It is perfectly above board, but can contain illegal content. Deep web pages require usernames and passwords for access. Typical examples include subscription information, bank accounts, corporate intranet content, and personal medical records.

Note that this data is not completely inaccessible: Google can pull results from the deep web just like a headline in a search query, and users can see snippets of the site in the search result. Interestingly, a large amount of data on the deep web is of great interest to criminals.

Much of the material on the dark web is intentionally hidden and potentially illegal. However, not all of it is nefarious: many individuals and organizations use the dark web for beneficial purposes, such as journalists soliciting information from confidential sources, or communication in politically oppressed regions. Stored data is encrypted, and most dark web pages are hosted anonymously.

The distinguishing feature of the dark web is that it cannot be accessed with the same browsers used for the surface or deep web: it requires a TOR browser, or similarly anonymizing keys that are freely available. Data posted here is not indexed, so searching is complicated and challenging.

Dark web threat actor analysis

Criminals frequent all parts of the cyber world. Some use social media accounts (which are deep web, because they are password protected) to communicate with each other, issue threats, distribute propaganda to followers, or research potential threat actors. Some move to the dark web to congregate, communicate, and collaborate anonymously and to trade and share stolen assets and information. When an investigation begins online, it invariably leads to the dark and deep webs.

The surface web as we know it holds only around 4% of all online information. The other 96% is found on the deep and dark webs, and this is where investigators conduct the bulk of their detective work. To grasp the astronomical amount of data investigators contend with, consider that the World Economic Forum estimates that by 2025, 463 exabytes of data will be created each day globally — equivalent to 212,765,957 DVDs per day.

Law enforcement challenges
Investigating a case requires evidence in many forms, including digital formats. Given the sheer volume of unstructured online data, investigators have their work cut out for them.

Law enforcement agencies face a multitude of challenges online. They have to figure out how to find evidence on the deep, dark, and surface webs, and how to navigate between structured and unstructured data. They need access to the correct browsers, plus they must understand how the deep and dark webs function. Moreover, they need to know which search terms to utilize: names, IP addresses, keywords, hashtags, locations, websites, social media platforms, batches of numbers including telephone numbers, and cryptocurrency wallets, among others.

Information found online should not be confused with data that could serve as evidence or even be considered admissible as evidence. Role players must be identified together with their associates (Social Network Analysis), their connections verified, and their part in the crime firmly established. Threats and risks also need to be worked out. The investigators must know how to take the information found and use it to support a subpoena, if relevant to the case.

Once information is found, investigators then have to move data through the intelligence cycle and on to the due diligence and evidence vetting processes. Evidence needs to be confirmed and validated, and irrelevant information discarded. With online evidence in hand, detectives then start to build a case for prosecution.

Clearly, with so much complexity, so much data, and so much darkness on the web, there is no room for error. Missing a detail could mean a case goes unsolved, and a felon walks free.

So, what is the solution?

The value of machine learning and artificial intelligence
The leading solutions for law enforcement investigators working online utilize machine learning (ML) and artificial intelligence (AI). ML and AI working in tandem provide law enforcement with an automated methodology that can search the open, deep, and dark webs to pinpoint illegal activities and bring malicious actors to justice. Together, ML and AI rapidly make sense of the vast amounts of unstructured and uncategorized data on the web. ML, by virtue of its algorithm, gives technology the ability to learn on its own. AI helps the technology think or decide.

It is essential for law enforcement purposes that ML algorithms be fine-tuned to analyze, label, and sort Big Data. The AI component must be able to identify and extricate the relevant intelligence for investigators using a number of different procedures and capabilities.

Without these capabilities, investigators working on a crime committed from the deep or dark web—or those working on a real-world crime with components of the crime posted online—would have to manually gather data from individual online sources, analyze it, and then put all the puzzle pieces together to see the crime in its entirety. The risk with this time-consuming and cumbersome method is that key parts of the crime may be missed or overlooked, which could negatively impact the overall investigation.

Open-source intelligence
With an open-source intelligence solution (data from publicly available sources) built on ML and AI capabilities, the investigative team has the ability to look for any and all evidence they may need in the online world, starting on the surface web, drilling down to the deep web, all the way into the dark web, often looking for — and finding — a needle in a haystack.

An open-source intelligence solution delivers value to police and law enforcement investigators by mitigating their investigative challenges and expanding their scope. It can search for a wide array of terms from publicly available information, and it can initiate a deep search through a variety of open-source information on all layers of the web and across all social media platforms and blogs.

When the solution finds the data it is searching for, it collects, analyzes, and presents it to investigators in an easy-to-read, easy-to-interpret format. Investigators can then analyze the data presented and identify potential threat actors or new threats. They can prevent harmful incidents from occurring in the first place, support the evidence confirmation and validation procedure, and rule out irrelevant material.

Open-source intelligence solutions are not complicated to use; they require only minimal training. Open-source solutions increase the investigative team’s speed, accuracy, and capabilities, and reduce the actual cost of an investigation.

Used effectively, an open-source intelligence solution powered by ML and AI can both predict a potential crime and direct investigators to the person or persons involved. The enormous value delivered by the ML and AI components can help overcome human limitations of research and analysis and connect small pieces of data that may seem irrelevant but are in fact pertinent to the case.

Examples from the field
Cobwebs Technologies provides an effective ML and AI web intelligence solution, trains investigators how to use it, and provides after-sales solution support to all clients. In addition, under certain circumstances, Cobwebs provides expert analyst support to help investigators find the relevant data for scrutiny before it can be considered as evidence. The following two cases detail actual events. Identifying details have been omitted so as not to compromise the investigations.

Active shooter threats via social media
A state law enforcement agency asked Cobwebs to help with a case involving an individual who used an open social media platform to issue a threat about a mass shooting at a particular event. The online threat was accompanied by a picture of the person in makeup holding what appeared to be a real firearm. This individual went on to make several additional threatening comments in other social posts.

This particular case was extremely urgent because the law enforcement agency discovered the tip on the very same day that the event under threat would be held. The law enforcement agency could not determine any additional information from the postings, so they approached a third party for help. When that party could offer no additional insights, the law enforcement agency was referred to Cobwebs.

Using its open-source intelligence capabilities, Cobwebs identified the threat actor’s social network and found multiple accounts for the same individual. They also uncovered additional photos the threat actor posted of firearms, including one in which the serial number was identifiable. This information was collected and turned over to the investigating agency for evidence determination.

In addition, Cobwebs discovered a post where the threat actor made a public comment that he was registering his firearm. This information was immediately traceable. Following a detailed cross-analysis of his many social media accounts, one particular post led to a photo which clearly showed his face, as well as some flyers for discussions he hosted listing his full name.

Law enforcement identified the threat actor and made contact. The outcome of this case has not yet been disclosed.

Human trafficking
A local law enforcement agency, as part of a larger task force, was investigating a human trafficking case. The investigating agency had already identified one potential victim as well as two alleged traffickers. Searching for more leads, the investigative agency contacted a specialized non-profit organization that referred the local law enforcement agency to Cobwebs.

Cobwebs assisted in identifying the human trafficking network, all connections between the role players, and other possible threat actors and victims. With the known profile information provided, Cobwebs used ML and AI to initiate a full open-source intelligence sweep of the entire online environment using specific search terms. In a very short time, a much larger human trafficking network was discovered and identified. Using this network as a starting point, Cobwebs found a number of common connections and then identified an established core network among all parties involved.

During this investigative process, Cobwebs was able to identify an additional five new victim profiles of the original victim. This in turn led to the identification of several other potential victims, including another solid threat actor. The information was then passed on to the investigating agency for evidence determination.

With the information provided, the law enforcement agency concerned was able to exponentially expand their investigation past the original threat actor. The outcome of this case has not yet been disclosed.

ML and AI are core components that assist investigators in finding evidence in a criminal case and helping investigative teams achieve successful arrests and prosecutions. Without access to ML and AI within a larger web intelligence solution, criminal investigators would find it nearly impossible to fully investigate a case involving online aspects, case-resolution rates would decrease, public confidence in law enforcement would suffer, and crime rates could increase. ML and AI significantly advance the identification, confirmation, and utilization of online evidence and the prosecution of threat actors for law enforcement agencies.

This article appeared in the March-April 2020 issue of Evidence Technology Magazine.
You can view that issue here.

About the Author
Johnmichael O’Hare is the business development and sales director at Cobwebs Technologies. He is the former Commander of the Vice, Intelligence, and Narcotics Division for the Hartford (Connecticut) Police Department. Prior to that he was the Project Developer for the City of Hartford's Capital City Command Center (C4), a Real Time Crime Center (RTCC) that reaches throughout Hartford County and beyond. C4 provided real-time and investigative back support for local, state, and federal law enforcement partners utilizing multiple layers of forensic tools, coupled with data resources, and real-time intelligence.

< Prev   Next >

Court Case Update

FINGERPRINT EVIDENCE went through a nearly three-year ordeal in the New Hampshire court system, but eventually emerged unscathed. On April 4, 2008, the New Hampshire Supreme Court unanimously reversed the decision of a lower court to exclude expert testimony regarding fingerprint evidence in the case of The State of New Hampshire v. Richard Langill. The case has been remanded back to the Rockingham County Superior Court.