What's in an App?
Written by Christa Miller, Ronen Engler, Gilad Sahar   

AS OF 2012, Nielsen reported that the average smartphone user had approximately 41 apps installed on a single device. But perform a Google search for “how many apps are installed on average” and you will find users who say they keep anywhere from 50 to more than 100 apps on their iPhones and Androids.

What apps are people installing, and what are they using them for? One example is finding movie times and tickets, cheap gas, the latest stock quotes, the day’s weather forecast, and other day-to-day information. Other examples include mobile messaging, personal navigation, social media, and managing personal productivity and finance. Consider:

  • Mobile messaging apps have become so popular that they are actually undercutting wireless carriers’ paid text-messaging services. In April, in fact, cNet reported that chat apps had surpassed carrier SMS services in popularity.
  • Turn-by-turn navigation apps reduce the need for users to purchase additional GPS devices.
  • Social media apps keep users connected to a wide network of friends, relatives, locations, and activities.
  • Personal finance apps can include banking, mobile payments, and day trading.

In other words: Mobile app data comprises aspects of a user’s entire life. The data stored within mobile apps can provide needed evidence, and a context for evidence, in both criminal and internal investigations.

Extracting mobile app data

In general, three different methods extract data from mobile devices.

Logical extractions are the most basic type of extraction. Here, the mobile forensic tool uses different communication protocols, such as AT/AT+ and/or OBEX commands, to connect to the phone’s software via the device manufacturer’s application programming interface (API). Normally, these commands are used to synchronize the device’s contents. With a forensic logical extraction, however, the forensic tool uses the protocols (typically provided by the manufacturer) to request content from the device.

The return of content depends on what protocols the phone responds to, and is therefore limited to what the device manufacturer supports. For example, the logical extraction may return each SMS or image, parsed and decoded, one by one. However, it may not return all of the SMS or pictures. If the user was using a third-party text-messaging or camera app, the SMS and images may be stored in a different folder than the phone’s default location.

The situation is similar for locations. With a logical extraction, locations are derived from the metadata within extracted media files (images and videos). However, because additional locations may be found within GPS or social networking apps, locations obtained from a logical extraction may present only a limited view of the device user’s activities.

A good analogy for a logical extraction is a fast-food restaurant. You pull up to the drive-through window and you have a menu of what you can select. If you try and order something that is not on the menu, the person taking the order will not understand, and you will leave hungry.

Even so, the advantage to a logical extraction is speed. Even from a smartphone, depending on the device’s make and model and the amount of data it stores, a logical extraction usually takes less than 20 minutes.

Device manufacturers do not provide communications protocols to retrieve data from third-party apps or apps created independently of manufacturers. However, the SQLite databases that compose iOS and Android file systems can provide access to available and deleted databases, including deleted entries from a database.

This means that there is significant evidentiary value in being able to view tables and content, and search the data. Here, a type of smartphone logical extraction known as file system extraction can be useful. Rather than targeting specific files, the file system extraction copies entire file directories—folders and files—including app data.

File system extractions can provide a wealth of deleted data that is often enough to build a case. However, they do not access unallocated space. To get data in that space, a bit-for-bit copy—physical extraction—is necessary. Physical extraction accesses all data on the device, regardless of where it is or was stored.

How is this possible? Just as a computer does, a mobile device saves files by physically writing to the memory chip. A file table describes where that file is actually located, and thus tracks where the file was saved. The device accesses the file table whenever you want to open that file, to find where to read the data.

As on a computer, deleting a file doesn’t really delete the file’s data, but only the record of where the file is located. After deleting this record, the device marks that location as free (unallocated), even though the data remains. Unless it was overwritten with a new file, reconstructing it should be possible. Even then, overwritten data fragments can often be carved.

To make the bit-for-bit copy, the mobile forensics tool cannot simply request data from the phone, as it would with logical or file system extractions. With a physical extraction, the mobile forensics process accesses the binary data on the device chipset by acting as its operating system. In other words, rather than allow the device to boot normally, the mobile forensic tool loads the same type of boot loader that the device’s own operating system uses to start.

A forensic boot loader—basically a “micro operating system”—accomplishes several things. First, it protects the evidence because the phone is not booting normally and therefore not accessing its network; this way, no one can remote wipe it. Second, the boot loader avoids the user partition, thereby protecting that data from the examiner inadvertently overwriting or otherwise modifying the data. Third, the boot loader enables the forensic examiner to override any user locks that are in place.

This, however, presents a new challenge: decoding. Flash file systems are designed to avoid delete cycles by keeping deleted information in the device’s memory. Accessing this information in its binary form means it must be decoded, which means reconstructing the file system—interpreting it properly in order to parse the information.

Decoding support is often just as important as extraction support in a mobile forensics tool. Automated decoding saves considerable time for forensic examiners, and yet it is one of the most challenging levels of support a mobile forensic vendor can offer. That is because unlike the world of PCs, where only a handful of file systems dominate, more than 140 file systems currently exist for mobile devices. Each file system may run any one of a variety of operating systems and versions of each OS. Because the first step in physical decoding is to perform file system reconstruction, taking the binary and reconstructing the files is a huge undertaking.

Once the Flash file system has been reconstructed, locations, Bluetooth devices, device information, cookies, installed apps, web history, and other content can be decoded and parsed out of the device’s databases. The mobile forensics tool should then make the data available in both binary and hexadecimal format, highlighting the various pieces of user content and metadata so you can see exactly where they were decoded from, including the database where it is located.

The reason is that app data contains not only information that the user intentionally placed. It often also contains “metadata” about that information. A chat or instant-message session between two or more people has important date and time stamp data attached to it, as do images or video taken from within an app.

Although modern mobile forensic tools do a good job of supporting the majority of mobile devices on the market, forensic examiners do still encounter devices which no automated tool can support, or which are only partially supported (i.e. only some data is retrievable from the usual extraction methods).

An unlocked device that contains plenty of existing evidence can simply be photographed, screen by screen. However, devices that contain crucial hidden or deleted data require a deeper form of physical extraction known as JTAG or chip-off methods. These methods take advantage of a process called “wear leveling,” an automatic internal mechanism in which the memory chip redistributes data from one surface to another.

This is to prevent surface wear on the memory, and thus make it last longer. However, it is possible for a chip-off extraction to literally take data off the surface of a memory chip. Doing so enables the examiner to find more data fragments to reconstruct. Those fragments can then be brought into certain advanced mobile forensics tools for decoding and parsing.

Decoding third-party apps

Billions of apps exist thanks to easy availability of APIs and Android’s open-source architecture. Therefore, no mobile forensic tool can extract and decode all apps. New apps, and updated versions of those apps, are added to the respective marketplaces too frequently for forensic tool support to be a realistic expectation.

Vendors do support many dozens of apps already. Going by the app’s popularity in the iPhone’s App Store or Google Play, vendors can focus on which app support examiners need most. Customer feedback also plays into app support development.

Even after a vendor supports an app, however, several obstacles exist. App support usually depends on the app’s version and the operating system on which it runs. Thus, when the app or the OS is updated, the vendor must revise its decoding methods to ensure that the data is being parsed properly.

Additionally, decoding can be restricted to the type of data that the application is designed to store on the phone. For instance, the TigerText for iPhone and WhatsApp for Blackberry apps store their contact information in an encrypted SQLite database on the phone. Other decoding challenges include non-standard data compression methods and use of proprietary file formats rather than SQLite databases to store data.

That is why it is important to open an extracted database within the mobile forensics tool that was programmed to decrypt or decode its contents. Opening the database outside of the right tool may result in gibberish. However, opening the database within the right tool presents you with the decoded content.

(Even so, this is not always the case. Sometimes you will be able to extract the data but not decrypt it. Other times, as with the iPhone 4s and 5 models and the iPad 3, not even physical extraction is possible.)

iOS apps are a good example. Passwords within apps are contained within encrypted and protected keychains, a vault for passwords for any variety of services—social media accounts, WiFi connections, and many others. To obtain user data and content from these services, it is necessary to decrypt the keychain.

In other cases, some iOS apps can choose to utilize the encryption mechanism provided by the device using keys from the keychain. These apps have their own password storage database.

Some apps may be built to be anti-forensic. For instance, SnapChat was developed to allow users to view an image for just a few seconds before trashing it. Many devices store the image only in RAM—not in the cache or temp folder—so there is no data to recover.

However, this is not an ironclad rule. Different versions of SnapChat, installed on some devices, do store images in the cache or SD card, either of which can be extracted. And even this can change with newer app versions.

What do you do if your suspect or victim is using an app or app version with limited or no support? The first step is to ask your vendors of choice if they are planning to support the app in the near future.

“Do-it-yourself” app support is also possible with manual hex search or via Python scripting. Using this form of programming, you—or an examiner you know to be skilled in Python—can develop plugins to support the apps you encounter. Depending on the script’s complexity and the coder’s experience, this process could take minutes, hours, days, or weeks. Some scripts could address low-level bytes; others, parsed or decoded data.

Smartphones, and the apps they almost always harbor, need not be difficult to examine or understand. Training can help develop a basic understanding of how apps store data. From there, it is incumbent on the examiner to build deeper understanding of apps. You may choose to test mobile forensics tools on the same version of an app installed on the same make and model of their evidence device. Or you might choose to research apps and smartphones apart from cases you are working.

Either way, understanding the basics of what you can and cannot expect to retrieve will help build your expertise—and your credibility in court should you ever need to testify.

Sidebar:
Extraction Speeds

It is important to remember that mobile devices were not designed with forensic extractions in mind. In general, logical extraction takes the shortest amount of time (but gets the least amount of data) and physical extraction, the longest.

Logical and file system extraction speeds depend on how much user data is stored on the device, while a physical extraction depends on the capacity of the memory chip itself. It also depends on the phone and the speed at which it can communicate with the mobile forensic tool. The time frame thus varies from device to device.

For example, a 16 GB iPhone 4 with only 2 GB of user data on it: I do a logical extraction and I retrieve about 1 GB of data within a few minutes. The file system extraction gets me 2 GB still within a few minutes. The physical extraction will give me a 16 GB image even though there is only 2 GB of user data because it is a bit-for-bit copy and this process can take 20-30 minutes.

Sidebar:
Decoding the malicious app

Mobile malware has been making headlines over the past year or so, with Android apps heading the list because of Google Play’s low level of regulation compared to Apple’s App Store. Currently, only a few mobile forensic tools support malware detection.

However, mobile malware mitigation still requires some manual processes. A physical or file system extraction that detects a malicious app will decode it just as it would any other data. However, identifying the malware is just the first step. The onus of figuring out the app’s payload and where it is “phoning home” rests with the forensic examiner, who must use reverse engineering along with resources like Anubis, which “sandboxes” .apks for safe exploration.

This kind of expertise can be important towards determining whether the malware is really doing what it is suspected of doing (such as downloading child pornography or exfiltrating private data), what it is doing apart from what it is suspected of doing, and even whether it might be a false positive—an app misidentified as malicious.

About the Authors

This e-mail address is being protected from spam bots, you need JavaScript enabled to view it is Director of Mobile Forensics Marketing at Cellebrite USA Inc.

This e-mail address is being protected from spam bots, you need JavaScript enabled to view it is the Engineering Product Manager for Cellebrite USA.

This e-mail address is being protected from spam bots, you need JavaScript enabled to view it is Decoding Research Team Leader at Cellebrite.

 
< Prev   Next >






Item of Interest

The language barrier between English-speaking investigators and Spanish-speaking witnesses is a growing problem. (Updated 28 February 2011)

Read more...