When your boat fills with water, you need a bucket to bail it out.

But sometimes there’s just too much water.

For digital forensics experts working in an IoT world, drowning in data is the new norm. It can seem like the old tools aren’t adequate

The old methods of water removal still work, of course, like the traditional methods of forensics still work, but they simply may not work fast enough to make a difference in some contexts.

A recent paper, titled Cybercrimes Investigation and Intrusion Detection in Internet of Things Based on Data Science Methods, dives into how data science methods can be co-opted to help forensic investigators deal with large amounts of data:

The massive amount of data needs new fast and efficient processing tools and techniques for data extracting and analyzing in less period of time. Data science methods can be used for this purpose to investigate and detect a different type of severe attacks and intrusions. This chapter introduce principles of Digital Forensics, Intrusion Detection and Internet of Things as well as exploring data science concepts and methods that can help the digital investigators and security professionals to develop and propose new data science techniques and methods that can be adapted to the unique context of Internet of Things environment for performing intrusion detection and digital investigation process in forensically sound and timely fashion manner.

It is a matter of scale. These days, we produce about 2.5 exabytes (that’s 2.5 billion gigabytes) of data every day. The number is only going up.

Our refrigerators, dishwashers, coffeemakers, light switches, door locks, thermostats, and lights are now hooked up to the internet. So are our cars, traffic lights, and transportation systems.

It is not just the amount of data produced but the rate at which it is continually produced that makes it resistant to the task traditional forensics to collect, preserve, analyze, visualize, and present data.

Newer and much faster processing tools are needed and this is where digital forensics can get a leg up from the field of data science. Data science methods can extract previously hidden and unknown information from a mass of unstructured data quickly. Machine learning and automatic extraction methods can search and organize large amounts of data in a jiffy.

Semantic Indexing

Among the tools that can be borrowed from data science is semantic indexing. Instead of just recording key words, it also examines the entire database as a while to see if any other document has those same words. It then uses a mathematical technique called singular value decomposition (SVD) to highlight patterns in the unstructured data. This makes the indexing much more efficient and allows the recognition of useful knowledge.

Pattern Recognition

Pattern recognition systems are also used in data science. The AI present in the software could be trained to recognize patterns from sample of previous work even though it is now spread across a huge volume of data. This could aid forensics workers in identifying authors of fake emails or malicious code.

Image Analytics

Analytical methods derived from data science can also discover if images presented in evidence are forged or tampered with. A camera’s images sensor usually procures a predictable pattern of noise. If the level or quality of noise changes from area to area on a photo, it may have been tampered with. Such distinctions in the level of noise from sensors can only be discovered through the processing of large amounts of data.

Supposedly deleted files on a hard drive are actually not deleted. it is just fragmented all over the place and new data can override it. Data science techniques can recognize these file fragments and stich them together again, enabling forensics investigators to reclaim the data.

These examples do not exhaust the possibilities for cross-fertilization. In the future, more data science techniques can be adapted to the needs of digital forensics. With the explosion of data still ongoing, we are sure that they would be needed in the very near future.