Digital forensics is most often a reactive science. This isn’t Minority Report; a crime happens and a forensics team collects, preserves, analyzes, and presents the scientific version of the facts before a court of law.

But with the increasing connectedness of the world around us and huge amounts of real-time data now whirling about us at velocities hard to imagine, people are now exploring the idea of using a combination of data science and forensic techniques to detect patterns of emergent crime and prevent them from happening.

When we said this wasn’t Minority Report, maybe we weren’t quite right. Some aspects of the pre-crime methodology have already been in place and are being used by police forces in Los Angeles and Chicago since 2008.

Called predictive policing, the practice has been adopted in at least four other US States and three other countries, among them China and the United Kingdom. Through the use of algorithms and other analytical techniques, police can now project what crimes are likely to be committed in a certain geographical area, who are likely to commit those crimes, and who are the likely victims of those crimes.

A ComputerWorld article quotes Paul Kenyon, COO of security software company Avecto:

Proactive forensics is the practice of looking for something in advance based on high level futuristic rules. Rather than responding to a situation, proactive forensics can be used as an early warning system by using key characteristics to identify certain behavioural changes in applications, detect anomalies in network traffic or unexpected alterations to system configurations. It requires a very high level view of everything that’s going on across the entire network. However, to be truly effective it must also be capable of issuing timely alerts when something erroneous occurs.

A Framework: PROFORMA

A research team including Amarnath Gupta of the San Diego Supercomputer Center at UC San Diego has developed PROFORMA, a prototype application that uses automated techniques to help prevent internet fraud.

Using message histories, Facebook posts, mobile phone call logs and other data sources, Gupta and his team believe that their program could a warn a potential victim before an attack happens. For now Gupta’s team limits PROFORMA’s application to fraud that “capitalize on prolonged communication between a victim and an adversary – interactions that may spread over multiple channels (Internet, mobile phones, emails) and may be publicly visible or private between the parties.”

The team’s research paper says that PROFORMA was made possible by the availability of large amounts of data from social media companies, the development data science techniques that could sift through a mass of unstructured data, and fuzzy logic techniques that could assign trust “weights” to messages between a criminal and his target. The combination of all three allows the computation of an “anomaly score” which can trigger a warning message to a potential victim.

The authors explain how it works:

Once commissioned, the system uses the API of these sources to construct a combined profile of the subject by drawing his/her individual profile information from all connected sources. The Social Context Builder attempts to reconstruct the visible (based on the subject’s permissions) part of her social network over all media channels, and store this information in a personal knowledge base.

The Message Aggregator scans messages from different sources, places them in a Heterogeneous Data Store (e.g., the AWESOME polystore [1]) for analytical operations occurring downstream. The Trust and Risk Analysis Module is the heart of the prevention mechanism.

The risk analysis involves computing a trust score for each message (or a set of messages depending on the configuration). More importantly, it monitors the responses written by the user and assesses the risk associated with the user’s message based on the content of the message, the trust of the receiver and the history of trust and risk computed over the lifetime of exchanges between them over all message channels. For example, divulging the security code of a locked gate to a suddenly-turned-romantic friend of a friend may be very risky.

If the message is evaluated to have high-risk, a Protection Directive – a statement that says which part of the subject’s response is high-risk, along with a link that explains why the system made the assessment – is immediately created for the user.

Read the full paper for more details.