Using Big Data to Fight Credit Card Fraud

According to research payments group Javelin, 15.4 million consumers experienced identity theft or fraud in 2016, amounting to more than $16 billion worth of theft.[1] As fraudulent credit card purchases grow, the onus is on credit card and payments companies to take action: the Fair Credit Billing Act states that credit card owners are liable for a maximum of $50 for unauthorized transactions.[2] In an effort to minimize losses, credit card companies are increasingly using big data and machine learning techniques to fight credit card fraud.

In the past, credit card companies detected fraud by flagging suspicious transactions and calling in human investigators to closely review the transactions. This process may even have included phone calls to the consumers, asking for verification.[3] However, with the number of credit card transactions growing at an annual rate of 8.0 percent by volume between 2012 and 2015,[4] this process was not sustainable. Better technology was needed, which is where big data and machine learning come in.

Credit card companies are beginning to employ big data and machine learning to identify fraudulent transactions when they happen, not after verifying with the consumer. Companies first use normal consumer transactions to train their machine learning algorithms. After it has established the consumer’s ‘normal’ transaction behavior, the algorithm can then predict a probability that a certain transaction is fraudulent. Companies can set specific thresholds for this probability, and if a transaction is over this assigned value, it will be rejected.[5]

Many factors go into these algorithms, including consumer shopping behavior, device used, transaction amount, time, location, vendor reputation, etc.[6] Another factor is IP address location – if the same customer account shows multiple IP addresses from all over the world, it is likely the account has been hacked.[7] The more factors and data available for the algorithm to use, the more accurately it can perform.

These new algorithms require a lot of data and, often times, new data infrastructure. Payments giant PayPal monitors more than 169 million active customer accounts and processes more than 1.1 petabytes of data, with many subsets analyzed in real-time. The company has turned to new open-source technologies like Hadoop and Kraken, which are run on grid and cloud computing infrastructures, to store and analyze all this data.[8] Other companies are collecting data and employing cutting-edge  fraud detection techniques, such as artificial neural networks, artificial immune system-based models, support vector machines, and hidden Markov models.[9]

While there are plenty of tools and models available for credit card companies to use, a few issues remain in the field of credit card fraud detection:[10]

  1. Credit cards transactions are inherently private, and the lack of a standard dataset makes it difficult to compare different techniques and methods. Consequently, there is no standard algorithm or technique that outperforms all others.
  2. There are limited metrics to evaluate a fraud detection system’s accuracy and efficiency.
  3. There are limited adaptive credit card fraud detection systems that can learn as transactions stream in. Instead, most systems must be trained offline and cannot immediately incorporate new fraud or new normal behavior.

 

As cyberattacks and data breaches continue to advance and grow, hopefully payment and credit card companies’ use of open-source technology, big data, and machine learning can help them stay ahead of the game.

 

 

[1] https://www.cnbc.com/2017/02/01/consumers-lost-more-than-16b-to-fraud-and-identity-theft-last-year.html

[2] https://www.ftc.gov/sites/default/files/fcb.pdf

[3] http://www.govtech.com/fs/Machine-Learning-And-Big-Data-Know-It-Wasnt-You-Who-Just-Swiped-Your-Credit-Card.html

[4] https://www.federalreserve.gov/newsevents/press/other/2016-payments-study-20161222.pdf

[5] http://www.govtech.com/fs/Machine-Learning-And-Big-Data-Know-It-Wasnt-You-Who-Just-Swiped-Your-Credit-Card.html

[6] http://bigdata-madesimple.com/how-to-use-big-data-to-successfully-fight-credit-card-fraud/

[7] https://blogs.wsj.com/cio/2015/08/25/paypal-fights-fraud-with-machine-learning-and-human-detectives/

[8] https://blogs.wsj.com/cio/2015/08/25/paypal-fights-fraud-with-machine-learning-and-human-detectives/

[9] https://arxiv.org/ftp/arxiv/papers/1611/1611.06439.pdf

[10] https://arxiv.org/ftp/arxiv/papers/1611/1611.06439.pdf

3+

Users who have LIKED this post:

  • avatar

8 comments on “Using Big Data to Fight Credit Card Fraud”

  1. Credit card usage is quite prevailing across the world. As an international student from China, I was shocked the first time using credit card here in the United States, because no passcode is needed! That definitely helps with convenience but remains fraud risk. Utilizing big data and related technologies to find out flaws in the existing network could definitely be a great point to start for ensuring a secure environment for credit card usage.

    0
    1. Hi Wenlaih, I completely agree! Even when the U.S. adopted “chip-and-pin” EMV cards, no one was required to enter a PIN number. And fewer and fewer merchants check identification when you use credit cards.

      0
  2. It is interesting how credit card fraud detection happens using simple patterns from the user’s spending history like geography, time of shopping, typical stores they buy from, typical spend spread across different categories like restaurants or travel, etc. Big data has definitely enhanced the security associated with credit card usage but there is a lot more that could be done to go more granular like the typical items the user buys, how frequently etc.

    1+

    Users who have LIKED this comment:

    • avatar
  3. Thanks for sharing, Claire.

    Adding a few more critical open issues in the field of credit fraud detecting:

    * Sampling bias in training data. A great hurdle in particular if supervised learning techniques are used and where business rules are implemented to screen out potential fraudulent transactions. (The issue, on a high level, is that one would have to infer the fraud rate in those that were rejects; and the ML model has not seen labeled data from the reject pool…)

    * Data quality and availability. Fraud detection problem is typically highly imbalanced (very small % of positive class), thus requires a large amount of high-quality data to train.

    * Limitation of typical anomaly detection type of algorithms.

    I think it would be interesting to see what creative approach practitioners would come up with, including the use of alternative data, more thinking along behavioral science, etc.

    1+

    Users who have LIKED this comment:

    • avatar
  4. Very interesting blog Claire ! Credit card fraud is a big issue and I have been thinking about these problems as well. This tremendous rise in the usage of credit cards is causing some fear and insecurity about if the companies with keep up with the progress of the cyber attackers. So far, it seems like they do but if at some point they do not, the issues that will arise will be huge and maybe irreversible . To my mind, the moment that the companies will be lacking the tools to cope with the challenges they meet, they will probably collapse. However, it makes me feel a little bit safer that companies takes so seriously the potential and actual threats and that so many different disciplines and variables come into play in order to secure electronic transactions and systems to verify correctly the consumption patterns of their clients.

    1+

    Users who have LIKED this comment:

    • avatar
  5. Nowadays we mainly use IP address to determine whether transactions are regular payments or fraudulent. With more factors like shopping behavior and device used taken into consideration, it’ll make credit card fraud more discernible. However, as the positive false rate goes up, the false positive rate also becomes larger, which may cause a lot of trouble to credit card users. Therefore, it’s very important to choose appropriate algorithms as the optimal classifier.

    1+

    Users who have LIKED this comment:

    • avatar
    1. Hi Shijia, I agree that the false positive rate goes up which may annoy users. It’s a delicate balance between annoying users and falsely blocking their transactions vs. catching fraud immediately.

      0
  6. It is interesting to note that payment processors and credit card companies are using big data to prevent fraudulent charges. When I worked to implement a fraud matrix at a payment gateway company, the process was still at a stage in which humans did most of the work, manually flagging fraudulent charges. There was, however, some automation based on those human inputs. I hope, in the near future, such preventions can be fully automated and available for all merchants.

    0

Comments are closed.