Bilingualism in the Great White North

Canada has two national languages, English and French. In a Canadian census of 2011, data was gathered regarding “Languages spoken most often at home.” In this study the government reported that 83.5% of the Canadian population (with the exception of Quebec) spoke English. Primarily French speakers accounted for just 2.5% of the population (again, not including Quebec, and the rest being made up of other languages)

So why was it that the merchant’s fraud-coded chargebacks with billing outside of Quebec had a browser language of FR-CA or FR-FR on nearly 35% of the fraud. The answer was that despite using VPNs, proxies, and dummy email addresses, even good fraudsters don’t always hide everything about themselves.

Correlating language risk

The next step to counter this fraud vector was clear. A supervised machine learning model focused on browser language, looking for French with billing from provinces where French as the browser language is a statistical anomaly, went into place. The model was able to detect and reject more fraud with billing originating from British Columbia, Alberta, Saskatchewan, Manitoba, and Nova Scotia. Additionally more granular rules were needed for Ontario, specifically to train the model against false positives from residents in and around the Ottawa area, which has a much higher concentration of French speakers than the rest of Ontario. These were characteristics even well trained analysts weren’t successfully able to diagnose.

The detection changes based on browser language ended up being one of the 5 biggest threats identified and solved via the model.

Making small gains from unexpected languages

Not every merchant’s use case will benefit from targeting specific languages the same way. For some merchants, it can be more beneficial to focus on what the browser language isn’t, rather than what it is. Some US based merchants that do not accept orders placed outside the US have found some benefit in reviewing transactions where the browser language is anything but EN-US.  This has resulted in internationally based fraudsters that are utilizing US based re-shippers getting caught from having their own native language in the browser or more subtle mistakes like using EN-GB, which would be very uncommon for most American customers.

Similarly, a Mexico based airline had certain threat levels correlated to Spanish browser language codes that were not ES-MX or ES-ES. This enabled them for example, to flag travel bookings with Mexico billing where the language ES-PA suggested the fraudster was actually from Panama.

Travel utilizing low risk browser languages

If you’re an international company that witnesses many languages in bookings, it can be valuable to identify languages strongly correlated with good bookings to help reduce your false positives. A large online travel agency with a high review rate found some advantages in looking at browser languages with a strong history of having a low chargeback rate. They found that there were some languages specific to Western European countries and some Southeast Asian countries with a much lower risk of fraud. Over time, the machine learning model picked up on these characteristics that could reduce the need for these lower risk bookings to be held, even in cases of same day departures or hotel check-ins. The positive downstream impacts could be felt in the call volume of customer service centers, as it meant less delays in verifying customers or fulfillment agents not having to manually correct a ticket price if an airline’s ticket price had fluctuated since the booking was made.

Are you utilizing a proper device intelligence provider? Are you analyzing browser language to detect threats? Do the signals in your machine learning models properly account for language idiosyncrasies where your customers are?