Advancing Premise’s Fraud Checks for Improved Data Quality

by , | May 16, 2023

EmailTwitterLinkedIn

Premise’s customers rely on accurate and reliable data to inform their strategies and make important business decisions. An ever-increasing amount of data is collected by our contributors every day; therefore, ensuring its quality while preventing fraudulent activity is a critical mission for our teams. To be able to keep ahead in this area, Premise invests heavily in automated quality and fraud checks powered by machine-learning. Leveraging advanced algorithms and data validation techniques, we are able to detect data inaccuracies to maintain a high level of data quality and minimize exposure to fraudulent and erroneous data. We have made powerful improvements to our quality and fraud checks over the past few months, and are excited to showcase some highlights here.

Machine-learning Driven Bot-detection Model 

Bots are becoming increasingly sophisticated and are often used to perpetuate fraud and inflate app activity, skewing and corrupting the data that is collected. To combat this, we launched a ML-driven bot-detection model early this year. It learns from patterns in recently suspended users and automatically suspends users who show bot-like behavior on a daily basis. It’s also automatically retrained and re-deployed on a weekly basis to keep up with new patterns in who’s getting suspended. New predictions are generated daily for each user based on their recent activity, ensuring that any issues are identified and corrected immediately, rather than after the fact. This model is built atop a complex and comprehensive set of features that ensure that the data is accurate and reflective of genuine user behavior.

Ensuring Survey Response Quality

Survey data quality dips when bad actors try to power through as many surveys as they can in the shortest amount of time possible to rack up their earnings. Two of the most common problems we see:

  1. People from speeding through forms, clicking on a random response without reading through the question and responses thoroughly
  2. “Button smashing”, where Contributors pick a response in the same position too many times in a survey

Our recent improvements in this area – detailed below – help to not only boost the data quality, but also ensure that those who take the time to provide thoughtful and thorough responses have a better experience.

We rebuilt our survey speed-check capability from the ground up, making it much smarter than our previous iteration.  We incorporated our historical data to build a more accurate model to detect speeding, predicting the time it would usually take to answer a given question. Early tests have shown that this model is more nuanced and is much better at detecting bad behavior without being overly aggressive, which means that more Contributors get rewarded for their effort while our customers get high-quality data faster. Next, we refined and improved the way our button-smashing checks work and are now able to better accommodate scenarios where such behavior is not uncommon for a particular form and other edge cases. 

It is an uphill battle to fight against the multitude of fraudsters and bots across the world. These tools we’ve built are a game-changer to our ability to maintain data integrity, reduce the risk of errors and inconsistencies and deliver accurate and reliable data. Our teams will continue to invest and improve upon these capabilities and provide you with the data you need to make your decisions.