According to security researchers, tracking, marketing and analytics companies exfiltrated user email addresses from web forms before they were submitted and without the user’s consent.
Some of these companies are also said to have inadvertently entered passwords from these forms.
In a research paper scheduled for the Usenix ’22 security conference later this year, authors Asuman Senol (imec-COSIC, KU Leuven), Gunes Acar (Radboud University), Mathias Humbert (University of Lausanne) and Frederik Zuiderveen Borgesius, (Radboud University ) describe how they measured data processing in web forms on the top 100,000 websites, ranked by search site Tranco.
The boffins created their own Software to measure the collection of email and password data from web forms – structured web input areas through which site visitors can enter data and submit it to a local or remote application.
And many companies involved in data collection and advertising seem to believe they have the right to capture information that website visitors enter into forms with scripts before the submit button has been pressed. .
“Our analytics show that user email addresses are exfiltrated to tracking, marketing and analytics domains prior to form submission and without consent across 1,844 websites across the EU and 2,950 websites in US exploration,” the researchers say in their paper, noting that addresses can be unencrypted, encrypted, compressed, or hashed depending on the provider involved.
Most of the email addresses entered were sent to known tracking domains, although boffins says it identified 41 tracking domains that are not on any of the popular blocklists.
“Additionally, we find accidental collection of passwords from 52 websites by third-party session replay scripts,” the researchers explain.
Proofreading scripts are designed to record keystrokes, mouse movements, scrolling behavior, other forms of interaction, and web page content in order to send this data to marketing companies for analysis. In a contradictory context, they would be called keyloggers or malware; but in the context of advertising, they’re kind of just session replay scripts.
Gunes Acar, one of the co-authors of the report, was also the co-author of a similar research project in 2017 which examined data collection by session replay companies Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale and SessionCam.
Of course, not much has changed since then, except perhaps email addresses have become more desirable as unique identifiers now that privacy-focused browsers like Brave, Firefox, and Safari are taking more steps to block cookies and tracking scripts.
Email addresses, the researchers observe, represent a replacement for cookies because they are unique, persistent, and can be used to track people across apps, platforms, and even offline interactions that can be linked to a email address such as loyalty card transactions.
The categories of websites with the most leaks are: fashion/beauty (11.1%, EU; 19% US); Online shopping (9.4% EU; 15.1% US); and General News (6.6% EU; 10.2% US).
Websites categorized as pornography had the best privacy when it came to surreptitious collection of form data.
“A somewhat surprising result was the following: despite filling in email fields on hundreds of websites categorized as pornography, we did not have a single email leak,” the researchers say, noting that previous studies of adult-oriented websites have relatively fewer third-party trackers. than equally popular general interest websites.
Those pesky regulations
The authors of the report say EU websites engaging in email exfiltration may violate at least three GDPR requirements: transparency, purpose limitation and prior consent. Companies that violate these rules can be fined up to €20 million or 4% of annual turnover, per Section 83(5).
The United States does not have a federal data privacy law, although it is conceivable that one of the a handful of US states with enforceable privacy rules may take action against the collection of pre-submission forms. But given the ineffectiveness of US privacy regulations over the past decade, don’t expect much.
The authors say they attempted to contact 58 first parties and 28 third parties with GDPR requests. They report receiving 30 responses from first parties, which ranged from surprise and remediation to justifications of one type or another.
“fivethirtyeight.com (via Walt Disney’s DPO), trello.com (Atlassian), lever.co, branch.io and decision.com were among the websites that said they were unaware of collecting emails prior to form submission on their websites and suppressed the behavior,” the report states.
Marriott, meanwhile, said the information collected by digital analytics firm Glassbox helps with customer service, technical support and fraud prevention.
Third parties Taboola, Zoominfo and ActiveProspect defended their data collection practices.
Facebook, aka Meta, is one of the third parties involved in this case. Researchers say email addresses or their hashes were spotted being sent to facebook.com from 21 different websites in the EU.
“Of 17 of them, Facebook Pixel’s Automatic Advanced Matching feature was responsible for sending the SHA-256 of the email address in a
SubscribedButtonClick event, despite not clicking any send button,” the report said.
Advanced Matching – recently called student loan data collection – is designed to collect hashed customer data, such as email addresses, phone numbers and names from payment, login and registration forms . The researchers assume that on these sites, Facebook’s script treats clicks on the no-submit buttons as a click event for the submit button.
Facebook did not respond to a request for comment.
The report concludes that browser vendors, regulators and privacy tool makers need to address this issue because it is not going away. “Based on our findings, users should assume that personal information they enter into web forms may be collected by trackers – even if the form is never submitted,” the report concludes. ®