Privacy and Ethical Web Analytics
Web analytics is often based on invasively collecting and aggregating user data. But web analytics doesn’t have to be an invasion of privacy. A growing movement of businesses, including performance monitoring services like Request Metrics, are working to create sustainable web analytics tools. Tools that give web developers the metrics they need to improve their websites without compromising the privacy of our users.
Google is often the worst offender of user privacy. Through their useful free analytics service (GA), they gather private user data from millions of websites. With all this data, they can predict your age, gender, education, wealth, and political affiliations.
If one would give me [six websites visited] by the most honest [person], I would find something in them to have [them] hanged.Cardinal Richelieu, updated for the information era
Users expect and deserve some sense of privacy online. Unfortunately, the end-user tools to protect your privacy online are over-zealous, complicated, or poorly adopted. As web developers, it’s our ethical responsibility to fulfill our user’s expectations of privacy, and we are accountable when it’s lost.
Privacy-Aware Web Analytics
Web analytics tools, including web performance and error monitoring services, are critical for building great web applications. Together, they enable developers to build fast and reliable user experiences that are valuable to users.
In general, web analytics are trying to answer three questions:
- How did the user find my website?
- What is the user doing on my website?
- What experience did the user have on my website?
We can answer all of these questions AND maintain user privacy by following three principles.
1. Limit Data Collection
What questions do you want to ask about your users, and what data do you need to answer them? By limiting the data you gather from the user, you ensure they will remain more private and more anonymous. Questions like:
- How often is my pricing page visited?
- How many users complete this form?
- What are the most common screen sizes used on my site?
There are only a few data points we would need to answer these questions, and users would understand our need to know this.
But data collection also includes the scope of collection—mostly we only care what users do on our website. But by using “free” third-party tools, you inadvertently broaden the scope of data to millions of sites.
2. Avoid Implicit User Identification
If the user hasn’t identified themselves, we shouldn’t attempt to do so either. This seems obvious and most people would find it “creepy”. Yet this is very common in many web analytics tools.
For some web applications, we know who the user is. They have logged in or provided an email address. They have explicitly identified themselves, and it’s okay to link their actions together.
But anonymous users expect to remain so. Technologies like device fingerprinting, tracking cookies, and third-party iframes implicitly identify the user using the volumes of data being collected about them.
Many governments are trying to address this problem through legislation like the European GDPR and Californian CCPA, but it is still common practice today.
3. Aggregate Data When Possible
Most of the questions we have about our users on public websites are not about specific users, but rather aggregates of users:
- “How many users visited my checkout page yesterday?”
- “How many times has this video been watched?”
- “How many times are mobile users visiting my site?”
We can answer these questions by aggregating a count of events rather than specific events on a user. Counting the number of visitors to a page in this hour, like old-school web counters did, we can answer many of our questions without compromising user privacy at all.
Of course, this doesn’t work for every case. We have a lot of questions about usage in the Request Metrics application where we need to know both who and what happened specifically.
Supporting a Fast and Private Web
At Request Metrics, we didn’t think it was a good trade for our users or their visitors. All our performance monitoring data is kept in our own system, aggregated and non-identifiable. We have no idea who your users are or where they will go next—we just know how fast your site was for them.
We also built an internal product analytics system using Elastic to keep our important user usage and engagement data private. We hope to adapt it for anonymous web analytics and put it on this static website as well.
There is no perfect solution, and there are valid reasons to use Facebook’s SDK or Google Analytics. But as developers, we should also consider ways to send less data to third parties. Maybe by using plain-old links to social networks instead of their widgets. Or maybe not including Google Analytics and opting for a commercial option instead.