When you have a website with a decent amount of traffic, it is very likely you will find a few things. First, you will be providing Google data through Analytics, Google Authentication, or Advertising. Second, you will find at least one of a few tracking cookies like the Facebook Pixel or a Mobile App in the cookies.

This data is collected server-side and fed into massive databases of audience-building marketing data. Data management and planning play a vital role in digital marketing, and we are probably already collecting a lot more information from our users than most are aware.

This article shows you a few techniques to peek under the hood and then you are encouraged to share what you find by joining the Privacy Topic in the forms.

The data you can collect from your users is bountiful. Initially, it may feel a bit creepy to learn how much information is tracked on you yourself. However, we're going to focus on how users are tracked and what is recorded. You will be better positioned to respect the level of consent your users give you.

How Data Tracking Works

First, let's take a look at how we can collect all this information and make use of it. Second, let's take a look at some examples of this currently being done, and finally, let's explore privacy and consent. Ultimately when we understand precisely how data feeds into usable intelligence, we are better positioned to plan our data collection policies. We are also in a better position to protect our users and respect their right to consent to different levels of data collection.

On the left, you see the client request. On the right is the webserver log.

In the above example, we can see the server response on the left. This image is basic HTTP information that a typical web server will provide back with each request. Beyond being interesting, we're not going to talk about that too much, so you can ignore it. On the right-hand side, you can see the server log. It shows 127.0.0.1 is where it would generally show the user's IP address (it's my localhost IP). Further to the right, where it says "curl/7.68.0" is the User-Agent. The User-Agent string contains information about the user's device and software.  

So what?

This figure is an IP lookup on a random IP for demonstration only https://whatismyipaddress.com
This is a user agent lookup on my own browser. http://useragentstring.com/

With only these two pieces of information, I was able to turn it into a lot more.

Every interaction that happens on the web at its very basic level is a connection. These are all requests and responses which, by their nature, share information about you. At a minimum, they share an IP address.

When someone loads your web page, for example, they send an HTTP request to your web server, and your server responds with HTML that often includes a list of more subsequent resources that the requestor needs to load the page. Images, CSS files for styling, and javascript files are some examples of these subrequests. Each being another connection.

A Real-world Example

On the left, you can see the webpage I loaded, and on the right, you can see all the subrequests made when I loaded this page.
On the right, I expanded (just) one of these requests to show you all the information I provided along with all of these subrequests. If you look closely, you will see a lot of these requests don't even go to the website - they go to Google Analytics and ad servers.

It's fairly common for websites to have Google Analytics or a Facebook Pixel embedded in their HTML which is a specific kind of request for the purpose of tracking.

Have you installed one of these on your website?

The Facebook Pixel

In the request headers that your users send along to load your web pages, that cookie section (look closely at the image above on the right) is full of fingerprint strings to help track visitors as they navigate through your pages and around the web.  By sending these requests through tracking tags and pixels to providers like Google and Facebook, it helps you advertise much more effectively.

By including these small snippets of code, we add resources to the page load so that the user's browser sends a request to Facebook or Google for an image or a javascript file. The image is usually a 1×1 pixel (or tracking pixel) that, when the recipient loads, will send data back to the tracking system including IP Address, User-Agent, and Cookies.

In my business, it's common not to have the full picture right away. What you need to do is "stitch" the data together. You need two known data points between separate sets to tie both sets together.

Email marketing is full of these techniques. Email is often formatted with images and HTML markup. All you need to do is catch the request of one of these images loading and tie it to the sent email. Links from the email can include identification information in the URL itself. Take a look in your inbox right now and over a link. It's likely that it will look somewhat cryptic. Most email marketing tools will do this automatically.

Dynamic Number Insertion (DNI) is a call tracking feature where a unique phone number is tied to each ad source. This helps marketers analyze offline behaviour much in the same way they track online behaviour with the help of cookies.

Take a second look at your current marketing setup. You are probably already collecting a vast amount of data.

Now take the perspective of one of your visitors. Be honest and think about what data is required to provide them with personalization features to make their visit more enjoyable - things like enabling dark mode on your site. Then take a look at it from another perspective and imagine you got hacked and everything was exposed. Is there anything in there that your customers wouldn't want the hackers to see?

Finally, look at how well you handle compliance. Do you have a way for people to opt-in and opt-out of additional data?

Even if you have no business in the EU yet, the legislation makes sense as a policy. Take a look at them as guiding principles:

  • lawfulness, fairness and transparency
  • purpose limitation
  • data minimization
  • storage limitation
  • integrity and confidentiality
  • accountability

Conclusion

I would love to get your feedback on this article. I know it may seem a little long (or short in some parts) but I have a done lot of research on the subject and would love to garner more interest in the topic. I apologize for any grammatical or language errors caused by trying to squeeze these blogs in after-hours in that short window of time after the children sleep and I must sleep.

Regardless if you want to learn more about how tracking is done and how it can be improved, how to audit can verify your pixel or tracking tag setups, or even how to determine what risks you expose your users to, I would love to connect. (Oh and yes, I have old-school friends on IRC too lol).