Article Hero
Blog16 minutes read
November 10, 2023
  • telegram
  • facebook
  • twitter
  • github

A Complete Privacy Guide To Metadata

One of the most important things to understand about privacy is that the information that you give away freely is far more powerful than you think.

You might think you’re being careful. You use all of the latest privacy and security software on the market. You never use your real name on public forums. You carefully scrub any mention of your country or region from any comment before you post anything on social media. You’ve completely separated your work life and your home life, leaving no links between the two. You might even use a P.O. box instead of your home address whenever possible.

But none of these things matter if you aren’t paying attention to your metadata.

Every time you post on social media, every time you create and share a picture, every time you create a recording of any kind, every time you generate a new document, every time you send a text or E-mail message… you’ve potentially generated metadata.

And that metadata can give away some of the most critical information about your life: Identity, location, technical details about your devices, the software you have installed, the companies you do business with, the social media you use… yes - even the ‘anonymous’ accounts.

Simply put: This metadata can be used by advertisers and governments to track you. But it can also be used by people who want to embarrass, exploit, or hurt you. It can expose your private activity to friends and peers. It can be used as blackmail material. It can be used to track or stalk you, with the intent of robbing your home when you aren’t around. Or worse.

This complete guide to metadata will attempt to cover all of the more common ways that hidden data is generated and sent out into the world, intentionally or unintentionally. It will address the privacy issues relating to metadata, as well as the safety and security concerns that it can pose if left unmonitored and unedited. And it will discuss tools and methods for checking, editing, and erasing metadata in several popular mediums.


What Is Metadata?

Metadata is anything that provides a deeper context to data that is generated for everyday tasks, communication, and file sharing.

Generally speaking, metadata isn’t something that gets presented as a ‘main feature’ of the generated data. In other words, it exists in the form of hidden fields, tertiary tags, and often-ignored header information.

Imagine a filing cabinet. It has all sorts of paperwork in it, organized by a label name and the subject matter covered by any particular folder. On the surface, everything looks normal.

But if you open any given folder and look at the inside cover, there’s a host of information that was added by the dedicated record keeper: Names of people who worked on the file, where it was originally created, when it was last edited, which offices have access to this information, security clearance levels required, how much of the contents are available for use by public relations, and even redaction and censorship details.

That additional ‘inside cover’ information is akin to metadata. It’s hidden from the world unless you specifically crack open the file and look at those ‘meta’ details. Most people won’t know about them. And even those who know they exist won’t really care. But to the select few, this information can be a goldmine.

Metadata can come in dozens of different categories: Structural, statistical, referential, descriptive… it’s a catchall. By default, anything that might be useful to someone is captured, as well as any information useful to an app or operating system. In this way, metadata can be seen as forensically documenting the journey of a piece of data, providing as much useful context as possible in as small of a space as possible.

That’s why most metadata is just text. Text is small, it’s easily compressed, easily transmitted, and understandable by both man and machine. Hide a little lump of text within a file, and it isn’t likely to have a noticeable effect on the file size or transmission time of any modern medium.

Think about all of the ways that you can sort a group of files on your device: By creator, creation date, last edit date, size, name, extension.

Now think about all the ways you can sort the music in your playlist: By artist, album date, genre, a number of times played, the last date played, bit rate or sample quality, beats per minute… and those are just some of the common metrics.

What about the details when you right-click on a simple text file? You get security details, creation details, the document owner, the computer hosting the file, when it was last accessed, and even version information.

All of these things are examples of metadata. That information needs to be stored somewhere, otherwise sorting and search functions wouldn’t be accurate. Similar data points or ‘metatags’ are tacked on to just about every file or form of communication that you can imagine in the modern world.

A Brief History Of Metadata

Computers were hardly the first time metadata has been used, it’s just the way that it’s been weaponized.

In the past, filing systems of all sorts used metadata to help track and organize paper data collection. The example given above of things written on the inside cover of folders, or a file cover sheet, is just one example. Perhaps one of the most common forms of metadata for some people is a library card catalog. This data was kept as an organizational standard, a register of all bibliographic and physical attributes of a book.

But it wasn’t until the modern era that metadata really started to be used as a weapon. Starting with the formation of Five Eyes (the government intelligence sharing agreement between the U.S., U.K., Australia, Canada, and New Zealand), all metadata passing through undersea Internet backbone cables was scraped and collected. Other governments soon created their own versions of this program. This practice would eventually expand to online forums and social media when they started to become popular. Metadata became the go-to tracking and behavioral analysis tool, allowing governments to see not only big picture trends, but individual taboo acts, crimes, and social infractions that could be used against people.

But it wasn’t just the government. Online advertisers have been collecting metadata for decades. At first they used it to channel the most appropriate ads to individuals, but eventually they had enough data to make predictive models for large groups of people.

People have been unwittingly sharing personal metadata in digital form for decades now. And as we’re about to discuss, that information is in the hands of all kinds of awful people.

How Is Metadata Used Against You?

There are tons of ways that the metadata you accidentally or unknowingly leak can be used against you. Some of the effects are just annoying. Others are oppressive. But the worst consequences of leaking your metadata to the wrong people all result in harm: Financial, legal, and sometimes even physical.

Leaked Location Data:

Location data is one of the most common metadata leakages. It can be present in social media posts, phone records, text messages, camera, and other image data, GPS sharing, and to a more limited extent things like E-mail and networking logs.

Why is this important? Everyone does things that are either deeply private or technically a rules infraction of some sort. This can range from aspects in their social life, to after school or after work activities, to the things they do in the privacy of their own home that they wouldn’t share with just anyone.

But if law enforcement, hackers, or just plain cruel people get a hold of your location data, they can correlate the activities in your life with the rules and customs in the area and then arrest, fine, or blackmail you. Anyone standing up for their rights in a particularly authoritarian place could serve jail time if they’re discovered. An obsessive stalker or someone who wishes you harm could use the data to find you as well.

Leaked File Sharing Data:

As you know by now, every media file comes with a host of metadata to help categorize the type of movie, music, or image that it is. Metadata stores the genre information, media play length, resolution information, format specification, default dimensions, and a ton of other things.

But hidden within this information, there might be a poison pill.

Record labels, rights management organizations, movie studios, and distributors can use metadata to trap people. They hide some unique code or phrase within media that they themselves create and put up on Torrent sites. Often it will be a slightly different file size than the original media, to make tracking that much easier.

Then, as the file gets distributed, they can map its progress across the various Torrent search engines. After that, all it takes is a subpoena, or an agreement with major ISPs. Then thousands of people get treated to expensive copyright violation fees. Anyone who was particularly instrumental to the distribution of these files might earn special attention from law enforcement (such as the FBI’s intellectual property team in the U.S.).

It’s important to note that in many cases, VPNs won’t help you to avoid these consequences. Other techniques can be used to verify your identities, such as web browser or device fingerprinting. And it doesn’t even matter if you own a copy of the physical media for whatever you grabbed because the specific version you downloaded was slightly different in some way. And of course, the meta tags will come back to haunt you in court, since they uniquely identify the file as the one downloaded via Torrent.

Honeypot operations have become more common over the last decade, and entire Torrent sites have been known to sell out to anti-piracy companies for at least as long. The metadata you feed them now could linger and be relevant for years.

Medical Metadata

Most people don’t know that pharmacies collect your metadata in every major country throughout the world. They link everything that they possibly can to your identity: Prescription information, payment methods, any forms that need to be filled out, and programs applied for.

They then sell that information to healthcare organizations… and in particular, insurance agencies. This is legal (somehow) in most states within the U.S. and in most English-speaking countries throughout the world. Check your local laws for specifics about your area.

How can this impact you? Well in the long run, this practice can kill you. Because insurance companies will use every piece of information, every link that they can find between the medical metadata and your purchase methods and your identity, to argue that you knew about preexisting conditions that you didn’t disclose to them. They can then refuse coverage when you actually need it, trotting out the metadata in court to show a pattern of self-treatment that might indicate you knew something was wrong before you were ever insured with them.

This goes beyond the annoying drug companies pushing tailored ads your way, and into the realm of complete economic and lifestyle ruination. All so that the pharmacies can make a few extra bucks and insurance companies can please their stockholders.

Social Media Metadata

Anyone paying attention to the news probably knows that people post evidence of their crimes on social media. Perhaps they think they’re anonymous, or perhaps they don’t think the activity can be associated with them in ‘real life’.

But the metadata included with every social media post includes a staggering amount of information by default: What kind of device it was posted from, the device’s location, exact timestamps and timezone, and the application used to generate the post, just for starters.

Even without including the additional metadata generated by images or soundbytes attached to a post, most default social media posts contain enough information to place you at the scene of a crime. Make a post at the wrong place and time, and the social media metadata will make you a suspect.

One of the most high profile examples of this occurred in the 2021 U.S. Capitol Attack. One year out, over 725 arrests were made, with around a quarter of them pleading guilty and dozens more found guilty at trial. Mostly due to photo metadata.

It should be noted: Even the most innocent of social media posts can generate metadata that can and will be used against you in a court case. That’s why it’s important to curate metadata before making any post online, either manually or by using privacy software.

Document Metadata

As boring as it might sound, document metadata is one of the most useful types for hackers to get a hold of. Any time a document is produced, it gives quite specific information about the production process. That includes the version of the software that it was edited on, the operating system used, the username of the creator, the computer name of the creator, last modification date, last print date, and much more.

All of this is useful to hackers, particularly versioning information. But the real goldmine comes in the form of the phishing and social engineering information that can be scraped from document metadata.

Someone posing as a tech support person is far more convincing when they have an accurate username, computer name, and software details at hand. Combine this with typical location metadata mentioned above and social media information, and the hacker or scammer will have everything they need to make a targeted spear phishing attack.

And of course, this metadata is also useful for copyright organizations if they bring a case against you. That might be for a license violation, such as using the free version of their software commercially even when the TOS for that software says you need the commercial version for that. Or it might be to track pirated or illegitimate versions of the software used to generate these documents, whether you thought they were valid or not.

Either scenario can be an expensive proposition, with consequences ranging from court cases, to credit fraud, to identity theft.

Network Metadata

This is a broad category that covers everything from connecting to an ISP to using a mobile phone network to utilizing network services like a VPN.

In a nutshell, most network metadata is stored in log files. Yes, even the supposed ‘no log’ VPNs, who are far too often simply lying. These logs track everything from bandwidth usage, to what sites have been accessed, to which cell towers are being used, and everything in between. This is valuable information to law enforcement agencies, who are more than happy to subpoena or raid a company until they get the information they want. Though that’s often not needed, seeing as they often have prior arrangements to walk in whenever they want and take the information they’re seeking.

Being online and browsing the wrong website will generate metadata that can be used as ‘probable cause’ to launch an investigation against you.

But for most people that’s not the worst-case scenario. The worst case scenario is that their ISP gets hacked. Or maybe your VPN gets breached. Then all of that metadata, everything you thought was private, gets leaked to the hacker community. And if you don’t pay a large ransom, it will get leaked to the world.

Or maybe they sit on the information and use it as part of an impersonation or identity theft scheme. They know when you log in, what sites you visit, what services you use, potentially work and bank details, and who your friends are. Congratulations, you’ve just become the perfect patsy.

Network metadata defines so much of what we do online, recording the core interactions we have on the Internet. It’s certainly not something you want falling into the wrong hands.

How Do I Scrub My Metadata To Maintain Privacy?

In order to remove metadata from files before you share them, you’ll need to go through a short process. It’s easier on some operating systems than it is on others.

Images

Images in Windows: Simply open up your File Explorer (or type ‘file’ in the OS search bar and press Enter), and browse to the location of the file that you want to scrub clean of metadata.

Right-click and select Properties. Go to the Details tab and at the very bottom, click on ‘Remove Properties and Personal information’.

Follow the prompts. You’ll have the options of scrubbing the original file, or making a scrubbed copy instead.

Images in MacOS: If you only want to wipe location data from a picture, that’s possible without third party software on a Mac. If you want to wipe any other metadata from the photo, or wipe metadata from any other file type, you’ll need to install third-party software.

To remove location data, open the picture using Preview, then press Command-I to open the Inspector. Click on the More Info tab, click on GPS.

Now click on Remove Location Info at the very bottom of the window.

Images in Android: There is no native way to remove all metatags on Android. You can, however, change the location data and the file’s date and timestamp. For everything else, you’ll want to use a third-party app.

To change location data or the date and timestamps, open Gallery. Touch and hold on to the image that you want to edit until a checkmark appears. Tap the three dots in the lower right and select the function that you want.

Follow the prompts that pop up, or navigate the map that appears. For the location, you can change it to the North Pole… unless you’re at the North Pole, then change it to the middle of an ocean. For the date and time, you can be exact.

Other Files

Other Files in Windows: Other file types follow the same process as above, as used with photos. Certain document types will have additional metadata editing options in their official readers, but you generally need a licensed version of the product (Like Adobe Acrobat Pro). But for extensive editing options, something like ExifTool is preferred.

Other Files in Other Operating Systems: For all other files types with no native metadata support in the OS, check out ExifTool. It works in Linux, macOS, and Android (the Android link is in the links section at the bottom of the ExifTool page).

As far has hiding your location and device data when posting to social media, traditional methods don’t work. There are more complex ways to change your device type and user agent information, but they involve scripting and using developer APIs for some social media platforms. By default, VPNs will pass through agent information untouched.

So other than downloading a bunch of different utilities and creating custom scripts or API hooks, what’s the solution? A complete proxy setup that uses virtual browsers and streams back the results on a tab-by-tab basis is the best solution. That, in addition to using ExifTool on any attachments sent, should completely obscure everything that matters: User agent, device type, you name it.

A privacy app such as Hoody can accomplish this. It uses a method called Phantom Browsing to create custom web browsers on their own encrypted VMs for every tab that you open or web app that you use. These tabs are entirely independent of one another; there’s no correlation between them unless you specify it. That means you could use one tab for Google Docs while logged in, and another for YouTube while logged out, and neither would be the wiser.

Unlike a VPN which only changes your IP address and encrypts data between your computer and the exit node, every aspect of Hoody browsing takes on the fingerprint of a freshly created virtual machine and newly compiled custom web browser. So no device or browser fingerprint correlation, and the next tab you open will have a completely different signature.

With Hoody hiding your device data, and ExifTool scrubbing the metadata from all of your attachments, the only thing left to worry about is your ISP or mobile, phone provider.

ISP Metadata

ISP metadata can never be completely scrubbed. You need to make some sort of handshake with them, and they can (and will) completely shut you down if you try to use their network without identifying yourself, even if the source of the traffic is from a house that has a paid-up subscription. You can, however, minimize the amount of metadata that they can log by not using their provided router. Buy your own, get instructions from them on how to configure it, and take it with you if you switch services. Don’t let them flash it with any of their own firmware. Even with this precaution and end-to-end encryption, you’ll still leave metadata on authentication information, data usage statistics, and the like.

Mobile Phone Metadata

Mobile network metadata is hard to scrub in any realistic or legal way. The problem is, each handset will be assigned an IMEI number, and you really aren’t supposed to mess with that. Getting a black market IMEI will rarely get you a ‘clean’ one, and even making the attempt might bring the feds to your door depending on other crimes committed with that ID. A blocked IMEI is useless to you as well, and that’s likely what will happen to the original if you try to swap it.

The truth is, mobile phone metadata is inherently chatty and has been for a long time. In order for it to function, you need to bounce a tracked signal off of towers. You can change your SIM, but the underlying handset information remains the same. And messing with any of that likely breaks several laws in your country.

In short: Assume any signal created by or relayed from a mobile phone is traceable and not at all private. At least as far as core hardware metadata and handset tracking are concerned.

Medical Metadata

As much as supporting local businesses is important, that goes out the window if they’re reselling your metadata.

The best way to keep your medical metadata out of the hands of pharmacies is to fill your prescriptions online, preferably over the border in other territories, or countries. This is realistic (and perhaps even a cheaper option) for some people, but not realistic for others. A level of urgency enters into the picture, as well as convenience.

You want to maximize your ability to get what you need while minimizing the ability of insurance companies to positively correlate your data. Even multinational insurers sometimes have problems getting data across borders if they can even source that information legally in certain countries or states.

Paying with cash might help a bit, as far as not linking certain credit or debit cards to your insurance profile. Similarly, paying with crypto might help avoid linking a payment method to the rest of your medical metadata.

But if you use pharmacies frequently and pick up prescriptions regularly, your only long-term hope is that you’re in a place that disallows the selling of metadata to insurance companies. Otherwise, it might just be a matter of time.

File Sharing Metadata

The problem here is that once you’ve downloaded a ‘poisoned’ file, it might be too late. Yes, you can change the metadata on your end, even in mainstream apps like iTunes and VLC. You can deactivate metadata linking from those programs as well if you’re worried about third-party tracking.

But there will potentially be a record of the download itself. Depending on the site or service used, and what fingerprinting information they have on you, you might be in jeopardy.

Privacy software like Hoody is once again your best bet. It removes all possibility of device or browser fingerprinting, masks your IP without leaks, and even has its own Torrent utility. By using the Hoody encrypted network for the download, and immediately editing the metadata to standardize it, you’re as safe as you can reasonably be.

To be safe, you can also consider publicly available services for music and books, and take advantage of the hundreds of thousands of projects that artists all around the world release for free in the hopes that it will increase sales of their commercial works.

The Rest

For any mediums that we haven’t covered, you’ll need to do some research. For open source projects, the most helpful resource will be their help system or message boards.

For everything else, there’s search engines. Two helpful search phrases that you can use in your research are ‘remove metadata automatically’ or ‘remove metadata manually’ plus the name of your device, OS, or app.

For example: Remove metadata automatically Microsoft Word

This is particularly useful if you tend to create and send out the same kind of documents or media on a regular basis. Automating the process can be a huge time saver, of course. But if that method isn’t available, learning how to do it manually so that it becomes second nature is the next-best choice.

When Do You Just Edit Metadata Instead Of Removing It?

There are certain situations where the metadata serves a useful purpose, and stripping the media, documents, or posts of the relevant information can harm the goal and intention of sharing them in the first place.

One particular reason to edit or curate your metadata instead of wiping it out completely is searchability. For applications such as advertising something you’ve made, branding a website, or releasing useful information to a larger community, editing the metadata to be more SEO-friendly is preferable.

Another reason might be correct categorical or regional grouping. For example, releasing a music track goes much smoother if the right genre, artist, title, and album information are present. Making a post on social media that will automatically get sorted into certain regional interest categories can benefit from correct geotagging.

This is another job for the handy ExifTool, or something similar. You can edit out personal information, and leave the right information to get picked up by search engines and sorted by the right content filters.

It can also be useful to add notes in the ‘Miscellaneous’ or ‘Notes’ metadata fields. These fields, when available, can help to define the intent of the file, or link it to a bigger effort or project. That might be raising awareness for a charity, or giving context to something that might be misinterpreted.

Of course, this can be taken too far. In some databases with smaller-sized entries, there’s more stored metadata than actual data! This can really drag performance down. As with all things in life: Moderation is key. Keep the metadata that is useful, and either sticks the rest in cold storage (if it might become useful one day) or scrap it.

Conclusion

Metadata is dangerous in the wrong hands. Some people use it to track their habits. Others use it to attempt to catch you stepping out of line. But a lot of people will just use it to try to rob you blind or hurt you in some way.

So use the tips and tricks above to protect yourself from potential scammers, blackmailers, stalkers, and law enforcement officers with nothing better to do. Scrub your content before you post it, and use a privacy tool like Hoody that will completely mask your browser fingerprint and device metadata on the web.

Will R
Hoody Editorial Team

Will is a former Silicon Valley sysadmin and award-winning non-functional tester. After 20+ years in tech, he decided to share his experience with the world as a writer. His recent work involves documenting government hacking methods while probing the current state of privacy and security on the Internet.

Latest


Blog
Timer7 minutes read

How the Government Hacks You, Final Chapter: IoT Hacks

Chapter 14: IoT Hacks

Will R
6 months ago
Blog
Timer9 minutes read

How the Government Hacks You, Chapter 13: GPS Tracking

Dive into the unsettling world of government-controlled GPS tracking!

Will R
6 months ago
Blog
Timer7 minutes read

How the Government Hacks You, Chapter 12: Garbage Day

Trash Talk: How your garbage can be exploited by hackers, law enforcement, and government agencies

Will R
7 months ago
Blog
Timer8 minutes read

How the Government Hacks You, Chapter 11: Resonance Attacks

It’s time to uncover how government surveillance gets personal.

Will R
7 months ago

Bulletproof privacy in one click

Discover the world's #1 privacy solution

  • Chrome Icon
  • Brave Icon
  • Edge Icon
  • Chromium Icon
  • Coming soon

    Firefox Icon
  • Coming soon

    Safari Icon
  • Coming soon

    Opera Icon

No name, no email, no credit card required

Create Key