I know the term “meta” is cool with the kids right now, but seriously talking about data, that can be found in data, that relates to data itself, thats just so meta man 🙂
If you are new to the term metadata then I will look to give a brief definition. Metadata is often defined as data about data, there are many different standards when defining metadata but we wont go into that now. Essentially there are two common forms of metadata, both structured and descriptive. Structural metadata provides contextual information such as how data is put together, something like number of pages in a chapter. Descriptive provides information relating to discovery and information, such as version information, date, creation author etc. Depending on the information, you can also find administration type of metadata that provides details of who can access information, as well as information relation to retention periods.
So thats all well and good aka boring, but what does this mean from an attacker / defender standpoint, the short version is its yet another passive intelligence gathering approach that can be taken to learn more about an organisation, its individuals, software and hardware, location, timings, etc. There is alot of information in the media in relation to government agencies and their use of meta data, a key thing to point out is meta data is often without context. You can tell when something was created, where, how many revisions etc, but you don’t necessarily know why it was created and depending on the type of data you may not be able to access the content either. So like all Intel Gathering approaches it helps build up a picture but it may or may not be complete. I should also say metadata can be very handy internally for some of the points mentioned above, but in general when you are sharing with the outside world you should strip metadata from files to minimise the data leakage, there are various manual and automated tools to achieve this but not something we will cover here.
So what information can we find certain type of files?
This isn’t an full list, but more to provide an indication of information available to the outside world.
In Pictures / Photos – Date and time the image was taken / created, the device used, potentially GPS location information.
In Documents (PDF & Office Documents) – Date and time of creation, user account / ID information associated with creators / modifiers, version of software used.
Websites – Titles, Descriptions and Keywords, the data a search engines use to rate and index sites.
There are other interesting places you can find data, but strictly speaking that isnt metadata.
Why care about MetaData?
All information that is leaked and gathered is used to build up the multiple puzzle pieces to build up an informed picture of organisation and or individuals. From gathering metadata from the sources above we have alot of information that could be used in a social engineering attack. We know the type of camera or phone used to take a picture, perhaps we can make a call proclaiming to be from the manufacturer about a recall or issue. There is potentially GPS location identifying where the individual works, where the visit regularly or even where they live. From documents we potentially have their username and username formats, and also the version of software used to create and modify the document and potentially the Operating System itself. With this information potential vulnerabilities can be identified for potential exploitation, and user credentials can be used in bruteforce attacks.
This information may seem minute in its nature, but it can be the small pieces of information here that provide key pivotal points in the game of attack and defence, so any actions that be be conducted to minimise information leakage and increase the effort of the attacker are of benefit.
Tools to check out in your MetaData hunt!