Social Media Data

Social Media Data

Yesterday I wrote a post that argued that we need to move beyond a focus on what parents ‘share or not share on social media’ and instead consider children’s social media data flows. But how can we really understand social media data?

The truth is that we can’t.  Social media data is extraordinarily complex, as it includes not only the data that is collected, but also the data that is inferred by machines, companies and individuals that bring that social media data together to come to conclusions about an individuals’ psychology or behavioural patterns.

Here I will be focusing on the data that is collected, and in the future I will be discussing data inferral in more detail.

During the Child | Data | Citizen project, I carried out a ‘platform analysis’ of four different social media companies: Facebook, YouTube (Google), Snapchat, and Twitter. The analysis, as explained in my forthcoming book, consisted in the study of the promotional cultures, business models and data policies of the different platforms. It also consisted, in researching and following the different news that involved patent requests as well as privacy scandals. The platform analysis was largely qualitative and ethnographically informed, in the sense that I found myself analysing these platforms simultaneously as a researcher and as a ‘concerned parent’ who wanted to find out how my children’s social media data (and my own) were being collected.

My Social Media Data 

During the research by reading in detail the different data policies, I came to the simplified understanding that broadly speaking there are four different types of data flows that social media companies admit to be gathering in their data policies:

Registration Log in Details: Details include name, date of birth and email to set up the account, at times people register other important details such as workplace, education etc.

Activity: Social media companies collect a lot of data in relation to ‘what people do’ when they use their services. Of course activity data varies from social media platform to social media platform. Yet broadly speaking activity data includes “How you  interact with content”, “Voice and audio information when you use audio features”; “Purchase activity”; “People with whom you communicate or share content”; “search activity”.

Content: Social media companies collect a lot of data on the content we produce. Google, for example claims that they collect different kinds of content: “email you write and receive, photos and videos you save, docs and spreadsheets you create, and comments you make on YouTube videos”. All the four companies shared very similar terms of services when it came to the issue of content ownership. They all made sure that they clarified that ‘users retained data ownership and intellectual property rights’ of the content they produced. Yet by signing of their terms of service, you also agreed to grant the company (and their “affiliates”, “partners”, or “those who they work with” ) “a worldwide, royalty-free, sublicensable, and transferable license to host, store, use, display, reproduce, modify, adapt, edit, publish, and distribute that content. (Snap Inc., Terms of Service, 2019, yet similar wording is used by all the four social media companies, including Google).

Device Data: Social media companies gather much personal information from devices. This is  personally identifiable data (e.g. IP address, unique identifiers, device IDs, and other identifiers, such as from games, apps or accounts you use, and Family Device IDs, this latter as we shall see in the future is very interesting). They also gather: “information you allow them to receive through device settings you turn on, such as access to your GPS location, camera or photos” and other data such as “battery level, ..browser type, app and file names; cookie data etc.” Yet what the companies are also really trying to collect is more specific behavioural data “information about operations and behaviors performed on the device, such as whether a window is foregrounded or backgrounded, or mouse movements”.(Facebook, Data Privacy, 2019)

Data-Broking Off Line: All companies admitted to collect data on social media users offline, through third party partners. Even in the case in which users do not have an account. Facebook for instance lets its users know “Partners provide information about your activities off Facebook—including information about your device, websites you visit, purchases you make, the ads you see, and how you use their services—whether or not you have a Facebook account or are logged into Facebook [… ]We also receive information about your online and offline actions and purchases from third-party data providers who have the rights to provide us with your information”.(Facebook, Data Privacy, 2019) Also Twitter mentions that they receive information when you view content on or otherwise interact with their services, which they refer to as “Log Data,” even if you have not created an account. (Twitter, Data Privacy, 2019).

I realise that the above information is overwhelming. This is because the data that social media companies collect is enormous. When we think about social media data, however, we should not only think about the incredible amount of data that is produced, collected and archived, but we also need to think about the multiple ways in which social media companies are creating a single system where all the “distinctive data signatures” can be mapped to a single person. This according to the authors of Data-Driven: Harnessing Data and AI to Reinvent Customer Engagement, one of the latest books for businesses on data harnessing, is what successful companies should be aspiring to (2019:26). The question thus arises spontaneously: Are they trying to create unique profiles of individuals from childhood?

Children’s Social Media Data and Family Data

All the different companies do not allow individuals below the age of 13 to open social media profiles, in order to  comply with COPPA (Child Online Protection Act). Yet it is clear that companies are trying directly or indirectly to harness the data of children, beneath the age of 13,  either by designing technologies that are explicitly targeted at them or by trying to gather information from family and household profiles (I will explore these themes in more details in the future, yet I have already talked about Amazon household profiles).

Key examples of the ways in which these companies are directly gathering children’s data can be found in the creation of  technologies such as YouTube Kids or Facebook Messanger for Kids. If one reads the extent of their data collection, it is not surprising that in the last few years both technologies have been heavily criticized by child campaign groups. In the book, I dwell on the analysis of these policies in comparative perspective and the debates that followed their introduction.

Yet in the book I also draw attention to the fact that we know very little about other forms of indirect children’s data collection on these platforms. It is clear that most social media platforms are collecting and storing immense amounts of children data. All the pictures shared on Facebook by parents and friends, all the cute videos on YouTube, the hashtags on Twitter and the ‘snaps’ on Snapchat are collected according to terms that generally apply to the adult profile. In this way the data of children is integrated with the data of adults. Yet these companies, have the means and technologies, or at least are developing the means and technologies to try and separate profiles and harness that data.

An example of this can be found in the fact that, in November 2018, Facebook filed a patent for  ‘PREDICTING HOUSEHOLD DEMOGRAPHICS BASED ON IMAGE DATA’. The technology, which will rely on facial recognition, and will enable Facebook to profile “photos posted by the user and photos posted by other users socially connected with the user” as well as other types of textual data (e.g. captions such as “thankful for my family”) to  “build more information about the user and his/her household in the online system, and provide improved and targeted content delivery to the user and the user’s household” (Bullock et al, 2018 Patent Application)

When we break down exactly what type and how much data is being collected by social media companies we certainly start grasping its complexity. Yet focusing on what type of data companies collect is really only just scraping the surface, because the real question that we need to ask ourselves is  not about data collection but about data inference.

The danger of social media data is really not the fact that a mother posts the image of her child eating ice-cream and that the company collects that image, but the fact we are increasingly seeing more and more organisations relying on algorithms and automated decision making on the basis of social media data. This is the main risk of social media data, not the data itself, not the fact that the parent chooses to share images of the child, but the fact that as a society we started to believe that we can draw conclusions about people on the basis of  social media data.

More to come.