Data Mining Facebook

A lot has happened in the past month.

Mark Zuckerberg (founder of Facebook) has been found out, “calling users who joined his social networking site “dumb f---s” for trusting him”… 


(Stephen Hutcheon, Sydney Morning Herald, 14 May 2010)  



Ahh, but wasn’t it obvious that this was his attitude already? 

Surely the awesome popularity of Facebook was accidental. 
How was Mark to know he was such a clever dude and would end up enlisting 400 million people into his venture; 
unwittingly deluding them to trust him and his colleagues with their most personal details and relationship ties?  

He didn’t ask for that. 

It just happened with a little bit of prompting. 


We made Facebook into the monster, and the amazing social engine, that it is today. We are to congratulate and to blame for its success. 
It is an extension of how we work as social creatures, and has increased our natural tendencies to connect and to gossip ten-fold.

What has happened as a result of this intimate social media and the ties that it has made for us to servers and to company databases, 
is the very slow and imperceptible breakdown of the anarchic Internet as we have known it for the last 25 years.

Targeted advertising is a baby step towards a much more controlled, monopolistic version of what was once a beautiful and free-love version of the Internet. 

Pro-active searches are giving way to passive receivables of what we “should” be looking at on the net. 
Free movement and freedom of research, the things that we value so much about the unfettered terrain of the Internet, 
seem to be ebbing away as we are tracked further and further into our most intimate and vulnerable hiding places on our computers. 

We trust the firmament of the Internet a little bit too much I fear. And those stars hurtling along inside it are driven by any number of motives –those who wish to convey information and communication, and those who just want to make exponential profits. At the end of the day, nothing is for free. Unfortunately.


We have the most advanced online applications ever imaginable, with the best answers and liberating educational tools at our fingertips. 

But we are being watched every step of the way, and I do not like the feeling.
 


Further below is some info. that you might want to look at about what’s happening at the moment. This information is supplied by Joseph Bonneau, PhD candidate at the Security Group at Cambridge. 

Joseph Bonneau is a savvy, good guy who understands the field of data mining intimately. He initially published findings on loopholes that allowed third-party data mining of Facebook in his paper “Prying Data Out of a Social Network” published in 2009 and presented at the 2009 International Conference on Advances in Social Network Analysis and Mining

Here is the abstract of his paper:

"Preventing adversaries from compiling significant amounts of user data is a major challenge for social network operators. We examine the difficulty of collecting profile and graph information from the popular social networking website Facebook and report two major findings. First, we describe several novel ways in which data can be extracted by third parties. Second, we demonstrate the efficiency of these methods on crawled data. Our findings highlight how the current protection of personal data is inconsistent with user's expectations of privacy."

Joseph also contributes to the Security Research centre’s blog at Cambridge University, called “Light Blue Touchpaper”. 

I have included links to his articles here:

http://www.lightbluetouchpaper.org/2009/03/31/facebook-giving-a-bit-too-much-away/ 

This sounds like small fry nowadays – but it proves that this was known ages before it was made public. 

Very good example of how the personal data appears within the code:

http://www.lightbluetouchpaper.org/2009/06/09/how-privacy-fails-the-facebook-applications-debacle/ 

A bit more on the enthusiastic nature of Facebook applications:

http://www.lightbluetouchpaper.org/2010/02/04/the-need-for-privacy-ombudsmen/ 

Also a bit of activism regarding Google. We will address Google in-depth soon:

http://www.lightbluetouchpaper.org/2009/06/16/open-letter-to-google/ 


Joseph’s great. Here’s his homepage if you want to find out more: http://www.jbonneau.com/

 

 

Filed under  //  Data Mining Facebook   Julia Burns   Privacy Artist  
Posted

Some Questions Answered

There are a lot of formats I could take with this blog. I think I'm going to show you a bit about my raw approach first.

 

Here is a copy of email correspondence I had with Tamir Israel, Staff Lawyer at CIPPIC (Canadian Internet Policy and Public Interest Clinic) These guys are cool and know their stuff:

 

 

How does Facebook make money?
 

Facebook commercializes the information of its users.  It allows advertisers to select information criteria, and then charges them for advertisements served to individuals with those criteria.  So, for example, if you go to this page here: http://www.facebook.com/ads/create/ 

you'll see after entering some basic info how it works.

 

Basically, Facebook will let you enter a geographic region and age group, and select any of the numerous key words FB users have in their profiles to choose who you want to see your advertisements.  It will tell you how many users meet that criteria at the outset.

 

So, for example, I can tell Facebook that I wish to send my advert to all 13-19 year old girls in Sydney or Melbourne, Australia who's birthday is today, who are in a relationship, and who like Miley Cyrus (there's about 40). An advertiser can then send these 40 girls an add saying happy B-day, come sign up to our Miley Cyrus website and we'll give you 50% off your first purchase.

 

How Does Data Mining on Facebook Work? Who/What king of companies do data mining? What software do they use to do it?


How is this data analysed and categorised? What is the value of different kinds of information that can be gathered from Facebook? (ie bday info better sale price than fav movie statistics)

 

Facebook does not conduct data mining analysis on its own consumer's info. Individual companies/advertisers do their own research and decide who they wish to target.  They then go onto Facebook and purchase advertisements based on who they think is most likely to buy their stuff.

 

Facebook provides two models for payment.  One is CPM (pay per impression), the other CPC (cost per click).  Under the CPM model, advertisers pay x amount per 1000 impressions (every time 1000 ppl who meet the advertiser's

criteria see the advertiser's add, the advertiser is charged $x).  Under the CPC model, the advertiser pays x each time someone clicks on its advertisement.  The CPC x rate is typically higher than the CPM rate, because it means someone has actually interacted with the advertisement in question.  The x value for each, however, will rely on the number of individuals who meet your criteria. CPC is used for smaller, but highly targeted audiences (40 Miley Cyrus fans), whereas CPM is used for larger, but still targeted, audiences (i.e. all 14-16 year old girls in Australia).

 

The value in a given item of info is difficult to quantify.  It will, as you may have gathered, change with the eye of the beholder.  Age (Date of Birth) and gender, location will almost ALWAYS be important for advertisers, and having a captive audience such data attached to it is invaluable for marketers, but the true value comes in more targeted campaigns, where marketers begin to make assumptions on which users will want their products based on level of education, or based on the movies they like, etc.  For example, a company selling toy guns may assume people who like war movies

will like their products.  These types of assumptions are often wrong, but it doesn't stop companies from believing that with the proper amount of research they can reach precisely the customers most likely to buy their products.

 

Now, more recently (see: 

http://www.ft.com/cms/s/2/3578fb70-4b14-11df-a7ff-00144feab49a.html),

Facebook has announced it will start providing targeting criteria based on activities its users take on other websites.  So, for example, if you visit www.x.com while logged in to Facebook, and interact with the website,

Facebook will record that, and allow advertisers to target you based on that.

Where does the scraped information go? What is the commercial relationship between information gatherers and marketing departments of big corporations? How does the transformation from data to useful information work in terms of personal information and statistics? 

 

Now, to date, the FB model does NOT provide external companies with any personal information DIRECTLY.  It merely permits advertisers to select criteria, and then Facebook serves the advertisements itself.

 

Now, however, Facebook has pushed most of its users to make much of their information 'public'.  This means that advertisers are able to collect it indirectly, not through Facebook.  It also means that, in any situation an advertiser interacts directly with a Facebook user, they are able to simply visit that user's profile page and collect whatever information is there. 

 

How do people/companies make money out of this information? Where is it stored? How long is it stored for? Is it on-sold?

 

As noted above, Facebook does not DIRECTLY provide this information to companies.  Most traditional advertisers make their money by selling products, and assume that more targeted advertisements will lead to more sales.

 

However, ever since the privacy transition described in our document (I gave you the link above), all the information Facebook has made 'publicly available' is open to anyone to collect.  Advertisers, data miners, anyone,

really, can mine Facebook profiles to get information such as name, location, favourite movies, etc.  This occurred recently, so we have not seen the impact of it yet.  However, data brokers in general collect data such as this and try to connect it with other information on individuals collected from other sources.  These profiles are very valuable to advertisers, particularly if the data can be linked to a real life mailing address or phone number.

 

What do Facebook Applications really do?  And what happens to the personal information and statistics that are gathered from these?

 

No one knows.  Most applications are likely harmless, merely intended for entertainment.  Most will use the same advertisement serving processes as Facebook (i.e. they will not directly collect/keep user data, will not pass it along to others).  Facebook's contracts actually prevent application developers from doing otherwise.  Two major caveats, however.

 

a.) Since Facebook's recent privacy transition (described in our document above), a large amount of personal information is considered 'publicly available to everyone'.  It is not clear (and not likely the case) that application developers are limited in ANY way from doing anything they want with such data.

 

b.) even for other data, Facebook tells application developers they shouldn't collect/use such information, but does little to actually prevent them from doing so.  There are 100s of thousands of application developers with millions of applications, so the potential for abuse is quite significant.  

 

ANYONE (not just companies - government agencies, jealous ex-boyfriends, your parents) can make an application, for any reason.  Once they've done so, Facebook gives you a key to their API (the repository on which most user information is stored), and those developers are able to collect much information on their users.  Given the vast numbers of these, it is truly difficult to get any sense of what occurs. 

 

Why is a person with 100 Facebook friends more valuable to Facebook than a person with 5 friends? Why do they encourage us to 'find our friends' so much? How does this help Facebook, and external parties (need examples) to make money?

 

More interactions = more engaged users = they will fill in their profiles with more details ($ from advertisers), will do more activities on Facebook ($ from advertisers), will visit FB more frequently to interact with their

many friends ($ from advertisers - FB can boast it has xx+1 million users who visit its site every single day), is more likely to encourage other friends to join ($ from advertisers), more likely to put Facebook on their

mobile phone, more likely to use Facebook to set up an account on other external websites, so they can interact with their hundreds of friends on those websites as well, etc.

 

Can people/companies get access to the information in your locked, private Facebook account? How? Who is doing this? What is done with this information?

 

Information designated 'only friends' cannot be accessed by anyone other than designated friends and Facebook itself.  Companies can still target 'private' criteria when deciding what ad.s to serve.  This is justified in

that the company is never told WHO the criteria belong to, as Facebook acts as an intermediary and passes along the information.  Unfortunately, after Facebook's recent privacy transition, 65%+ of its users now have MUCH of

their information designated as available to EVERYONE, meaning anyone can get it.

 

Even information designated 'only friends' can be accessed in a number of ways.  First, application developers can access much of it through myriad ways, as long as you or any one of your friends have interacted with the application.  Second, this information is often provided under court order to government agencies (law enforcement, etc.) or if you get sued, under court order.  Finally, this article describes how easily Facebook employees are able to get at that information: http://therumpus.net/2010/01/conversations-about-the-internet-5-anonymous-fa

cebook-employee/ 

 

 

Best,

 

Tamir Israel

Staff Lawyer

CIPPIC


 

See my artworks about privacy online and why it's important here: http://juliaburns.com

Posted

The Value of Privacy

I know it might sound crazy for a privacy artist to start a blog.

I never thought I’d do it actually.

But I’ve realised, like a lot of people, that social media is an amazing way to share opinions and catalyse discussion on important issues.  It is possible to use these tools without reneging your rights to privacy.

Marketing departments do it for instance.

Most bloggers and forum discussions that you see out there are actually driven by Internet and Social Media marketing companies trying to improve the keyword ranking of corporations. Maybe Tom who wrote an entry about his fabulous trip to Bali didn’t really go to Bali at all?..

This is obvious stuff, so is most of the other dodgey activity that goes on within the grimey and glorious Internet.

There are a lot of ways companies can use what you do on the Internet for their commercial advantage. They can also manipulate your choices online and in the real world.

I’m not talking about sneaky perv’s here guys, I’m talking about money-making giants.

Don’t worry about the little guys so much! It’s the big cloud-like entities which scare the dickens out of me.

I'm sick of being told to 'be careful' on the Internet, and to watch out because it's going to catch up with me in the future.

It's just like smoking, taking drugs, drinking alcohol and partying. We don't care, it's fun and hasn't killed us yet.

My real axe to grind stems from other people making money from me for free.

That annoys me.

It's like stealing a piece of my art, or my arm, or piece of clothing, and hawking it out to a third party!

How dare they I say!

That's my property, and if I want to sell it, cool. But if I don't, then goddammit, that's my right.

This is how I feel about my personal information.

It's a matter of untapped financial resources. And when I get really strapped for cash - I want to know that I can sell my privacy freely and that no one will have already mined my assets before me.

Now there's a new way to think about the value of your privacy.

http://juliaburns.com

Posted