How Much Data Do You Want To Share?

One of the recurring topics in recent discussions with friends and colleagues has been the amount of data the big online players (i.e. Facebook, Google) are collecting on their users and whether and how exactly users have control of that data. We are (hopefully) (by now) all aware of the fact that Google record all our searches – whether we are logged in to Gmail or not. It’s great that you can easily share the photos from last night’s party with your Facebook “friends”, but do you have full control over who gets to see or even share and distribute them further? If you are interested, you should take a look at Facebook’s privacy terms.

I am planning to write a longer post on the topic after conducting a bit more research as I believe this is one of the important issues that will drive the future of the web; well, at least the way we interact online. I think we have only just scratched the surface of how to deal with this issue.  Its importance has exploded due to the new ways people interact online – Google, Facebook, but also think about Twitter, Foursquare or a random review you posted on a site using Facebook Connect and leaving a virtual footprint (for eternity?).  Supporters of a rational approach to this might say: “Well, let’s just look at how much value we can derive from the data we share … and then it’s just simply comparing the cost of privacy/sharing with the benefits ..” I doubt it’s really that easy and straight-forward, particularly as certain benefits might only arise over time. There are certainly opinions on both ends of the spectrum (“I share nothing about me online” to “I don’t think there is any harm in sharing all kinds of information about me online”).

I want to share the info-graphic below which was recently posted here. It’s a nice first dive into the numbers of data shared, but far from answering some of my pressing questions. Please let me know thoughts you have / things you are curious about and I ll add it to the list of things to investigate/think about.

Good or Evil

Google buys Metaweb to move beyond Keyword Search

Yesterday, the MIT Entrepreneurship Review posted the first article in my series on Semantic Technologies, available here. I asked whether Semantic Technologies had crossed the chasm yet and my assessment was positive yet cautious:

“Overall, there seems to be consensus that as semantic technologies move out of the purely technical corner and beyond the innovators and early adopters in academia and government, content-heavy organizations and users like publishers or e-commerce sites will help these technologies cross the chasm as they see the largest benefit in applying the technology. As pointed out earlier, companies like The New York Times or Best Buy have already begun to build and rely on semantic technologies. As more and more companies start adopting linked data standards and share data in the linked data cloud, we will see more businesses created to derive value from aggregating data across different datasets to provide value to their users.”

I had quoted Will Hunsinger, CEO of Evri, who pointed out to me, that he had seen increasing activity in the past year and transactions such as Apple’s acquisition of Siri, or Microsoft’s acquisition of Powerset – as well as Evri’s acquisition of Radar Networks – are all indicators of increased focus on semantic technologies and potential exits for start-ups in the space.

Today, Google surprised the world by announcing the acquisition of Metaweb, the company behind Freebase. The official press release says that “we believe working together we’ll be able to provide better answers”. This is yet another big break-through for semantic technologies.

To understand the impact of this acquisition, have a look at the following video, which I think does a great job explaining Metaweb/Freebase and hints to the power of combining this with Google.

While Google has done a tremendous job in improving its search engine, in particular with respect to providing contextual results, access to Metaweb’s Freebase of 12 million entities will move Google even fast beyond keyword search.

I wonder though how the acquisition by Google will impact Freebase’s growth from individual contributors – will people continue to contribute voluntarily at the rate they have been? Google said in the press release that they will keep Freebase going and “plan to contribute to and further develop Freebase and would be delighted if other web companies use and contribute to the data“.

Also, parts of Bing’s search are powered by Freebase, as Jamie Taylor, Minister of Information at Metaweb, pointed out to me that “you definitely see freebase data appearing on Bing” (see upcoming interview in the MIT Entrepreneurship Review) – I wonder what happens to this now.

Facebook, Have You Seen This?

I like how Google is always able to surprise me. Just when everybody has been writing off Google’s efforts in the social graph space (e.g. see one of many posts on the topic here), with Google Buzz not making so much buzz after all, they come out with a 216(!) page, commented slide deck which is based on years of Google research in the field and goes a long way explaining how online social networks will have to be designed to better map our real life social networks and be more useful.

Paul Adams, creator of the slide deck, and member of Google’s UX team, does a great job in taking real life examples to show the limits of e.g. Facebook when it comes to mapping offline interactions to online interactions.

In my opinion, these are the major take-aways from the slide deck with respect to where online social networks are headed:

1. Friend <> Friend: We Need A Better Way For Mapping Real World Relationships Online

Social networking sites like Facebook currently do not offer the tools to segment your friends into distinct groups and to allow you to communicate with each group in distinct ways. Paul makes the example of a young woman named Debbie and how the kids she teaches, and who have friended her on Facebook, are able to see the pictures from a night at a bar with some of her friends, she has commented on. The fact that people have multiple groups of friends in the real world [according to the research: 4-6 independent groups with 2-10 friends each, formed around hobbies, shared experiences, life stages], is currently not fully reflected in the online social network design.

2. Strong Ties, Weak Ties, Temporary Ties

A lot has been written about the theory of strong ties and weak ties (e.g. this is a good book I recently read on the topic). Paul explains how research has shown that we typically have 2-6 strong ties, i.e. people we communicate with on a daily basis. We are also able to manage up to 150 weak ties, i.e. people we keep in touch with but not necessarily speak to on a frequent basis. However, strong and weak ties alone are not able to fully capture the breath of our online relationships. Paul introduces a category called temporary ties, which basically include people “we have no recognized relationship with, but that [we] temporary interact with”. In my opinion this extension of the current framework will prove very useful; currently the information content from these relationships is lost: for example, a positive interaction with a friendly store assistant is typically not captured anywhere, but could be valuable for me or other people in the future.

Source: http://www.slideshare.net/padday/the-real-life-social-network-v2

3. Online Trust – Privacy Matters!

In my recent blog post, Three Pillars of Online Trust, I talked about how the rise of online social networks and the ability to leverage them, will impact the way we establish trust online. Paul makes the point that, with more information becoming available in online social networks, we’ll increasingly rely on our social graph to make decisions. He also points out some of the reasons why current review systems tend to be biased and/or broken. We also tend to be influenced by temporary ties, for example people tend to give the same review online as people before have given. People care about what others think about them and research has shown that anonymous ratings tend to be 20% lower than ratings where people provided their real name. Also, I agree with Paul’s assessment that the role of influentials is “overestimated” – I see some similarities to the way the role of strong ties was overestimated at first.

One of the most heavily discussed topics and definitely a dimension which will shape online social networks going forward: privacy. Paul says that “We think people care less about privacy because they misunderstand complicated privacy settings”. The big difference between online and offline is that online “conversations are persistent”.  Paul points out that privacy and trust are linked tightly together. I agree with Paul that as a business, the way you handle people’s private, sensitive data, will impact the way people trust you and do business with you in the long term. Given the huge amounts of personal data already online and coming online over the next few years, I think data privacy and privacy management within companies who are active online should be a key item on every agenda.

Want to know more about it?

I believe that anyone who is marginally interested in social networks and understanding how they should be designed should take a look at the slide deck, posted here. Great job, Paul Adams. I can’t wait to read your book, Social Circles (coming out Aug 30, 2010).