Task force/Analytics/2010-05-19 Andreas Weigend
Andreas Weigend visited the Wikimedia Foundation on May 19, 2010 for an informal "data day." Our goal was to talk about how Wikimedia could better think about and use data to help with its strategic priorities. Fifteen Foundation employees and six guests (including Andreas) participated. Notes were taken collaboratively on an instance of Etherpad.
Facebook vs. Twitter followers
As an opening exercise, each participant was asked to report (based on memory) their number of Facebook friends and of Twitter followers.
Then we each were asked to check our answers; some discussion of what the differences meant.
Facebook/Twitter experiment results
What is a fundamentally private service? Intimacy
Andreas: "Privacy was a blip in history."
Who cares about privacy?
Daniel J. Solove, I've got nothing to hide: and other misunderstandings of privacy.
- Catalogues multiple notions of privacy; hiding embarassing information is only one component. Control over how information is used / shared, ability to correct.
- Notion of privacy has changed (Facebook and Internet services are pushing this; also other technological advances, e.g. in genetics)
Moka: difference of web-literate populations that have instincts about what to share, what not to share; understand how information flows
Andreas: people are lazy and don't care until something big happens.
Moka: not laziness, ignorance.
Miller from Princeton published a paper in Scientific American in the 1980s: literacy has changed. Americans were "at the bottom" of general literacy.
Knowing vs understanding (factual knowledge vs. strategic thinking).
External representation of who you are by an external party vs. self-representation; self-representation was not possible before. Erving Goffman's work in the 1950s, The Presentation of Self in Everyday Life.
- (not brought up in conversation, but relevant to privacy conversation: http://bynamite.com/blog/2010/03/12/privacy-and-stupidity/ from robla)
danah boyd on privacy in social networks, particularly how it pertains to children/adolescents.
Steven recently deleted his Facebook profile. Steven: "Facebook is not on the side of its users" "interactions in person diluted by using FB so much."
Neil: interesting that Steven is leaving school and FB at the same time, sees FB as a tool for adults to keep in touch even without a strong in-person community to keep them connected.
"How do social networks change your notion of friendship?" Andreas looked at it in China and in the U.S.
Relevant blog posts from Andreas:
Wikimedia
Howie noting lack of visiblity of people on Wikipedia. Asked for experiences from others
- RobLa: At Second Life, if a user made a friend, it was a big predictor if someone would stay active
- EEKim: according to some in the Wiki community 'retention' is not as important as it is to become a quality contributor
- Others volunteer that they stayed Wikipedians because of relationships with admins
Wikipedia/Amazon parallel. Does it matter whether something is primarily social
Barry: as we make approach decisions about what direction to go with the development of Wikimedia projects, can you help us think about how to consider data?
Ethics about data gathering and privacy; commitment to experimental rigor.
Andreas's PHAME framework for thinking about data:
- Problem
- Hypotheses
- Actions
- Metrics
- Experiments
Amazon gets $100 for each co-branded credit card, gives $30 to the end user.
Hypothesis: giving it to them right away gives them incentive to spend.
Is it more effective to give the incentive money up front (immediately), or after first purchase (delayed incentive, but can be incentive of itself "oh, I have a credit!")? Given the set of data, what insights could we get from that?
Nobody finds great datasets anymore, they create great datasets through experiments
A/B testing especially
Howie & Micah mention difficulty of getting data at Wikipedia, or predicting secondary social effects
You can't afford NOT to do experiments
Andreas: "you can't afford not to do experiments"
Strategic planning process expressed a strong preference for an experimental approach
Eugene: Wikimedia is at an inflection point. Up till now, culture has not been so supportive of an experimental approach.
that's not really true ; both usability projects have started to use a research-driven approach
- I think Eugene meant prior to that even. He tends to acknowledge the data brought in by usability projects as an example of what's done RIGHT.
as a sidenote: research has been one of the main topics discussed by the UX team during the business planning process; unfortunately, it's been hard to convince the executive staff to dedicate resources to research, because they feel there are a lot of unfinished dev work/technical issues to fix first
- Pete: This is a tough bit for me to follow. It seems to me that Wikipedia exists ONLY because of a vast network of interrelated experiments. I could understand if the point is adopting some central notion about best practices with experimentation.
- It's not the experiment itself, it's the type of data that's created as part of that experiment that's the issue for people. We can experiment all day, but we're limited in what data we can capture.
- That point makes sense, but it's not what I heard Eugene saying. Maybe I misunderstood him.
- It's not the experiment itself, it's the type of data that's created as part of that experiment that's the issue for people. We can experiment all day, but we're limited in what data we can capture.
Wikimedia privacy policy: "When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites. The Wikimedia Foundation may keep raw logs of such transactions, but these will not be published or used to track legitimate users."
There's a different Wikimedia Donor privacy policy.
Publicly available data can be found at: http://stats.wikimedia.org
- Hits (any number of different actions can be found)
Other resources:
- http://en.wikipedia.org/wiki/Wikipedia:Editing_frequency
- http://en.wikipedia.org/wiki/User:Dragons_flight/Log_analysis
- http://en.wikipedia.org/wiki/Wikipedia:Statistics
Privately held data is at stuff like: the fundraising team's database; reader relations database of calls, OTRS databases, volunteer database, stuff like that.
Data vs Infrastructure (plumbing vs
Questions
- Role of social?
- PROBLEMS THAT DATA / METRICS / EXPRIMENTS CAN HELP WITH
- Fund raising
- Quality of content
- How to measure it?
- Breadth of content
- Editors / Contributors
- number
- who they are, where they come from, why they leave, what they expect, what their mental model is -- to design the best experience for them
- Engagement?
- Lifecycle
- Comfy / friendly place to hang out at and contribute to
- Keep barriers to entry as low as possible
Scientific method (pete's memory)
- Formulate a hypothesis
- Design an experiment
- Execute the experiment
- Gather the results
- Interpret the results
- Draw a conclusion
- (Formulate a new hypothesis)
http://en.wikipedia.org/wiki/Hypothetico-deductive_model
- Gather data ( observations about something that is unknown, unexplained, or new )
- Hypothesize an explanation for those observations.
- Deduce a consequence of that explanation (a prediction). Formulate an experiment to see if the predicted consequence is observed.
- Wait for corroboration. If there is corroboration, go to step 3. If not, the hypothesis is falsified. Go to step 2.
reader
anon edit vs register
Erik: (hypothesis): one of the reasons to create an account may be to customize their *reading* experience, not necessarily a desire to edit
Andreas: From a reader, what makes them simply make an IP edit
English Wikipedia: 32% of edits are from non-registered users. Dutch is 10%.
Neil: How does vandalism relate?
Robert: Majority of unregistered edits (~80%) are legitimate, compare to ~95% of registered edits being helpful.
Howie: with a marketing campaign, you generate a certain type of user. Our current view is pretty monolithic.
Barry:
Rebecca: Are we assuming that registered users have a higher retention rate?
- (could reason be to avoid harassment?)
(Pete: Danny Horn at Wikia is a good person to talk to about this)
Hypotheses:
- The cost of signing in is significant
- The cost of signing up is significant
Moka: developing incentives
Eugene: Wikipedians have done some research by making fake accounts and making edits. (User:WereSpielChequers)
Ed Chi attempted to normalize for vandalism.
Anonymous versus logged-in edits
- How to measure quality? Edit persistence?
- What about mistakenly anonymous edits (forgot not logged in)
Weigend suggests surfacing the benefits of being logged in to the user (lower likelihood of reversion, etc.)
Ariel: what if being logged in becomes the new normal, and then new users are reverted at a similar rate
We don't do a good job of expressing to end-users the benefits of getting an account
We discussed some graphs that Erik Zachte sent on editing and reverts.
-
2010 edit trends on German Wikipedia.
-
2010 edit trends on across all Wikipedias.
-
Revert trends on Dutch Wikipedia.
Closing Comments
- (ASW) how can we understand the real barriers to contribution?
- (ASW) Most ppl just go to get some info -- parallel: airline vs just go on that trip with your friend
- (ASW) who really knows who the "good" editors are?
- ( (Neil K) Experiments > anecdotes & theories. Let's talk about getting better experiment infrastructure
- Moka: What is the life-cycle of an editor?
- Pete Forsyth:
- let's do experiments, but carefully examine assumptions, do it with the scientific method. (Let's esp. look at first step of scientific method, "use your experience to formulate a hypothesis/experimental question")
- SHARED OWNERSHIP Wikipedia is getting to the point where it's a major institution, have a desire to help that may transcend individual incentives. let's leverage this to persuade editors
- Howie: 375M uniques per month! We're lucky to have such a base for experimentation. Our community loves data and information.
- Micah's talking point picture: http://www.flickr.com/photos/35034358900@N01/4622563354/sizes/o/
- Looking at the "funnel" from reading -> click edit -> click submit -> downstream results (reversions, quality...)
- Rebecca: Register/not register. The #1 answer to "why did you donate" is "because you asked. There may be an important parallel here. Figure out how to invite people.
Andreas's story: Airline collecting data on whether people wanted to drink Coke or Pepsi. One student said, "All I care about is getting to Paris and having a good time with my girlfriend." Moral: Don't get too caught up with unimportant questions.
Suggestion
- (ASW) Figure out the simplest thing you might want to know. then ask something 10 x simpler
- (ASW) Surface positive and negative effect on the community, consider "social capital" predictor