Proposal:Defining and enhancing analytics from Wikipedia data

Status (see valid statuses)

The status of this proposal is:
Request for Discussion / Sign-Ups

Every proposal should be tied to one of the strategic priorities below.

Edit this page to help identify the priorities related to this proposal!

  1. Achieve continued growth in readership
  2. Focus on quality content
  3. Increase Participation
  4. Stabilize and improve the infrastructure
  5. Encourage Innovation

If not English, in what language is this proposal submitted?:


There is a lot that can be learned from Wikipedia's users and their usage on Wikipedia. We'd like to define some semi-specific questions that are of interest to the community in order to be able to better guide subsequent technical work that will try to answer those questions through data analysis of different data streams. These questions are important to help shape where we would like to go, as well as to help map the interesting with the possible.


Setup a wiki page to capture and discuss questions that may be of interest to the community to further explore using data analysis. In addition, provide technical feedback to explore the difficulty levels in trying to answer these questions using existing Wikipedia data (datadumps, squid logs, search queries, etc.). At the moment, the thought is to capture questions that we would like to answer, and identify and specify potential technical means of providing answers to those questions using the datasets that already exist and are available for analysis.


Usability, the Wikimedia strategy team, and others are motivated to extract new insights from existing data sources.

Key Questions

The key question is: What are your questions?! Please be semi-specific in your questions, as this will be helpful to discuss and eventually determine whether the answer is technically possible. Eg. What are users looking for by country in the last year/month categorized by subject? Do most users use Wikipedia in their native language, or in another? Does this change by geographic region or by a country's economic status?

Potential Costs

To be determined


Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Defining and enhancing analytics from Wikipedia data.

Want to work on this proposal?

  1. .. Sign your name here!