Overview

In 2005, founder Jimmy Wales articulated an overall vision for Wikipedia:

"Wikipedia is first and foremost an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language. Asking whether the community comes before or after this goal is really asking the wrong question: the entire purpose of the community is precisely this goal" [1]


Thus, the goal of Wikipedia has always been to not just provide content, but to provide high quality content. The two summary issues below attempt to provide an overview of the available research and data into how much progress has been made towards this goal, as well as to highlight both remaining challenges and potential opportunities. Summary Issue 1 digs into the state of Wikipedia as a reference resource, while Summary Issue 2 explores the issue of quality content as it pertains to the other content areas Wikimedia has expanded into through both Wikipedia and sister projects. For both, quality is defined along several dimensions:

  • Breadth: The scope and depth of content covered within the project
  • Reliability: The completeness and accuracy of content coverage
  • Quality Assurance: The usage of mechanisms to monitor and improve content reliability


Two important distinctions exist for analyzing and discussing quality content. First, the relative importance of quality may be different for a mature Wikipedia such as the English or German versions than it is for an emerging Wikipedia that is still trying to build a solid base of content. At the same time, quality is always the eventual goal, and it seems likely that all Wikipedias will face similar quality issues as they grow and mature. Relatedly, most of the data and research on these topics is available for only the English Wikipedia. However, as the largest and most mature Wikipedia, the English version may be the most appropriate language Wikipedia to benchmark against both Community-identified goals for content breadth(e.g. vital articles) and traditional resource materials. This comparison can serve as an indicator of where Wikipedia, at its most robust, is highly successful and where opportunities still exist to be the most comprehensive and usable online, free encyclopedia for the world.


Second, while the same quality metrics apply to the different types of content provided by the various Wikimedia projects, their exact definitions and relative importance may differ. In addition, the overall quality bar may be higher for some types of content (e.g. education) than it is for others. Whenever possible, this overview will attempt to highlight those differences.

Summary Issue 1: As an online reference resource, Wikipedia is impressive, yet unfinished

Note: for a more detailed look at Wikipedia as a reference resource, please see Opportunities to improve core reference content


By almost any measure, Wikipedia has made impressive progress towards the vision articulated by Jimmy Wales and become the most successful online encyclopedia in the world. Its mass collaboration model and active community of volunteers have created 269 language projects and a total of 14M articles. Wikipedia is also the cornerstone of the Wikimedia portfolio, receiving 96% of all page views from the over 330 million visitors to Wikimedia each month. English Wikipedia alone is the 5th most popular website, and its 3M articles make it second to only Hudong’s Chinese language encyclopedia when it comes to content breadth. As Wikipedia projects continue to evolve and dominate the online reference landscape, they are being held by contributors, readers, and external parties to ever increasing levels of quality as measured by the breadth, reliability, and assurance of quality content.


Breadth

The biggest advantage Wikipedia has over other online and traditional encyclopedias is the breadth of the content it provides. enWikipedia covers 22 million categories with over 50% of all articles providing reference on topics in Culture and Arts, People, and Geography (although Science has seen the largest growth in articles over the last three years).[2]

Figure 1 Topic distribution for enWikipedia
 


However, when measured against a vital articles standard (the 1,000 articles deemed by the Community to be necessary for any enclyclopedia), enWikipedia has gaps of 20% to 30%, particularly in the areas of Science, Technology, Social Science, and Math. For the category of Arts and Culture, 50% of vital articles are C or Start Class. Measurement is the category of most need of quality vital articles with less than 30% of articles as Featured or Good.[3]

Figure 2 Status of vital articles
 


This data, combined with other studies suggesting that enWikipedia has content gaps relative to best-in-class traditional encyclopedia, suggests that even a Wikipedia as mature as the English version can not yet be considered "done".

Reliability

enWikipedia also has an opportunity to provide more structure and rigor, enabling readers to navigate and use the content for enhanced research and reference purposes. According to Wikipedia's own entry on encyclopedias:"Some systematic method of organization is essential to making an encyclopaedia usable as a work of reference. There have historically been two main methods of organizing printed encyclopaedias: the alphabetical method (consisting of a number of separate articles, organised in alphabetical order), or organization by hierarchical categories. The former method is today the most common by far, especially for general works. The fluidity of electronic media, however, allows new possibilities for multiple methods of organization of the same content. Further, electronic media offer previously unimaginable capabilities for search, indexing and cross reference"


Wikipedia certainly benefits from its digital, wiki-based model. Not only is content dynamic and current, but users have a variety of options when it comes to searching and browsing for content and easy access to related internal and external content. However, other aspects of the Wikipedia model make it difficult for users to take full advantage of these benefits and easily find the information that they need. There is currently what appears to be a proliferation of reference lists that could be browsed (including overviews, featured content, topic lists, topic portals, glossaries, and timelines) and the lack of a single, internally-consistent classification system has resulted in literally thousands of potential categories an article can belong to and a category-logic that is admittedly not intuitive to the average person. The FAQs on article categorization even state that "articles are not usually placed in every category to which they logically belong" and "different parts of Wikipedia use different schemes for organizing articles into categories."


At the article level, observers of Wikipedia agree that the project can produce quality articles that are as comprehensive, accurate, and reliable as traditional encylcopedias. A 2005 comparison by the magazine Nature found that Wikipedia's science entries match the Encyclopedia Britannica's in terms of accuracy (2.92 mistakes per article for Britanica and 3.86 for Wikipedia) [4]. However, the complete body of research into quality, as well as Wikipedians themselves, suggest that the quality of Wikipedia projects is inconsistent across subjects, topics, and individual articles.

Assurance

Wikipedia has evolved a series of content policies and quality assurance measures (from bots and volunteers dedicated to reverting vandalism to an internal quality rating and review system) intended to improve quality and make quality processes more transparent to both readers and contributors. However, these processes are not always implemented consistently or lack the ability to easily scale. As the data below shows, less than 0.5% of articles in enWikipedia have been vetted and stamped as quality articles:

 


These quality assurance gaps can lead to a perception of lower quality, and make it difficult for readers and contributors to easily recognize potential trouble spots and tell the good from the bad. deWikipedia was the first to institute a more robust quality assurance engine when it introduced flagged revisions in 2008. enWikipedia has recently announced that similar protections will be placed on articles about living people, which has sparked much debate among Wikipedians, observers, and users.

Summary Issue 2: Expanding beyond core reference has the potential of furthering Wikimedia's mission

Over the years, Wikipedia itself has grown to include more than just core reference content. As analysis of page hit data for the top 100 pages shows, non-traditional encyclopedic content in such areas as pop culture and current events is very popular across several of the largest Wikipedias, and the site is increasingly becoming a go-to for breaking news ("Michael Jackson", "Ted Kennedy" and "Swine Flu" were among the most popular pages in enWikipedia for July 2009). [5] Likewise, through the creation of a variety of "sister projects", Wikimedia has expanded into such areas as primary source content (e.g. Wikisource and Wikiquote) and educational resources (e.g. WikiBooks and Wikiversity). Currently, there are over 740 Wikimedia projects that cross 8 broader initiatives (including "Wikispecial projects" such as meta and Commons). Several of these initiatives (e.g. Wikibooks, Wiktionary, Wikisource) have been around almost as long as Wikipedia, yet none have yet achieved a similar size and global presence. Together, these projects represent 33% of total articles and receive just 4% of page views. [6]

Figure 4 Overview of Wikimedia projects

 


The expansion of both Wikipedia and these sister projects beyond traditional encyclopedia content has happened in a rather ad hoc manner, and some Wikipedians have started to question whether they can, or even should, all be successful. Below is a potential frame for understanding where and how Wikimedia projects should strategically expand. It builds on comments from a WMF board member, who suggested that it is important to think about how projects fit with both the mission and the "wiki way". In addition, he suggested that it would be helpful to think about two types of projects - those that support core reference resources (such as Commons), and those that expand Wikimedia's presence into different, but related, types of primary content. Currently, there seem to be two subsets of the primary content projects. The first is Wikinews, which expands Wikimedia’s sum of all knowledge efforts into current events and other topical content. The second is WikiBooks and Wikiversity, which extend Wikimedia’s knowledge base into educational resources such as textbooks and course materials.

Figure 5 Wikimedia projects across topical, reference, and education content

 


Supporting content

Many of the supporting projects (especially Wikisource, Wiktionary, and Commons) appear to have developed solid community traction and a strong base of primary source content to serve as further resources for Wikipedia's contributors and readers. However, members of the community have expressed concern that Wikimedia might not have the best platform for such content, especially since such projects distract valuable resources from further developing Wikipedia. In particular, Wikipedians question whether these projects should be provided by Wikimedia or whether other reliable sources exist (e.g. Project Gutenburg or government archives), especially where content cannot be edited (e.g. source texts and quotes). There is also the concern, that current Wikipedia articles cover the same content, although there is no data that clearly shows content overlap between Wikipedia and these supporting projects. There is also no data that shows how these sister projects do support Wikipedia or how they provide a greater knowledge experience to the world, furthering Wikimedia's mission of the sum of all knowledge to all people. Greater understanding of the role and potential of these projects to further Wikimedia's mission is necessary work for the Expanding Content Task Force.

Topical content

Note: for a more detailed look at Wikimedia and topical content, please see Opportunities to expand content - topical


The topical end of the content landscape is changing in important ways. Users are gaining more control over the creation and consumption of news and other local information, and a new media ecosystem is emerging that combines news, data, and direct opinion. And all of this may be happening at the expense of traditional media outlets, who appear to be losing their dominant position in an increasingly fragmented arena.


Wikipedia has already started the process of expanding into topical content, and available information suggests that it is playing a critical role in the way that the landscape is changing. A look at the the most popular pages reveals that topical content is popular with users (~60% of page hits for the top 100 pages in enWikipedia are in pop culture or current events, and the statistics are similar for other languages), and the wiki platform and mass collaboration model appear well-suited to provide it. Until recently, most traditional news organizations would have considered Wikimedia an unlikely competitor. Now, however, it has become impossible to ignore Wikipedia's popularity when it comes to content that has historically been the domain of professional journalists and media companies. In the four weeks after Michael Jackson's death, Google News and Wikipedia both received the largest number of visitors at approximately 7% of total page views versus CNN, the largest traditional media outlet, which achieved 1.5% for 10th place. In fact, Wikipedia has become such a popular source of news that it has essentially rendered Wikinews unnecessary. In the words of a recent New York Times article, "So indistinct has the line between past and present become that Wikipedia has all but strangled one of its sister projects - Wikinews." [7]


The key to Wikipedia's success lies in its content model and structure - every topic has a single, comprehensive, and consistent topic page and at any given time a group of editors are collaborating to pull in or link to all relevant information. As a Google VP explained in a recent testimony to Congress, "Today, in online news, publishers frequently publish several articles on the same topic, each at their own URL. The result is parallel Web pages that compete against each other in terms of authority and placement in links and search results. Consider instead how the authoritativeness of news articles might grow if an evolving story were published under a permanent, single URL as a living, changing, updating entity. We see this practice today in Wikipedia’s entries and in the topic pages at NYTimes.com. The result is a single authoritative page with a consistent reference point that gains clout and a following of users over time.” [8] There are signs that traditional media organizations are adjusting their own strategies to be more aligned with the Wikipedia model.[9], and the best example of success so far is the New York Times' topic pages. “I absolutely think that news organizations are paying a lot of attention to Wikipedia page rank and wondering how they can get that," explained Matt Thompson (a researcher, journalist, and expert on next generation news sites). "The notion of topic pages has become very current in the world of news organizations; they are all mulling or implementing topic page strategies.”


Wikipedia also had an advantage when it comes to topical content because its digital platform and mass collaboration model enable it to provide users with easier access to the long tail of local information (news stories and local reference) that traditional media can usually not afford to cover. This is an especially powerful model for localized Wikipedia projects that have access to a wide enough base of relevant online content (both in terms of language and the focus of the localization) for the community to compile, link to, and reference. Conversely, the model may be less viable where it would potentially be valued most: places that do not even have that foundational content to build on (an example here would be Sub-Saharan Africa, where there is a notable lack of content available in Swahili, the local language).


If a base of local content is a prerequisite for the start of a viable topical content model, then so is an active local community that is large enough to effectively curate, improve, and maintain that content in line with Wikipedia processes and standards. There are already concerns that the Wikipedia community lacks diversity, and that this diversity both skews and limits content growth. As Wikipedia scales down, these concerns become even more pronounced and with few editors, inherent bias or opinion may arise. As Matt Thompson explains, "I am concerned that at the local level, you find that who the editors are starts to take on more prominance. The community that forms around the project has more visible markings, and current editing structures start to break down." There are signs that the community is headed down the localization path (here are some examples of local projects that already exist within Wikipedia), but scaling such efforts would require more structured and concerted efforts to cultivate and sustain community interest and growth at the local level.

Educational resources

Note: for a more detailed look at Wikimedia and educational content, please see Opportunities to expand content - education


Open Educational Resources (OER) are commonly defined as "digitized materials offered freely and openly for educators, students, and self-learners to use and reuse for teaching, learning, and research. OER includes learning content, software tools to develop, use and distribute content, and implementation resources such as open licenses." [10] The OER movement has been gaining momentum, with a growing number of organizations working to provide educators and students with access to a wide variety of open learning resources, and a growing number of politicians and administrators recognizing the value of increasing usage. Despite this momentum, however, there are still important barriers to movement growth, including a lack of awareness among educators, differing local needs and conditions, and significant gaps in high quality content across subjects and languages.


Wikipedia was one of the first, and remains one of the largest, providers of open educational content. Yet despite the initial "spark" that it provided for the movement, and its impressive breadth of content, Wikipedia appears to have relatively low traction with educators, and is not among the first names mentioned (MIT OpenCourseware, OER Commons, Connexions) when it comes to organizations that are currently doing the most to advance the OER cause. Instead of serving as a powerful resource for teachers and students, its use is often banned or restricted in classrooms. And other Wikimedia projects dedicated explicitly to providing educational resources (WikiBooks and Wikiversity) have failed to really catch fire with contributors and more casual users. While more research is required to determine the root cause behind this low traction, there are some important points that surface from examining where Wikimedia stands relative to challenges and opportunities facing the OER movement as a whole.


On the content end, the experience of WikiBooks suggests that the wiki platform might not be ideal for the creation of textbooks. As a report on OER from the Organization for Economic Cooperation and Development explains, "The possibility of contributing small modules of content has helped ensure the success of Wikipedia, while the Wikibook project has not had the same success. This may be because book chapters cannot be divided into small enough parts; if the bits are small, the process of compiling individual contributions into chapters is probably more time-consuming than writing the book oneself." [11]


Wikiversity aims to provide educators and learners with a wider variety of learning resources (including activities, lesson plans, lectures, and courses), and appears to have done a better job at building a solid community and content foundation. However, even a quick glance at the overview of potential resource types suggests that there are several content areas that still have nothing at all. In addition, Wikiversity appears to struggle with one of the main challenges facing the OER movement as a whole - a lack of relevant, diverse content. According to the OECD report, "The vast majority of OER is in English and based on Western culture, and this limits their relevance and risks consigning less developed countries to playing the role of consumers." [12] Perhaps it should not be surprising, then that the list of current Wikiversity projects looks like this: English, French, Spanish, German, Italian, Portugese, Czech, Greek, Finnish, and Japanese. And while Wikiversity has a wide-variety of content roughly organized by education level (preschool, primary, secondary, tertiary), it would seem as if the bulk of Wikimedia's content (available through Wikipedia) is not structured and formatted for classroom use.


Building on its content, there are several roles that Wikimedia could look to play in the OER movement, including:

  • Providing content for resuse and adaptation by other OER platforms and communities
  • Providing content that is already structured and formatted for classroom use
  • Providing both content and a platform with the features and functionality teachers need to adapt and share content themselves


There are important questions that Wikimedia will need to address if it wants to further explore the potential and feasibility of these roles and successfully increase and sustain inroads with teachers and students. The quality bar, both actual and perceived, for educational content is high, and whether or not Wikimedia has the quality control and assurances mechanisms necessary to support teachers directly (as opposed to through another OER platform) is an issue that probably warrants serious consideration. Some teachers have gone so far as to ban Wikipedia use in the classroom citing both from a quality perception and student plagarism reasons. Understanding the needs of teachers and educational institutions is also key to making further inroads with educational content and to surfacing whether the current Wikimedia platform and model provide enough of the features and functionality that teachers need to take full advantage of the content that is made available through relevant Wikimedia projects.

Task Forces

Improve Wikipedia's Quality Task Force

The goal of Wikipedia has always been to provide high quality content. Over the years, many conversations about quality and many efforts to improve quality have been tried across Wikimedia projects. This strategic planning process provides an opportunity to reflect on shared goals related to improving quality, what approaches have been tried, and what priorities should be for improving quality going forward. Specifically, this task force will focus on developing recommendations to improve the quality of Wikipedia as an encyclopedia. See Improve Wikipedia's Quality Task Force for the rationale for this focus, critical questions associated with this task force, and specific supporting materials.

Expanding Content Task Force

Wikipedia and its sister projects have already expanded far beyond the core encyclopedic content with which Wikimedia began. Wikipedians have mixed opinions about the value of expansion into other forms of content (e.g., dictionary definitions, quotes, current events, educational materials), and also whether that expansion should happen within Wikipedia or under the aegis of other projects. No new projects have been launched in recent years, suggesting that the bar to launch new projects is quite high. Some argue that Wikipedia's sister projects get too little attention and resources, others believe that investing in these projects would be a distraction from the core mission of Wikimedia.

In Phase II, a Task Force will be set-up to further investigate the opportunities and challenges for expanding content beyond encyclopedic core reference content. See Expanding Content Task Force for the list of critical questions associated with this Task Force, as well as specific supporting materials.

Additional information and resources

Overview of Wikimedia projects and the content landscape

Opportunities to improve core reference content

Opportunities to expand content - topical

Opportunities to expand content - education

Quality control and assurance deep dive

References

  1. [1]
  2. “What’s in Wikipedia: Mapping Topics and Conflict Using Socially Annotated Category Structure”
  3. Analysis from data found at Wikipedia:Vital Articles
  4. Nature 438, 900-901 (15 December 2005) | doi:10.1038/438900a; Published online 14 December 2005
  5. [[2]]
  6. Data from Wikistats as of September 11, 2009. Page hits data can be seen at Overview of Wikimedia projects and the content landscape
  7. The New York Times, "All the News That's Fit to Print Out" [3]
  8. [4]
  9. [[[5] Here is an article about how the AP is attempting to better position itself to compete with Wikipedia
  10. [6]
  11. [7]
  12. [8]