Task force/Wikipedia Quality/Summary of Archive 1

This is a summary of Archive 1. It excludes threads which did not explore the taskforce issues, including the draft weekly report thread.


Opening

Philippe opened the taskforce asking: "What have we agreed on in terms of quality? Where is the community in terms of the quality discussion. What do we NOT agree on? What have we not discussed about quality, as a community? What sort of information would be useful, in terms of helping us think this through?"

Discussion initially moved to the Five Pillars, newcomer guidance, and the nature of reliable sourcing, before branching out into other threads.

The "Five Pillars" and newcomers

Woodwalker noted the Five Pillars of Wikipedia (Verifiable, Neutral, Balanced, Findable, Encyclopedic) were a sound basis but could be improved and better promoted, and it is highly demotivating when some users don't understand these requirements. FT2 asked "whether it's appropriate and fair - to newcomers and the project - to throw users unfamiliar with encyclopedia writing into the project, without clear guidance", noting we make high demands of their learning curve and standards. He had started writing a "newcomers manual".

Hillgentleman thought the 5 pillars were commonsense; Woodwalker felt they were not obvious to many users and that educating newcomers was a significant drain on time for experienced editors. He suggested that guidelines, 'newcomers manuals', etc should be more visible and easy to find, which in turn allows experienced users to assume they are known and treat such users "with less patience". Woodwalker also noted tools such as Flagged Revisions whose effect on quality was unknown and needed further examination, and suggested looking at "the sharing of all information and initiatives among projects" - while local communities made their own choices they should be aware of the choices and work of others (eg in manuals, tools, etc). Yaroslav Blanter added (later) that perhaps "completeness" should be considered a pillar.

Wizards

FT2 noted that an enwiki user has written a "new article wizard" (updated v2.0). He observed that other organizations seeking mass use of powerful tools educate people as they go - "it's worth noting the effort they put into interfaces, wizards, help systems, and newcomer guidance" (with an "advanced" setting to disable these functions). This was strongly supported by other taskforce members and examples considered or proposed, with Piotrus suggesting such initiatives be prioritized.

Examples and uses were considered. Woodwalker suggested wizards to help remove bad content or deal with POV and other problem text; FT2 suggested a "report a problem" suggestion wizard (via link/icon) on every page, and suggested other examples where wizards might be viable included new account creation (names and multiple accounts), edit war detection, insufficient/absent citation detection and advice, controversial topic editing, and requesting/proposing page deletion. He commented that we should not "demand users read most policies, or hit them with a stick for not doing so. Instead, guide them when they have actual need for the information... The popup doesn't need to say everything covered in the policy, just what's relevant to the action they're doing". He proposed a Wizard extension to allow communities to design and customize wizards with little help from WMF.

Sources

Primary/secondary/tertiary sources were discussed next. Woodwalker noted that these terms were deemed ambiguous (primary sources per academia and Wikipedia especially, differ in meaning): "In real life, most peer-reviewed scientific papers are considered primary sources. I believe many wikipedians consider them to be secondary". He commented that "hundreds of references to primary sources" even in featured articles is a "plague" at enwiki, makes editing more difficult, and wrongly implies that the content is trustworthy. He contrasted the guidance at enwiki that wasn't working well with the possibly overstrict guidance at dewiki which (he felt) does generally work well. FT2 gave examples of different sources as a discussion point; Bhneihouse was still "at a loss" on how the terms were being used and commented good information can come from these different kinds of source. Woodwalker felt this isue, although valid, would not lead to anything in the top 2 -  4 recommendations.

Sjc noted that "[Lack of critical consideration and t]he current obsession with citing sources has led to rafts and reams of citations which fulfill policy criteria at the marked expense of quality", a view endorsed by FT2 and Woodwalker. The latter noted only "specialist users" should use primary sources; others unfamiliar with the necessary approach and rigor should be "actively discouraged" from doing so.

Bhneihouse advised working from standard terms, which would help with the academic community. She suggested identifying tools and documents to focus upon, such as the Wikipedia articles and project pages on primary/secondary/tertiary sources. Woodwalker suggested the German version of that page. Piotrus agreed they were important but the communities were "doing fine" and other areas were a more important priority.

Slrubenstein (at the end of the thread) adds that a "functional definition" of sources is easier:

  • Wikipedia "provides people with a balanced, proportionate, and accurate account of the state of knowledge on a particular topic. It is not a means for forwarding new analytic or synthetic arguments, or interpretations" [tertiary]
  • "Wikipedia is based on... reliable sources that DO present new analysis, synthesis, or interpretation, but are considered reliable according to the criteria of specialists in that topic." [secondary]
  • These in turn "are based on primary sources, that is, whatever it is that they are analyzing, synthesizing, explaining, or interpreting."
Quality v. editability

Woodwalker drew the conversation back to its starting point: "most larger and medium-sized communities" have agreed their requirements for quality, general approach, and page layout, but "there seems to be some friction between the anyone-can-edit principle and the quality principle, and our communities often don't agree where to draw the line". He wanted to know more about "the relation between quality and editability", and suggested looking at ways to reduce friction between them (including already-mentioned wizards).

Bhneihouse suggested that on an encyclopedia "anyone can contribute to" perhaps users could be paired (less experienced with more) to help learning, if the guidance is good then people can teach each other. She also noted different demographic groups (eg children) may have different workflow approaches to bear in mind.

Assessing expert consensus

Bhneihouse stated there was a problem with defining "neutrality" as the broad consensus of experts/specialists in contentious areas, or "balance" in terms of proportion to importance given by experts. In some areas there is no such consensus or an important counterpoint exists, and in others: "[T]he issue here is that the experts may or may not be handling an issue or the parts of an issue in proportion to it's impact, i.e. Nozick on Rawls (Nozick so criticizes Rawls without understanding his argument clearly that his treatise is disproportionate) or dissemination of information about H1N1 (swine) flu -- many experts actually have misinformation, unless you are referencing the CDC or NIH itself you are probably quoting the wrong information". She felt the definition was good but needed care in application.

Woodwalker noted: "If a debate is raging in the literature, the fact should be pointed out. Giving each view proportionally requires knowing the relative importance of the view. There are ways to 'weigh' scientific literature: citation indexes and impact factors.. [if in doubt] we should still try to be as balanced and neutral as possible and cite the literature as precisely as possible. In a couple of years the debate may be decided and another user can then remove the refuted views easily." He added that creating easy-to-use guidance about sources in specialist fields would need experts from those areas.

Bhneihouse noted not all debates happen in the literature; limiting sources to formal literature might misrepresent some areas: "How do we handle information that comes from what is currently considered non-standard channels that academics already accept?". Woodwalker commented the debates might happen there but they were not accepted as references per se. Bhneihouse agreed (verifiability, reliability) and asked if we should document what was and was not appropriate source material from scientific inquiry. She raised a concern that "non academics may have trouble with academic guidelines, no matter how sensible they may be. How do we convey academic standards to non academics in a way they can understand?", which in turn led to a further discussion.

Meta-discussion of the above, and taskforce focus

FT2 opined in reply to the previous question that "we don't". He stated:

"We set up a better editing environment, with better odds of more success, and changes that will probably catalyze and feed through into such areas and issues... and then focus on getting our basic raw editors and basic "not fit to eat" articles up to a basic standard. best odds all round. We can't afford to do much more... that has to be our first focus... then general improvements for established editors, and we can debate what else we can squeeze in. However much we value experts, and suffer when they leave, they leave because of general issues with the editing environment and poor editorship and disputes, all general issues. Focus on those, which also affect everyone else too. Don't try to...fix specialist issues with academic sourcing. Wikipedia's standards and community aren't yet at a point where these... are a "top 5" focus... We have to get the basics needed for improvement. that means specialisms such as experts, FAs and the like are not a priority (this time around). Ruthless but hard choices."

Woodwalker felt while this was not controversial and all articles needed bringing up to a "plateau" (ie, a general basic standard), there is "a serious problem growing inside the larger projects concerning citation of (academic?) literature" with "hundreds of references" that upon review are "simply unsound" - they are irrelevant, contradict or don't confirm the article text they are intended to support, or are otherwise inappropriate. This undermines efforts to be trustworthy in education and academia. He blamed in part the "cite needed" template, which encouraged users to trawl for any web page stating the point, but often without checking or full understanding. He expressed concerns given we "simply don't have enough expert users" to check the cites involved. He suggested as an interim approach a guideline not to use a scientific source unless one understands it oneself.

FT2 suggested a separate category and sourcing guideline for topics where "the topic and availability of references means that core topic material should be cited primarily from strict and limited sources". He suggested a process for doing this, including selection, handling by small groups of editors and with bot help if required, and wizards and feeds to "push these along", and noted this showed the power of suggestions made elsewhere, which could "open whole new quality doorways across the wiki" to help expert topics as well as basic/beginner issues. (Proposal) This concept (ie a Citation/Citation Check wizard) was strongly supported by Woodwalker who made further suggestions (link).

Final words

Slrubenstein notes that Wikipedia has grown. he states that "More people contribute... than are possible to socialize into the Wikipedia way", and that:

"People could easily argue their interpretations of 'NPOV' or 'vandal' or 'troll' and... "global" consensus... is now practically impossible to achieve. I think the number of newbies who do not understand or care about our core policies seriously degrades the quality of articles and increases the number of conflicts. I believe the problem is ignorance, not bad will. So I agree that early mentoring or very user-friendly wizards or tutorials is a great idea. This is a very constructive suggestion"
Starting point

Is this the main working page for the group? Yes.

Clarity, working tools, and the user experience

Bhneihouse stated the medium (Liquid Threads) was impeding the process: - "multiply the frustration I am feeling 100 fold for a newbie who might wish to contribute... we are losing all those who cannot function in this medium. How about people with disabilities? How does quality control affect them? ... I see what is happening right here on this page as part of the problem".

She asked how people could interact with information more efficiently, what Wikipedia interface(s) would take different means of interaction into account, and how to streamline this or address demographic groups such as the disabled who may have different ways to approach information, editing, contributing.

Most taskforce members seemed to agree that Liquid Threads (currently in development) was not ideal but that it was the status quo and offered help; some liked it a lot.

Bhneihouse felt let down by being given beta collaboration software for this task and noted she and others would assume they were being given high quality collaboration tools to work with. Clarity of direction and clear foundations were not provided -- and were not being provided elsewhere in the project either.

FT2 commented that "we should sit down with a blank piece of paper and ask 'what should the Wikipedia experience be like': - I'm a new user, how should I be guided and introduced? I see a problem, what should my experience be? I'm a fanatic on something, what should my experience be? (ie, how should I find myself gradually being brought to a halt)". Bhneihouse strongly agreed.

Starting point

FT2 observed that '"originally no thought was given to basics like practicalities of 'consensus' and decision-making, the very wide spectrum of editors' views, etc. Quality improvement recommendations may have limited effect, if communities don't simultaneously have better ways to handle large scale decision-making and differences of opinion".

Specifically, if communities "[lacked] good ways to make large scale decisions and [handle] large scale divisions of opinion... the mechanism to choose [useful options] doesn't exist in the first place". On enwiki this affects (for example) disagreements about reliability of sources, balance of mainstream views, editor processes, deployment of new proposed tools, proposals for ways to handle conflict and disruption better. quality related matters don't stall under such problems as 'too difficult to get a clear decision' or 'vocal fringe'. Other difficult decisions (around content, processes, self-governance, improvement, etc) discourage many people from even trying to propose improvements, or can waste a lot of volunteer energy.

A possible solution was noted, to ensure that communities have theior attention drawn to it, so that this will be taken more seriously as a quality factor:

"[We can tell communities, i]f you focus on it, and figure a way to improve consensus seeking on big decisions in your wiki, then you can do quality improvement easier yourselves. If that's not a problem holding back your wiki now, then judging by bigger projects it will be in future."
Bhneihouse's view

Bhneihouse asked how to poll contributors and make them stakeholders in the process; "what are the decision rules? ". She noted that the online community is generally "more independent and less rule oriented", top-down rules don't work, and "no matter how hard you work, many will still resist change, even when they can fully see the benefit". There are two main approaches - mandate rules (and those who can work with them will turn up), or, involve the community in creating value based rules that take into account their preferences and current methodologies. She assumed there would be a "play book" to guide handling if someone "doesn't play nice".

Piotrus' view

Piotrus thought the point valid but saw the main problems as being in different areas. (Piotrus' essay "Why good users leave and why civility is key"). He saw the core problem as being:

"[G]ood, productive, experienced users leave due to attacks and stressful incidents; quality relates to number of editors; easily reachable users are mostly reached (geeks who think it's fun); other groups are harder to reach; as a result we are burning through experienced users faster than we are adding to them and quality will suffer [...] Conclusion: we need to understand why editors are leaving, address the issues causing them to leave, reach out to those who left and try to bring them back, and support plans to tap new pools of editors".
Randomran's view (from the community health taskforce)

Quality is highly correlated with the number of editors working on an article, and that the loss of editors risks making even basic maintenance an issue (per Ortega (file)). He felt there is "a real systemic problem with how disputes are settled" and notes Ortega's concern about "consensus-building falling off and more people pushing their personal opinion":

"And yes, those two person debates escalate into WikiProject discussions or RFCs or policy discussions, which attract entire cartels of opinion pushers. And sadly, many of those opinion pushers are not interested in consensus building. They've actually realized that they can accomplish more by filibustering debate than by compromising. So you keep on having the same debate over and over, hoping to make progress, only to see it fall apart because the different extremes unified in their desire to keep the debate going forever. So not only does quality never get a decisive resolution... you actually lose good contributors who just got sick of trying to mediate between the different fanatics."
Woodwalker's view
  1. We don't have a sound definition of "quality" (Ortega uses Featured Articles which does not really capture quality for a project, nor is it valid on many projects). He later amended this to note "[The data] only shows featured articles have been edited by many users, not that content with high quality was generally edited by many users" (ie this is correlation, not necessarily causation).
  2. We assume number of editors equates to growth decline in quality. Quality editors matter, but many editors are not quality editors and their contributions have much less impact on quality, and some users actually detract from quality.
  3. Editors have a lifetime and will leave. They tend as a group to be " not so good at communicating with others", an issue fostered by anonymity and exploited by other projects such as Citizendium.
  4. Quality editors may be better communicators but equally may burn out sooner (due to lower tolerance for poor communication).
  5. Projects tend to foster "maintenance" editors and respect these (essential for adminship etc too), but users adding quality have it less easy, and may have less influence and less final say in decisions. Another negative factor.
  6. Numbers of quality and "maintenance" editors can grow if help functions are improved (eg via wizards as discussed)
  7. Quality users are often found in isolation at small projects, surrounded by maintenance users and vandals. When they leave, the quality they added is at risk of being destroyed again.

Woodwalker therefore sees the key questions as:

  1. "How can we ensure quality users get more respect and/or influence in communities (and in that way prolong their life time)?"
  2. "How can we get quality users at local projects out of their isolation?"
  3. "[And] we may want to explore the questions how to decrease the amount of vandal users and increase the amount of maintenance users too, since this increases quality as well."

General discussion followed:

The best content is often written by one (or few) users and is easily eroded

Piotrus has written about 20 Featured Articles. On most he was the sole editor (or sole substantive editor), and collaborative FAs and GAs ("good articles") are the tiny exception. Randomran concurs but notes sometimes small "tag teams" of a few editors do form around an article.

Yaroslav Blanter states after a certain quality is reached, further edits mostly reduce quality. Protecting the quality that is achieved is a major issue.

Setting and enforcing fair expectations matters, especially in retaining capable users

Randomran notes that core editors on enwiki suddenly vanish, suggesting they became fed up (described by Piotrus as "That's it, I'm outta here!") rather than loosing time and interest. Bhneihouse concurs too, that "discord" is a major cause of loss, and that experts are likely to have zero tolerance for unnecessary conflict, adding: "I think it likely that is a common thread for the exact people we want to attract and retain as contributors and editors. So the whole supermajority/consensus issue is probably more important than I originally thought". FT2 concurred noting it applies to "most mature and capable users" and warning that many who have zero tolerance may consider themselves worthy of exception and not accept others having zero tolerance of their own behaviour: [As] said elsewhere, even experts need induction/newcomer handling, "This is how we work, these are the expectations..." Bhneihouse agreed aboutletting people know the expectations up front: "Let people know up front what the expectations are, let them know that Wikipedia takes protecting their rights and contributions seriously, and we will attract and keep the mature and capable users".

How to help "quality users" and "quality content"

Piotrus provided his analysis.

  1. 10% of Wikipedia editors produce 90% of content. Overall, the number of editors does matter, but the number of high end contributors of course matters even more.
  2. Quality users don't contribute to wiki-projects because they want to change policy, but because they want to show the world their knowledge. What all wiki-users crave for is some sort of 'applause'. This is why positive reinforcement is important, and negative reinforcement is so bad. He feels this is the major cause for editors vanishing.
  3. Maintenance editors are crucial for stability, content creators are crucial for growth. Creating new content is usually more contentious than retaining existing stable content.
  4. Quality users could get more respect and influence by giving "special powers" to non-anonymous editors. This would also help them not to be overruled by anonymous but experienced POV pushers and could help improve Wikipedia's standing. Such users should not be allowed to be disruptive, but should be much better protected from (successful) harassment and given positive reinforcement and support.
  5. (Blatant) vandals are not an issue, POV pushers are: "They are harder to identify as disruptive, yet they are the ones primarily responsible for creating bad atmosphere and driving others out of this project. In my experience, it is those type of editors that are primarily responsible for driving good editors away, and I am not seeing any signs the Wikipedia system is able to deal with this issue. In fact, the danger to quality in the future is not only the loss of contributors - it is the possibly shifting proportion between good content editors and POV pushers"

Woodwalker agrees that "rewarding or at least protecting non-anonymous users is a very good idea, as long as constructive critique remains possible ", although noting that anonymous users can be quality users too. He notes "it is easier to 200 times revert a vandal than it is to remove POV from one long article" and that "the difference in the amount of 'award' the system gives for maintenance (enough) and editing content (too little) is a problem".

Woodwalker adds further that erosion (quality work gradually being degraded by well meaning maintenance edits is a problem; good content needs active management. Smaller communities suffer disproportionately from this problem and small project quality content is at higher risk of erosion. It's also a concern that if such users stop editing, the smaller non-English project often lacks anyone to watch quality on its best articles, and they degrade. Although degradation at smaller projects is slower, it is ultimately far more destructive of quality too. Outreach to quality editors (especially on smaller projects) and helping them to work on larger projects too, may be beneficial and could extend their editing enjoyment and lifetime.

FT2 felt that having "a way to formally recognize "trusted [or senior] content editors" is actually the one biggest change that could help" (on these and other related issues), since the taskforce would be producing only 2 -  4 recommendations. Reasons and discussion were in another thread and met broad positive support.

Yaroslav Blanter stated that interwiki mobility (translation and dissemination) of good and featured articles would be an immediate and valuable aide to quality. Users willing to translate or improve wording are easy to find but may not know good content exists elsewhere.

Consensus and community

Bhneihouse felt consensus seeking was a "hard sell", and focus should be on "understanding what users do, how they use this creation we call Wikipedia, and what they are really looking to get out of it, both the experience and the information... If you [reach] for a true consensus... you will derail the project in its entirety". She felt a supermajority was often enough and that one underlying question is "what is Wikipedia growing up to be?"

Slrubenstein felt that "the anarchic nature of the Wikipedia community has both advantages and disadvantages" and should not be abandoned. He concurs there is a difference between topics where clear mainstream consensus exists and where it does not, but feels that consensus processes are not the major problem. Policies about "disruptive editors/editing" can help address those, though they can take days or weeks ("it really isn't a long time").

Randomran (providing input from the community health taskforce) feels by contrast consensus seeking is a core problem, and is interested that NPOV/V/NOR are seen as viable way to settle many disputes. He sees many users not understanding or caring about them, possibly due to learning curve issues; "Maybe one really effective thing that the board could do is give backing to the core content policies? It could actually help".

BarryN (Bridgespan Group) feels this is a valuable thread, and that "getting consensus for change in a well-established community is the hardest part... it is rare that there is an emergent consensus from the diffuse community that change is needed [yet] a top-down change mandate rarely works either unless there is a huge crisis". He sees consensus on the "vision" but "a growing consensus that the way the community is engaging around article quality is becoming an obstacle both in terms of the ability to reach high quality consensus on articles (too much 'last person standing wins') and on acceptance/cultivation of new [capable/expert] editors". He feels change is more likely to come from many small changes rather than "[one] big bang shift in the culture".

Some quality points raised by FT2 and various comments by others.

  1. Newcomers - there is a pyramid effect (fewer expert editors, many new/inexperienced editors. Observations:

    Guidance and basics for newcomers is significant (affects a significant proportion of users and edits, and may improve the efficiency of established users)

    Piotrus says "Addressing newcomers (with tools such as the article creation wizard) is important as we do need more editors, and I think at this point we have tapped the pool of those who can and want to master the current wiki syntax".
  2. Established editors - obseervations:

    Personal improvement and development is self-initiated and self-guided. There are no quality related classes, courses, or masterclasses in anything. It's "pick it up for yourself". Users can edit for years and not be exposed to some ideas. Exposure to better working methods or (good) new ideas used by others, would probably be well received and adopted, and improve quality considerably.

    Also we need to ensure editing stays enjoyable for these users, by looking at ways it can cease to be fun. As in the real world, it's easier to retain people who like you, than attract newcomers, and they have a higher motive to work to a high quality. If it stays enjoyable, they'll mostly continue editing (subject to real-world constraints).

    Piotrus says "Ensuring that editing stays enjoyable, by looking at ways it can cease to be fun" is a KEY ISSUE [emphasis in original].
  3. Patrolling - an important quality tool. The same principle could apply wider; if we could generate some kind of feed or list of that. This might be the most effective way to cover missing content, for example automatic or user-originated feeds for "stuff to add", "facts to reference", or "sections and topics to write", in small doses so patrolling is interesting, or encourage patrolling generally.
    Piotrus says bots can ultimately only do a few things better than people.
  4. Interwiki efficiency/mobility - If content was quality rated, could editors on an article/topic on one wiki be kept informed if the same article/topic on another wiki is better written or referenced in some areas, so wikis can cherrypick the best of other projects' work, and so that good work or updated information can migrate easily between wikis.
    Woodwalker is in favour of exchange of ideas, but cautions against reducing diversity, and fragmenting the experts and content discussions on a topic. Some projects don't want rating systems - nlwiki recently deleted and forbad the rating of articles. [Note: find out more]
    Yaroslav Blanter would not translate all content, but in some cases (eg an article on a US state copied from enwiki or Paris metro from frwiki) it is an obvious approach. But users should be aware that even so their quality is not always good.
    Piotrus adds information on this from the Polish Wikipedia: "I know pl wiki for example has a way to add mini-templates to interwiki links to indicate articles in other languages have been FAed or GAed. This could and should be improved to reflect the entire spectrum of quality changes, from stub to FA, and possibly a flag to indicate something like 'this article has been recently edited'."
  5. Ratchet effect - it's easy for edits to reduce good quality rather than improve it. Ideally we want a kind of ratchet effect, where if something's decent quality then its next step is better quality (eg peer reviewed against older versions). Can we achieve this?
    Woodwalker agrees and notes "flagged revisions" is a way to do this, but more needs to be known before any use since it has potential drawbacks too: "There are some other similar systems too, like patrolled revisions, which is used at wp-nl a couple of years now, or the systems used at external projects like Citizendium. It's not simply a choice between two options, there's a whole spectrum of systems available."
    Piotrus says "perhaps" but it's too close to flagged revisions, which he doesn't endorse.
  6. Other areas - problem editing (deliberate or wanton poor editing), social efficiency (disputes, arguments, resolving differences of opinion, and other matters that arise between editors), automation (where we can better use automation or technology to help editors).
    Woodwalker notes flagged revisions would solve problem editing.
    Piotrus says problem editing "seems under contol" but social efficiency is important; the key issues are enforcing civility and a good editing environment.

Piotrus answers the first two questions asked of the taskforce, what approaches have been tried and what measurement methods could be used.

Piotrus' answers:

  • Quality could be assessed against known and recognized Featured Article criteria and discussing how to improve the numbers of these. But other metrics exist.
  • If looking at quality overall, we need to decide what exactly we're looking at, which ties into this.

No further discussion.

Starting point

Woodwalker considered how quality might be defined or measured. The starting point was his essay " On Quality" covering assessment of quality, edits, and editors, and the concept of "content erosion":

The Dutch Writing Contest criteria list seven factors for assessing quality (link): Lede ("lead"), article structure, page layout, article content, style, verifiability, findability.

Woodwalker proposes a more general set of factors:

  1. Content requirements
    • encyclopedicity, verifiability, neutrality, balance
  2. Reader requirements
    • interesting, completeness, depth/specialism level/reader level, relevance of content to subject matter
  3. Requirements of form
    • correct (or good) language, encyclopaedic tone, text style and readability, clear and easy article structure, clear and easy layout,
  4. Broader project requirements
    • findability of topics, and coverage , balance, consistency, and completeness of topics in general (across the project).

He also divides edits and editors by their impact on quality:

  • Edits that only add quality, those that only remove it, those that are neutral (eg US English to UK English), and those that add quality in some ways but remove it in others.
  • Editors who add quality by adding new content; editors who add quality by maintaining (preventing degradation, and making minor improvements to existing content - a continuum exists), editors who remove quality, and "problematic users" who add quality some ways or work hard generally, but overall come to be seen as harmful or negative overall in quality terms.

He then describes the (important) concept of content erosion:

"If we assume the rate of change and the percentages of all four edit types to stay constant, the rates of quality increase and quality decrease are constant too. This means any page in the project is subject to a slow decay in quality, which I call wiki-erosion. Quality is guarded by the community though. The ability to revert destructive edits of all kinds is thus related to the amount of knowledge in the community. This means destruction can only go as far as a certain quality level. If the community is larger, that level will be higher, if it is smaller, the level will be lower. Thus, in the long term, quality of any sort will stand a larger risk of being destroyed at small projects, even though the wiki-erosion rate is much smaller."

Woodwalker feels current analysis of quality is lacking, but perhaps by analyzing these components, the statistical team can find better ways to assess quality. It may also lead to practical ways to improve specific quality factors rather than unhelpful generalized suggestions, that projects can tailor to their culture.

General discussion of quality measurement

Piotrus argued metrics are overrated. Virtually all scholarly publications agree quality is high and rising, and internal categorizations (Featured/good/assessed/stub etc) are adequate for the rest, for now. He drew attention to improvements of this scheme, including articles outside WikiProjects and hence not rated, and low activity Wikiprojects without adequate discussions or members to support rating. He sees more (and more active) Wikiprojects as a core quality tool.

FT2 states three kinds of metrics are useful and attainable: 1/ crude metrics such as computer assessments based on tagging, cite to word ratios etc (and some calculation based on these) which can be used to crudely identify major issues and assess articles up to a simple baseline; 2/ article progression and conversion metrics based upon article standing (new -> baseline quality -> good -> featured) and time taken between these, and article stability; 3/ assessments based on user and reader feedback ("rate this article").

Baseline quality and the "low fruit"

FT2 observed that "the 'low fruit' is appealing [to focus upon] -- metrics relating to substandard articles that don't meet a agreed baseline for quality, or measuring how long they take -- because there's lots of them, they make a big impression, they are easy to identify and quantify the issues, and they are easy to fix. Maybe for now, we should recommend focusing on that."

Incentivizing and promoting quality

FT2 explored this topic in several thread posts. Comments included:

  • Adding a baseline quality and crude automated ratings, would "capture basic issues that are a concern and could flag them to the author and the community. If we take care of the worst articles then over time the average will improve. Nobody is more motivated to work on an article than those who have already edited it, so they may be interested in a simple "score" plus information why it's low."
  • Giving an editor even a crude rating on an article ("This article is rated as 5.4, click here to see what's needed to improve it") will incentivize and stretch users ("We need quality things to be pushed, incentivized, fun, enjoyable, and desirable to go for.... Suggesting incremental ways to do better... [even] a crude automated evaluation of an article's weaknesses [can help]"). A suggested wizard/popup was proposed, to embody this approach [1].
  • Other major organizations (Macdonalds, Coke, Nike, etc) promote by "mak[ing] it simple, easy, intuitive -- and plaster things (tastefully) wherever they can that channel people towards the ways that help that organization. We're no different in a way. We want readers to be nudged to check out possible corrections and facts to cite, and we want to make that really easy and obvious... we want editors who write an article to have it made really simple and attractive to revisit it to get it one more notch up a crude quality number... and so on."
  • AndyZ's assessment tool, one of several quality rating tools identified by Piotrus, "has real potential if it could prioritize the key issues and suggest them, and if it was made simple with an integrated interface thing that was "once click away" on each page. Every last article that's not GA/FA could have a little tasteful slow-blink icon saying "Improvements we want on this article", listing 2 or 3 selected improvements the article needed and a "Let's fix it!" button... That would get the wider public's involvement."

Piotrus felt this sounded good, advocated simplicity, and asked whether Andy's tool (or one like it) should be recommended for future development.
File uploads

There was a short subthread about file uploads. Bhneihouse commented "the fact that anyone on this team said they "hope xxx works" is a huge statement about reliability factors on Wikipedia. That is a quality statement right there".

Branding

Bhneihouse discussed the idea of "branding" in the broad sense of purpose, mission or "being-ness" (as opposed to just "visual identity"), and how the brand (roughly, what Wikipedia is) drives what Wikipedia does:

"What we are talking about here is really about what Wikipedia is, and thus how it does what it does... we cannot have a conversation about quality without starting the conversation with brand... Brand is intangible but is expressed through that which is tangible, whether it be a mark/logo, or the way customer service responds to a customer or the way that a user experiences Wikipedia".

She also added that "a consistent framework would serve Wikipedia's goals" and remove guesswork. She stated

"[W]henever Wikipedia allows that which is not consistent with its brand to exist as Wikipedia, it dilutes the brand... Wikipedia is about accurate knowledge. Standards in keeping with Wikipedia's core ideals and values keep Wikipedia being Wikipedia".
User feedback mechanisms

Woodwalker asked about obtaining reader or user feedback, and suggested neutral, balanced, complete, well cited as the four axes. FT2 noted Flagged revisions has such a tool already, and suggested balanced, sources, coverage (completeness), up to date" as axes, adding that capturing the reader's knowledge level on the topic (casual editor | knowledgable | very knowledgable | formally qualified) would be extremely valuable (shows the rating that different levels of reader give the article, and reader existing knowledge levels).

 

There was some agreement (FT2, Woodwalker) that "rate this" popups would be seen as "spammy" compared to a toolbar, and agreement about the value of obtaining the reader's knowledge level in terms of "actual audience".

Article rating

Woodwalker states (later) that assessment may not help smaller projects lacking the skills to rate quality, and that rating systems vary between projects, stating in summary, "having more editors is important, but let's be fair: for quality we especially love to have more 'quality' editors". Bhneihouse stated that she felt the quality framework needed consistency and buy-in across projects, not a "pick and choose" structure. Woodwalker felt WMF's capabilities to force change on all projects are very limited (eg BLP) (see below), and that offering options if they wanted to improve this or that aspect was more respectful and likely to obtain higher uptake. He noted that "all projects should in the end have the same quality goals, but they may be in different project phases and therefore need different approaches".

WMF's abilities to force change and working in the "real world"
This was a significant point of taskforce philosophy.

Woodwalker felt WMF has only a very limited ability to force change on communities (see: BLP). FT2 agreed, expressing concern that some ideas would be "dead in the water" in any practical sense, and only certain things can be effectively shaped or altered.

Woodwalker commented he was "philosophizing for perfect world", but that it wasn't our problem if projects did (or were able to do in practice) with it "isn't up to us".

FT2 argued the taskforce's aim was to maximize effect, which meant designing for the real world of the projects, not a perfect world ("the criteria is 'what's most likely to deliver given these things' "). Anticipating or considering possible issues and practicalities that could affect likely productivity was part of the job:

"[I]t has to be a path that's got a good chance of people following it, otherwise it's pointless. So our optimum result might be categorized somewhat openly as the best path that has a good chance of enough people following it to make the necessary difference. Human nature, variety of views, and inertia, will ultimately limit what we can achieve in any given "bite" at the quality cherry. Best to respect there are limits on the achievable (although not giving in lightly), and see what's the most we can progress quality for this time."
Decision making

Woodwalker stated that paralysis in decision making within projects meant that new ways needed to be found, rather than just trying to modify old ways, but felt that ultimately frustration with 'red tape' would eventually force progress.

Sjc commented that "It's not just the amount and complexity of 'red tape', it is also the fact that 'red tape' is a tool which can be bent to purpose by all and sundry. In fact, a creative editor can make the red tape mean exactly what he or she wants it to mean. Red tape is in this respect more of a liability than an asset, where edit wars can be won by the editor most adept at bending the laws of reality to their own intent than others". He (Sjc) advocated removal of policy ("appears to be a fundamental enemy of content and quality of content") in favour of unhindered good faith editing.

(Sjc's essay on this and related points)
Summing up

Woodwalker closed the thread to date, commenting that:

  1. We've found that quality isn't easy to measure by simple metrics, perhaps it's impossible unless we would have some form of feedback from the reader.
    Piotrus states this isn't impossible. Rather, there are several different metrics to doing that, and what may be impossible is selecting the "best one".
    Woodwalker disagreed, noting that while subjective feedback helps, there is no clear way to quantize key goals like "completeness", "balance" or "structure"; even reverting can be for both good and bad reasons.
  2. We've come across some nice ideas how this feedback can be obtained. Important also is to know who gives the feedback (expert/interested person/school kid?).
  3. Piotrus suggests Wikiprojects can play a part and we need more; Bhneihouse suggests Wikipedia can only become a quality brand when there is a consistent basic level of quality across Wikimedia projects (I assume this is not just about Wikipedia); FT2 thinks our recommendations should be in the form of realistic suggestions likely to make the biggest positive effect on quality as they perpetuate, allowing for where the communities are today [as amended]. I suggest this feedback thing could become our second practical recommendation (after 1. creating more manuals/wizards).
Starting point

FT2's impressions at 26 November.

Points arising:

  1. Categorizing the community - newcomers need guidance (wizards, hand-holding, won't know norms), experienced users are already involved so for this group we learn what demotivates and discourages and how to encourage uptake of other tasks. Featured content writers tend to work independently.
  2. Consensus seeking needs improvement if it is to be viable long term.
  3. Quality is not easy to metricate (beyond "substandard"), though ideas have been proposed. Metrics should mirror what we want people to pay attention to (tags, cite ratios, promotions to higher ratings, stability, time taken to resolve identified issues, etc). There isn't a way to capture reader concerns on articles and thereby focus editorial attention.

Questions and issues:

  1. Highs or lows - should we emphasize raising fewer articles to featured level, or should we emphasize the need for all articles to quickly reach some kind of "reasonable/good" level? Does the world assess us more by featured content (clear successes) or poor content (clear failures)?
  2. Effective community - we don't know how to make a large community operate consensus effectively.
  3. Specific tasks - we have specific tasks where we need to get more input, but in a volunteer community they don't get enough attention.
  4. Better guidance - we have specific demands for editors and increasingly high expectations, but need to better guide editors (the anarchic "go figure it yourself" is limiting).
  5. Locking in good content - we haven't worked out how to "lock in" good content, or review lost content. So articles erode unless curated by users who actively watch for poor edits. The watching process is affected by OWNership/tendentiousness/attrition by POV warriors and user departure.

Shortlist for possible 2 - 4 recommendations:

  1. Guiding and handholding newcomers: We need to guide newcomers better, and provide more for experienced users, not just "figure it yourself". For newcomers we need wizards, guidance, "question marks" you can click to see an explanatory popup of a key community term, automated user explanatory notes, etc
  2. Recognition and enhancement for established users: A "Good Editor" standard, more insight into other wiki-specialisms, masterclasses, better help in disputes, etc. Migrating newcomers to established users is half the deal; keeping them enjoying their editing is the other half. We need to value them - and act like it, and show it.
  3. Reader feedback: "Completing the loop" by allowing review and feedback and ways for editors to be notified where to pay attention.
  4. Focus on the "low lying fruit": Focus on articles that are visibly substandard, as a starting point. Set defined quality baselines that articles should aim to reach within 1 - 3 months of creation, and metrics and information that specifically focus on that gap (measuring crudely via tags and clickable user feedback for issues). That's a useful standard for readers, much easier to measure, they're easy to fix, and this affects far more articles than any other likely targeted change. It's also easy to incentivize authors to fix such issues, and feeds through to GA/FA articles and editorship.
  5. Key tasks under-attended: We need to solve the problem that key tasks get too little attention. Some kind of "task manager" feed where simple matters get added (missing cites for facts, bias queries etc) and which users and readers could filter based on their preferred areas, interests, or articles they are reading/watching. ("This article has requests for help that match your filters, do you want to read them?" may get a much better response than a mere "citation needed".)
  6. Consensus mechanisms: Draw communities attention to the overriding importance of deciding to improve their own consensus-building and difference-resolving methods, as a way to empower themselves on quality.
  7. Quality lock-in and erosion: We need to find a better way to "lock in" quality, against erosion, users who habitually watch articles but then leave, long term steady flow of poor/misinformed editors or deliberate edit warriors or people with long term agendas, etc. Flagged revisions is one way but a number of users object to it. How else?
  8. Automated crude quality ratings: If users can see their article is rated at 3.2, and what's causing it to get that grade, they have a clearer incentive to get it to grade 4 or 5, and patrolers can focus on poor articles to improve the long tail at the bottom. It needs to just reflect basics that matter, as a spur to mediocre or poor articles and to inform and encourage their authors or readers. ("This article is only rated at 2.1 for quality. Click to see if you can help Wikipedia with any of these issues". Leveraging engagement is crucial)
  9. Interwiki workflow: We need to improve the ability of knowledge to flow between wikis. Being able to see the quality ratings of the same article on other wikis (not just their existence) is one possible way.

Bottom line questions - What changes have the most significant effects long term? Which things most impede other changes?

General discussion

Randomran thinks it's a good summary but stepping back to the bigger questions will also help: "Why has quality not improved at the rate that we hope it would? What are the biggest barriers to quality on Wikipedia?"

He feels that teaching newcomers, improve consensus mechanisms and defining quality better are all valuable. But also "ask people why they think it's so hard to make Wikimedia projects into environment of quality. The recommendations will be much more persuasive (and effective) if they're matched with big problems."

Piotrus agrees with the suggested recommendations. He comments that in his view Wikipedia is judged predominantly by the places it fails or falls short, not by its highs (more below), but that consensus wouldn't be a problem if good faith and civility (conduct norms) were followed. He singles out "Recognition and enhancement for established users" and "Focus on the low lying fruit" as "very valid", but "Quality erosion" may need a closer look because sometimes it's the quality standards that change, not the editorial context.

Is Wikipedia judged more by its successes, or its failures?

Piotrus states that in his view Wikipedia is judged more by poor content and failures than by successes. "Many readers don't really distinguish between GA or FA; they are rather annoyed/confused by poorly written (or non-existent) articles". There is general consensus on this as a key point.

FT2 states that reducing clearly substandard items is a universally recognized core means of quality improvement:

"The wiki probably [has] 2 million articles we could easily get pile-on help to improve to a recognized and reasonable quality baseline with ease.
We have hundreds of thousands of members of the public who'd love us to make [it] so smooth they'd take to it as a fish to water, knowing when they needed guidance or ideas on improvement it was "just there".
We can make huge strides by addressing the easy but vast majority cases... [they are] very amenable to automation, they scale, and they encourage incremental improvement in other ways..."

He notes we would still need to support established users and address low quality ones, but GA/FA will flourish due to these other actions.

A baseline quality standard

Randomran feels what we may need (and is being described) is a "basic 'safe enough to eat' quality standard". He asks what this gets us and how we address articles that don't meet the baseline.

FT2 agrees, describing it as a communally agreed "baseline for quality". (Example criteria). What it gets us - Much easier to drive or promote basic improvements, educates new users who write them, hence likely to make good inroads, automatically, on a large scale, engaging the wider public, and relevant to what critics most notice; What about sub-baseline articles - they would be hit hard (next section)

How baseline standard can be attained

FT2 describes his view:

"We set up automated systems, "Help fix this!" buttons when someone views an article, feeds for individual substandard issues, "Fix a random issue" button, everything we darn can, and drive like hell that EVERYONE can help fix basics in articles, readers, people who've never used Wikipedia before.    ANYONE.
  • "You can look up a citation, here's how!" ...
  • "You can check a fact, or if a statement/section is fairly tagged, here's how!" ...
  • "You just wrote an article, and I noticed some improvements that will help it stay on Wikipedia. Here are the top 2 items!" ...
  • "This article has requests for help that match your filters, do you want to read them?" ...
  • "This article is only rated at 2.1 for quality. Click to see if you can help Wikipedia with any of these issues" ...
We push like hell for it, using automated methods, to get this kind of work automatic. That's what we do."
Starting point

MissionInn.Jim suggested users could rate articles (more weight to recent ratings), and that a Wikibibliography of sources (also rateable) might be valuable. Articles could then be rated based on the quality of sources used. Similarly editors could be rated too, based on quality of edits.

General discussion

FT2 agreed with rating of articles by users, but disagreed with rating editors due to incendiary potential and pressure to "game" (make oneself and friends look good or worse, make opponents look bad). He felt a sourcing index might have issues too: sources contain good and bad material, making generalization hard, cites can be gamed like nobody's business, overall a lot of work for questionable hard benefit.

(Slrubenstein states he "agrees completely with FT2 here" but it's not entirely clear what this relates to, and it might relate to the distinctions noted below on "trusted/senior editors")

MissionInn.Jim suggested editor rating issues could perhaps be mitigated, and a catalog of rated sources "could be a challenge, but I think it is worth exploring".

Article rating, editor rating, and experts

Slrubenstein notes that our consumers are also our producers, meaning we are not simply "providing a service" as such. Users who visit to read an article may not know enough to assess it for quality. His main point is:

"One index of Wikipedia's poor standing is the number of university professors who discourage students from using Wikipedia as a source. I think one reason why Wikipedia has quality problems is because too few of our editors are experts (e.g. university professors) on the relevant topics. When more professors are editors, more professors will judge article content highly and encourage their students to use it".

Slrubenstein noted that while more professors (etc) are contributing, the rise in non-experts/non-academics is much faster. We need (he feels) to get more experts on boat to balance the community and improve our ability to rate articles to a high standard.

Piotrus stated that users/editors rating articles is good, but needs care what to do after the page is edited - and especially after "major edits".

BarryN (Bridgespan) stated that rating content was "a really productive area" and agreed there were good ways to crowdsource suitable quality information. One approach would be to obtain simpler feedback and have a team correlate the feedback received with expert assessments, which would (probably) quickly allow a "simple content rating tool" to be set up which would partly compensate for the lower proportion of experts.

He also thought that such a tool might provide a basis for user rating, basing user ratings upon the changes in quality in each of their many edits in some manner; "As the ratings are generated for each article, they could become part of a portfolio that provides for recognition of the contributor's work. [T]his would have positive synergies with the community health work as it would reward positive contributions more clearly".

Randomran cautioned that "making it too subjective will just make it gamey. You'll see different political ideologies, different religions, using it as a way to express disapproval over the perceived "bias". ... that's if they haven't already gotten there first to give the article a ten, and use it as an excuse to prevent the article from improving. ("My 'Criticism of Barack Obama' [article] was rated a ten, so you have no right to start changing it")." He feels rating would be "a bad idea if there aren't some checks and balances". Woodwalker felt that feedback on different areas would allow the POV data to be separated from other signals.

Brya also agreed that any kind of rating system would be "gamey". "[T]he emphasis should be on reader feedback (readers outnumber editors by a huge margin), not on ratings by users, but... A very likely scenario is that articles that get good ratings will attract attention from editors, with deleterious consequences."

Editor rating and trusted/senior editors

MissionInn.Jim asked FT2 how disagreement with editor rating was consistent with the idea (elsewhere) of recognizing trusted/senior editors ("How would you arrive at trusted / high quality users if they were not rated in some manner?").

FT2 clarified that rating editors automatically or via a formal schema would be a target for gaming. But a "trusted/senior user" system would be just one level that's granted or not:

"Users aren't being 'rated' [in that proposal]. It's a means of recognition of trust. A user who "sometimes" edit wars but "mostly" edits well, or "usually" adds cites but "a few times" has acted improperly in content work per consensus, doesn't get a "slightly lower" rating. They get no "trusted content editor" standing at all".
Sources, cites, and trust

Woodwalker states that "The poor state of the verifiability principle is probably the main reason why Wikipedia isn't seen as a trustworthy source. The problem is not the quantity of sources, but the quality of sources and [their] balance".

Starting point

FT2 followed up an earlier post which stated that "creating an official usergroup of trusted (or good quality, or senior) content editor might be the single biggest step towards helping here that we can make":

Suppose a community process existed for recognizing editors who are trusted in their content work. This means (eg) they consistently work well on content, edit neutrally, don't edit war, collaborate instead of fillibuster, make generally good edits, cite well, improve content, debate issues instead of attacking personalities, and have a good general ability. No "extra rights" but...

  1. They have an investment in extra standing (reputation-wise), which is valuable and a source of reward. It will tend to be guarded and incentivize people.
  2. We can patrol articles and edit wars easier by highlighting these users in the article history (valuable information).
  3. Not all editors will want or seek adminship, this is a parallel way to recognize those who edit content. If it can be made something "everyone should aim for" we might have many thousands of editors flagged this way, a strong impetus for a quality based community.
  4. In an entrenched area or difficult edit war, Arbcom or the community can now say "any trusted content editor may edit the article. Others = talk page only". Many many trusted content editors, so no real POV or "too narrow editors" issue here. Anyone who wants to edit the mainspace and can get community agreement they edit content well, may join in (others only on the talk page). Instant stability, good decisions, consensus, and quality, on problem and edit war articles! No harm done, no bias added, articles still edited by a wide pool that anyone can join.
  5. (added later) While wiki isn't a "profession" this provides a way users can stretch their skills and a means of self evaluation and development as editors. A "recognized wiki-editor" qualification would also be good for ethos.

If we only have 2 - 4 recommendations, this might be one with profound scope to help in many ways - stability, quality, entrenched edit wars, experienced editor enjoyment, and incentive to gain good content editing skills and edit well.

General discussion

Randomran felt it was a good idea if de-politicized and factional gaming could be avoided. Multiple FAs and a good track record of civility and consensus building would be good criteria.

Piotrus felt it made some good points but expressed concern that a user who was reliable in one area might have trouble in another, or that users editing in divisive areas would be unable to gain such standing due to mud slinging by other editors and cliques. He suggested that once a large number of trusted/senior editors were appointed, the task could be left to these users (who would presumably be less prone to mud-slinging): "In other words - I can trust the quality editors to make quality decisions, but I am increasingly disappointed with flaming and mistrust-sawing comments from "the peanut gallery" in various discussions I see".

FT2 responded: 1/ it would also need an effective and hard-to-game removal process; 2/ "appropriate recognition and self management where they have COI or other problems/strong views...[the] idea implies good editorship on articles the user cares about, too"; 3/ pure self selection can encourage divisiveness, where we're aiming for mass involvement and good standards. A two-step process might resolve the issue.

Bhneihouse liked the "overall feel" and wanted to further refine it.

Non-anonymous experts

Piotrus noted that giving non-anonymous experts extra standing could be another (or complimentary) approach. FT2 noted that experts don't always make desirable editors and have the same learning curve as any other person on how to edit appropriately:

"This is about recognizing people who know how to edit and are accepted as good at doing so. Nurture those, and [everyone benefits] including for experts":
"I'd tell the expert that (like joining any new project) they need to learn how to edit Wikipedia, which will be different from how they edit their own papers. But they can edit freely, and (since they are bright and used to scientific collaboration) they'll surely be recognized as a trusted content editor for our purposes nice and quickly, if they take a few minutes to understand how we work here. In fact I'd make that part of the "New user wizard" ("Do you have formal credentials in any field you plan to edit?" + guidance)."

Bhneihouse suggested a list of known credentials may be useful for screening.

Sue Gardner's post

Sue Gardner (WMF) stated that this would be a useful and good idea, for both new editors (to identify trusted editors) and for not-new editors. She expressed a strong preference that such recognition should be assessed automatically rather than manually to address scaling and "popularity contests", and save time.

FT2 felt automation could help and could cut down the cases needing review, but that the benefits flowed from the idea that "some users can generally be trusted to do right, in article and article discussions, of any kind. Those are the users we want in this pool, because once identified, they provide a large population of experienced content writers not needing much guidance or checking, and capable of being given heavy disputes to put into good editorial order... The benefits here flow from their acknowledged trust to do right (broadly speaking) on any content matter, up to and including self-management of bias, interaction style [etc]". He felt this could not be assessed without human involvement.

Bhneihouse agreed about non-automation, and agrered with Piotrus and Sue on "popularity contests". She noted that "Perhaps a pivotal point in quality control is how Wikipedia "approves" and trusts editors? Perhaps another pivotal point is the actual "structure" of this process?"

Sue Gardner commented in response that this would be

"a marker/label for people who are particularly trusted to have good judgment. Probably these would be people who've been around for a while, and understand the policies well, who are reasonable and thoughtful. I think that's a great idea. I think new editors would really appreciate being able to tell at-a-glance if an editor they didn't know was someone they should trust and listen to..."

She also noted the distinction would greatly help newcomers, who could understand the editing better and seek reputable advice, as well as helping those who deal with editorial behaviour and disputes. It would extend the usual network of trust, which usually doesn't scale well. She did not like the term "trusted editor" though (implies others are not trusted) and suggested "senior editor".

She also worried that the system for gaining the standard would be gameable or (per Piotrus) lock out editors in controversial areas. She felt the decision should be made by "thoughtful... experienced people" and editing criteria, not by simple voting, and perhaps a "trusted team" to identify such users. She concluded it was a good idea she would like to see work.

How would such users be selected?

Piotrus suggested a user should show quality content (1 FA/5 GAs/50 DYKs), and be trusted only in areas he has a reputation (WikiProject based), and this would be hard to game or disrupt. FT2 stated that writing certain content would not necessarily correlate to trusting their editorial approach generally, would not show neutrality or good conduct on their "pet subjects" or appropriate talk page approaches. It could at best be evidence.

Randomran agreed some processes are "vulnerable to whim and personal opinion" and saw FA as one of the few that is hard to game as an indication that a user understands quality writing, adding later that "The human component can be there as a screen, as a veto, but people should really be judged by accomplishments that the widest number of Wikipedians cannot deny".

FT2 noted that good content writers contain a "fair proportion of users who couldn't meet the kind of role we're talking about" and suggested two possible approaches:

  1. Specific criteria (user presents a portfolio meeting set criteria; objections must meet defined and evidenced criteria too). It's prescriptive and somewhat gameable but much harder to game. (details)
  2. a 2 stage highly automated process whereby users only needed 50% community approval (hard to game and mass participation) and a 75% "trusted user" approval (high standards, veto, allows existing senior editors to see any community views and concerns), both parts held via SecurePoll for efficiency and to prevent "popularity contests". (details, also discussed below)

Woodwalker was "not against formalizing the status of quality users" but was concerned that expertise in one area might not equate to expertise in another. FT2 highlighted that a generally well reputed and skilled editor could be more of an asset in the contexts under discussion due to issues of foibles and bias, collaboration and mass editing skills and the non-expert knowing how to help others make the best of their input (including experts). "If we're assessing what kind of editor can be broadly trusted to work on all kinds of difficult articles unsupervised... in a proper way... [then these qualities] will get you that person... a PhD won't". (link). Woodwalker agreed but suggested not calling them experts if they were not, and preferred a criteria-based approach. Piotrus suggested looking at their activity record, and only considering concerns from the last 6 months (the criteria had included a 9 month cutoff).

FT2 noted a panel would have the same issues of gaming and politics, and that we should trust the wider community; it is easier since all issues (including most alliances and gaming) would be visible and public. Rather than trying to be perfect on selection, create a method that is 95% valid but "slightly able to be gamed", along with a "clear and standardized removal process" and "some kind of scrutineers panel [for] cases claimed to be grossly affected by bias and canvassing, or where the results don't reflect appropriately on the user". Woodwalker agreed that community not panel was appropriate, and Piotrus agreed that addressing "popularity contests" was very important indeed. Philippe endorsed designing for 90% ("good enough"), and Bhneihouse stated the two stage process sounded good and process now needed adding to handle the exceptions.

Further discussion on the 2 stage proposal

FT2 described the latter as "a hybrid of enwiki Mediation Committee's nomination method (filter[s] good quality users and operates historically with no drama whatsoever) and a modification of the SecurePoll tool already in place... a bit more involved than 100% automation, but it is simple (once set up) and keeps almost all the benefits of automation, all the benefits of user involvement, and very little of the drawbacks of either, when merged." Its design goals were stated as "automation, low gameability, simplicity of experience to users, very low scope for politicking/dramatizing/popularity contests, and low time needed by participants".

Some discussion took place on its talk page:

Randomran stated he had concerns it can be gamed and a threshold is needed to weed out bad applicants and prevent fillibustering.

FT2 noted that this would not help a clique, because all the community can do is 1/ stop someone getting 50% (which is still an easy level for a decent editor in such circumstances) and 2/ raise concerns, and filibustering doesn't work because it isn't a "debate". Community concerns will be publicly visible after the 1st stage and inform the second, but the 2nd stage is not easily influenced by partisan cliques because its constituency is editors already considered to be high quality and of good judgement.

Piotrus agreed this "sounds plausible and indeed should not be very easy to game".

There was some discussion how to bootstrap the process (the users who will operate the 2nd stage at the start) - Randomran suggested using 2+ FA writers. FT2 stated it was a once-off issue and needed the highest quality of content editors to get a good start; he suggested using the subset of FA/GA writers who have also passed RFA (the latter attesting to other areas of trust, awareness and judgement). Since RFA often requires content work this is a substantial pool of FA/GA writers and "probably enough" to start it off.

Randomran agreed that 2+ FA writers who had also passed RFA would be a strong pool, but (being devils advocate) was concerned that it might be taken as a cabal ("I'm not sure this is a bad thing in practice. But in principle, a lot of people just hate cabals"); that it might still just reflect popularity; and might still exclude editors in controversial areas. He felt it could perhaps be strengthened against these issues.

Randomran noted some useful data for the taskforce (link).

Starting point

Piotrus summed up some rough agreed points on quality editor departure and negative reinforcement (discouragement) and his early conclusions and recommendations:

  1. Internal wiki processes with regards to improving content quality (eg assessment/FA/GA) are functioning well are not in need of any serious reform.
  2. Number of editors correlates with article quality, and quality is threatened by a decline in editorship. In his view this is "the biggest danger to the project's quality and very survival is this decline, and recommendations should focus on stopping and reversing it".
  3. We should endorse solutions to increase editor uptake and decrease editor burnout or discouragement
  4. We should gather more information to survey departed editors, software to facilitate easy project-wide surveys in future, issue a call for academic studies on the observed decline in users; and encourage a peer-review outlet on Wikipedia research.
  5. People "edit Wikipedia because it if fun and otherwise personally rewarding"; negative discouragement must be strongly countered and limited (better than at present) to allow quality to be established.
  6. Many editors leave after attacks; those attacking tend to remain. It is rare that civility is properly enforced.

Specific recommendations:

  1. "we need to be more active with dealing with personal attacks and other civility violations. Telling editors to grow thicker skin is resulting in them leaving the project... [The] "Personal attacks noticeboard" needs to be recreated, and admins must be coached to treat personal attacks and other civ[ility] violations as seriously as 3RR".
  2. Editors who edit non-anonymously should be as well protected as BLP subjects. (Even one unfounded or exaggerated accusation against them under their real name could be a concern in some cases)
  3. More surgical remedies are needed than simple bans. Blocking users for a few substandard edits, when in fact their overall pattern is a concern, is common but sub-optimal. "[S]olutions like article / topic bans and probations, 1RR restrictions, civility restrictions, talk page bans, and such should be used more often. That said, an important qualification to their use is whether the editor sanctioned [sincerely recognizes] s/he did something wrong..."
  4. Harsher remedies may be a factor in sudden departures; more mentoring and a lot more focus on positive enforcement (recognition, awards, mediation, counselling, etc) would help.
  5. "It is very rare for [enwiki] ArbCom to recognize that editors who were in some ways disruptive are also constructive. Often, FoF/remedies will treat an editor who did few errors harshly, ignoring his good contributions and issuing a generalizing findings that he was disruptive, damaging his wikicareer, and ignoring his attempts to reform. I think it is imperative that ArbCom reevaluates its approach to editors and in particular, tries to be more constructive than destructive, in line with our policy that sanctions should be preventative, not punitive."
General discussion

FT2 agreed with "the sentiments" but was concerned that suggestions at this level would be very limited in terms of ability to change culture, and would quickly become "yesterdays news" -- "I like the ideas you have, but for my money we need to focus on what would make the most significant change[s] to progress quality, that once set up achieves the most, with littlest communal 'pushing'... anything that relies on telling a large number of users "do this instead" ("this" = some ongoing behavior change) is doomed to failure -- it'll be old news in a week, 99% won't hear in the first place, culture won't be changed by it, new users won't hear it"

Piotrus affirmed that "the biggest problem facing us is the erosion to due to hostility/incivility/crude sanctions, and our recommendations need to be very clear on the need to decrease negative reinforcement and increase positive ones" and that the suggestions were intended to guide specific (mainly positive reinforcement and support based) solutions.

FT2 was concerned that merely "saying nice things" and "too narrow proposals" would not help much for a user on the way out, and suggestions directed at Arbcom would not be of much impact either -- he felt we should focus on the community and identify recommendations with a permanent and pervasive effect that does not rely on a general statement "telling people what to do".

Piotrus was concerned that this focused on getting new editors but ignored the need to prevent erosion of established editors.

Bhneihouse wanted to see a bigger picture and felt this was too detailed at this point in time. Piotrus suggested the more general suggestions should be retained with a view to ways to make them practical; which was generally agreed.

Starting point

MissionInn.Jim proposed a process for the taskforce (link):

  1. Agree on a framework for the taskforce, and how recommendations will be developed, including what subpages will be needed
  2. Define the objectives of the taskforce, and the definition of Quality using Woodwalker's defintion as a starting point
  3. Itemize all known Quality Issues (QI) and Barriers to Quality
  4. Prioritize the list of issues
  5. Itemize the solutions that have already been proposed, including those here
  6. Identify and assign research tasks.
  7. Associate each proposal with one or more issues
  8. Develop one proposal at a time, starting with the proposal that will best address the highest priority issue.

Randomran noted a similar process was being followed at the Community Health taskforce, and that ensuring issues and solutions were backed up with data would also help.

Starting point

Discussion of a newspaper article quote about Wikipedia:

"The loudest voices and most obsessive contributors become the arbiters of truth"
General discussion

Bhneihouse affirmed this was common in education and that "[s]ometimes the loudest voices actually are correct, many times they are not". She posed two questions - how to vet contributors, and how to encourage the softer voices.

FT2 opined that the decision rule for removing a user from a discussion was incorrect. Rather than being "unless egregiously horrible allow them to continue" it should be "upon request show a good standard of conduct or must leave". This would lead to a basic rule change: "Any admin can require any participant in a discussion to show better quality of editing/interaction -- failing which can remove them from that discussion thread unilaterally for a while". (Ie, rather than blocking for bad conduct, any admin could require that a user improves their contribution in a thread or leaves it alone.)

Bhneihouse agreed strongly with the idea "ops can demand better conduct" stating "Aim for the high benchmark and expect everyone to live up to it". Using childrens' conduct as an analogy she explained "[In] a group of kids, if a supervising adult allows any bad behavior, the [group] dynamic exponentially changes... If expectations are communicated up front, the kids are more likely to behave and when non-behaving kids are taken aside, the remaining kids tend to support the adult's decision. The key is knowing the rules up front".

FT2 stated the present dynamic is "very slanted towards bad behavior [being] okay provided not toooo bad"; being a content writer is seen as a "get out of jail free" card, gets pushed hard,. Admins get "slammed" for it, leading to reluctance to enforce and an ideal setup for gaming. "[A] sea change to "above average conduct, no excuses, and any admin can enforce it" would be huge. It would really be a community sea change. But it may well be what's needed. Problem is how do you get to there".

Randomran concurred with these views:

"As long as I don't personally attack you, and as long as I have a few other people to back my side up, there's no way you can get your changes through. And if I refuse to cooperate with you, and say that your point of view is destroying Wikipedia, and I will oppose you until my last breath, there are no consequences. And if you block my changes on one article, it's not to say I couldn't find enough people to get my changes through on another article and slip it passed you. You can just imagine how well this works out for controversial content areas, and what it does for quality."

Bhneihouse commented perhaps such people should not be part of a "collaborative community". She asked:

"How does a person who is ego driven fit into what Wikipedia IS? Maybe they dont, and maybe that is OKAY... If they dont buy into the brand, maybe they shouldnt be playing in the "game". Why do you think I keep talking about brand? Because it drives everything else. It even drives getting rid of people who gum up the works whose actions are totally inconsistent with the brand/mission/vision/goal."
"There just seems to be this idea that Wikipedia will take everyone and their ideas regardless of how they act or what they think. And that idea drives a lot of behavior on Wikipedia. But I do not think that idea is consistent with what Wikipedia is. You cant have a world class encyclopedia that is free that allows people to abuse the privilege of sharing information with the world.
Do you see the shift? We go from accepting everything that everyone wants to give us to only that (conduct, information, content) which is consistent with what Wikipedia is and where Wikipedia is going. That isnt copping out or leaving anyone out, that is making the choice that needs to be made in order for Wikipedia to do what it set out to do"

FT2 noted he would reach the same question ("should we accept just anyone") by a different route. Bhneihouse explained her view on brand as a driver and "a bit like a soul", and how it touches everything and permeates all decision-making.

Starting point

FT2 suggested a list of "friends and enemies of quality" for brainstorming, categorized as:

  1. Related to well-intentioned content editors
    • factors affecting skilled contributions,
    • stability/erosion,
    • capable editors with collaboration issues
  2. Related to other editors
    • users who are ignorant about (or have their own view) what Wikipedia "should" be for.
    • Most contributors aren't experts (how to maximize the benefit of their work?).
  3. Related to content consensus
    • Content battles never end (no way to get a final agreement), but equally too much fixity or finality would kill the ability of wikis to be self-correcting, fossilize consensus, and make topics into battles for "Wikipedia's official view".
    • The looked-for "meeting of minds" often doesn't work.
    • It can remove skilled as well as "fringe" views.
    • It's a lot of time, energy and work (Eg, rejustifying basic PhD knowledge to a lay-person)
  4. Related to problem conduct
    • Tendentiousness is easy.
    • There are capable content users who are bad community members.
    • Some users learn to "play a game" and can enforce their view or wear out/exclude well intentioned users.
  5. Related to conduct enforcement
    • Admin standards and expectations vary.
    • Admins role is to address bad behavior but they get attacked, so they only address obvious bad behavior... leaving it easy to engage in bad-but-not-egregious behavior.
  6. Related to perception
    • Public perception matters. Readers and possible (and actual) editors may decide their trust levels and level of involvement (if any) based on public perceptions in circulation. So things that matter to the public perception should get a very high priority in mattering to us, even if technically other things might matter more in some ways. ("Expectation management").
    • Wikimedia isn't a "fad" any more, so it needs to work for its recognition and share of public time, against the "next big thing" out there.
  7. Rerlated to wiki-cultural philosophy
    • Successful ventures reinvent or reorient themselves, and are not afraid to say "this direction doesn't work any more". We may need to do that on some old cherished beliefs, even if their foundational principles are valid.
Starting point

Bhneihouse felt a "big picture" and "framework" was important. The dialog looks (to her) like people talking about what they feel important, rather than commenting upon what works and doesn't work in the framework. It is unclear if questions are being answered, or even if the right questions are being asked. The key questions (in her view) are for example:

  1. What is Wikipedia?
  2. What does Wikipedia provide?
  3. How does Wikipedia provide it?
  4. What works/doesn't work about how Wikipedia provides it? (this is where most of what you all have been typing in would go.)
  5. What needs to change in how it is provided? (here is where the conversation about policies goes)
  6. How do we build buy-in to the changes, especially when people are used to doing it a certain way?
  7. How do we communicate the Wikipedia "brand" through all of this?
  8. What is the big picture for quality?
  9. What does quality NOT touch?

She asked that a page is set up to outline a more complete quality framework based upon these kinds of questions, and commented that policy is an effective tool to get certain results, but policies cannot take the place of principles.

The document on branding is relevant to this thread (File:Branding flowchart full version.pdf).

General discussion

FT2 commented we are one of several groups, so we can probably afford a simpler structure, and take a number of areas for granted (what Wikimedia is, goals, audience, what quality of content is). We can then focus more directly on the aspects of [content] quality that will have maximum effect, how to enhance them (or reduce the problematic issues), and look at long term structural threats to quality such as "political" editors, or fixed (arbitrated) content.

MissionInn.Jim felt we need to get beyond threaded discussion, organize the information, and move on to a process of producing "real deliverables" and how we will approach quality improvement overall.

FT2 felt the dialog so far was producing firm deliverables and hard discussion. Bhneihouse felt we had ideas and fragments, but (as yet) no cohesive framework, and suggested starting with "a new beginning" to put these into some kind of orderly structure. MissionInn.Jim noted that the input was good , but felt:

"[W]e need to be more organized and methodical about how we approach this problem (i.e. the problem of how to improve quality). Discussion lists are good for discussion, but they do not provide a summarized and organized presentation of the material... threads are popping up arbitrarily based on whatever comes to mind of each Task Force member. It is unclear to me what our... priorities are, or should be, or why I should spend my time on one particular thread versus another"

FT2 felt that although threads might start on "whatever came to mind", a convergence was likely; "[T]he landscape itself is holographic. Wherever we explore, if we do it well, we'll probably come to the same (or similar) final conclusions". He felt that starting by exploring a variety of approaches and significant issues would in the end inform a more useful structure (based on reality and experience rather than theory or assumption).

Bhneihouse felt the discussions could be highly valuable in a thinktank, but to remember we were not just a thinktank; we were required to produce a few concrete and well selected recommendations by a fixed date. A framework could help ensure coherence.

MissionInn.Jim agreed this was his concern too: "Regardless of the specifics and what we call it, we need some kind of structured approach to make sure we are focusing on the right issues and doing the right research. Unless we do that, we will just continue discussing issues endlessly. The discussion and ideas presented so far really show a lot of creativity, but we need to harness it"

Bhneihouse felt taking anything for granted was unwise and advocated a "zero" starting point based on branding: "If I do not fully understand Wikipedia's brand in the same way as the rest of the team, then how can I understand or agree to what the structural threats to quality are? ... [I]f Wikipedia is trying to be a well vetted source of reliable, neutral information, then anyone that gets in the way of that doesnt belong in the community."

FT2 noted this led to a related ("on its head") question ( covered in another thread, link):

"How much quality should we be prepared to sacrifice, to allow what level of public editorship? Or,
What degree of cost to the community is "too much", ie, the point where we cannot seek more quality without undue harm in some other area?"
Starting point

Woodwalker considered how quality might be defined or measured. The starting point was his essay " On Quality" covering assessment of quality, edits, and editors, and the concept of "content erosion":

The Dutch Writing Contest criteria list seven factors for assessing quality (link): Lede ("lead"), article structure, page layout, article content, style, verifiability, findability.

Woodwalker proposes a more general set of factors:

  1. Content requirements
    • encyclopedicity, verifiability, neutrality, balance
  2. Reader requirements
    • interesting, completeness, depth/specialism level/reader level, relevance of content to subject matter
  3. Requirements of form
    • correct (or good) language, encyclopaedic tone, text style and readability, clear and easy article structure, clear and easy layout,
  4. Broader project requirements
    • findability of topics, and coverage , balance, consistency, and completeness of topics in general (across the project).

He also divides edits and editors by their impact on quality:

  • Edits that only add quality, those that only remove it, those that are neutral (eg US English to UK English), and those that add quality in some ways but remove it in others.
  • Editors who add quality by adding new content; editors who add quality by maintaining (preventing degradation, and making minor improvements to existing content - a continuum exists), editors who remove quality, and "problematic users" who add quality some ways or work hard generally, but overall come to be seen as harmful or negative overall in quality terms.

He then describes the (important) concept of content erosion:

"If we assume the rate of change and the percentages of all four edit types to stay constant, the rates of quality increase and quality decrease are constant too. This means any page in the project is subject to a slow decay in quality, which I call wiki-erosion. Quality is guarded by the community though. The ability to revert destructive edits of all kinds is thus related to the amount of knowledge in the community. This means destruction can only go as far as a certain quality level. If the community is larger, that level will be higher, if it is smaller, the level will be lower. Thus, in the long term, quality of any sort will stand a larger risk of being destroyed at small projects, even though the wiki-erosion rate is much smaller."

Woodwalker feels current analysis of quality is lacking, but perhaps by analyzing these components, the statistical team can find better ways to assess quality. It may also lead to practical ways to improve specific quality factors rather than unhelpful generalized suggestions, that projects can tailor to their culture.

General discussion of quality measurement

Piotrus argued metrics are overrated. Virtually all scholarly publications agree quality is high and rising, and internal categorizations (Featured/good/assessed/stub etc) are adequate for the rest, for now. He drew attention to improvements of this scheme, including articles outside WikiProjects and hence not rated, and low activity Wikiprojects without adequate discussions or members to support rating. He sees more (and more active) Wikiprojects as a core quality tool.

FT2 states three kinds of metrics are useful and attainable: 1/ crude metrics such as computer assessments based on tagging, cite to word ratios etc (and some calculation based on these) which can be used to crudely identify major issues and assess articles up to a simple baseline; 2/ article progression and conversion metrics based upon article standing (new -> baseline quality -> good -> featured) and time taken between these, and article stability; 3/ assessments based on user and reader feedback ("rate this article").

Baseline quality and the "low fruit"

FT2 observed that "the 'low fruit' is appealing [to focus upon] -- metrics relating to substandard articles that don't meet a agreed baseline for quality, or measuring how long they take -- because there's lots of them, they make a big impression, they are easy to identify and quantify the issues, and they are easy to fix. Maybe for now, we should recommend focusing on that."

Incentivizing and promoting quality

FT2 explored this topic in several thread posts. Comments included:

  • Adding a baseline quality and crude automated ratings, would "capture basic issues that are a concern and could flag them to the author and the community. If we take care of the worst articles then over time the average will improve. Nobody is more motivated to work on an article than those who have already edited it, so they may be interested in a simple "score" plus information why it's low."
  • Giving an editor even a crude rating on an article ("This article is rated as 5.4, click here to see what's needed to improve it") will incentivize and stretch users ("We need quality things to be pushed, incentivized, fun, enjoyable, and desirable to go for.... Suggesting incremental ways to do better... [even] a crude automated evaluation of an article's weaknesses [can help]"). A suggested wizard/popup was proposed, to embody this approach [2].
  • Other major organizations (Macdonalds, Coke, Nike, etc) promote by "mak[ing] it simple, easy, intuitive -- and plaster things (tastefully) wherever they can that channel people towards the ways that help that organization. We're no different in a way. We want readers to be nudged to check out possible corrections and facts to cite, and we want to make that really easy and obvious... we want editors who write an article to have it made really simple and attractive to revisit it to get it one more notch up a crude quality number... and so on."
  • AndyZ's assessment tool, one of several quality rating tools identified by Piotrus, "has real potential if it could prioritize the key issues and suggest them, and if it was made simple with an integrated interface thing that was "once click away" on each page. Every last article that's not GA/FA could have a little tasteful slow-blink icon saying "Improvements we want on this article", listing 2 or 3 selected improvements the article needed and a "Let's fix it!" button... That would get the wider public's involvement."

Piotrus felt this sounded good, advocated simplicity, and asked whether Andy's tool (or one like it) should be recommended for future development.
File uploads

There was a short subthread about file uploads. Bhneihouse commented "the fact that anyone on this team said they "hope xxx works" is a huge statement about reliability factors on Wikipedia. That is a quality statement right there".

Branding

Bhneihouse discussed the idea of "branding" in the broad sense of purpose, mission or "being-ness" (as opposed to just "visual identity"), and how the brand (roughly, what Wikipedia is) drives what Wikipedia does:

"What we are talking about here is really about what Wikipedia is, and thus how it does what it does... we cannot have a conversation about quality without starting the conversation with brand... Brand is intangible but is expressed through that which is tangible, whether it be a mark/logo, or the way customer service responds to a customer or the way that a user experiences Wikipedia".

She also added that "a consistent framework would serve Wikipedia's goals" and remove guesswork. She stated

"[W]henever Wikipedia allows that which is not consistent with its brand to exist as Wikipedia, it dilutes the brand... Wikipedia is about accurate knowledge. Standards in keeping with Wikipedia's core ideals and values keep Wikipedia being Wikipedia".
User feedback mechanisms

Woodwalker asked about obtaining reader or user feedback, and suggested neutral, balanced, complete, well cited as the four axes. FT2 noted Flagged revisions has such a tool already, and suggested balanced, sources, coverage (completeness), up to date" as axes, adding that capturing the reader's knowledge level on the topic (casual editor | knowledgable | very knowledgable | formally qualified) would be extremely valuable (shows the rating that different levels of reader give the article, and reader existing knowledge levels).

 

There was some agreement (FT2, Woodwalker) that "rate this" popups would be seen as "spammy" compared to a toolbar, and agreement about the value of obtaining the reader's knowledge level in terms of "actual audience".

Article rating

Woodwalker states (later) that assessment may not help smaller projects lacking the skills to rate quality, and that rating systems vary between projects, stating in summary, "having more editors is important, but let's be fair: for quality we especially love to have more 'quality' editors". Bhneihouse stated that she felt the quality framework needed consistency and buy-in across projects, not a "pick and choose" structure. Woodwalker felt WMF's capabilities to force change on all projects are very limited (eg BLP) (see below), and that offering options if they wanted to improve this or that aspect was more respectful and likely to obtain higher uptake. He noted that "all projects should in the end have the same quality goals, but they may be in different project phases and therefore need different approaches".

WMF's abilities to force change and working in the "real world"
This was a significant point of taskforce philosophy.

Woodwalker felt WMF has only a very limited ability to force change on communities (see: BLP). FT2 agreed, expressing concern that some ideas would be "dead in the water" in any practical sense, and only certain things can be effectively shaped or altered.

Woodwalker commented he was "philosophizing for perfect world", but that it wasn't our problem if projects did (or were able to do in practice) with it "isn't up to us".

FT2 argued the taskforce's aim was to maximize effect, which meant designing for the real world of the projects, not a perfect world ("the criteria is 'what's most likely to deliver given these things' "). Anticipating or considering possible issues and practicalities that could affect likely productivity was part of the job:

"[I]t has to be a path that's got a good chance of people following it, otherwise it's pointless. So our optimum result might be categorized somewhat openly as the best path that has a good chance of enough people following it to make the necessary difference. Human nature, variety of views, and inertia, will ultimately limit what we can achieve in any given "bite" at the quality cherry. Best to respect there are limits on the achievable (although not giving in lightly), and see what's the most we can progress quality for this time."
Decision making

Woodwalker stated that paralysis in decision making within projects meant that new ways needed to be found, rather than just trying to modify old ways, but felt that ultimately frustration with 'red tape' would eventually force progress.

Sjc commented that "It's not just the amount and complexity of 'red tape', it is also the fact that 'red tape' is a tool which can be bent to purpose by all and sundry. In fact, a creative editor can make the red tape mean exactly what he or she wants it to mean. Red tape is in this respect more of a liability than an asset, where edit wars can be won by the editor most adept at bending the laws of reality to their own intent than others". He (Sjc) advocated removal of policy ("appears to be a fundamental enemy of content and quality of content") in favour of unhindered good faith editing.

(Sjc's essay on this and related points)
Summing up

Woodwalker closed the thread to date, commenting that:

  1. We've found that quality isn't easy to measure by simple metrics, perhaps it's impossible unless we would have some form of feedback from the reader.
    Piotrus states this isn't impossible. Rather, there are several different metrics to doing that, and what may be impossible is selecting the "best one".
    Woodwalker disagreed, noting that while subjective feedback helps, there is no clear way to quantize key goals like "completeness", "balance" or "structure"; even reverting can be for both good and bad reasons.
  2. We've come across some nice ideas how this feedback can be obtained. Important also is to know who gives the feedback (expert/interested person/school kid?).
  3. Piotrus suggests Wikiprojects can play a part and we need more; Bhneihouse suggests Wikipedia can only become a quality brand when there is a consistent basic level of quality across Wikimedia projects (I assume this is not just about Wikipedia); FT2 thinks our recommendations should be in the form of realistic suggestions likely to make the biggest positive effect on quality as they perpetuate, allowing for where the communities are today [as amended]. I suggest this feedback thing could become our second practical recommendation (after 1. creating more manuals/wizards).
Starting point

FT2 observed that quality has cost, (ie a cost-benefit issue). Perfect quality would restrict editorship so much as to make the project non-viable. So looking into their balance was important:

"[We can ask] how much quality should we be prepared to sacrifice, to allow what level of public editorship. We can stack the odds in a huge number of ways - filtering who can edit... educate and guide... mitigate the problems of bad editors..."
"[We can] accept problems now for benefits long term... change some ground rules that will be painful but necessary; discriminate quality areas where we have to guard ruthlessly from those we can't guard so well at this time...".

But the question remains:

  • "How much quality should we be prepared to sacrifice, to allow what level of public editorship?"

Or:

  • "What degree of cost to the community is "too much", ie, the point where we cannot seek more quality without undue harm in some other area?"
General discussion

Randomran felt we had to be ready to make tradeoffs, but questioned if there was inherently a conflict requiring tradeoffs or solutions that would not require tradeoffs. FT2 agreed that some answers would improve both, but expressed concern that if we "demand too much of [quality] without careful thought... other things come under strain".

Bhneihouse felt this was an over simplification, there can be several "goals", but only one "brand", and we need to ask what would help Wikipedia stay "genuine" and to get it where it is going. Then "maybe all these growing pains I believe I am seeing as people struggle with what needs to go and what needs to remain at Wikipedia, would cease being such a struggle".

Starting point
Warning: Long thread ahead!

Bhneihouse posted a description of her framework, which she titled "Barriers to Quality" (details).

Summary:

  1. Narrow demographics of communities probably leads to an imbalance in project coverage (both information demand and overall quality content)
  2. Quality users become disappointed and leave
    Rude behaviour of other users
    Technical, "political" and informational elitism may have a discouraging effect on less active or technically adept users, partly due to models of policy- and maintenance-driven approaches.
    Status is defined by quantity, not quality of contributions.
    Quantity and rapidity of posting dominates discussions, rather than the quality of points made
  3. Software toolset is limited, user-unfriendly and discouraging
  4. Quality of content erodes in a manner related to community size ("content erosion")

Positives related to Wikipedia culture:

  1. Everyone welcome
  2. Anyone can start anywhere
  3. No prior expertise needed, just an (encouraged) ability to put together good evidence
  4. Personal opinions trumped by facts and evidence
  5. Hard work or minimal participation valued
  6. No "sacred cows" in content areas
  7. Everyone online can benefit
  8. Seeks to "do good"

Negatives related to Wikipedia culture:

  1. Everyone is welcome even if badly behaved; hence problem users return, some users become adept at gaming the system
  2. Significant content can be wiped out, as if it didn't matter. If restored (questionable) that also takes time
  3. Technically skilled people can gain more control; technical and logical and "Picking apart arguments" can trump everything else
  4. Focus on value of contributions rather than "just being part of the community" (and conversely users who don't contribute are not part of the community)
  5. Award system rewards and recognition needed to "move up in the chain of responsibility" (however recognition is often not provided via a balanced objective or consensus approach)
  6. Some disruptive behaviours are allowed or hard to eradicate - infighting, flaming, vandalism, reactive responses, picking apart arguments without personally checking facts, forcing users to defend knowledge (often against people with limited knowledge or an agenda)
  7. Civility issues have a number of flavours where the user is civil but..... (tendentious, intentionally or recklessly makes untruthful statements, wants Wikipedia to be something it isn't, seeks control by exhausting others, "travels with a pack of like-minded editors who can overpower an article or discussion")
  8. Even if the problem editor is addressed the harm they have done to the community (other users) or content is hard to repair

Toolset issues:

  1. Poorly suited for the job
    • Clunky, linear, coding involved, not user friendly, hard for newcomers, impedes contribution
  2. Limited
    • Media linked across projects, unreasonably constrained, cannot replicate other websites
  3. All communication requires writing
    • Slow and cumbersome, omits other communications (body language, expressions, voice tone, etc) -- however does ensure documentation.

Information issues:

  1. Not well rounded by topic or language
    • How should coverage vary by language, should all languages have the same kind of coverage, how to determine focus areas, known to be big gaps in coverage, but how to identify these?
  2. How does Wikipedia ask its users what else they need (or want) on Wikipedia
    • Users may have ideas they lack skill or knowledge to create pages for
    • How to reach a younger audience and get them involved in creation/ideas?

Overall recommendations:

  1. Wikipedia is at a crossroads
    • Needs to decide its brand (and what it's doing, and how it's doing it), take a "hard look" at how it's been operating and what does or doesn't work.
    • Consistent with that, all policies and approaches must be brought into line with that brand (not just lip service), and Wikipedia needs to lead by example.
  2. To be a world encyclopedia means reaching people who are offline, gaining a more diverse contributor base, and being more balanced and rounded in information coverage
    • May involve furnishing computers to people and areas who can't get access
  3. Younger users need to be brought into the contributing circle
    • Minors (children) need knowledge (eg for quality of life, self-betterment, education, etc). Early user involvement teaches them how to use and contribute, they can master the learning curve early, will naturally help and show others how to benefit (eg older people). Also provides safe ways to learn valuable life-skills.
  4. Wikipedia needs to show by example
    • "Gold standard of collaboration"
    • Rules and policies need to be forged that make it a safe place to collaborate (and learn how to collaborate) but also improves overall culture via knowledge and skills
General discussion - scope and remit of Quality Taskforce

FT2 felt this explained Bhneihouse's outlook better, and observed that the umbrella topic of "Quality" split into overlapping areas of external (reader/researcher) quality, internal (editor) quality, perception of quality (what impacts the perception of quality by readers and the media), project quality (overall quality of the project), and objectively what we feel quality should cover. He understood the core remit of the team as "quality of content". A wider focus might be difficult to cover in the time allowed.

Bhneihouse liked the idea of catgorizing quality but felt the overall framework was important; no formal statement said only to look at one part of quality. [We asked about this, see below]. She felt that a framework like this was crucial:

"I want to address what is standing in the way of making Wikipedia what it wants to be because that is the ultimate quality issue.... Then I would like for us to craft a set of recommendations... craft specific "fixes" for problem areas, for example: identifying what teenage girls want on Wikipedia, what kind of tools they might want/need to utilize and crafting a Wikipedia in the Schools program to expose kids to engaging in learning on Wikipedia."
"All of this is going to drive new inroads... For example: while rules and policies need to be consistent, there may be a different mentoring/rule approach to teen centered content or teen users/contributors. They may get a bit more oversight until they get the hang of how they should appropriately contribute.
For self identified technophobes, we may have a mentoring/wizard approach that allows them to do very little other than type and choose something graphically, and those people may need editors who look at the content/wikis for different things such as obvious newbie non tech types of mistakes. [This is] how defining who and what you are, and what your purpose is drives actions... it is those actions that create or negate quality on Wikipedia."
Clarification of taskforce focus

FT2 felt if this degree of uncertainty existed, there was an urgent need to check the remit with with the strategy team. He distinguished quality of content as a sub-topic within quality of project, and suggested these should not be conflated; perhaps setting up a second taskforce to look at the other was best.

Woodwalker concurred in requesting clear guidance on this point.

On asking (email) The taskforce was informed that content quality was intended to be the core focus. However this did not preclude looking at other aspects which was encouraged as well.

General discussion

Sue Gardner strongly supported this thread.
Randomran felt it was "fantastic".
FT2 suggested he might try to do similar for "quality of content".
Bhneihouse suggested a "quality related to wiki-culture" subtopic as well ("addressing making the culture more amenable to new users and offsetting its deficits"), noting all of these would dovetail into quality in the end.
FT2 wrote up a summary of content quality at Task force/Wikipedia Quality/Content quality (discussed separately).

Woodwalker commented on Bhneihouse's final recommendations, agreeing with almost all, with a question over "providing computers to people lacking them" which was a nice ideal but probably untenable for serious impact as a current target.

Bhneihouse responded with further comments (details). Notable points:

  • "This is a long term project. There is a lot of information in the world. The Guttenberg project has something similar going on with literature and from seeing their results I know that this is doable. Perhaps we need to section topic areas off and try to fill in a bit of each and then more of each and then…until those topic areas are better populated. I think this is an area worth discussion - how to approach more comprehensive content - as appropriate for a quality team. Breadth of content creates better quality."
  • "[I want] Wikipedia to be [a] host for working groups for school and university kids. Why not a Wikipedia style "wave"? and Wiki groups? (of course once we get more basic issues under better control.)"
  • "Some of my prof[essor]s won't allow us to discuss our grades on a test or a paper for 24 hours so we really think about what it is they have "said" and what we really want to say ... People who regularly can be shown to not actually read content yet react to it can eventually get a negative rating score of some sort...
  • Teaching people the power of forgiveness, of flexibility, of all the tools they need to be good collaborators is truly KEY if Wikipedia is going to exist in ten years. This how we become the gold standard - by teaching. If knowledge is power then those who teach are truly great. We teach others what we know about how to work together. We figure it out, we define it, we create wizards, we create videos, we do whatever it is we need to do to ensure that people can learn what it is they need to know to participate. And we remove barriers to participation."
  • "[T]his is why Wikipedia is powerful. I think it may be possible to challenge the existing community to be teachers. I think there may be a myriad of ways to entice them to play by rules not of their own making to create a just result. We just haven't explored them all. So while we are adding to the community, we can also heal the community. Each of us has a teacher inside of us".

Woodwalker commented:

  • Lists could be made of missing material for specific areas of expertise. Some knowledge is universal. This would help smaller projects and guide new users where content would be useful.
  • Disagrees with blocking as a norm unless there is clear bad intent. Prefers deal-making ("don't do X or this will happen"): "Most of these well-intending problematic users can become good contributors when they just have a little less freedom/choice where and when to edit. The most surprising aspect to me is that many of them were afterwards thankful".
  • "Editing Wikipedia shouldn't be a matter of simply signing in and then having total freedom, it should be a learning process that only begins with signing in. I understand this doesn't seem compatible with the liberal ideals and mentality of the founders and many of the current incrowd of Wikipedia, yet I think this is a misconception. I think guiding contributors to more constructive ways is in a way even more compatible with our ideals than the total freedom to behave, edit and do as you like we have now."
  • "[Everyone agrees] something has to be done about behaviour. The question is mainly how to convince the communities and how to make a recommendation that will have impact."

Piotrus noted the point about " disruptive behaviours being allowed", adding that there is an increasingly poisonous atmosphere due to failue to effectively address flaming, harassment, etc.

Accepting there is a need for substantive change

Bhneihouse felt that:

"[T]he community itself needs to radically change, it needs to diversify, it needs to feel empowered to police itself and it needs to have policies and guidelines in place that support the community members in acting correctly in the first place and empowers them to police themselves... it is tricky when the body that is "voting" is the one causing the problems... it was a small group that originally got [Wikipedia] started. If something external needed to rein it in and reshape it in order for it to become what it was originally intended to be, that might not be a bad thing".
"[T]he demographics I see at Wikipedia are a bit like an implosion waiting to happen. If something doesn't change, Wikipedia has the potential to be the best idea with the worst implementation people ever heard of. Sometimes, we give up some of our "freedoms" i.e. the freedom for every user to have a say in every decision, in order to get what we wanted in the first place.
I fear that many Wikipedians aren't willing to engage in trade offs.... All of us have seen people with the wrong mindset destroy projects with great potential. We have a choice. We figure out how to fix it and make it better for a lot of other people, in this case, billions of people; or we say it's too difficult to make it work".

FT2 supported this view. Woodwalker felt it was "too pessimistic" and that rating talk edits could "probably" solve conduct issues by social pressure.

User conduct

Bhneihouse drew attention to comments about IP editing and similar pages, to shed light on past proposals tried or rejected.

Woodwalker stated he was saddened due to a current wiki-dispute:

"[W]hat I can't stand is people being too stupid to understand what a discussion is about yet contributing in those discussions in an often aggressive, personally directed and provocative way. I can't stand to be framed every time, being called names even after leaving a discussion... I think it's the idiots that can't communicate in a constructive way we should worry about, not the IPs. As I said, vandalism is probably not the worst problem for quality, the worst problem is wide-spread ignorance combined with self-over-estimation among the most active users."

Randomran felt there were too many users who were "opinionated AND[/OR] self-serving", noting "there's no downside to being opinionated and self-serving so long as you can find a few passive aggressive people to support and protect you".

Bhneihouse felt that including wider demographics would cause more people with "good" motivations to join and that "empowering [administrators] to be vocal, to teach, to ban on actionable offenses, etc. will start to change the culture to 'this behavior is not tolerated'."

Randomran noted the actionable issues were fine, it was the unactionable offenses (passive agrgressive, claims a user was not attacking but stating a firm point, ongoing borderline behaviour that doesn't formally cross a line, "fellow travelling team" support) that are the problem.

Bhneihouse felt insults should be met by a warning and a "respect" culture implemented ("I didn't cross the line" doesn't carry weight if disrespectful). Supporters of poor conduct should face the same result as the conduct they sponsored. Users who don't wish to be part of a well behaved culture get told their access will be moderated.

"In a short time Wikipedia can go from a community where many seem to not respect anyone to a community where people know there are rules and policies they must abide by... Perhaps it is time to stop supporting anarchy and time to make the community responsible to itself. After all, what good is a limitless encyclopedia that supports an online space where users are abusive to each other?".

Randomran felt these were " strong recommendations [with] the potential to work". But he warned that it would be harder in practice than theory:

"[Y]ou have to keep in mind, they usually have 3 or 4 members who do the dirty work, and 15 members who have solid reputations to back them up. They're only human. Sometimes we forgive someone who fights hard for something we believe in, and get angry at someone who politely pushes for something we are against. So the problem isn't that 20 people are all disruptive, but that 3 disruptive people can be forgiven because there are enough good people who kind of appreciate what they're doing."

Philippe noted the Community Health taskforce reached convergent views, a sign (per Bhneihouse) that there was probably validity "brewing".

Woodwalker questioned whether "true believers" (POV warriors) meant the notion of assuming good faith was bankrupt. He concluded it was a vital point, and should be backed up with a more strict principle that such people should be topic banned immediately. They aren't because (he says):

"POV is more difficult to remove than vandalism. It requires more time to analyse. We're all volunteers, no admin wants to spend time on something that is likely to be unrewarding, unpopular or tiresome.
We have been talking about having a "senior editor status". I think it would help, if true believers have no access to or will be stripped of this status. It sounds harsh and elitist, but I think in the end better distinction between user types will raise quality."

Bhneihouse agreed, and noted that it was POV warring that was "elitist" here, because such users presume they have better knowledge than anyone else. "Wikipedia [should] start to quickly identify problem users and hold a hard line about acceptable and unacceptable behavior. It is not that not everyone can play, it's that everyone can play as long as they play nicely."

Randomran noted there were practical issues. POV warriors believe they are doing right, and if a POV warrior and another user argue, "who is the true believer? The one who insists that Wikipedia needs to allow original research? Or the one who insists on reverting and removing original research? A really stubborn editor is going to get a really stubborn response... so which stubborn person are you going to ban? I'm being devil's advocate. But I have direct experience with these kinds of editors... you can't underestimate how hard it is find a solution."

Bhneihouse commented (long post) on these various points, noting agreement on most of them, and adding the following:

  • Lack of explicit statement of Wikipedia's brand permits POV warriors to assert they are doing right
  • Banning users who do not follow rules is fair provided rules are equitable and fairly applied
  • "The key here is to keep "testing" policies/rules/procedures against the brand... I believe that Wikipedia has ideas and beliefs that it follows. I want to see... not some hollow rhetoric about how cool it is to build a comprehensive online encyclopedia, but a real, living, breathing brand statement that encompasses all the values that Wikipedia has or should have".
  • Meanwhile, let's do some short term work on behavioral problems... by putting in places rules/policies/procedures that are consistent with what the brand SEEMS to be. That way, we attack the problem top down and bottom up. (Bhneihouse states this view is via FT2)
Behavioral problems

Philippe asked Bhneihouse how she thought such policies might be put in place.

Bhneihouse proposed the following:

  • Distill all policies to 6 principles/behaviours that are short, memorable, enforceable and widely publicized (disrespect, unfounded original cites, etc)
  • Empower all admins to work on a 3 warning basis (warn, caution, last chance) followed by a block with 2nd chance only if appropriate. Fair even application to all would be critical, no variance or wavering. Those who can't or won't enforce it should not be admins.
  • Content is created by discussion page first. Once the basics of the content are covered an actual article can be created, which would ensure that content pages are verifiable and neutral from the start. (As opposed to create the article first then consider if it's good quality and who will fix it after)
  • By doing this, everyone learns to collaborate, there are clear rules, a loose "team" and some trust exists, articles get created once there is some proposed content and some basic consensus/review, the teamwork will itself fight POV warring somewhat, there is a shift from just contributing, to teaching/educating/mentoring, making the project more collaborative and less competitive.
Taskforce focus

Philippe commented that proposals which would be rejected by communities or otherwise impractical would be wasted. There are already thousands of admins and 100k's of users, who use policies they have developed daily. Their independence of policy making was already part of the mores or embedded philosophies of the communities (and hence brand), as was non-editorial policy making by the Board and its staff. These needed to be honoured. If changing that principle was going to be a taskforce recommendation then so be it (and fine), but in any event, any proposed changes to communal policies would need a clear and communally acceptable migration path. It could not start from a purely "zero" basis.

Woodwalker felt that as all major wikis had their own self-developed policies in place (and perhaps different main issues), the 2 - 4 recommendations should be proposals likely to significantly benefit all projects. But they should not be too general either.

Bhneihouse stated (to Philippe) that the direction things needed to go included thorough brainstorming, a "time consuming process". If the end direction was not understood then the value of taskforce members contributing became questionable. There would be many non-actionable ideas before the best ideas emerged, and "It takes guts to stand up and persuade people that what you are suggesting for them is right for them. It also takes grounding in the background tradition or brand". She requested review of the outline she had proposed (previous section) to see if they made sense, seemed reasonable, were worth auctioning, seemed to reflect the brand, and could be developed as a migration path. There were already people who didn't enjoy the current culture or couldn't edit well in it; so it was questionable whether major changes would significantly increase the "issues" in any practical sense:

"The people who care about what Wikipedia is and what it was meant to be may be smarter than you think when it comes to understanding enforcement of existing policies. The people who will likely rebel the loudest are the people who are vandalizing and spreading non neutral POV.
It's not like this is a free speech area where rights are respected. They cannot yell about Wikipedia denying them free speech - that is not a right on Wikipedia due to its mandate as an encyclopedia. So are you saying that Wikipedia is worried that people will say "Oh, no, you cannot curb my right to do X?" Wikipedia already does, it's just that nobody enforces it.
Personally I do not think admins enforcing rules in a volunteer community with a shared purpose when the rules make it easier to do your "job" will create a rebellion. I think, as I said above, that Wikipedians are already rebelling. The Wikipedia brand doesn't say "we won't have any rules." The brand says "we will be the best, most comprehensive online encyclopedia"."

She felt more time was needed. "It took Wikipedia how long to get into this mess? Can't we have a little more time to figure out how to get Wikipedia out of it?"

Philippe noted that funding and deadlines were budget driven, and due to these financial constraints the original hard limit on timing (mid-January) existed for recommendations.

Wikipedia has a strong anti-authoritarian community component

Randomran states to Bhneihouse:

"I'm actually really sympathetic to your vision. But as someone who has tried to fix policy and fix the culture, I think you have to recognize just how hostile that Wikipedia has been towards authority, and it's a more than a significant minority. There are already numerous people who feel that we shouldn't have administrators, let alone arbcom... or at least that we shouldn't entrust them with very much power.
Arbcom is absolutely terrified of tackling any content or policy issue, aside from enforcing the behavioral norms that already exist. And they're elected by the Wikipedians! Can you imagine their reaction if a few random Wikipedians on an unelected task force got to change the rules to whatever they thought would work? ... For better or for worse, it's a non-starter. (And I honestly believe it's for the worse.)
So knowing that the trustees are not going to make Wikipedia a markedly more authoritarian place... what can we get the trustees to do? Maybe empower some administrators? Create some processes that will let the community settle issues more effectively, so that true believers will have a harder time obstructing the process? Change the organizational structure a little? We have to take a surgical approach. Can we achieve a big impact with something small and strategic?"

Philippe concurred. The suggestions were close to administrators current work already. He suggested he could walk other taskforce members around a typical administrators workload so they could see for themselves what was done and what issues were coming up, to allow a shared understanding of the role: "I'm not 100% sure that you're operating off a totally informed viewpoint about what administrators do, because I'm not sure you've walked the mile in their shoes"

Bhneihouse's post before leaving

Bhneihouse wrote a post expressing her frustrations and views on the overall strategy process,in which she stated she was done and could be blocked or banned if needed. (link):

  1. No comprehensive induction material/URLs, poor collaborative tools, poor management, disconnect in relation to timeline, seeking hard recommendations shortly after commencement, "people who know how to nurture" and to "provide everything we need" missing, team demands to work in a hurry dictated by budgetary requirements.
  2. Wikimedia is adrift, needing people willing to make tough decisions to ensure the idea stays workable, and admission of mistakes by existing participants where required.
  3. "If Wikipedia really wants to be the best online encyclopedia it may just have to suck it up and admit that it needs rules and regulations and policies and procedures to ensure a liveable workspace.... [Y]ou go anywhere else offline and psychologists and psychiatrists will tell you that people need boundaries... veritable "home rule" at Wikipedia is [not] consistent with its brand or its mission".
  4. Proposal - WMF should work with a brand consultancy agency to "assess, qualify and state its brand" (noting this may take time as there are "millions of stakeholders"), then create strategy and engage change consultants to implement the brand across its projects.
  • The needed work cannot be done in 60 days, especially with a newly formed team figuring out the basics.

Randomran agreed the process could be frustrating at times but that "we have to do the best that we can". Wikipedia had become to tolerant of problems and was hindered by poor boundary setting, but excessive new boundaries were not perhaps the answer. He gave this example:

"If you want to program a flock of mechanical birds to fly in a realistic way, it's tempting to get really complicated. Birds must follow a leader. To avoid everyone crashing, you create a hierarchy. Bird one takes the lead, bird two and three follow close behind, until you create a flying V. They must follow the trajectory. They're not allowed to stray too far. To create realism, randomize the velocities and distances within a tight range. Randomly swap their places on occasion.
But one thing computer scientists figured out was that you could actually achieve flocking behavior with a few simple rules: steer towards the average heading of your neighbors, and avoid crowding. Somehow, these two rules manage to create a very realistic flock of birds.
I think the lesson is that yes, task forces will need to produce new rules. But can we find just a few rules that have a huge impact?"

He concluded that " we can empower the community to solve its own problems" (a "surgical approach"):

"Something has gone wrong with our community's processes. Maybe the processes didn't scale very well to the explosion in volunteers, or maybe people have found new ways to abuse those processes. But either way, the processes broke down, and the community could no longer adapt to new problems. I think the most effective thing we can do is fix the processes so that the community can adapt once again. We don't need to direct the community's evolution. We simply need to remove the obstacles that are preventing the community from evolving on its own".

Bhneihouse commented:

"Then you are saying that the brand of Wikipedia is: "the largest most comprehensive online encyclopedia built by a self governing, self correcting community"
Up to now, Wikipedia has said: "the largest most comprehensive online encyclopedia" and mention of the community has been omitted. Do you see how knowing your brand drives your actions?
I care about doing it right. And if doing it right is distilling all of this down to the two rules that will get the flock to "fly right" then I believe that the task forces should have the time to figure those two rules out. I also believe that if, across task forces, we are all coming up with that the solution to quality content is that everyone needs to "fly right" that it is foolish to continue to focus on "quality content" as a subset and force the "answers" down that path. Instead, the different groups should shift their focus to figuring out the two rules to get the community to "fly right"."

Randomran asked reflectively, how you get a "giant" to move: "Very slowly, and only if you can trick them into believing that's where they were already planning on going".
Bhneihouse responded "[If I can take] five months to do actual research (original and secondary) prior to making recommendations of how to and why to move a giant of about 50,000 people, then why wouldn't Wikipedia give more than two months to move its giant of millions of people?"

Disruptive users again

Randomran noted that we would want to spend time on research, but that in practical terms, "under the circumstances, we can only do our best". He notes two key problems:

  • A lot of issues are grey. Example: - "Disruptive [users] can survive on Wikilawyering and the support of a good cabal. Someone relies heavily on a company's press releases to write an article that promotes their product. Someone tries to delete it as original research, and as an advert. They respond that "the information is verified, and it's written in a neutral tone instead of an advert." All hell breaks loose over an issue like that. Have they crossed the line?"
  • Policies are descriptive not prescriptive. They can be rewritten by anyone who can obtain consensus; they aren't written in stone or entrenched by Wikipedia's founders. So there is a difficult and very uncertain line between a user being "opinionated" and "disruptive", which is not conducive to deciding when to take administrator action. "Is it fair to exclude someone because they disagree with a policy the way that it is now? What if they fight that policy tooth and nail, and swear up and down that we should "ignore all rules" because they truly believe they are enhancing the encyclopedia? Are they being disruptive?..."

Woodwalker feels this is always evident in behaviour. "when discussions follow the form of intelligent inquiry, consensus will eventually always shift towards higher quality (more neutral, more balanced). It's the contributors that prevent intelligent inquiry (by being rude, by editing against the consensus, etc) that form the real barrier. I am opinionated in many subjects, so are all of us. Yet I keep that in mind when I edit, or simply don't edit the subjects I think I'm not neutral in".

Randomran felt the grey area was a major factor. For example, "a new user who thought something was wrong or untrue on-wiki and said "stop pushing lies" or "I'm an expert, you need to step aside and listen to me" - would that be that offensive? What if a user is one of many people who support a given stance, so that when an admin says a point is "original research" or otherwise insufficient quality he is overwhelmed by furious users?" He commented that at present, anything but obvious stuff is very difficult to address because of these types of issues. In such cases there are often many users who will claim it's right, or "fair game".
Philippe concurred.

Woodwalker felt that Bhneihouse's "brand" was the similar to his term "factors of quality". He felt that "Edits that kill neutrality and/or balance are by far worse [than vandalism, which we block for anyway], because they are so difficult to recognize. We should be prepared to be at least as punitive in such cases. That said, I'm always in favour of second chances, and third, as long as the user shows to have gained insight in why his behaviour was wrong". Bhneihouse concurred.

End note - communication

MissionInn.Jim pointed to the Wikipedia article on "communication" and specifically to the research showing that some 93% of communication was non-verbal, which "partly explains why there is so much discord on Wikipedia": "People read too much or too little into what is being said, or they just don't grasp intended subtleties because the reader applies their own emphasis on words and interpretation based on their own state of mind. The writer's vocal and physical cues are cut out of the conversation. Misinterpretation quickly escalates into arguments and negative behavior. I find it is critical that I divorce myself from all emotion whenever I write or read something on Wikipedia...."

Woodwalker concurred, noting when Wikipedia stops being "a nice hobby" people leave or take a break. "if we really want to be able to raise quality to a higher level than the ruder part of our contributors is able to understand, we have to somehow get over the barrier of discussions being decided by ignorance and self-over-estimation".

Eekim (Eugene, Strategic Planning team) stated he was pleased with the general directions (30 November) and advised using the wiki structure and other pages under development to assist the team.

Pages noted include:

Starting point

Bhneihouse asked Philippe and Eugene (WMF Strategy team) to clarify the scope of "Quality" for the taskforce team. She proposed a Quality taskforce for the "big picture" (quality of project) and sub-taskforces for Content quality, or else to address quality of project as the main area of focus. "If we address only content quality in a vacuum we could end up building on a faulty foundation".

She had discussed with FT2 an approach that would create a skeleton framework that would provide multiple approaches simultaneously.

Bhneihouse continued that although the data was provided, she did not feel sufficiently guided in finding it. It paralleled other assumptions, that users would figure matters out themselves and tolerate a lot in doing so. She felt accommodations she needed were not provided, and that deep assumptions were not being recognized, including assumptions that prevented people seeing important questions. If that was so here, then "how will Wikipedia respond to finer points such as how a newbie needs to be treated in order to maximize the quality of their user experience and to break the barriers to their participation and contribution...?"

She asked:

"[I]s there any way to engage in this process with a clean slate... a way to effectively include people who are new to Wikipedia, who don't know where to find information, to prepare them, give them information, guide them (while respecting their acumen), and without them feeling like they are being put through torture to get something simple done? Is there an entirely new way to work... that leaves behind old unworkable assumptions, and that accommodates not only disabilities but differing working styles so each team member is able to be as effective as they are capable of being?"
General discussion

Woodwalker agreed that the taskforce could not self-limit by ignoring some entire aspects of quality, although focusing on aspects of content was not itself a problem: "Philippe/Eugene's answer is an extra reason to make content quality our main priority, but I believe we can't do that if we don't address the project/form/demand aspects of quality as well".

Bhneihouse stated she would review further statistical data and try to get some ideas about user expectation. She endorsed ideas by FT2 on "a choice of interfaces with a choosable level of help" that had been tentatively mentioned off-wiki.

Philippe confirmed "[the team's] mandate is to address the issue of content quality: that's the thing that the Board and our partners at Bridgespan and our internal staff have identified for this task force. You're welcome to address other issues as well, but we'd like that not to be at the expense of content quality. If you want to go in other directions, fine, but please make content quality your priority for the January deadline".

Bhneihouse stated that as "content" was not specifically stated in the original mandate, "we are free to make this task force what we feel is appropriate. Personally I am going to take advantage of the current state of anarchy to work in what I estimate is in everyone's best interest and keep the focus on big picture quality because without it, quality of content is IMPOSSIBLE. However, I am quite amenable to working side by side on content quality issues".

Philippe noted the mandate was a part of Priority 3, "Improve Quality Content", and referenced pages and data related to quality of content such as Improve quality content/Overview of Wikimedia projects and the content landscape, Improve quality content/Opportunities to improve core reference content and Improve quality content/Quality control and assurance deep dive. Content quality was a major issue and the taskforce was asked strongly ("implored") to work on that as its primary task.

Bhneihouse stated that users had been told they were "free to do as we see fit regarding quality" and asked how content could thrive in a poor general environment:

"Some of those assumptions may be so deep that people do not even realize they have those assumptions. Try changing your thought pattern to call the color blue the word "red" and you will have an idea of what a deep assumption looks and feels like. By their nature, they are things that are taken for granted as givens."

She suggested "Perhaps it is time to stop plugging quality of content, as many comments... keep citing bad behavior as inhibiting good content", and that "bad behavior may actually be the single largest factor inhibiting or blocking quality content", because it keeps well meaning users from contributing, obscures appropriate content (in her words "the truth"), interferes with Wikipedia's mandate (and the content itself), and prevents Wikipedia being a welcoming community.

She added (later) that "being able to be effective [in these circumstances] grinds to a halt... magnify that 100 or 1,000 times for a newbie who isnt used to sorting out complexities and digesting them quickly. How do we get THAT person to (comfortably) add quality content?... What about ordinary Joe's and Jane's who have something valuable to contribute? How do they have any hope of creating quality content if it's difficult for them to express their thoughts on here? Or if their level of frustration is so high that they just choose to NOT contribute...?"


Randomran noted Philippe was not saying to ignore behavioural issues, he was saying not to lose sight of the main goal of the taskforce. The Community Health taskforce had a lot of overlap on this, but he "would hate it if there was no task force that wasn't making quality their top priority". He noted that all these aspects were interconnected, but that "the value of giving us different focuses is that we might notice different things. If we all focus on the same issues equally, we limit the value of the process".

Philippe agreed:

"While... these things are interconnected, I really want to leave community health issues, as much as possible, to the community health task force... It doesn't make sense for every task force to emerge with exactly the same set of recommendations, all centered around community health. We recognize that community health is a problem. That's why there's a task force... working on it.[...] [P]lease monitor their stuff and know that they're working on it, and drawing almost exactly the same conclusions you are. My hope is that you can feel freed of that particular aspect and move on to the other things that affect quality of content."

Yaroslav Blanter noted that content quality concerns existed more or less similarly on all projects, but other issues such as hostile behaviour varied greatly and could not be this taskforces' main focus. He suggested focusing entirely on content quality issues.

Woodwalker stated he had written something similar elsewhere.

Starting point

Bhneihouse noted FT2's user page statement describing Wikipedia's ethos, and felt it captured succinctly a lot of " who and what Wikipedia is and how it does what it does", ie, Wikipedia's brand:


"Writing for an encyclopedia is not the same as writing for a newspaper, or even an academic paper. In a way, it's more like writing the bibliography for an academic paper. In a way, we aren't even trying to decide (as experts would) what is "true" and what isn't, because that's not what this is."

"We are summarizing a field, creating a balanced collation of multiple perspectives and views. There's few decisions to make, few opinions to form, other than to observe which views seem to be more or less relevant views of note, and to understand each (and its sources) well enough to document."

"We care that we document each view carefully and with understanding. That is the "truth" we work to here. That, and that alone. Our truth is the truth of the bibliography, and the measure is, have we represented collectively in summary the multiple verifiable sources of note. Drawing editorial conclusions from all of them is the end-use of an encyclopedia, not the work of encyclopedists (emphasis by Bhneihouse)."


This provides (she felt) a "good snapshot" of the brand, and a "really good place to start for a brand statement":

Brand characteristics:
  1. What we are
    1. Wikipedia is an encyclopedia
    2. Wikipedia is a bibliography or reference material
    3. Wikipedia is not a newspaper or an academic paper
  2. What we do
    1. We offer facts
    2. We do not form opinions
    3. We do not offer opinions
    4. We do not say who is right or who is wrong
    5. We summarize a field
    6. We collate multiple perspectives and views
    7. We pay attention to which views seem to be more or less relevant based on available research, and the quantity of this research that supports each point of view (POV)
    8. We verify sources and vet information as much as is possible/practicable
  3. What we believe
    1. We believe in educating, hence being an encyclopedia
    2. We believe that it is the user's role to draw editorial conclusions
    3. We believe it is not Wikipedia's job to draw editorial conclusions
    4. We believe in collectively representing in summary multiple verifiable sources of information
    5. We believe we cannot tell others what the truth is, that they have to figure that out themselves from the facts Wikipedia presents and any other available sources
  4. Who we are
    1. We are a loosely organized group of people around the world who use computers to help each other learn
    2. We are an encyclopedia that cares about fairness and equity
    3. We care that each view is carefully documented with a focus on ensuring each view is understandable
    4. We are not arbiters of truth
    5. We are providers of information

FT2 added a second analogy he uses (not on his user page), the map-territory relationship:

"The acid test of an article's balance, focus and coverage is whether a reasonably capable user would gain a balanced, informed, understanding of the topic. The article should map to its topic like a map to its territory."
General discussion

Brya felt it was a "great statement" and a pity most content was written from "an entirely different perspective".

Bhneihouse looked at a specific article in her field, with a view to considering recommendations for improving the quality of content.

She felt from this attempt, that Wikipedia needs an interface for new users, "I have realized how spread out the information is and how difficult it is to find much less understand all of it. All users, not just new users, need to constantly have access to what we are and how we work (and thus what is acceptable.)"

Philippe confirmed that new user wizards were also a likely recommendation of other taskforces too.

Starting point

DGG asked how editors might be able to access resources usually available at academic libraries and other subscription sites, to allow higher quality editing and sourcing of material.

The typical subscription model for these sites is that "[a]ll licenses are based on the existence of a limited and known body of users with discernible use expectations. Sometimes the number is very large, like the population of a city, but the assumption is made that a very small percentage of them will ever use it."

By contrast "Wikipedia is open to everyone, and all our resources need to be available equally; we have no pre-built class elite users that could form a delimited group"

DGG wondered whether:

"[users] could form such a group, with a reasonable number--in terms of negotiating a reasonable sum, I would try to keep within the range of a small college, 2 or 3 thousand. I can certainly see how to select our 2000 most active mainspace editors, but we'd need to be accessible to people writing their first articles also. My current thought is, the first 2000 people who apply. I think actually we might not have that many who would actually use it, but we'd need to cope with the many who would only think they would. Publisher willingness to do it at all would depend on their opinion of whether it would cannibalize their existing markets. I know some who will not consent for any reasonable sum of money; I know some who might, at least as a 1-year experiment."

But (he said), the situation for publishers was deteriorating, so they might be more willing to experiment or try new avenues.

He later added further information:

"There would be no caste formation if the arranged limit is not used., which is what I expect. Removing people from the list who do not use the resource is an entirely reasonable way of keeping the list open to newcomers. It might of course happen that 10,000 people would want to use these resources to write Wikipedia articles. This cannot be accommodated by my proposal, but if they actually did so, WP would be so much better off in multiple ways that we would well be able to find the money to renegotiate the contract; the publisher would be delighted to do so. All such contracts are renegotiated from year to year (or 2 or 3 -year periods) based on actual use. If the use goes up, more money is necessary. If it remains very low, the question is whether the resource is worth paying for at all. As for the legality or advisability of confidential negotiations about the cost, the entire industry works in this manner. Most librarians are unhappy with this, but almost all publishers insist. If people attempt to cover all problems, people never agrees on a contract. I imagine every publisher would see this as an experiment, and so would we."

He concluded that "if it turns out the WMF is seriously interested, please let me know privately. The entire possibility relies on the exact arrangements and costs being confidential. I am prepared to attempt to broker such arrangements and consult, but not engage in the routine work that would follow".

General discussion

Randomran felt the idea would be exceptionally valuable.

FT2 noted that allow certain classes of editor access via a WMF service to online database subscriptions had been proposed at one point, on the WikiEN mailing list. He suggested that we might have fixed criteria for agreed (unpaid) access, and paid access for users who did not meet the criteria.

Bhneihouse agreed it would be of "huge benefit to content" but was concerned that "if [Wikipedia] is free to anyone, regardless of resources, to participate, then [this idea] risks building or reinforcing an elite who has access to other resources that many cannot afford [and forming a kind of club].[...] A club is that which includes but also excludes. I think that is a very dangerous slope for Wikipedia to stand in or to stand near... As soon as there are "classes" of users in a "free" encyclopedia where "everyone can participate" you are destroying... Wikipedia's brand".

FT2 disagreed, noting that "everyone can edit" does not mean "everyone has equal access to all facilities" and highlighting many existing examples of this. He did not see an inherent problem in saying "we have made arrangements with these suppliers to provide access to sourcing material as a paid service to anyone meeting criteria X. We also make it freely available to Wikimedia editors who meet criteria Y".

DGG noted that the key for a publisher was

  1. A preset limit (key factor)
  2. The overall use (number of places available times the probability that any one of them will be used)
  3. Whether it will draw away the basis of other subscriptions (eg , if everyone at a particular college decided to get access by becoming active Wikipedia editors). He felt this was "unlikely".

Randomran felt a good answer was that it was a good idea, it seemed to find ways to satisfy all parties, but if there were issues a quota system could be used while experimenting with demand - remove people who don't use the facility, and keep 66% for "trusted users" and 33% for "our [other] most active editors", with any unused balance on a first come first served basis to others.

Woodwalker felt this was a "marvellous" idea.

Yaroslav Blanter was concerned that some would sign up just to gain free access to data, while on the other hand many had such access via their workplace. He asked how common it was that a non-academic needed academic access to write an article and whether existing methods would be able to address this.

DGG felt the number in pure sciences would not be high but the numbers seeking access on humanities, social sciences and newspapers might be significant. He estimated half the references on these were by users who had not been able to access the actual articles.

Brya felt most who would be able to understand and process their content would have access already, and others would include a high proportion who wanted them more as "ammunition". There would need to be an effective way to deselect editors, and this must not be popularity-based.

Woodwalker felt this was inaccurate. Many specialist contributors were (for example) retired researchers who no longer had access. As well, mal-citation was a "plague" (discussed elsewhere) and many users were citing based on abstracts only since that was all they could see. Allowing access to full data for some users (especially in conjunction with a guideline that academic papers should only be used by those who understand them), would go a long way to address this problem. Yaroslav Blanter felt this was a good idea but "probably impossible to implement".

Woodwalker was concerned that while collaboration can work well, the replacement of bad with good references is overshadowed by the addition of bad citations. He feared that "we will never be able to become more trustworthy and useful for academic usage, if we don't address this problem".

Randomran noted that the Wiki process handled most of these issues - even one user with access could field queries for co-editors on similar articles or in the same WikiProjects. He also felt the right tools could greatly help on this issue:

"We can make this even more powerful with the right tools. Using some sort of scholarly service would definitely help elevate the quality of sources, and settle disputes with less reputable sources. I always thought that Wikipedia should have its own internal search engine, which weeds out personal websites and other BS, and only searches peer reviewed sites with a reputation for fact checking. That would settle these issues quickly and effectively."

Others (Woodwalker, Yaroslav Blanter) broadly agreed with the idea of some kind of internal bibliographies.

Woodwalker added that as well as a bibilography system, project-wide WikiProjects could help by creating "needed articles" lists (coverage), and acting as a central discussion point where any user can ask questions about the WikiProject's field of interest.

Sue Gardner cross-posted from Clay Shirky's talk.

She stated that like other online hobbyist activities, four main factors probably motivate Wikipedia editing. They could be used as a framework for motivation:

  1. Autonomy (nobody assigned me to do it, I wanted to do it)
  2. Competence (I am good at it, and by practicing I get better, which is fun)
  3. Feedback (I get more useful feedback than before, which helps me improve faster, which makes me happy)
  4. Reputation/respect (I can show off, and be publicly rewarded/honoured for being competent)


Comments by others

FT2 and Piotrus "agreed completely".

'Ad-hoc'racies (or adhocracies) - groups that come together without pre-planning.

Piotrus asked "what makes adhocracies work well? And what destroys them? There are answers to those issues in existing literature, and we may be well-advised to read up a little on it".

Sue Gardner noted from our existing article, the characteristics of an adhocracy:

  • Organic - highly organic structure
  • Low formalization - little formalization of behavior
  • Specialization - job specialization based on formal training
  • Functionality for housekeeping but project based for "work" - a tendency to group the specialists in functional units for housekeeping purposes but to deploy them in small, market-based project teams to do their work
  • reliance on liaison devices - a reliance on liaison devices to encourage mutual adjustment, the key coordinating mechanism, within and between these teams
  • Low standardisation and role definition - low standardization of procedures, because they stifle innovation, roles are not clearly defined
  • selective decentralization
  • work organization rests on specialized teams
  • power-shifts to specialized teams
  • horizontal job specialization
  • high cost of communication (dramatically reduced in the Networked Age)
  • culture based on democratic and non-bureaucratic work
Starting point

Sue Gardner noted that the Board lacked precedent and role to mandate project-wide changes within communities. Nor was it well suited to evaluate community based proposals. Faced even with the best proposals, it could at most recommend them. It could not mandate adoption.

She suggested that:

"There is no group that currently exists, with the authority and the ability to mandate the kind of change your group is moving towards recommending. Because there is no body that is reasonably reflective of the full breadth of projects and languages, and would therefore would have the necessary credibility and moral authority.
This suggests to me that, rather than focusing your energy on the development of recommendations for new meta-level (cross-project, cross-language) policy changes (or maybe in addition to it)......... your group might better focus energy on developing a recommendation for a meta-level body that would have the necessary credibility and moral authority to mandate changes (or at least to strongly, confidently recommend them).
What would such a body look like? How would its membership be established? What level of "representation" would be required for it to be credible? How much "hard" authority would such a body ideally be granted - or should it just have the ability to recommend? What kind of support would it need, to do its job well? ... If your group sees a need for a meta-level body, I would be happy to carry that message to Movement Roles so we can support it."

She noted that this was the sole taskforce explicitly tasked to look at "quality", and that it should "mainly focus there".


General discussion

Piotrus (as an IP) stated that he was concerned that any proposals may end up pointless if few read them and none acted on them. He did not wish to see further bureaucracy, but rather, a simple statement (KISS) of recommendations in simple terms, heavily promoted. It should be "as useful as possible", which often correlated with KISS. But it should not be forced upon people.

Woodwalker stated he had reached a similar conclusion to Sue Gardner, that "we should form some kind of lobbying group/wiki-party pushing for quality improvement on a meta scale". This body should push for:

  1. A 'brand', containing the basic aspects of quality;
  2. Advertising, stimulation and education of these aspects of quality on as many Wikimedia projects as possible;
  3. More statistical research specifically directed at these aspects of quality and the proposals/systems which might best stimulate them;
  4. More guidance and teaching of new or inexperienced contributors;
  5. A technical addition that enables feedback for articles and talk page contributions;
  6. Admins to be more active against rude, demotivating behaviour.
  7. A 'senior editor' status to be created to give quality users more influence in wiki-politics and more authority in discussions;
  8. Measures that keep the not-understanding-but-wanting-to-comment types out of discussions;
  9. WikiProjects to function at a meta scale; apart from sharing specialist knowledge, such groups should create lists of the universal information that all Wikipedias should have.

Yaroslav Blanter added a note that these would not all be directed for auctioning at the same target audience.

Starting point

Yaroslav Blanter felt it would be useful to note the similar discussion ruwiki was having on content quality (ru:Википедия:Проект:Качество/Проблемы):

  1. Low quantity concerns generally, including those articles where lay-editors specialist knowledge to contribute - science, social sciences, humanities
  2. Sometimes even areas of general but niche interest are not covered in a satisfactory way (eg bios of Hungarian artists)
  3. Review articles are often sub-standard and non-existent (eg: "French music")
  4. Content erosion, for example once refereed, even FAs are prone to new users re-writing it, and there is no set approach if the user is uncooperative
  5. No guarantee that even uncontested articles will be "correct"
  6. Articles created by newcomers are often sub-standard
  7. Pictures are often not available
  8. Info on current or changing matters is not always up to date
  9. FA and GA are so difficult that many do not attempt them, which diminishes quality efforts.
General discussion

Woodwalker felt this mirrored his view on content erosion. He thought several of the points related to "project completeness" and wondered if there were any reasons the ruwiki community doubted "eventualism" (the philopsphy that eventually all content will be covered by someone or other). He also wondered if there was much translation to and from that language, and asked how Flagged revisions had impacted the wiki (size of community and quality of content).

Yaroslav Blanter stated that Flagged revisions was in a very simple state (no vandalism, categories, one internal link, and "a couple more requirements") which was very useful and saved "a ton of vandalism fighting". There had not been a noticeable increase in editors from this, (Yaroslav Blanter did not reference a loss of editors either). They have about 500 users who can flag articles and another 500 whose edits will be automatically flagged. As a project, they translate a lot of content from enwiki (a "standard second language for Russian speaking users") and a few from frwiki/dewiki. But they lack speakers in (for example) Japanese. A Meta project targeting this difficulty "in a systematic way" (ie more than top 1000 articles) would be nice.

Woodwalker suggested that in line with his post on content erosion solutions could include moving WikiProjects to Meta, searching for specialist writers on other projects to join these project-wide WikiProjects, and note the same issues exist elsewhere.

Yaroslav Blanter felt facilitation of translation (talk page and article) would be needed if the project-wide WikiProjects used English as a common language. Perhaps artiles could be refereed in some areas as well. He did not agree with "eventualism" as a reliable approach. Because in some areas only 100 - 500 people in the world might be able to write an article to reasonable quality, and likely none were editors or would ever do so. He suggested 3 possible solutions:

  1. To approach individual experts and ask them to create an article (top-down approach).
  2. To see when someone creates a stub and then approach to these experts and ask to referee the article and to help with building up the structure (bottom-up approach).
  3. To rise the Wikipedia status to such a way that creation of say an FA would be equated in status to publishing in Nature (the most prestigious scientific journal). Then people would do it themselves.

He feels the latter is most desirable, but "not very directly dependent on us".

Content quality framework (mirror for reference)

This is a copy of FT2's outline framework for "Content Quality", mirroring the "Barriers to Quality" summarized aearlier on this page, to provide an "all in one place" reference:

Focus What is the focus of a discussion about content quality, and, which aspects of content quality are 'important'
Metrics What metrics best reflect content quality
Issues What issues impede or assist content quality
Priorities What priorities must be addressed to maximize long term improvement in content quality
Articles The norms and expectations that apply to articles themselves (in-house style, use of links and templates, balance and neutrality, sources and cites, topic coverage, page layout, etc). Is any norm a significant concern, for quality.
Editor profile If certain types of editor are especially valuable to content, how do we attract and retain them
Editor diversity If editors tend to overrepresent certain demographics, how does this skew content?
Long term basis What decisions need to be made, to best underpin long term quality (for example, a self-sustaining community of good quality editors, steady long term improvement, etc)
Community If quality is dependent on fostering a vibrant healthy community, what recommendations for the community might be most significant in their effect on content quality (this overlaps the Community Health taskforce but may offer additional insights due to the different "angle")
Detrimental behavior If certain types of human interaction and agenda are detrimental to content quality, how do we reduce or minimize them
Structures and beliefs Do existing structures and beliefs best facilitate content quality, and are any 'sacred cows' impeding it.
External interfacing What aspects of external interfacing are relevant to content quality (for example user feedback and guidance)
Best compromises Where any of these inherently conflict, what payoffs and compromizes might be best and why
Leverage and feedback loops The effect of a single report on hundreds of thousands of editors (including future editors) is uncertain. What recommendations can be embodied into actions that will spark desirable feedback loops ("positive and negative feedback") or best leverage what we already have, and may create self-perpetuating pervasive changes to help us. Colloquially, what would maximize "bang per buck".

Other significant issues worth highlighting:

Retaining existing quality content Anyone can edit, editors change, and everything can be re-edited. Once we have good content, how can we best lock it in or perpetuate it? (ie, so-called "erosion")
Perception What does the public (and the media that informs it) consider important in regard to quality, as opposed to any reality of what is important. We need to take their views on what matters as a high priority.
Task coverage Not all quality-related tasks get the attention they need.