Task force/Offline/SJ Q&A

I asked SJ some questions (in italics), drawing on his experience with One Laptop Per Child. This is a slightly abridged transcript of the email giving his answers.

1. Do schools represent a good distribution point for Wikimedia content in general?

Definintely. Teachers love it in moderation [though when students get distracted by knowledge they sometimes drift from group class activities], students love the ability to explore useful information while studying; much better than dry textbooks that never let you find out about what really catches your fancy. Administrators like the idea of an encyclopedia for every class.

Definitely a great way to get teachers and students excited about the projects, and encyclopedias as a learning tool.

2a. Do you expect OLPC usage of Wikimedia content to grow in the next 2-5 years? If yes, then would it be by a lot, or a little?

It will at least double in the next year; out of the 1.2M students using OLPCs, we can confidently say we have 400,000 students using WP in spansih today, and will have another 400k by this time next year; and perhaps 100k using it in English. The practical use, beyond the number of people who have access to it, is also growing; rate unknown. right now WP is part of ~20 national curriculum modules in Peru and some supplementary activities in Uruguay and other countries; only in casual use in Rwanda, Nepal, Mexico and other countries.

2b. Do you (realistically!) expect OLPC to become a major method of distribution of Wikimedia content in developing countries, or will it be unimportant & small?

Inexpensive netbooks priced for smaller children and school districts are starting to dominate distribution to primary school students and families in poorer areas. Wikimedia should be directly engaging every major majufacturer; asus & acer have a supermajority of the commercial market in those regions, and have their own default content & software that ship with their builds.

OLPC is currently dominant in Latin America, but weaker in Asia. It has a major community of free content/software producers for children, so other players watch what OLPC does, and has close ties with Ed. Ministries -- both useful for ripple effects in second-adopters.

Outside the classroom & for read-only purposes, cell phones/smartphones will remain by far the most active distributor of WP and other knowledge.

3. What does OLPC want from the Wikipedia community, other WM projects and the WMF? What format would OLPC like the content to be in? XML? OpenZIM?

We're working to unify code libraries to use OpenZim.

It would be helpful for "everything" to be available in "a suite of formats". Everything should probably include full current-content dumps, select dumps (as defined by projects such as Schools WP & WikiBrowse), and select image dumps. The suite of formats should include XML and OpenZim, and perhaps raw HTML.

Having multiple formats available lets people who are playing with new code try it on someone else's testbed dataset, for direct comparison.

4. What indexing would you like to see? Categorization? What metadata do you think you need? (We're considering the UDC category system) Task_force/Offline/UDC_categorisation

Metadata: 'freshness', 'last reviewed revision', 'popularity', 'topical significance' (you might be able to get a crude measure of the latter from tools such as wikiosity)

5. What content would you like to be available for OLPC, besides Wikipedia? Wiktionary has been mentioned a lot - are there other things you'd like to see?

We currently ship

  • a few Wikibooks (completed WikiJunior and other low-literacy-friendly or multiple-language-friendly books)
  • a featured image collection (could be improved with better metadata and offline interface)
  • topical Wikislices (say, about chemistry and physics)
  • a dictionary (drawn from various free sources; not really from wiktionary, which wasn't in a convenient structure)

OLPC users have asked for things like

  • A snapshot of major historical source material for historical classics
  • A broader variety of wikibooks. Modules of popular existing books such as the "Light and Matter" physics collection or the UNDP-APDIP books, ready as a bookshelf for offline reading. (not all of these are even on wikibooks yet)
  • Topical WP slices for a hundred major topics
  • Multilingual dictionary content, with links to relevant images
  • Topical image collections (say, for biology, agriculture, machine shop)


    • Most OLPC requests are for en, fr, es.
    • Annual updates would suffice; that's all many people have patience for. Of course once the toolchain is in place that may not be the bottleneck.
    • Cross-linking between different wiki modules is sometimes

requested. "make [wiktionary] available as a resource to all other programs." "link module A to wikipedia articles if the wikipedia module is also available". This is something that Answers.com and others have integrated very well (for their own knowledge products) into windows desktops; something to consider for Wikimedia content in general when used offline. The solution isn't obvious, but this is definitely something we need to do to make material more useful and self-reinforcing offline.

6a. Given the space limitations on the XO, how do you expect to select the content? Will you want to include thumbnail pictures- which are obviously important for kids, but which which fill up lots of space - if yes, how would that be selected?

We select by popularity, in an interative process. This metric is crude and can be improved. Generally the most popular articles, or articles linked from a 'main page' by virtue of being Significant (even if not Popular) get a single thumbnail up to some space limit. A description is on the Wikibrowse page ([wiki.laptop.org/go/wikibrowse wiki.laptop.org/go/wikibrowse]).

6b. Would you have use for a collection of lede paragraphs on less important topics?

Yes. A microcosm of {wikipedia, wiktionary} entries with a million terms would make a useful standalone resource; and a tool that captured more or less of an article based on its importance would be most useful.

7. What do you think would be the space allocation on a OLPC computer by 2012?

Space limitations are dropping; however different audiences need different things, and the likelihood of people having the patience to download an update drops dramatically as module size grows. A selection of 100M modules is always preferable to making everyone use the same 500M module.

Assume 2G are available for wiki-content of all sorts in 2012.

8. Most Wikimedia content is aimed at adults, not kids. Do you have anyideas on how to make the content more "kid-friendly"?

Improve simple english! this was a common request by ESL speakers, some of whom found simple english very useful despite its quirks. Have a 'simple language' version of each article that gets updated, if more slowly, over time. Having this in languages other than english would be similarly useful.

Have a kids wikipedia in each language. Perhaps combined with the above; at any rate a safe space for kids to read and comment on articles.

9. How would you see the offline content being "updatable"? Could it be that the school might serve as an internet hub, where students could get a monthly update to their article collection? (I'm thinking of 2012 onwards, by which time the internet may be more available in remote areas.)

P2P updates once one person has an update are likely. The first problem to solve here: Make Incremental Updates Work.

10a. Related to that, it would be great to see OLPC users writing contributions for WP (and perhaps other projects too). I've heard you mention that before as a goal. How would this be done in practice?

Best solution: Offline editing, periodic synchronization.

Simple solution: Encourage large-group newbie editing. Find ways to identify blocks of new users, make sure they are welcomed by mentors and watchers rather than vandal-fighters and spelling zealots.

Support school projects to edit WP, especially about local topics.

Encourage mentorship of young editors; build guidelines that don't reflexively delete anything the reader hasn't heard of as NN. improve notability guidelines by making it easier to write about people/places/events that have no web presence for lack of connectivity.

10b. upload or edit content? How would the WP community coordinate this and handle contributions (including some vandalism or general silliness?) from kids from a very different culture? Would these students be able to contribute in a local language, to help grow these smaller language

Find mentors for each new community interested in solving that problem. don't dictate from on high, support the idea and encourage the growth of these networks.

Encourage local editing through contests and other high-profile events making editing a cool thing to do within a community at universities and schools; where you can find mentors for kids and others from that community in the future. [see the Kiswahili Wikipedia Challenge for an example I've been working on recently]

10c. projects? I must say, when I look at articles like http://en.wikipedia.org/wiki/Kiffa and http://en.wikipedia.org/wiki/N%C3%A9ma and I think what local people might be able to contribute, I think it's very exciting; however, you have to consider how we would handle "reliable sources" for such material. I've looked for information on towns in

We shouldn't hold articles about new and hard-to-source topics, where we want to expand, to the standards we use for commonly known topics with thousands of references. Perhaps a different article-page template indicating this is an article about a new and developing topic, is of extra interest but lessened verifiability; and could particularly use corroborating cites, sources, and edits.

Preventing those articles from starting is the wrong way to build global knowledge in such areas.

10d. interesting questions - but we need to consider this if we are not to alienate communities who see their contributions arbitrarily deleted by some college student in the US! Maybe we need a sort of "holding area" for such content, so it isn't lost even if deleted from WP?

Right. More than a holding area, I think -- a different set of categories, policies, templates, and monitors.

11a. Can you offer any insights on other methods for distribution of Wikimedia content? We are proposing to use cellphones as a principal method for distribution to adults across the developing world - do you

I think ultra-cheap books remain the best short-term way; we need much much better printing/publishing contacts.

Cell phones are already a popular method, so you can't go wrong there. However, that still privileges the rich. I like the idea of distribution on USB sticks -- that way you can get your own copy of WP for a few dollars and use it in any computer or internet cafe...

11b. In those places where there is no internet, and no OLPC project, do you think a set of 30,000 articles in book form (a la World Book) would be popular? Would it be feasible to use publishers & printers in the "target" country?

Yes and yes. Even where OLPC and the like exist, power is often hard to come by. You won't always be able to leave your shared phone on for an hour while reading. Print is the cheapest way to produce a readable copy of wikipedia for areas that have no reliable computing. India-style textbook printers that use newsprint could make a 30k-article encyclopedia for a few dollars in paper form, and it could be used at all times of the day and night unlike a USB key + 'net cafe. You'd have to use local printers, else the shipping cost would match production cost.

Thanks for the questions; be well; let me know if you have any followups!