Task force/Offline/IRC/2010-01-12

< Task force‎ | Offline‎ | IRC
< walkerma> jeblad - different meeting I think!  This is the Offline group
< hejko> hi all
< Amgine> <chuckles> We haven't really gotten going. But the kettle is going in my house too.
< wizzy> are we happy with recommendations 1 and 2 ?
< Amgine> I didn't look at the agenda yet... I wasn't wizzy, but I am now.
< Amgine> You know, I usually expect people to argue with me before we reach a compromise.
< wizzy> good - lets stick to 3 and 4 this meeting
< walkerma> I was wondering about no. 1.  I think the others are pretty non-technical, but no. 1 is full of technical stuff.
< Amgine> Did anyone look at my comments on #1?
< walkerma> Do we want to "tone that down" at all?  Or is it OK as is?  I was hoping we could get Philippe's feedback.  I've been at work this 
            morning and had little chance to review the very latest changes
-!- pm27_ [n=chatzill@91.68.208.8] has quit [Read error: 60 (Operation timed out)]
< Amgine> http://strategy.wikimedia.org/wiki/Talk:Task_force/Recommendations/Offline_1#Issues_with_strategy_section
< hejko> please give me a second to scan the comments
< walkerma> I don't understand a word of them, sorry!
< wizzy> My take is the only thing they (recipients of these docs) will really read are the questions and strategy
< walkerma> Actually that's untrue - I do get "Provide parsed semantically annotated XML rather of article text."
< hejko> I would erase the parser stuff, since it is wrong
< Amgine> I removed the 'rather' just now. Poor english.
< hejko> IMaybe 'd write: There should be a reference implementation of a parser
< wizzy> hejko: what parser stuff ?
< Amgine> <nod> I suggest "Write/publish a Mediawiki parser specification."
< hejko> Strategy 3. "A parser should be written ..."
< wizzy> from comments, or recommendations ?
< wizzy> well, it should
< Amgine> If we create a specification, anyone can write a parser.
< hejko> Amgine: No i think about providing a reference implementation to parse the dump. But the stuff below is abut a writer, not athe parser
< wizzy> ok
< hejko> wizzy: Recommendations
< Amgine> I'm not certain I understand "a reference implementation", so better we are sure what we're talking about before going further Hejko.
< wizzy> I am very keen we have *one* master dump, and generated output, including standard HTML
< hejko> Nonetheless, templates, redlinks are important and should be annotated/expanded in the dump
-!- Philippe|Wiki [n=Philippe@wikimedia/Philippe] has joined #wikimedia-strategy
< walkerma> I think we need to consider what is "strategy" (big picture), and what are "tactics"(details of how).  I think details of implementation 
            shouldn't be in the strategy, but could be moved lower down among the assertions where we can say "This could be done this way".  From 
            what I understand, we will be asked to come up with tactics after this stuff has been reviewed
< hejko> standard html is easy once there is a proper dump
< Philippe|Wiki> happy time-of-day everyone, and my apologies for being late.
< walkerma> I think the strategy part should be clear to an intelligent person who is not an expert in computer science
< wizzy> hejko: Amgine freaked me a while back, saying pictures had not been considered. As long as I get my HTML dump, I am pretty happy
< walkerma> Philippe - great!  I was wondering if you had some feedback on our four draft recommendations, before we make them final?
< hejko> reference implementation: a piece of software that parses the dump and gives examples how to extract data. e.g. written in python. other 
         developers could derive libraries for other languages.
< walkerma> We're discussing no. 1 right now.  Two things - Amgine's comments here:
< walkerma> http://strategy.wikimedia.org/wiki/Talk:Task_force/Recommendations/Offline_1#Issues_with_strategy_section
< Amgine> hejko: A specification is similar: it's the written requirements of the parser, so anyone may implement it in any language.
< Amgine> By avoiding any single language, it avoids language bias.
< Philippe|Wiki> walkerma: To be honest, they're all so far above my head that I can barely see the bottom of them.  I'm a facilitator, not a 
                 programmer.  With that in mind, my impression is that they're on target for expectations.  I encourage you to remain strategic and 
                 high level, and trust that the tactical details will follow.  (You'll probably be pulled into that phase as well.)
< wizzy> I would like a reference implementation
< Amgine> Wizzy: WMF devs will point at Mediawiki and say "there's a reference implementation, in PHP".
< hejko> the problem with the PHP parser is, that it is not really a parser. really!
< jeblad> at least keep it on a spec, not an implementation..
< hejko> that is why it is hard to derive a spec
< walkerma> And I made a more general point about strategy (big picture) vs tactics (details), but it's hard for us to know how technical to be.  If 
            Erik and Tim Starling are reviewing this stuff, great - but if it's you and Sue Gardner, maybe it's too technical and too detailed in 
            parts?
< Amgine> <nods> If there is a spec, the Devs will be forced to limit the ongoing accretions to the "parser"
< hejko> nonetheless the PHP code while not eing a parser could be modified to emit XML that is parsable by any XML parser
< Philippe|Wiki> walkerma: the next phase will include the Board and experts as needed.  The business planning piece (which would have to fund 
                 recommendations) will certainly not be done without engaging technical staff.
< walkerma> Hard for me to know, since I can't reliable discriminate technical big picture from technical details - but I like to ask the dumb 
            questions!  I know that technical people like to get into details (we chemists are just as bad!)
< hejko> I agree with Philippe|Wiki that we should avoid technical stuff but rather be strategic.
< Amgine> (For non-techies, the specification is like a set of laws that must be followed; other implementations can add more rules, but everyone 
          must follow the minimum set of laws.)
< Amgine> That way everyone will turn something into a link to the article something.
< wizzy> Amgine: Strategy 1 is my stab at it. If it should be writer as well, fine - but after all of this, I want to run a program/makefile over 
         the XML dump and get what I want.
< hejko> I think it is sufficient, if we assert that things are technically solvable
< walkerma> One danger if we get too specific, is if we say in our strategy statement: "We need to do X, and so we need A, B and C" they may turn 
            around and say "We don't want to do B and C, so we'll ignore this"
 * Philippe|Wiki nods at both hejko and walkerma 
< Philippe|Wiki> a broad assertion is what we're looking for.  From that we can figure out how to fund it to fix/improve/change things.
-!- pm27 [n=chatzill@80.125.173.60] has joined #wikimedia-strategy
< Amgine> That's the status quo, hejko. I think the specification is the hingepin to pretty much everything else we've discussed.
< walkerma> However, if we just say, "We need to do X" as our strategy statement, then in our assertions we say "This could be done using A, B and 
            C" they may say "Yes, but we want to use A, D and E"
< pm27> hello all
< wizzy> we need to be able to generate the output that BozMo and Kelson (and wizzy) want.
< Amgine> Hullo pm27
< Philippe|Wiki> walkerma: that's also possible.  The thing I want to make clear:  don't get hung up on the details.  Get me an overview.  Details 
                 are fine, but broad strokes are what i need today.
< walkerma> wizzy: I think that's our basic strategy in one sentence!
< walkerma> Hi pm27
< walkerma> We're reviewing http://strategy.wikimedia.org/wiki/Task_force/Recommendations/Offline
< wizzy> Amgine: can you fix 1, and we move on ? I don't want to get stuck again
< walkerma> So is no. 1 at the right level right now?
< pm27> humm yes i think we loose a very important point
< pm27> it s how to test the dump
< wizzy> walkerma: no - seems it is too specific, and detailed.
< pm27> before to send them in a torrent
< wizzy> pm27: suggestions ?
< pm27> i think it s to test them by the community
< hejko> another issue: we are always talking about dumps. but many re-users (as us) are querying the API to retrieve pages. so I would rewrite this 
         to: Provide WMF content as easily parseabl annotated XML (in dumps and the PAPI).
< hejko> API
-!- Philippe|Wiki changed the topic of #wikimedia-strategy to: Task force recommendations should be completed today (2010-01-09) ... 
    http://strategy.wikimedia.org/wiki/Task_force/Recommendations
< pm27> because a lot of people contact me about bad files : 
        http://blog.wikiwix.com/fr/2009/05/14/recherchons-beta-testeur-pour-okawix/comment-page-2/#comment-7269 ( see the comment 60 )
< pm27> sorry in 61
< pm27> I think it s very stupid for people to dowload about 20 G of data without the file could not open
< Amgine> hejko: agree, but I think that would be assumed. RoanKattouw would implement any article xmlparsing probably as the beta test.
< hejko> pm27: I'd hope that there would be automated tests that guarantee the validity of the output. but I'd not put that into our recommendation 
         as it is obvious SW development best practice.
< pm27> no hejko
< walkerma> pm27: Is the overall stategy OK? We need to submit our core strategy today, so we're supposed to be focussing on the "big picture"
 * Philippe|Wiki nods :)
< pm27> no problem walkerma , it was just my last suggestion :)
< Kelson> Quality check is a SW detail... pretty easy to do... I have a feature request about that: 
          http://sourceforge.net/tracker/?func=detail&aid=2901059&group_id=175508&atid=873518
< walkerma> wizzy: Can you rewrite the strategy as you think appropriate, then we can take a look afterwards?
< hejko> So do we skip #3 and all below from the strategy section?
< hejko> I'd also remove the "xml dumps could be parsed to include project-specific DTD extensions. Examples: " stuff.
< wizzy> Can we add that we want a reference implementation ?
< Amgine> Possible substitution for #3
< Amgine> # A parser specification should be published.
< Amgine> ## A non-Mediawiki reference implementation, maintained by WMF developers, might be an alternative.
< hejko> Amgine: which parser? the wiki text parser or the dump parser?
< Amgine> They are the same, in effect.
< hejko> if there is no real PHP parser for performance reasons, that would be no problem if parsable XML is provided
< hejko> Amgine: ?
< Amgine> How would they *produce* that xml without parsing the wiki syntax?
< hejko> Yes, but if there is an option get annotated XML the need for a wiki text spec is reduced.
< hejko> nobody would care as everybody simplly uses the easily parsable XML
< Amgine> Mmm, well, fewer people would care, but it would not encourage third-party development of uses for the data.
< Amgine> For example, if they aren't producing the syntax tags *I* need, I still end up needing to write a parser.
< wizzy> haven't we said we want XML, not wikitext ?
< hejko> True, but they should annotate everything in the first place.
< Amgine> They can't. There are too many possible re-interpretations of the data.
< hejko> So this should not happen. Otherwise you would contribute to the PHP to XML code.
< walkerma> OK, we're almost 40 minutes in, and we're still tossing this around - can we move the discussion to on-wiki?  I'd like us to cover all 
            four recommendations today, and we've spent much more time already on no. 1 than on the others!
< Amgine> Okay
< walkerma> Can wizzy try a rewrite, and then we'll all take a look afterwards?
< hejko> ok.
< wizzy> eek - what I want is there already
< wizzy> I want that
< walkerma> wizzy - do you mean you're happy with the wording as is?
< Amgine> hejko: if you an I can discuss the tech points on the talk page.
< wizzy> but I am not explaining it right, apparently, or am too detailed on implementation
< Amgine> The latter.
< Amgine> (imo)
< wizzy> I take hejko's point of writer vs parser, but otherwise it stands
< wizzy> I want all the narrowing stuff left in
< hejko> maybe we can take this to a subpage detailing stuff?
< walkerma> hejko: How would you rewrite it, to make sure we don't tie ourselves down to one specific line?  Or is it about right?
< jeblad> If you are to detailed on translating into xml _and_ replacing templates you run into troubles as this will be very difficult.. I think..
< wizzy> I say Amgine or hejko take a crack at it, on the main page
< walkerma> OK - shall we move on to no. 2?
< wizzy> yes
< Amgine> I'm fine with 2
< hejko> Do you know Sue Gardner? We should write it in a way that she can make sense of it?
< Amgine> I've chatted with her, but no I don't know her.
< Philippe|Wiki> Sue will call in experts if she doesn't understand something - but the key is this... keep it high level :)
< walkerma> hejko: Yes, exactly.  She's a journalist, used to be on Canadian TV.  SHe's smart, but I doubt she would understand more than I do
< Amgine> Mmm, she was in charge of the CBC website as well.
< hejko> My point is: She will ask the question: What are the effects/benefits and what does it cost?
< Amgine> <nods> cba is always the final arbiter.
< walkerma> Amgine - sorry, didn't know that!  If the main reviewers don't understand our core strategies, they may well set them aside - and this 
            is too important to ignore
< walkerma> What do people think about the cellphone recommendation?
< walkerma> No. 2?
< Philippe|Wiki> Amgine is correct: Sue had ownership of cbc.ca     So she's technical enough to know what she doesn't know, but smart enough to 
                 engage people who can explain it to her.
< jeblad> Point 2 on strategy, in Norway we have a company delivering Wp on SMS. Seems stupid as we hae a lot of internet on cell phones but it 
          still is BIG.
< wizzy> walkerma: fine by me - not sure how we support proprietary offline solutions except with a detailed XML spec
< Amgine> The cellphone recommendation is really important, but it is dependent on the parser issue.
< Amgine> <grin> Same thought, wizzy.
< walkerma> wizzy: That may be true, but we need to write things hierarchically, so the main point isn't lost
< jeblad> Its very big now, but its assumed to live on only for a few years
< hejko> I like #2 except that i'd omit Strategy #1 and move Strategy #2 after Strategy #4 and skip strategy #5
< walkerma> jeblad: Can you give us a URL for the Norwegian company's product?
< hejko> To have a big impact, deals to preload the data on cellphones is the key.
< wizzy> I am an engineer - I like #1 - we need results
< jeblad> Its a sms service, not a downloadable product
< hejko> wizzy: I am an engineer too.
< jeblad> but the company site is http://www.askadam.no
< hejko> the phone companies will supply funds to the developers once they are convinced to implement the strategy
< wizzy> walkerma: I am happy with recommendation 2, all strategies
< jeblad> There is also another that work on similar solutions http://www.1881.no
< walkerma> hejko: Do you like no. 2?  jeblad - we should add that in at a relevant place
< hejko> wizzy: what does it help if the technlogy is there, but there is no one who utilizes it?
< jeblad> Note that hybrid solutions are also very interesting, where a "compact" form of wp is downloaded offline and extra material is added 
          throgh the cell network
< hejko> walkerma: as I wrote before. delete Strategy #1 #5 and move #2 down
< wizzy> hejko: then we got it wrong. 
< walkerma> Sorry, I misunderstood you
< wizzy> I could drop #5 - I was just tying it back
< walkerma> Did you mean just drop "These dumps to be created per Recommendation 1."
< wizzy> yes
< hejko> we can keep #1 but "Convince networks & manufacturers to pre bundle WMF content@ is the key IMHO.
< Amgine> <nods> I think we can do so more easily if we can point to a company that has a standing product.
< wizzy> strategy #1 is the reference implementation from Recommendation 1
< jeblad> ..preferably in standardized format..
< jeblad> ah, one thing, rec 1 should say something about rewritten short intros to articles
< wizzy> hejko: I think third world is much less dependent on m/f and networks - just enable them to do it themselves
< walkerma> I think #1 (Support third party developers/providers of open offline storage standards..) is needed.  They provide the essential link 
            between the Wikipedia world and people like Nokia and Verizon - setting up formats, building connections between the corporate world and 
            the WP community.  Don't you think, pm27?!
< walkerma> I don't mind moving it further down the list if you think #3 should be first, though
< wizzy> jeblad: "rewritten short intros to articles" is in strategy #4, badly written, perhaps
< jeblad> oh, I didn't recognize it.. O_O
< hejko> wizzy: how do you get WP loaded on your ultra low cost phone in the 3rd world? either it is preloaded and you will reach or 3 billion users 
         or it is available and you get 30 MIO users at best.
< pm27> I don't see why Nokia or Verizon
< wizzy> on a smartphone, you put it on a micro-SD card and plug it in. Super-basic phones use SMS
< walkerma> hejko: Good point, hejko!
< jeblad> you use memory card or some local nework (bluetooth)
< jeblad> There is one opton that is intersting, broadcast messages
< jeblad> but I newer heard about anyone getting it to use for such purposes
< Philippe|Wiki> I'm going to have to step away to hit another meeting... i'll be back shortly. :)
< Amgine> The assumption of OEM preload assumes new generation phones; what is the ultra low cost new generation phone capacities.
-!- Philippe|Wiki is now known as Philippe|Away
< wizzy> jeblad: no offence, but its too late for musings
< hejko> Amgine: we don't know. but if we propose this strategy the WMF will find out how to implement it.
< wizzy> bluetooth can move quite a lot of data
< Amgine> Yes, I agree hejko.
< wizzy> yes, we should talk to the networks. But in India, they buy a knockoff phone with identical IEMI #s at the local market, and then get a SIM 
         - they hardly talk to the network or m/f
< walkerma> By all means give some of the likely implementations further down in the recommendation, in the assertions and sub assertions - they can 
            provide a lot of good support for our overall strategy.
< wizzy> something like 30% of their phones are grey market cheapies
< walkerma> wizzy: We'll know we've succeeded when THOSE phones come preloaded with Wikipedia...:)
< Amgine> Okay, I think #2 is getting beat to death by the choir. (We are mostly in agreement, so we are talking too much). What is the next topic?
< wizzy> they might but a pre-loaded SD card, or bluetooth it (with reader) from a friend
< wizzy> Amgine: schools. I didn't work that one over
< Amgine> Mmm, I looked at it, but not in-depth.
< wizzy> we need strategies - they are dressed up as assertions further down the page
< hejko> yes. currently it is "work with schools"
< hejko> they might want to, but how?
-!- GerardM- [n=chatzill@dhcp-077-250-053-164.chello.nl] has joined #wikimedia-strategy
< wizzy> I feel like a broken record, but I think an uncompressed HTML dump that can be put on a webserver is the best way
< walkerma> wizzy is right - I put the broad strategy in the start, so it's obvious, then I put the details of how to implement that lower down as 
            assertions
-!- lyzzy [n=lyzzy@wikipedia/Lyzzy] has joined #wikimedia-strategy
< walkerma> wizzy: By all means insert that, if it's not in already.  SJ and Kul like the USB key idea - we have room for both, and books also, I 
            think
-!- lyzzy [n=lyzzy@wikipedia/Lyzzy] has left #wikimedia-strategy []
< walkerma> Once we begin to work more with schools, we can adapt to their particular needs
< hejko> last time we talked about human proxies.
< hejko> there is no such recommendation.
< wizzy> books, targeted selections (science, math) 
< hejko> i think teachers need a central point where they can address their requirements.
< wizzy> hejko: we dropped offline editing - it is too hard
< walkerma> I need to pop down and switch off a variable temp NMR experiment, I'll be back soon.  wizzy is right - see the main talk page, Amgine 
            and wizzy both felt it was too hard for now
< walkerma> http://en.wikipedia.org/wiki/NMR_spectroscopy in case you wondered!
< hejko> ok, but we concluded that we have no exact knowledge which content delivery mechanism is required by schools. and they might benefit if 
         they had a point of contact.
< hejko> content & delivery mechanism
< wizzy> delivery is USB stick.
< Amgine> <nods> Okawix, for example, has both usb and downloadable.
< Amgine> m.wikipedia was suggesting an API which could create on-the-fly specialty dumps.
< hejko> Amgine: I like this idea
< Amgine> (I think that would likely be too server intensive, but possibly using cached versions for commonly requested formats)
< wizzy> especially for books - we forget how many pages 4000 articles is
< hejko> not directly for schools but for publishers of books or DVDs or USB sticks
< jeblad> what do you want to dump, and why?
< Amgine> Subscription/approved access?
< hejko> again I think it is essential to support the ecosystem of projects that plan to provide materials for schools
< Amgine> Right, good point jeblad, we're back to talking about implementation and not higher level recommendations.
< pm27> hejko: DVD is a very bad idea
< wizzy> jeblad: special dump for chemistry, or maths
< pm27> I have got 4000 DVD of wikipediaondvd.com
< pm27> with the old version
< hejko> brb
< jeblad> the system for producing pdf's are quite good, but the pdf rendering is bad
< walkerma> back
< wizzy> pm27: what was wrong ?
< pm27> a lot of confusion between my society and the WMF
< pm27> like how to communicate
< pm27> how to manage the reproduction
< wizzy> pm27: ??
< pm27> a lot of journalist want to reproduce the CD for them magazine
< wizzy> pm27: selling and distribution ?
< walkerma> wizzy: The collection was just a TEST release - only 2000 articles,  not enough for a really viable collection.  But it was critical in 
            showing that en:WP could produce an offline release
< pm27> the trademark was a very problematic
< hejko> back
< Amgine> ah, now I remember that pm27.
< walkerma> The USB sticks are much better, because you can always just produce as many as you are selling
< pm27> yes walkerma
< Amgine> But difficult to stick in the pages of a magazine.
< wizzy> Amgine: yes
< pm27> not really , it was just the trademark of logo
< pm27> no trademark is very better to distribuate
< walkerma> OK, so what's the verdict on recommendation no.3? Any big changes?  Small changes?
< hejko> IMHO whoever wants to distribute the content on a DVD should easily be able to. at least if distributing the content is our goal.
< wizzy> walkerma: I volunteer to rewrite the schools section - emphasis on book-friendly, otherwise it is the same as a regular dump ?
< Amgine> We need it rewritten quickly, though.
< Philippe|Away> just as a time check, folks, we have a meeting in this room in 35 minutes :)
< wizzy> Is there anything else special about schools ?
< Amgine> (has no real issues with the content.)
< walkerma> wizzy: Yes, and also explaining the emphasis on kids as users and teachers as facilitators
< hejko> I'd reformulate "Work through schools" to "Provide the necessary infrastructure to enable organizations to easily provide WMF content to 
         schools"
< walkerma> That emphasis is the reason for including this in part of a strategy
< wizzy> walkerma: what strategy changes around teachers as facilitators
< walkerma> hejko: I come from the "basic is better" school of writing - but if the consensus is for something longer, that's fine with me!
< Amgine> walkerma: Strunk?
< walkerma> wizzy: Things like needing to consider national curricular, more scholarly topics, etc
< walkerma> curricula
< hejko> walkerma: I think it is not the WMF who will provide books, DVDs USB sticks but other entities. And those should be supported in doing so.
< walkerma> hejko: Definitely.  Please rewrite if that's not clear!
< hejko> :)
< jeblad> for schools there should be some mechanism for reusing curriculum
< jeblad> Perhaps even from one language to another if there is no such curriculum defined
< walkerma> (Amgine: No, I think a lot of it comes just from working in academia and on WP)
< hejko> Strategy
< hejko> Work through schools. Provide the necessary data and infrastructure to organizations that aim to bring WMF content to schools.
< Amgine> http://bartleby.com/141/ <- Best. English. textbook.
< walkerma> jeblad: They will have the curriculum set by their national or state board of education.  Our job is to make sure we allow selections to 
            match those curricula
< walkerma> That's what BozMo did with "Wikipedia for Schools" - all of the books that are required reading in the UK curriculum are included
< walkerma> (Even though the collection is only 5500 articles)
< wizzy> what time must this be done ? hours from now ?
< Amgine> 28 minutes from now another event is taking place here.
< wizzy> I was figuring we would do it on the wiki ?
< walkerma> wizzy: I'm assuming midnight PST, which is in about 12 hours, but I'd like to get things pretty much resolved now - we don't want 
            someone making a controversial edit when half of us are asleep!
< wizzy> I will take a shot when this IRC is done, unless someone else wants to
< walkerma> And if I'm wrong and it's UTC, we really need to get working!
< walkerma> wizzy, sounds good
< walkerma> OK, should we look at no 4?
< wizzy> its 9:30PM here - so you can fix later
-!- pm27 [n=chatzill@80.125.173.60] has quit [Read error: 60 (Operation timed out)]
< hejko> i won't have any time to fix stuff on the wiki as it is late here. but i think i left my comments already here.
< wizzy> for #4 - ditto strategy hiding in assertions
< wizzy> also I think there is a lot of overlap between rec. #1 and rec. #4
< wizzy> or shall we move my parser stuff in #1 to here ?
< Amgine> <nods>
-!- Pharos [i=47f952b7@gateway/web/freenode/x-cyakcvchiaztvoer] has joined #wikimedia-strategy
< hejko> #4 is  a selection tool that emits custom dumps of data. hopefully easily parsable.
< walkerma> I agree that there is overlap; however, I think the central strategy point is different
< wizzy> hejko: that was the thrust of my strategy #3 in rec. #1
< walkerma> hejko: It also requires a community contribution, that's critical.  Articles don't assess themselves.
< hejko> wizzy: But it is not a parsers job to support article selection.
< hejko> walkerma: did you read my thoughts :)
< hejko> I'd propose a tool to select content w/o community input.
< wizzy> I put "# The parser must be able to accept a narrow selection, a list of articles to be output" - figuring that you give the tool a 
         big-fat-list
< hejko> E.g. a tool to get custom dumps based on certain subcategories, asessment scores/popularity (if available)
< hejko> but it should be really a self service online tool.
< Amgine> I just wrote something like that recently, called GNSM.
< Amgine> It is a Special: page.
< hejko> no need to involve the community
< Amgine> So, a form of API.
< walkerma> hejko: That would be very hard to write, unless you want to ignore quality issues - and that would be polically unacceptable on en:WP
< wizzy> I would like your "self service online tool" to just give me the article names, which I then feed to the dump-maker
< Amgine> http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/GoogleNewsSitemap/
< hejko> why? such a tool could use the the same strategies as the selection bot does
< wizzy> so I can ask your tool twice - once for africa, second for history, and get an interlinked dump
< hejko> wizzy: even for category (and sub) africa and history
< walkerma> hejko: The selection bot is riding on the shoulders of thousands of PEOPLE who assessed the articles
< hejko> or even the intersection
< Philippe|Away> 15 minutes warning... :-)
< hejko> walkerma: yes i know
< walkerma> wizzy: The new 1.0 bot will be launched in less than two weeks, and it will allow Africa AND history, at least in WikiProjects
< hejko> but from the perspective of an end user I simply want to use the data and the human scores
< walkerma> hejko: On en:WP you can do that; on es you can't
< wizzy> will it give me Africa OR history ?
< walkerma> There is no 1.0 scheme on es
< walkerma> wizzy: Not sure
< wizzy> I don't want african history articles, I want all of africa annotated with history
< hejko> walkerma: right.
< Amgine> wizzy: my tool can, and it could select from a list of articles added to (for example) Category:1.0
< hejko> will we at least get wiki trust on all wps?
< Amgine> hejko: FlaggedRevs?
< Amgine> That will depend how it survives en.WP
< walkerma> hejko: Hard to say at this point.  I suspect there is some jockeying between WikiTrust and Flagged Revs
< wizzy> so - rec. #4 is about leveraging the crowd to help us get the right articles, and mixing in stats ?
< walkerma> wizzy: Right
< hejko> Puh, we have flagged revs in the german WP and they are more about vandalism then about quality
< wizzy> Is the "self service online tool" something different ? rec #5 ?
< walkerma> hejko: I think WikiTrust is mainly about that too
< hejko> walkerma: e.g. implementing  your assessment project at all WPs?
< Amgine> hejko: there is a section which used to be part of FlaggedRevs, called QualityPages.
< walkerma> wizzy - we're only allowed 4!  But really I saw that as 4(c) or something similar!
< wizzy> or is it #1(c) ?
-!- AlexandrDmitri [n=Alexandr@41.249.64.224] has quit ["Booted by Maroc T?l?com again"]
< walkerma> hejko: I wouldn't put it that way - because I don't want to foist my approach on say de:WP if they don't like it - but SOMETHING is neded
< walkerma> Please rewrite my wording if that isn't clear
< wizzy> walkerma: the work on Importance can be used inter-wiki - trust/flagged cannot
< hejko> walkerma: I'd love to see your approach used in more wikis. but this is certainly a luxury for most projects.
< walkerma> wizzy - true, though any language can implement the WikiTrust stuff, or Flagged Revs, or both.  Personally I'd like to see both - with 
            the checkers on Flagged Revs using WikiTrust
< hejko> anyway, if the assessment data is available it should be usable when selection custom dumps.
< hejko> selecting
< walkerma> hejko: What we found on en is that once the 1.0 bot became available, everyone WANTED to use it - we didn't need to do any persuading!
< walkerma> hejko: Good point, I'd missed that
< hejko> walkerma: I'd add this to the facts of the proposal.
< walkerma> I'd personally like to see WikiTrust or FR select vandalism free versions for dumps
< wizzy> who is re-wring rec. #4 strategies ?
< wizzy> re-writing
< walkerma> Shall we relocate to irc://irc.freenode.net/#wikipedia-1.0 ?