Talk:Task force/Recommendations/Offline 1

Potential survey

Survey goals: Two-fold:

  1. Continue to define areas of interest and themas among WMF content re-users, focusing particularly on hurdles and challenges facing implementations of database dump use.
  2. Begin the process of developing a quantitative instrument to develop an objective base line measure.

Survey recruitment strategy:

  1. Token distribution via e-mail invitation
  2. Invitation list developed from Strategy Offline Task Force members, targeting developers and project managers who have experience with or planning projects involving Mediawiki database dumps. (Invitees are presumed to be drawn from TF interviews and related TF research.)
  3. Predicted n ~ 20 convenient sample.


Using dumps?

  • Using Wikimedia Foundation database dumps?
    • No, not planning to either.
    • Not yet, but planning to do so within 1-2 years.
    • Currently developing to do so.
    • Using Wikimedia Foundation generated content
  • Can you give us the name and a brief description of your project?
    • [fill in the blank]

If not planning to

  • Did any/all of the following which influenced your reason not to use Wikimedia Foundation database dumps:
    • Content is unreliable/inaccurate/contains vandalism.
    • Content is not relevant to my project.
    • Content is unsuitable (contains objectionable material.)
    • Live mirroring of content is not allowed.
    • Content does not include Semantic tags
    • Content does not include Geotags
    • Content does not include structural markup
    • Content does not include images/media
    • Content is not provided as xhtml.
    • Content is not provided as xml.
    • Content is not provided as JSON.
    • Content is not provided as Yaml.
    • Content is not provided as sql.
    • Content is not provided as Other formal [fill in the blank]
    • Database dumps require parsing/post-parsing processing.
    • Database dumps are significantly lagged.
    • Licensing issues with GFDL/CC-by-sa/CC-by.
    • Philosophic differences with the Wikimedia Foundation/Wikis in general.
    • Other reason [fill in the blank]

[thank you for filling out this survey][end]

All others

  • What are some of your concerns with using WMF database dumps?
    • Content is unreliable/inaccurate/contains vandalism.
    • Content is not relevant to my project.
    • Content is unsuitable (contains objectionable material.)
    • Live mirroring of content is not allowed.
    • Content does not include Semantic tags
    • Content does not include Geotags
    • Content does not include structural markup
    • Content does not include images/media
    • Content does not include other [fill in the blank]
    • Content is not provided as xhtml.
    • Content is not provided as xml.
    • Content is not provided as JSON.
    • Content is not provided as Yaml.
    • Content is not provided as sql.
    • Content is not provided as Other formal [fill in the blank]
    • Database dumps require parsing/post-parsing processing.
    • Database dumps are significantly lagged.
    • Licensing issues with GFDL/CC-by-sa/CC-by.
    • Philosophic differences with the Wikimedia Foundation/Wikis in general.
    • Other reason [fill in the blank]

If any 'does not include' are set

  • If the database dumps were to include more information or tags, what would you want in the schema/output?
    • [fill in the blank]

If any 'content is not provided as' are set

  • If the database dumps were provided in the format you prefer, what would you want in the schema/output? (If you've already provided this information, remind us that you did.)
    • [fill in the blank]

Planning to w/in 1-2 years

  • Can you provide us with a link to your project's development website?
    • [fill in blank]

If planning 2 && parsing/post-parsing

  • What are you planning to use to parse Mediawiki syntax?
    • A custom Mediawiki parser.
    • A third-party Mediawiki parser.
    • Regular expressions to extract data.
    • Mediawiki.
    • Other [fill in the blank]
  • If your project requires parsing/post-parse processing of Wikimedia Foundation database dumps, what could the Foundation do to make your tasks easier, less complex, require less development or computer time?
    • [fill in the blank]

Developing to use WMF data dumps

  • Can you give us a link to your project using WMF generated data?
    • [fill in blank]

If currently && parsing/post-parsing

  • What are you using to parse Mediawiki syntax?
    • A custom Mediawiki parser.
    • A third-party Mediawiki parser.
    • Regular expressions to extract data.
    • Mediawiki.
    • Other [fill in the blank]
  • If your project is parsing/post-parse processing Wikimedia Foundation database dumps, what could the Foundation do to make your tasks easier, less complex, require less development or computer time?
    • [fill in the blank]

Currently using WMF data dumps

  • Can you give us a link to your project using Wikimedia Foundation generated data?
    • [fill in blank]

If currently && parsing/post-parsing

  • What are you using to parse Mediawiki syntax?
    • A custom Mediawiki parser.
    • A third-party Mediawiki parser.
    • Regular expressions to extract data.
    • Mediawiki.
    • Other [fill in the blank]
  • If your project is parsing/post-parse processing Wikimedia Foundation database dumps, what could the Foundation do to make your tasks easier, less complex, require less development or computer time?
    • [fill in the blank]

Issues with strategy section

I have some issues with the strategy section in its current form.

  • Current xml dumps are real xml; there's no need to imply they aren't.
    Suggested alternate wording: Provide parsed semantically annotated XML of article text.
  • Current xml dumps are standardised and documented. However, they do not included xml parsing of article content, which is a previous strategy statement.
    Suggested alternate wording: Create and maintain article xml DTD standard and documentation, plus stylesheet, per-project.
  • The only extant parser for Mediawiki is Mediawiki. This is an immovable object unless a prior step is taken: Write a Mediawiki Parser Specification. If this were the sole recommendation of this task force it would justify the entire strategy process, imo.
    Suggested alternate wording: Write/publish a Mediawiki parser specification.

- Amgine 00:12, 12 January 2010 (UTC)Reply

Return to "Task force/Recommendations/Offline 1" page.