Last modified on 16 March 2011, at 05:27

Task force/Analytics/Feature prioritization

Short term priorities

Note: Domains ordered alphabetically, does not imply ranking

Community Initiatives

Process owner: Rob Lanphier / Facilitator: Erik Zachte

Engineering

Process owner: Danese Cooper / Facilitator: Rob Lanphier

External Reporting

Process owner: Erik Moeller / Facilitator: Erik Zachte

Global Development

Process owner: Barry Newstead / Primary contact: Mani Pande

Infrastructure

Process owner: Danese Cooper / Facilitator: Tomasz Finc

Product Strategy

Process owner: Erik Moeller / Primary contact: Howie Fung / Facilitator: Nimish Gautam?

  • A/B testing
  • Participation analytics (A/B testing --> effects on participation)

User Experience

Process owner: Danese Cooper / Primary contact: Parul Vora /

Article Feedback

Process owner: Alolita Sharma / Primary contact: /

  • Basic analysis

Long term priority organization

Dashboard

  • Develop our own way of tracking how our projects are doing
  • Getting more fidelity around how different segments of our users are doing

Analytics for Specific Projects

Ensuring our infrastructure can handle increased analytics demands

  • Formulate role of WMF vs tool server cluster in data capture/storage/aggregation/delivery
  • Fixing fragile/broken parts of the system (e.g. udp2log)
  • Deploying new tools that give us new views (e.g. OWA)
  • Increasing our development speed and paying down technical debt

Process/Priorities Thoughts

How we're going to organize/attributes to assign to each feature:

  • By data source (e.g. squid logs vs OWA data capture)
  • By tool used to implement
  • By person/team responsible for implementing
  • By priority

How to set priority:

  • What feature development priority does it inform?
  • What programs priority does it inform?
  • How essential is the data to executing on feature or program priority?
  • What deadlines loom?

Features to consider (priority in parenthesis)

From the requirements doc:

  • Overall dashboard of site health; dashboard per segment
    • Uniques (medium)
    • Page views (done)
    • Visits (medium)
    • PV/visit (medium)
    • Bounce rate (medium)
    • Minutes/Visit (medium)
    • Entry pages (medium)
    • Exit Pages and destinations (medium)
    • Traffic sources, referrer breakdown (high)
    •  % new vs repeat (high)
    • Geographic breakdown (high)
  • Segmentation
    • By project -- highest level; everything below is for a particular project (high) --> the highest priority is that this will be the primary filter with the ones below being secondary filters
    • Reader vs. editor (and now rater) (high)
    • By geography - country (high)
    • By geography - city level (medium)
    • By referrer (high)
    • By device type (?)
  • User pathing: ability to instrument specific paths and produce fallout reports. E.g., Account Creation Process, Editing flow, ratings flow (high)
    • Segmentation (per above)
  • Split A/B testing (elements of pages and pages themselves) (depends)

Open question: requirements around behavior x web intersection; frank's account creation project and impact on subsequent editing;

  • Followup work: editor account age cohorts - Diederik (in progress)
  • Analysis of effect of reverts on editor retention - Diederik (high)
  • Editing history clusters: set first edit at t=0 and follow users over time to see what patterns emerge. (Cal-IT?) (medium)
  • Tracking where readers are from, regardless of language (segmentation of web analytics data) (high)
    • Activity on one Wikipedia from users in another geography, e.g.
      • PV of English Wikipedia from India (both mobile and desktop)
      • New articles created on English Wikipedia from India, etc.
      • Editor activity on English Wikipedia from India
  • Tracking where editors are from, regardless of languages (segmentation of Zachte's active editor data) (high)
  • Tracking user environment (medium)
    • Screen resolution
    • Computer horsepower
    • Browser capabilities
  • Mobile : TBD

Engineering driven work:

  • OWA evaluation (high)
  • OWA integration/testing/improvement (tbd)
  • udp2log improvements (high)
  • rsyslog deployment (possible udp2log replacement) (medium)
  • Data Warehouse
  • Hadoop/Hive/Hbase
  • Analysis of API queries
  • XML Dumps/Snapshots - stable, consistent, valid, .. etc