Proposal talk:Distributed backup of Wikimedia content
Creating dumps and available backups of Wikimedia content -- whether distributed or not -- should be a very high priority. -- Phoebe 03:29, 2 August 2009 (UTC)
- Agreed. But that isn't a proposal. --Gmaxwell 22:34, 13 August 2009 (UTC)
- Because it's supposed to be a continuing function. It works well in all the projects but en.wikipedia, and probably because of not much more than the size. There are working enwikip dumps without history, I'm told. 99.60.1.164 19:57, 22 August 2009 (UTC)
File use by non WMF wikies
I have a question. Does this proposal include a file sharing to non WMF wikies? I think it could be a good step forward. Than these communities running these wikies can support as by other files and/or data.--Juan de Vojníkov 08:13, 14 August 2009 (UTC)
- It is already possible to use Wikimedia Commons via "external repositories" inside of MediaWiki configuration. --Millosh 13:49, 16 August 2009 (UTC)
- What are "external repositories"? Can you link documentation and example?--Juan de Vojníkov 22:57, 18 August 2009 (UTC)
- I think Millosh means Embedding Commons media in third party projects --Goldzahn 03:54, 19 August 2009 (UTC)
- What are "external repositories"? Can you link documentation and example?--Juan de Vojníkov 22:57, 18 August 2009 (UTC)
Impact?
Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:08, 3 September 2009 (UTC)
- This is an infrastructure project, which means that it won't have direct or near-to-direct massive impact on end-users. (Of course, some consequences may make massive impact, but we may just guess which those consequences are.) --Millosh 13:32, 11 September 2009 (UTC)
WikGrid idea
http://meta.wikimedia.org/wiki/User:Mdupont/WikiGrid Mdupont 19:25, 23 September 2009 (UTC)
Proposal to merge with Proposal:Distributed_Wikipedia
the creation of peer-to-peer distributed infrastructure, where the current wikipedia servers are just another node participating in the network, and the placement of GUI-front-ends on top of the nodes, encourages wikimedia pages and associated images to _implicitly_ be "automatically backed up", by virtue of them being "near-permanently cached".
importantly, the combination of both a distributed "back-end" and nearby (local or loopback 127.0.0.1) front-ends *automatically* makes the choice of which images (and pages) to be cached a very immediate, obvious and relevant one: absolute priority should be given to the pages being requested by the users of the front-ends.
a further optimisation, for the convenience of the users, could be an enhancement of the GUI, whereby a query is first submitted ("Paris" for example) and the user then clicks a button "please slurp every page on the top 50 hits into the local cache". over an exceedingly tediously slow link, the back-end is then instructed to give some priority to obtaining these pages (and images). over a *disconnected* link, the back-end creates a list of queries which will go onto a USB memory stick (or other media), such that at some point in time (once a week, or once a month), the USB stick will be shipped abroad (or to a nearby university with internet connectivity), the USB memory stick inserted into a node, the queries run, and the responses place BACK onto the USB memory stick, and shipped back to the outlying remote area. (no, this is not a joke). Lkcl 21:37, 30 September 2009 (UTC)
- Publications that exist without a legal entity as the publisher in a peer-to-peer network are a rather childish notion. Who is legally responsible? The Wikimedia Foundation may not be able to regulate that network you imagine and if they tried it might just fork, leaving an identical copy without any publisher. A peer-to-peer distributed infrastructure has no technical advantage over an architecture using a distributed filesystem. --Fasten 19:18, 3 November 2009 (UTC)
Distributed filesystems
A different approach would be to choose a distributed filesystem that made it convenient for third parties to attach to the filesystem and mirror its content.
- Distributed File System (Microsoft)
- IBM General Parallel File System
- Amazon S3 or other cloud storage
- Ceph — (available with Linux 2.6.34)
- or a simpler backup solution like rsync
A configured MediaWiki system could be delivered as an rpm or VMware/VirtualBox appliance (at least for the lazy audience) or the pages could be preformatted ready for delivery through a web server. Appliances could demand to be addressed with a DNS name granted by the Wikimedia Foundation and report their load back to the foundation and thus create a world-wide scalable network with potential for load balancing (e.g. through DNS load balancing). One could offer different pools (like NTP pools) with UN location codes as DNS names (e.g. nyc.us.wikipedia.org). --Fasten 19:44, 3 November 2009 (UTC)