Tag it:
Delicious
Furl it!
Spurl
NewsVine
Reddit
YahooMyWeb
Technorati

Distributed Data Mining Simulator

screenshot5 Many current data mining tasks can be accomplished successfully only in a distributed setting. The field of distributed data mining has therefore gained increasing importance in the last decade. However, there are still many open questions and challenges. The Distributed Data Mining Simulator allows to perform distributed data mining experiments in a simple and flexible way. The experiments are not actually executed on distributed network nodes. The tool only simulate this. Simulation makes it easy to experiment with diverse network structures and communication patterns. Optimal methods and parameters can be identified efficiently before putting the system into use. The network structure can for instance be optimized as part of the general parameter optimization. While this cannot replace testing the system in an actual network, it makes the development stage much more efficient.

The distributed data mining simulator is provided as plugin to the open source data mining suite RapidMiner.

The key features are

  • 100% Java implementation
  • Simple API
  • Very easy to configure and extend
  • Combineable with all RapidMiner operators
    (e.g. clustering or ensemble learning)
  • Supports arbitrary network structures
  • Interactive communication inspection and debugging

New in Version 1.1:

  • Automatic generation of diverse network structures (e.g. scalefree networks, client/server, ...)
  • Distributed rule learning
  • Distributed feature extraction and sharing

Download

The Distributed Data Mining Simulator is currently discontinued. You can access an older version from the Rapid Miner CVS here.


Documentation

The following documentation is available:

  • There is a pdf manual (pdf), covering the use of the Distributed Data Mining Simulator.
  • A good starting point for using the Distributed Data Mining Simulator are the examples provided with the distribution.

Source Code and License

The Distributed Data Mining Simulator is provided under the GNU PUBLIC LICENSE (GPL). If you need another licensing scheme, please contact me. The source oode of the Distributed Data Mining Simulator can be obtained from the RapidMiner .


Screenshots

screenshot1

The Distributed Data Mining Simulator can be interactively configured through the GUI of RapidMiner and can be combined with all RapidMiner operators. It is, for instance, possible to cluster an example set and then to distributed the examples to nodes according to this clustering.

screenshot2

For debug purposes, the communication among nodes can be inspected at all points in simulation time.

screenshot3

The Distributed Data Mining Simulator allows you to create arbitrary network structures and to visualize them interactively. 

screenshot4

The Distributed Data Mining Simulator allows you to generate different network structures and to visualize their properties, as e.g. the distribution of node degrees.


Contact

The Distributed Data Mining simulator was created by Michael Wurst.


Related Links

  • For more resources on distributes data mining, refer to the distributed data mining Wiki, maintained by Hillol Kargupta.
  • The Distributed Data Mining Simulator is based on the RapidMiner Data Mining framework.
 
Valid XHTML 1.0 Strict Valid CSS!