Data Tools

We will reveal the mystery data set to you 10 days before the competition on April 20, 2016. To prepare, we recommend that each member of your team select 2 to 3 tools from the list and learn how to use them prior to April 20, 2016. Please also refer to the Workshop Page for additional opportunities to learn.

Data Sources

Data Analysis  

  • Classification & Clustering
    A chapter discussing different statistical methods for analyzing data written by Leo Pekelis
  • Gplot2
    A data visualization package for the statistical programming language R
  • GGobi
    A visualization program for exploring high-dimensional data
  • Improvise
    A tool to build and browse highly-coordinated visualizations interactively
    A data analytics platform
  • Metamarkets
    A data analytics and visualization platform
  • Mondrian
    A general-purpose statistical data-visualization system
  • Orange
    A component-based data mining and machine learning software suite
  • ParVis
    A program to represent multidimensional data in the form of parallel coordinates
  • Rapidminer
    A system for data mining
  • RevoScaleR
    A package that overcomes R‘s memory limit. Free to academic users
  • TimeSearcher
    A tool for visual exploration of time-series data
  • TreeMap
    Software for treemapping, a method of displaying hierarchical data by using nested rectangles
  • WEKA
    A popular suite of machine learning software

Natural Language Processing   

Network Analysis  

  • NodeXL
    An Excel template for network graph exploration
  • Gephi
    An interactive visualization and exploration platform
    An exploratory data analysis and visualization tool for graphs and networks
  • Pajek
    Aprogram for analysis and visualization of large networks
  • NetworkX
    A software package for the creation, manipulation, and study of complex networks
  • SNAP
    A general purpose network analysis and graph mining library


  • D3
    A JavaScript library for manipulating documents based on data.
  • Flare
    A library for creating visualizations that run in the Adobe Flash Player
  • Google Fusion Tables
    A data visualization with pie charts, bar charts, lineplots, scatterplots, timelines as well as geographical maps.
  • Google Spreadsheets
    Visualizations using a Google Spreadsheet as a data source
  • IBM Watson Analytics
    Cloud-based freemium data analysis service
  • Improvise
    Java system supporting coordinated views
  • InfoVis Toolkit
    A Java toolkit for the development of information visualization applications and components
  • Protovis
    A graphical toolkit
  • Modest Maps
    A library for creating interactive maps
  • Piccolo
    A toolkit that supports the development of 2D structured graphics programs
  • Prefuse
    A set of software tools for creating rich interactive data visualizations
  • PolyMaps
    A JavaScript library for image- and vector-tiled maps using SVG
  • Processing
    Programming language and environment for interactive graphics
  • Tableau Public
    A free version of a data visualization service
  • VTK
    A software system for 3D computer graphics, image processing and visualization