Python for big data
1 Basic stack
1.1 numpy
1.2 scipy
1.3 pandas
1.3.1 "Python for Data Analysis" by Wes McKinney
1.4 scikits image
1.5 scikits learn
1.6 scikits statsmodels
1.7 nltk
1.8 matplotlib
2 Newer packages
2.1 Numba
2.2 wiseRF
2.3 Blaze
3 Integrated platforms
3.1 Continuum.io
3.1.1 Anaconda
3.1.2 Wakari
3.2 PiCloud
3.2.1 Python + AWS
3.3 wise.io
3.3.1 MLaaS
3.3.1.1 RandomForest
3.4 ipython
3.4.1 Notebook
3.5 Orange
4 Visualization
4.1 matplotlib
4.2 Bokeh
4.2.1 ggplot for python
4.3 Mayavi
4.4 Nodebox
4.5 igraph
4.6 pandas
4.6.1 pandas.tools.rplot
4.7 Google APIs
4.7.1 googleVis
5 Data formats
5.1 Flat text
5.1.1 xreadlines
5.1.2 readLines
5.1.3 pandas
5.1.3.1 read_csv
5.1.3.2 read_fwf
5.1.4 xlrd/xlwt/xlutils
5.2 HDF5
5.2.1 PyTables
5.2.2 h5py
5.3 SQL
5.3.1 SQLAlchemy
5.3.2 pysqlite3
5.3.3 pyodbc
5.3.3.1 Vertica
5.3.3.2 Netezza
5.3.3.3 Teradata
5.4 NoSQL
5.4.1 MongoDB
5.4.1.1 PyMongo
5.4.2 CouchDB
5.4.2.1 couchdb-python
5.4.2.2 couchdbkit
5.5 JSON
5.5.1 Standard library
5.5.1.1 json
5.5.2 simplejson
5.6 XML
5.6.1 Standard library
5.6.1.1 xml
5.7 HBase
5.7.1 HappyBase
6 MapReduce
6.1 Hadoop interface
6.1.1 Hadoop Streaming
6.1.1.1 Hadoopy
6.1.1.2 example
6.1.1.3 dumbo
6.1.1.4 mrjob
Used and developed by Yelp
6.1.2 Pydoop 6.1.2.1 uses Hadoop Pipes 6.2 discoUsed and developed by Nokia
7 Glue 7.1 rpy2 7.1.1 R 7.2 PySpark 7.2.1 Spark 7.3 ipython 7.3.1 magic 7.3.1.1 R 7.3.1.2 SQL 7.3.1.3 matlab/octave 7.3.1.4 IDL 7.4 Jython 7.4.1 Java 7.5 boto 7.5.1 Amazon Web Services 8 GPU 8.1 NumbaPro 8.2 PyCUDA 9 Parallel 9.1 ipython 9.1.1 ipcluster 9.2 pp 9.3 dispy 10 Efficiency 10.1 Cython 11 Packages 11.1 PyPI 11.1.1 30686 packages VIA: Python for big data