We propose a set of features to study the effects of data streams on complex systems. This feature set is called the the signature representation of a stream. It has its origin in pure mathematics and relies on a relationship between non-commutative polynomials and paths. This representation had already signifcant impact on algebraic topology, control theory, numerics for PDEs, stochastic analysis and the theory of rough paths; more recently first steps have been taken to apply such methods to the study of big data streams. We show that the signature representation can provide an efficient summary of a stream and its effects. We then show that it can be combined with standard tools from machine learning. After introducing the signature for ...