In modern machine learning, raw data is the pre-ferred input for our models. Where a decade ago data scien-tists were still engineering features, manually picking out the details they thought salient, they now prefer the data in their raw form. As long as we can assume that all relevant and ir-relevant information is present in the input data, we can de-sign deep models that build up intermediate representations to sift out relevant features. However, these models are often domain specific and tailored to the task at hand, and therefore unsuited for learning on heterogeneous knowledge: informa-tion of different types and from different domains. If we can develop methods that operate on this form of knowledge, we can dispense with a great de...