In architecture and computer-aided design, wireframes (i.e., line-based models) are widely used as basic 3D models for design evaluation and fast design iterations. However, unlike a full design file, a wireframe model lacks critical information, such as detailed shape, texture, and materials, needed by a conventional renderer to produce 2D renderings of the objects or scenes. In this paper, we bridge the information gap by generating photo-realistic rendering of indoor scenes from wireframe models in an image translation framework. While existing image synthesis methods can generate visually pleasing images for common objects such as faces and birds, these methods do not explicitly model and preserve essential structural constraints in a wireframe model, such as junctions, parallel lines, and planar surfaces. To this end, we propose a novel model based on a structure-appearance joint representation learned from both images and wireframes. In our model, structural constraints are explicitly enforced by learning a joint representation in a shared encoder network that must support the generation of both images and wireframes. Experiments on a wireframe-scene dataset show that our wireframe-to-image translation model significantly outperforms the state-of-the-art methods in both visual quality and structural integrity of generated images.