The extensive deployment of sensors in oilfield operation and management has led to the collection of vast amounts of data, which in turn has enabled the use of machine learning models to improve decision-making. One of the prime applications of data-based decision-making is the identification of optimum well locations for hydrocarbon recovery. This task is made difficult by the relative lack of high-fidelity data regarding the subsurface to develop precise models in support of decision-making. Each well placement decision not only affects eventual recovery but also the decisions affecting future wells. Hence, there exists a tradeoff between recovery maximization and information gain. Existing methodologies for placement of wells during the early phases of reservoir development fail to take an abiding view of maximizing reservoir profitability, instead focusing on short-term gains. While improvements in drilling technologies have dramatically lowered the costs of producing hydrocarbon from prospects and resulted in very efficient drilling operations, these advancements have led to sub-optimal and haphazard placement of wells. This can lead to considerable number of unprofitable wells being drilled which, during periods of low oil and gas prices, can be detrimental for a company’s solvency. The goal of the research is to present a methodology that builds machine learning models, integrating geostatistics and reservoir flow dynamics, to determine optimum future well locations for maximizing reservoir recovery. A deep reinforcement learning (DRL) framework has been proposed to address the issue of long-horizon decision-making. The DRL reservoir agent employs intelligent sampling and utilizes a reward framework that is based on geostatistical and flow simulations. The implemented approach provides opportunities to insert expert information while basing well placement decisions on data collected from seismic data and prior well tests. Effects of prior information on the well placement decisions are explored and the developed DRL derived policies are compared to single-stage optimization methods for reservoir development. Under similar reward framework, sequential well placement strategies developed using DRL have been shown to perform better than simultaneous drilling of several wells.