Life experiences are increasingly intertwined with digital devices, suggesting screens as a preferred, if not required, data source for behavioral studies and health interventions. Text Information Extraction from digital screenshots is then a key prerequisite to the overall accuracy of analyses regarding media behaviors. This unique image data set offers the opportunity i) to test existing Image Processing and Text Recognition methods, and ii) to identify and discuss the computational challenges specific to the considered case. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, ensured a 74% text accuracy at the character level. The implications and incidence of different error factors on the resulting quality of text are discussed, prompting the discussion of future research trajectories.