This article highlights an approach based on authentic data, by focusing on recent research related to collection, processing and analysis of a large French text-message corpus, entitled 88milSMS (http://88milsms.huma-num.fr/, Panckhurst, Détrie, Lopez, Moïse, Roche, Verine, 2014), including a sociolinguistic questionnaire submitted to donors (with their answers). The authors, using a pluridisciplinary approach (linguistics/ language sciences, computer science, Natural Language Processing), explain why they chose to give the scientific community and the general public access to the SMS corpus.Nous présentons notre approche fondée sur les données authentiques, en nous concentrant sur des recherches récentes, portant sur le recueil, le traite...