In this article, I present the main research objectives, questions, and data elicitation methods used in a project regarding the acquisition of word-internal geminates in Italian as a second/foreign language (L2). Extending beyond geminate consonants, a corpus-based approach to document various segmental and suprasegmental phenomena in L2 Italian is proposed. The study introduces the research protocol, which entailed multiple tasks to collect both perception and production data efficiently within a manageable timeframe. A detailed description of the individual tasks, the selection of the materials and the participants, and the practical aspects of the data collection are provided; the limitations and challenges associated with collecting data pertaining to the acquisition of L2 speech are also discussed. While the primary focus is on L2 Italian, the method can also be adopted for research on first language (L1) Italian varieties, thus making it widely applicable. The ultimately aim of the project is to present a transparent and adaptable methodological approach that can serve as a foundation for future research.