Minsk, Belarus
Metadata extraction and processing are crucial for social media analysis as they help to systematize information about publications, authors, and audience engagement. The article introduces the web application OmniTrack, designed for automated collection and parameterization of metadata from Russian-language and English-language travel blogs. The application integrates web-scraping methods with a multi-layer architecture and a modular approach, which provides scalability, reproducibility, and extensibility even with changing external platform interfaces. The backend is implemented in Python (Flask); the frontend utilizes HTML, CSS, and JavaScript for an interactive user experience. Data-extraction algorithms are independent modules: undetected_chromedriver for TikTok’s dynamic rendering via browser emulation, yt-dlp for direct JSON-formatted metadata retrieval from YouTube, and Instaloader for high-level access to Instagram’s object model. Collected metadata are normalized to a unified schema in Excel format using the Openpyxl library, which facilitates subsequent statistical analysis. The application underwent usability testing: 42 participants processed 400 posts, evaluating installation simplicity, processing speed, and interface intuitiveness. The mean ease-of-use score was as high as 4.9 out of 5. Some critical issues were identified and resolved, including incompatibility of the pywebview backend module and incorrect handling of shortened TikTok links. The OmniTrack web application provides a robust framework for constructing a representative metadata corpus, supporting further linguistic research into the discursive, genre, and communicative features of Russian-language and English-language travel blogs.
web application, metadata, social-media, web scraping, data automation
1. Zhuchkova S. V., Rotmistrov A. N. Automatic extraction of text and numeric web data for social science purposes. Sociology: Methodology, Methods, Mathematical Modeling (AM), 2021, (50-51): 141–183. (In Russ.) https://elibrary.ru/xytjoy
2. Ahire V. Y. Assessing the effectiveness of metadata management systems in enhancing data governance: A primary study of IT and data-driven organizations. Management Journal for Advanced Research, 2025, 5(3): 85–90. https://doi.org/10.5281/zenodo.16792143
3. Berman F., Rutenbar R., Hailpern B., Christensen H., Davidson S., Estrin D., Franklin M., Martonosi M., Raghavan P., Stodden V., Szalay A. S. Realizing the potential of data science. Communications of the ACM, 2018, 61(4): 67–72. https://doi.org/10.1145/3188721
4. Brown M. A., Gruen A., Maldoff G., Messing S., Zanderson Z., Zimmer M. Web scraping for research: Legal, ethical, institutional, and scientific considerations. ArXiv, 2024. https://doi.org/10.48550/arXiv.2410.23432
5. Chani T., Olugbara O. O., Mutanga B. The problem of data extraction in social media: A theoretical framework. Journal of Information Systems and Informatics, 2023, 5(4): 1363–1384. https://doi.org/10.51519/journalisi.v5i4.585
6. Díaz de la Paz L., Crispí A. T., Mederos A. A. L. Model for the evaluation of metadata quality: Proposal for open science management in Cuba. Advanced Notes in Information Science, 2024, 6: 100–113. https://doi.org/10.47909/978-9916-9974-5-1.97
7. Edara P., Pasumansky M. Big metadata: When metadata is big data. Proceedings of the VLDB Endowment, 2021, 14(12): 3083–3095. https://doi.org/10.14778/3476311.3476385
8. Foerderer J. Should we trust web-scraped data? ArXiv, 2023. https://doi.org/10.48550/arXiv.2308.02231
9. Holom R.-M., Rafetseder K., Kritzinger S., Sehrschön H. Metadata management in a big data infrastructure. Procedia Manufacturing, 2020, 42: 375–382. https://doi.org/10.1016/j.promfg.2020.02.060
10. Huang Y.-N., Munteanu V., Love M. I., Ronkowski C. F., Deshpande D., Wong-Beringer A., Corbett-Detig R., Dimian M., Moore J. H., Garmire L. X., Reddy T. B. K., Butte A. J., Robinson M. D., Eskin E., Abedalthagafi M. S., Mangul S. Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies. Cell Genomics, 2025, 5(5). https://doi.org/10.1016/j.xgen.2025.100845
11. Moreno-Ortiz A., García-Gámez M. Strategies for the analysis of large social media corpora: Sampling and keyword extraction methods. Corpus Pragmatics, 2023, 7: 241–265. https://doi.org/10.1007/s41701-023-00143-0
12. Ohme J., Araujo T., Boeschoten L., Freelon D., Ram N., Reeves B. B., Robinson T. N. Digital trace data collection for social media effects research: APIs, data donation, and (screen) tracking. Communication Methods and Measures, 2024, 18(2): 124–141. https://doi.org/10.1080/19312458.2023.2181319
13. Park J.-R., Tosaka Y. Metadata quality control in digital repositories and collections: Criteria, semantics, and mechanisms. Cataloging & Classification Quarterly, 2010, 48(8): 696–715. https://doi.org/10.1080/01639374.2010.508711
14. Park J.-R., Tosaka Y., Maszaros S., Lu C. From metadata creation to metadata quality control: Continuing education needs among cataloging and metadata professionals. Journal of Education for Library and Information Science, 2010, 51(3): 158–176.
15. Pretorius K. A simple and systematic approach to qualitative data extraction from social media for novice health care researchers: Tutorial. JMIR Formative Research, 2024, 8: 1–9. https://doi.org/10.2196/54407
16. Skluzacek T. J., Chen M., Hsu E., Chard K., Foster I. Models and metrics for mining meaningful metadata. International Conference on Computational Science. Computational Science – ICCS 2022: Proc. 22nd Intern. Conf., London, UK, 21–23 Jun 2022. Springer, 2022, 417–430.
17. Subramaniam P., Ma Y., Li C., Mohanty I., Fernandez R. C. Comprehensive and comprehensible data catalogs: The what, who, where, when, why, and how of metadata management. ArXiv, 2021. https://doi.org/10.48550/arXiv.2103.07532
18. Wilkinson M. D., Dumontier M., Aalbersberg I. J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., da Silva Santos L. B., Bourne P. E., Bouwman J., Brookes A. J., Clark T., Crosas M., Dillo I., Dumon O., Edmunds S., Evelo C. T., Finkers R., Gonzalez-Beltran A., Gray A. J. G., Groth P., Grethe J. S., Mons B. The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 2016, 3(1). https://doi.org/10.1038/sdata.2016.18
19. Yang W., Fu R., Bilal Amin M., Kang B. The impact of modern AI in metadata management. Human-Centric Intelligent Systems, 2025, 5: 323–350. https://doi.org/10.1007/s44230-025-00106-5
20. Yulfitri A., Sensuse D. I., Ulum M. B., Achmad Y. F. Metadata management to accelerate Big Data implementation. Journal of Informatics and Communication Technology, 2025, 6(2). https://doi.org/10.52661/jict.v6i2.362
21. Zachlod C., Samuel O., Ochsner A., Werthmüller S. Analytics of social media data – state of characteristics and application. Journal of Business Research, 2022, 144: 1064–1076. https://doi.org/10.1016/j.jbusres.2022.02.016




