Typology of Tasks of Machine Analysis of Texts in Contemporary Sociology

  • Roman Kyrychenko Taras Shevchenko National University of Kyiv
Keywords: computational analysis of texts, content proximity analysis, topic modeling, sentiment analysis


This article considers the possibilities of using modern methods of word processing for sociological analysis. The main focus is on three tasks that we can currently solve using computational analysis of texts: analysis of semantic proximity, modeling of themes, and sentiment analysis. The methods discussed in this article have helped us to fully automate the semantic shifts in law enforcement-related words over the past twenty years. In recent years, the methods of processing natural language have progressed so much that it allows sociologists to automatically record the semantics of texts, compare them over time, and group based on similarity. It also allows us to scale the analysis of large arrays of documents, which opens a new page in the development of content analysis, in which we are approaching the abandonment of manual coding of documents, and researchers will be able to focus on study. We demonstrated these capabilities based on the news analysis from the resource «Ukrainska Pravda» for 2001–2020. We also grouped the news on the main topics of police reports in the publication materials and analyzed whether attitudes towards it changed during its existence.


Shekhovtsov, S., Chaplynskyi, D., Petriv, O. Tonal dictionary of the Ukrainian language. Retrieved March 28, 2021 from https://lang.org.ua/uk/dictionaries/

Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. arXiv. Retrieved August 19, 2020 from http://arxiv.org/abs/2008.09470

Bamler, R., Mandt, S. (2017). Dynamic word embeddings. 34th International Conference on Machine Learning, ICML 2017, 1, 607–621. Retrieved August 19, 2020 from http://arxiv.org/abs/1702.08359

Blei, D. M., Ng, A. Y., Edu, J. B. (2003). Latent Dirichlта інlocation Michael I. Jordan, Jan; Vol. 3, 993–1022.

Bobichev, V., Kanishcheva, O., Cherednichenko, O. (2017). Sentiment analysis in the Ukrainian and Russian news. 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON). doi: 10.1109/ ukrcon.2017.8100410

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Retrieved May 24, 2019 from http://arxiv.org/abs/1810.04805

Di Carlo, V., Bianchi, F., Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 6326–6334. doi: 10.1609/aaai.v33i01.33016326

DiMaggio, P., Nag, M., Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606. https://doi.org/10.1016/j.poetic.2013.08.004

Flores, R. D. (2017). Do Anti-Immigrant Laws Shape Public Sentiment? A Study of Arizona’s SB 1070 Using Twitter Data. American Journal of Sociology, 123(2), 333–384. https://doi.org/10.1086/692983

Harris, Z. S. (1954). Distributional structure, 10(2–3), 146–162. https://doi.org/10.1080/00437956.1954.11659520

Hofmann, T. (1999). Probabilistic latent semantic indexing. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’99. doi: 10.1145/312624.312649

Kozlowski, A. C., Taddy, M., Evans, J. A. (2018). The Geometry of Culture: Analyzing Meaning through Word Embeddings. American Sociological Review, 84(5), 905–949. https://doi.org/10.1177/0003122419877135

Lemke, M., Wiedemann, G. (2016). Text mining in den sozialwissenschaften. Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-07224-7

Lindstedt, N. C. (2019). Structural Topic Modeling For Social Scientists: A Brief Case Study with Social Movement Studies Literature, 2005–2017. Social Currents, 6(4), 307–318. https://doi.org/10.1177/2329496519846505

Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. Retrieved May 22, 2019 from http://arxiv.org/abs/1301.3781

Pennington, J., Socher, R., Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1162

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I. Language Models are Unsupervised Multitask Learners. Retrieved January 1, 2020 from https://github.com/codelucas/newspaper

Rothschild, J. E., Howat, A. J., Shafranek, R. M., Busby, E. C. (2019). Pigeonholing Partisans: Stereotypes of Party Supporters and Partisan Polarization. Political Behavior, 41(2), 423–443. https://doi.org/10.1007/s11109-018-9457-5

Stone, P. J., Dunphy, D. C., Smith, M. S., Ogilvie, D. M. (1966). The general inquirer: A computer approach to content analysis. MIT Press.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017–December, 5999–6009. Retrieved December 6, 2017 from http://arxiv.org/abs/1706.03762

Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H. (2018). Dynamic word embeddings for evolving semantic discovery. WSDM 2018 – Proceedings of the 11th ACM International Conference on Web Search and Data Mining, 2018–Febua, 673–681. https://doi.org/10.1145/3159652.3159703

Yin, W., Kann, K., Yu, M., Schütze, H. (2017). Comparative study of CNN and RNN for natural language processing. CoRR, abs/1702.01923. Retrieved February 7, 2017 from http://arxiv.org/abs/1702.01923

Zhang, H. (2019). Dynamic Word Embedding for News Analysis. UCLA. ProQuest ID: Zhang_ucla_0031N_18000. Merritt ID: ark:/13030/m5wh7p2f. Retrieved January 1, 2020 from https://escholarship.org/uc/item/9tp9g31f