Typology of Tasks of Machine Analysis of Texts in Contemporary Sociology
DOI:
https://doi.org/10.29038/2306-3971-2021-02-41-48Keywords:
computational analysis of texts, content proximity analysis, topic modeling, sentiment analysisAbstract
This article considers the possibilities of using modern methods of word processing for sociological analysis. The main focus is on three tasks that we can currently solve using computational analysis of texts: analysis of semantic proximity, modeling of themes, and sentiment analysis. The methods discussed in this article have helped us to fully automate the semantic shifts in law enforcement-related words over the past twenty years. In recent years, the methods of processing natural language have progressed so much that it allows sociologists to automatically record the semantics of texts, compare them over time, and group based on similarity. It also allows us to scale the analysis of large arrays of documents, which opens a new page in the development of content analysis, in which we are approaching the abandonment of manual coding of documents, and researchers will be able to focus on study. We demonstrated these capabilities based on the news analysis from the resource «Ukrainska Pravda» for 2001–2020. We also grouped the news on the main topics of police reports in the publication materials and analyzed whether attitudes towards it changed during its existence.
References
Shekhovtsov, S., Chaplynskyi, D., Petriv, O. Tonal dictionary of the Ukrainian language. Retrieved March 28, 2021 from https://lang.org.ua/uk/dictionaries/
Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. arXiv. Retrieved August 19, 2020 from http://arxiv.org/abs/2008.09470
Bamler, R., Mandt, S. (2017). Dynamic word embeddings. 34th International Conference on Machine Learning, ICML 2017, 1, 607–621. Retrieved August 19, 2020 from http://arxiv.org/abs/1702.08359
Blei, D. M., Ng, A. Y., Edu, J. B. (2003). Latent Dirichlта інlocation Michael I. Jordan, Jan; Vol. 3, 993–1022.
Bobichev, V., Kanishcheva, O., Cherednichenko, O. (2017). Sentiment analysis in the Ukrainian and Russian news. 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON). doi: 10.1109/ ukrcon.2017.8100410
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Retrieved May 24, 2019 from http://arxiv.org/abs/1810.04805
Di Carlo, V., Bianchi, F., Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 6326–6334. doi: 10.1609/aaai.v33i01.33016326
DiMaggio, P., Nag, M., Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606. https://doi.org/10.1016/j.poetic.2013.08.004
Flores, R. D. (2017). Do Anti-Immigrant Laws Shape Public Sentiment? A Study of Arizona’s SB 1070 Using Twitter Data. American Journal of Sociology, 123(2), 333–384. https://doi.org/10.1086/692983
Harris, Z. S. (1954). Distributional structure, 10(2–3), 146–162. https://doi.org/10.1080/00437956.1954.11659520
Hofmann, T. (1999). Probabilistic latent semantic indexing. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’99. doi: 10.1145/312624.312649
Kozlowski, A. C., Taddy, M., Evans, J. A. (2018). The Geometry of Culture: Analyzing Meaning through Word Embeddings. American Sociological Review, 84(5), 905–949. https://doi.org/10.1177/0003122419877135
Lemke, M., Wiedemann, G. (2016). Text mining in den sozialwissenschaften. Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-07224-7
Lindstedt, N. C. (2019). Structural Topic Modeling For Social Scientists: A Brief Case Study with Social Movement Studies Literature, 2005–2017. Social Currents, 6(4), 307–318. https://doi.org/10.1177/2329496519846505
Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. Retrieved May 22, 2019 from http://arxiv.org/abs/1301.3781
Pennington, J., Socher, R., Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1162
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I. Language Models are Unsupervised Multitask Learners. Retrieved January 1, 2020 from https://github.com/codelucas/newspaper
Rothschild, J. E., Howat, A. J., Shafranek, R. M., Busby, E. C. (2019). Pigeonholing Partisans: Stereotypes of Party Supporters and Partisan Polarization. Political Behavior, 41(2), 423–443. https://doi.org/10.1007/s11109-018-9457-5
Stone, P. J., Dunphy, D. C., Smith, M. S., Ogilvie, D. M. (1966). The general inquirer: A computer approach to content analysis. MIT Press.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017–December, 5999–6009. Retrieved December 6, 2017 from http://arxiv.org/abs/1706.03762
Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H. (2018). Dynamic word embeddings for evolving semantic discovery. WSDM 2018 – Proceedings of the 11th ACM International Conference on Web Search and Data Mining, 2018–Febua, 673–681. https://doi.org/10.1145/3159652.3159703
Yin, W., Kann, K., Yu, M., Schütze, H. (2017). Comparative study of CNN and RNN for natural language processing. CoRR, abs/1702.01923. Retrieved February 7, 2017 from http://arxiv.org/abs/1702.01923
Zhang, H. (2019). Dynamic Word Embedding for News Analysis. UCLA. ProQuest ID: Zhang_ucla_0031N_18000. Merritt ID: ark:/13030/m5wh7p2f. Retrieved January 1, 2020 from https://escholarship.org/uc/item/9tp9g31f
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Roman Kyrychenko
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.