About – My blog

1

By reading the relevant articles on this board, firstly based on ‘ Weizenbaum’s nightmares: how the inventor of the first chatbot turned against AI’ to learn about the two main schools of thought about the dangers of AI in the world today:(Tarnoff, 2023) the first, influenced by Weizenbaum, focuses on the risks posed in the present.” how large language models of the kind that sit beneath ChatGPT can echo regressive viewpoints, like racism and sexism, because they are trained on data. The second is the “existential risk” of possible future harm or of AI reaching a state beyond human control, where it becomes so intelligent that it eventually annihilates humans.(Tarnoff, 2023)

The paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” was then analysed for an in-depth understanding of the dangers in the development of large-scale language models, and the following information was extracted: (Weil, 2023)

The impact on the environmen

The article opens with the impact on the environment of human existence as a whole, and the need for large-scale language modelling to take into account the environmental impacts that this entails. The most important reason for this is that the development of the model began at a time of unprecedented environmental change around the world. “From monsoons caused by changes in rainfall patterns due to climate change affecting more than 8 million people in India,9 to the worst fire season on record in Australia killing or displacing nearly three billion animals and at least 400 people,10 the effect of climate change continues to set new records every year. It is past time for researchers to prioritize energy efficiency and cost to reduce negative environmental impact and inequitable access to resources — both of which disproportionately affect people who are already in marginalized positions.”

Language to destabilize dominant narratives

The author then describes how “A central aspect of social movement formation involves using language strategically to destabilize dominant narratives and call attention to underrepresented social perspectives. call attention to underrepresented social perspectives.”, i.e. the importance of large-scale language modelling and its immediate ideological impact on society is emphasised. Why does systemic language create problems of misuse? Because poorly documented social movements that do not receive media attention are not captured and reported by the media, and those that are reported tend to be dramatic or violent.LMs are not likely to produce more positive information outputs by learning the above information content, as most e-books with positive feedback are protected by copyright, and Chatgpt is not able to collect data on them. And at this stage it has been confirmed that large LMs have shown a variety of biases, including stereotypes or negative images of particular groups.

Human bias with seemingly coherent language

Based on the web content referred to in the appeal, large LMs can, because of their own features such as automated generation of compositions, mix human bias with seemingly coherent language, thus increasing the likelihood of amplification of automated bias, intentional misuse, and hegemonic worldviews. “Technology makers assume that their reality accurately represents the world, thus creating many different types of problems. ChatGPT’s training data is believed to include most or all of Wikipedia, pages linked from Reddit, and a billion words crawled from the Internet. (It can’t include e-book copies of all the Stanford Library’s content, for example, because books are protected by copyright law. The people who wrote all those words on the internet exaggerate whiteness. They overrepresent men. They overestimate wealth. More importantly, we all know what’s on the Internet: a vast swamp of racism, sexism, homophobia, Islamophobia, neo-Nazism.”

A stochastic parrot

A stochastic parrot: a random patchwork of a large number of trained observations. Since LM-generated text is not based on communicative intent nor does it engage in understanding the world or the reader’s mental state, nor does it share ideas, it leads to the illusion that this intelligent communication that is thought to exist is an illusion.

2

Based on the textual risks generated by appealing to the LM, in the following the article proposes specific behavioural risks generated in this way, both in personal and societal terms.

Reproduce and amplify the prejudice

Because aspects of racism, misogyny, and ableism in society are over-represented in the training data for LM, these training data already contain coding biases. When it is foreseen that the LM that generates the text will reproduce and even amplify the prejudice, more stereotypes will be introduced when people disseminate the newly generated text from the LM. Either way, harm will ensue.

Damages the reputation of any person or organisation

Because biased content also damages the reputation of any person or organisation that is perceived as a text.

Gain access to clues for their bad purposes

where a sufficiently motivated user could use it to gain access to clues in the training data for their own purposes, albeit potentially illegitimate ones.

Although Birhane and Prabhu note, “Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy.”In the context of anticipating large-scale linguistic risk, the authors suggest that “In summary, LMs trained on large, uncurated, static datasets from the Web encode hegemonic views that are harmful to marginalised populations. We thus emphasise the need to invest significant resources into curating and documenting LM training data. In this, we follow Jo et al., who cite archival history data collection methods as an example of the amount of resources that should be dedicated to this process, and and Birhane and Prabhu, who call for a more justice-oriented data collection methodology.”

Personal viewpoint

Studying under this knowledge board, especially after carefully reading this paper and related materials, I have a new perspective and understanding of the so-called chatgpt. This reminds me of my last final exam, when two boys behind me were discussing the use of chatgpt in the revision situation, and one of them said: you know, searching for the answer to a question on chatgpt, the answer you get from searching at different times will never be the same, and this led to a little bit of a bad revision situation for me.

I didn’t have any deep knowledge of this conversation at the time because to me it was taken for granted because that’s the way the system is set up, it’s a given fact, and it’s up to us as users of it to follow that given fact. But through studying this chapter I have learnt that of course that is one of the points, but the most important issue is that the text output is not fully intelligent; as mentioned in the article: an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning. can explain this problem directly.

The issues to be addressed

The authors conclude by saying that “unsupervised pre-training is an integral part of many language understanding systems.”However, it has to be considered that the resulting investment in personnel and the collation of training data is a very challenging matter; also the circumstances under which the LM has to be able to keep pace with real-world developments is a matter of great consideration; in the event that the machine does not have the output of the conscious activity of communicating with a human being, and unilaterally thinks in the direction of the human being, the emergence of an intelligent text is not a true AI.

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Tarnoff, B., 2023. Weizenbaum’s nightmares: how the inventor of the first chatbot turned against AI. The Guardian.
Weil, E., 2023. You Are Not a Parrot [WWW Document]. Intelligencer. URL https://nymag.com/intelligencer/article/ai-artificial-intelligence-chatbots-emily-m-bender.html (accessed 12.15.23).