What is ChatGPT? ChatGPT, an artificial intelligence tool, has recently taken off, shining like lightning in the slightly dull tech world. It also seems to have lit a bright light to guide the direction of the investment community, and some business people are starting to “stir” inside.
In this post, you will learn How mankind achieved amazing AI in 66 years. Indeed, ChatGPT’s achievements are unprecedented! It is the fastest growing internet service ever, bar none, with 100 million users in just two months of launch. It is now built into Microsoft’s Bing search engine and is poised to take Google down a notch, and will probably lead to a change in the shape of search engines and the most significant turning point since its inception. But ChatGPT didn’t just come out of nowhere. This “chat” bot is one of the most sophisticated in a series of large language models in years. A brief history of ChatGPT reveals that it was preceded by numerous iterations of technology and theoretical developments that paved the way for its creation.
AI, a term that began at Dartmouth College in 1956, has gone through decades of “three ups and two downs”, with “winters” and “summers”: several major events have made the once-dormant research on artificial intelligence a hot topic again. The success of ChatGPT stems from the long-term accumulation of artificial intelligence technologies represented by deep learning. At Dartmouth in 1956, scientists such as John McCarthy, Marvin Minsky, Claude Shannon, Alan Newell, Herbert Simon and others were gathering to discuss the use of machines to mimic human learning and other aspects of intelligence. This year was hailed as the first year of artificial intelligence.
There are two main categories in the field of artificial intelligence, one is symbolic AI and the other is subsymbolic AI in the form of perceptual machines. The basic assumption of the former is that intelligent problems can be categorized as “symbolic reasoning” processes, a theory that can be traced back to the originator of computers, the French scientist Pascal and the German mathematician Leibniz, and the so-called intelligent machines that really embody this idea, which originated from the pioneering work of Charles Babbage and Alan Turing in England. Turing’s pioneering work.
The emergence of subsymbolic artificial intelligence is attributed to the rise of behaviorist cognitive theory, based on the idea of “stimulus-response theory. American neurophysiologists Warren McCulloch and Walter Pitts proposed the neuron model, and psychologist Frank Rosenblatt proposed the perceptron model, which laid the foundation for neural networks.
Early neural network technology follows the basic idea of “layer-by-layer progression and abstraction” of deep learning in artificial intelligence, and models such as MCP neurons, perceptron and feedforward neural networks emerged, which usually consist of multiple interconnected “neurons” that process information. They are usually composed of multiple interconnected “neurons” that process information and are inspired by the exchange of information between connected neurons in the human brain. In the 1950s and 1960s, artificial intelligence stagnated in both the symbolic algorithm and perceptual machine directions. Hubert Dreyfus, who taught at MIT and UC Berkeley, published his 1965 report “Alchemy and Artificial Intelligence,” which compared the neural network research being done at the time to historical alchemy and pointed out that climbing the treetops was not the same as climbing to the moon. “The Wright Hill Report was critical of symbolic AI at the time, arguing that “discoveries to date have not had the significant impact promised at the time” and that AI hit a low point for the first time. Expert systems and neural networks, which emerged in the 1980s, also did not achieve substantial breakthroughs because they were limited by computing power and understanding of intelligence, sending AI into a second trough. But since the 1980s, a big tree has been sown.
Understanding and using natural language is one of the biggest challenges facing artificial intelligence. Language is often ambiguous, extremely context-dependent, and often requires a great deal of common background knowledge among the parties communicating in it. As in other areas of AI, research related to natural language processing has focused on symbolic, rule-based approaches in the first decades, without very good results. Recurrent Neural Networks (RN) changed everything.
ChatGPT is based on a conversational version of the large language model GPT-3, which is a neural network trained on large amounts of text. Since text is composed of sequences of letters and words of different lengths, language models need a neural network that can “understand” this kind of data, and recurrent neural networks, invented in the 1980s, can handle word sequences. One problem, however, was that they were slow to train and could forget previous words in the sequence.
In 1997, computer scientists Sepp Hochreiter and Jürgen Schmidhuber solved this problem by inventing the long short-term memory (LSTM) network, a recurrent neural network with a special component that allows past data in the input sequence to be retained for a longer period of time. LSTMs can handle text strings several hundred words long, but they have limited linguistic skills.
On the eve of a major breakthrough in artificial intelligence for natural language, neural networks and machine learning had an “out-of-the-loop” event in 2016, when Google’s AlphaGo gave the world an artificial intelligence education by winning various Go tournaments. -Shane Legg, one of the founders of DeepMind, believes that AI beyond the human level will emerge around 2025, while Ray Kurzweil, a member of Google’s strategy committee, has proposed a shocking “singularity theory”, which suggests that in 2029, fully Turing-tested The theory is that intelligent machines will emerge in 2029, and that an explosion of intelligence based on strong artificial intelligence will occur in 2045. AlphaGo defeated Lee Seok and Ke Jie.
A team of Google researchers invented Transformer, a neural network that keeps track of where each word or phrase appears in a sequence, enabling the breakthrough behind today’s generation of large language models. The meaning of a single word usually depends on the meaning of other words that precede or follow it. By keeping track of this contextual information, Transformer can process longer strings of text and capture the meaning of words more accurately. For example, “hot dog” is used in the phrases “Hot dogs should be given plenty of water” and “Hot dogs should be eaten with mustard”. should be eaten with mustard” are very different. The famous Google paper that released Transformer. Transformer is able to compute data and train models simultaneously in parallel, with shorter training time and grammatical interpretation of the trained model, i.e. the model is interpretable. After training, Transformer was the most advanced deep learning model at that time as it reached the first place in the industry in various scores, including translation accuracy and syntactic analysis of English components. From the moment Transformer was born, it profoundly influenced the trajectory of the artificial intelligence field in the following years. In a few short years, the model’s influence has spread across all areas of AI – from a wide variety of natural language models to the AlphaFold2 model used to predict protein structure.
Less than a year after the Transformer was created, AI researcher OpenAI introduced the GPT-1 model with 117 million parameters, which stands for Generative Pre-training Transformer, i.e., a model based on the Transformer trained with large amounts of data. Transformer-based models trained with large amounts of data. The company wants to develop multi-skilled, general-purpose artificial intelligence and believes that large language models are a key step in achieving this goal.
GPT combines Transformer with unsupervised learning, a method for training machine learning models based on previously unannotated data. This allows the software to figure out patterns in the data on its own, without being told what it is looking at. Much of the previous success of machine learning has relied on supervised learning and annotated data, but manually labeling data is a slow task, thus limiting the size of the dataset available for training. GPT ultimately trained models that achieved better results than the base Transformer model in four language scenarios: question and answer, text similarity evaluation, semantic implication determination, and text classification, making it a new industry first. Microsoft invests a billion dollars in OpenAI. In the same year, OpenAI announced GPT-2, a model with 1.5 billion parameters, which has the same architecture as GPT-1, with the main difference that GPT-2 is much larger (10 times larger). At the same time, they published a paper introducing the model, “Language Models are Unsupervised Multitask Learners”.
In this work, they used their own collection of new datasets based on textual information from web pages. Not surprisingly, the GPT-2 model set a new record for scoring large language models on several language scenarios, causing even more of a stir. But OpenAI says they are so concerned that people will use GPT-2 “to produce deceptive, biased or abusive language” that they will not release the full model. GPT-2 was impressive, but OpenAI’s successor, GPT-3, made an even bigger splash by making a huge leap forward in the ability to generate human-like text. GPT-3 can answer questions, summarize documents, and generate stories in different styles. It has an incredible ability to imitate. One of the most remarkable takeaways is that GPT-3’s gains come from hyperscaling existing technologies rather than inventing new ones. With 175 billion parameters, GPT-3 is much larger than the first two GPT models: the base filtered full web crawler dataset (429 billion words), Wikipedia articles (3 billion words), and two different book datasets (67 billion words in total). Its model architecture is not fundamentally different from GPT-2.
GPT-3 came out without an extensive user interaction interface and required users to submit an application that was approved before they could register, so not many people had direct experience with the GPT-3 model. After early testing, OpenAI commercialized GPT-3: paying users could connect to GPT-3 through an application program interface (API) and use the model to complete desired language tasks. in September 2020, Microsoft acquired an exclusive license for the GPT-3 model, meaning that Microsoft had exclusive access to the GPT-3 source code.
Meanwhile, the shortcomings of the previous generation were further amplified when Timnit Gebru, co-director of Google’s Artificial Intelligence Ethics team, co-authored a paper highlighting the potential hazards associated with large language models that was unpopular with senior managers within the company. in December 2020, Gebru was fired. OpenAI used this data to fine-tune GPT-3 with supervised training, collected a sample of answers generated by the fine-tuned model, continued to optimize the fine-tuned language model using the reward model and more labeled data, and iterated on it. InstructGPT is better at following human instructions and produces less offensive language, less error messages, and fewer overall errors. A common problem with large language models is the cost of training them, making it possible for only the wealthiest labs to create one. This raises concerns that such powerful AI is being developed in secret by small corporate teams without proper scrutiny or input from the broader research community. In response, a number of collaborative projects developed large language models and released them for free to any researcher who wanted to study and improve the technology. meta built and gave OPT, a refactoring of GPT-3. hugging Face led a consortium of about 1,000 volunteer researchers to build and release BLOOM.
Finally, in December 2022, ChatGPT will be available. Similar to the InstructGPT model, ChatGPT is a conversational bot developed by OpenAI after fine-tuning the GPT-3 model, which, according to the OpenAI website, is a sister model to InstructGPT. Like InstructGPT, ChatGPT was trained using reinforcement learning on feedback from human testers who rated its performance as fluent, accurate and harmless conversationalists. Since then, 100 million people worldwide have been chatting with it.
Examples of conversations users have posted on social media show that ChatGPT can perform a range of common text output-based tasks including writing code, correcting code, translating literature, writing novels, writing business copy, creating recipes, doing homework, evaluating assignments, etc. One thing that makes ChatGPT better than GPT-3 is that the former is more like a conversation with the user when answering, while the latter is better at producing long The former is better than GPT-3 in that the former is more like a conversation with the user when answering, while the latter is better at producing long articles and lacks colloquial expressions. After ChatGPT became popular overnight, it sparked great concern worldwide, and some industry insiders believed it would affect areas including the search engine, advertising, and education industries. in December 2022, Google issued a red alert internally and began an emergency response. In an exclusive interview with TIME, ChatGPT replied: I still have many limitations, but humans should be ready to deal with AI.