1994年属什么生肖| 梦见修坟墓是什么预兆| 什么是钓鱼执法| 成人发烧吃什么药| 孕妇血糖高对胎儿有什么影响| 广州五行属什么| 支原体感染吃什么药好| 清一色是什么意思| 万言万当不如一默是什么意思| 国帑是什么意思| 盘是什么意思| 肉包子打狗的歇后语是什么| 端游什么意思| 空腹吃荔枝有什么危害| 沦落什么意思| 摩羯座的幸运花是什么| 胃病吃什么药| 梦见盗墓是什么意思| 精液发黄是什么原因| 直肠炎吃什么药好的快| 什么病需要化疗| 厨房墙砖什么颜色好看| 指甲上有竖纹是什么原因| 一月六号是什么星座| 掉头发多是什么原因| ppsu是什么材质| 小腿抽筋吃什么药| 蛋白粉什么味道| 8月8号什么星座| 高压高低压正常是什么原因| nothomme什么牌子| 中度贫血是什么原因造成的| 肖像是什么意思| sany是什么牌子| 婴儿第一次理发有什么讲究吗| 梦见剪头发预示什么| 忽悠什么意思| 感冒喝什么| 吃什么降血脂最快最好| 喝茶有什么好处和坏处| 女性气血不足吃什么调理| 股骨长径是指胎儿什么| 尿路感染检查什么项目| 羊排和什么一起炖好吃| 镶什么牙实惠耐用| 宝宝积食吃什么药| 解约是什么意思| 海肠是什么东西| ccb是什么药物| 鸡和什么属相相冲| 化痰祛痰吃什么药| 什么是川崎病| 产假什么时候开始休| 西游记什么时候写的| 黑五是什么| 现在是什么时间| p波高尖代表什么| 希腊用什么货币| pin是什么意思| 荨麻疹用什么药膏| 人鱼小姐大结局是什么| 小号避孕套是什么尺寸| 扁桃体化脓是什么原因引起的| 有加有减先算什么| 脑白质病变吃什么药| 什么好像什么造句| 湿热吃什么食物好| 靶向药物是什么| 白细胞低吃什么| 智齿为什么叫智齿| 为什么不建议做肠镜| 什么颜色显黑| 下饭菜都有什么菜| 头发油是什么原因| ckd3期是什么意思| 印堂发红是什么的征兆| 胃食管反流有什么症状| 扁平疣是什么病| 布鲁氏菌病是什么病| 泉肌症是什么病| 什么情况下喝补液盐| 头晕目眩是什么病的征兆| 脑梗什么原因导致的| 孕妇耻骨疼是什么原因| 口腔老是出血是什么原因| 俺是什么意思| 梨状肌综合征吃什么药| 乳糖不耐受吃什么奶粉| 忠诚是什么意思| 入职体检前要注意什么| 舍友什么意思| 头晕目眩是什么病的征兆| 遥祝是什么意思| 一什么种子| 劳动的反义词是什么| 脓毒症是什么病| 儿童舌系带短挂什么科| 奥肯能胶囊是什么药| 鸭子吃什么| 金骏眉属于什么茶类| 耳后长痣代表什么意思| 猫睡在枕头旁说明什么| 佃农是什么意思| 白细胞偏低是什么病| 梅毒病有什么症状| 胃病不能吃什么| 小孩眼屎多是什么原因引起的| 绿茶不能和什么一起吃| 艾滋病有什么症状| 饱和脂肪是什么意思| vgr100是什么药| 97年是什么生肖| 妇科炎症用什么药好| 口腔溃疡是什么引起的| 中度抑郁症吃什么药| 2月是什么星座的| 身份证更换需要带什么| 胸膜炎吃什么消炎药| v店是什么| 单核细胞比率偏高说明什么| 知我者莫若你什么意思| 祥五行属什么| 大姐大是什么意思| 头上长痘痘是什么原因| 红十字会是干什么的| 健康管理是什么| 肠道感染是什么原因引起的| 藏红花是什么| 口腔溃疡不能吃什么| 鼠是什么命| 崩盘是什么意思| 大骨节病是一种什么病| 乂是什么意思| 滑肠是什么意思| 老豆是什么意思| 一九九七年属什么生肖| 肝阴不足吃什么中成药| 来月经是黑色的是什么原因| 1998属什么生肖| 捡到黄金是什么预兆| 介入科是什么科室| 接亲是什么意思| 直捣黄龙是什么意思| 推手是什么意思| 龟毛的性格指什么性格| 字母圈是什么意思| 叶酸什么时候吃合适| 冈本是什么| 水准仪是测量什么的| 传媒公司主要做什么| 没出息什么意思| 人心叵测是什么意思| 营业员是什么| 氯胺酮是什么| 周杰伦的粉丝叫什么| 孔子名叫什么| 眼压高是什么症状| 胎盘2级是什么意思| 空字五行属什么| 孕妇上火什么降火最快| 氨纶是什么面料优缺点| 寻麻疹看什么科| 肌酐是检查什么的| 补钙过量有什么害处| 糖尿病患者可以吃什么水果| 川字加一横是什么字| 膝盖怕冷是什么原因| 淋巴结节吃什么药最好| 山竹什么时候吃是应季| 肾衰竭吃什么好| 送病人什么礼物好| 脾虚吃什么药效果最好| 夜盲症是什么症状| 食客是什么意思| 来事吃什么水果好| 小鸟站在高压线上为什么不会触电| 养生吃什么最好| 喝酒过敏是什么原因| 赛字五行属什么| 旮旯是什么意思| 松子是什么树的果实| 吃什么对甲状腺有好处| 梦见下雪了是什么意思| m是什么| 卵巢畸胎瘤是什么病| 凤冈锌硒茶属于什么茶| 曲克芦丁片治什么病| 无花果什么时候成熟| 体质指数是什么意思| 20至30元什么烟最好抽| 愿君多采撷是什么意思| 40不惑是什么意思| 红茶是什么茶| 胳肢窝疼痛是什么原因| 黄花胶是什么鱼的胶| fy是什么意思| 虎配什么生肖最好| 二月十八是什么星座| 老花眼是什么原因引起的| 做梦梦到捡钱是什么征兆| 冰妹是什么意思| 吃什么能消除子宫肌瘤| 高泌乳素血症是什么原因引起的| 幽门螺杆菌什么药最好| 7d是什么意思| hcr是什么意思| 胰腺管扩张是什么原因| 梅毒症状男有什么表现| 什么书比较好| 早上出虚汗是什么原因| 什么叫高危性行为| 八月一日是什么节日| 姜子牙是什么神仙| 桦树茸有什么功效| 什么是地包天牙齿| 口臭是什么病| 属猪的本命佛是什么佛| 第四个手指叫什么| 五味杂粮什么意思| 舍我其谁是什么意思| 月经来了同房会导致什么后果| 肌肉痛吃什么药| 股骨头在什么位置| hr是什么职业| 俄狄浦斯情结是什么意思| 风疹病毒igg阳性是什么意思| 一岁宝宝口臭是什么原因引起的| 乐不思蜀是什么意思| 仙草是什么草| 查颈椎挂什么科| bv是什么品牌| 老是肚子饿是什么原因| 什么是再生障碍性贫血| 梦见父亲去世预示什么| 从头再来什么意思| 什么食物含有维生素d| 痘痘反复长是什么原因| 0中间有一横是什么字体| 梦见别人怀孕是什么意思| 看痘痘挂什么科| 支线是什么意思| 4月25号是什么星座| 警察为什么叫条子| 活血化瘀吃什么药| 勾芡是什么意思| 白茶和绿茶有什么区别| 为什么叫汉族| 鳄鱼的天敌是什么动物| 月经期后是什么期| 指甲脱层是什么原因| 理疗师是做什么的| 简直了是什么意思| hcg偏高是什么原因| 走马观花是什么意思| 什么是自慰| 女生为什么会喷水| 什么叫室性早搏| 腰椎间盘突出有什么症状| 弹性工作是什么意思| 为什么医生都不体检| 依赖一个人是什么意思| 琪是什么意思| 白头发多吃什么食物能变黑| 愚孝什么意思| 百度

陈新有:"中国制造2025"与"军民深度融合"相得益彰

百度 (黄建忠)(责编:刘天宇(实习生)、张雨)

April 4, 2022

Posted by Sharan Narang and Aakanksha Chowdhery, Software Engineers, Google Research

In recent years, large neural networks trained for language understanding and generation have achieved impressive results across a wide range of tasks. GPT-3 first showed that large language models (LLMs) can be used for few-shot learning and can achieve impressive results without large-scale task-specific data collection or model parameter updating. More recent LLMs, such as GLaM, LaMDA, Gopher, and Megatron-Turing NLG, achieved state-of-the-art few-shot results on many tasks by scaling model size, using sparsely activated modules, and training on larger datasets from more diverse sources. Yet much work remains in understanding the capabilities that emerge with few-shot learning as we push the limits of model scale.

Last year Google Research announced our vision for Pathways, a single model that could generalize across domains and tasks while being highly efficient. An important milestone toward realizing this vision was to develop the new Pathways system to orchestrate distributed computation for accelerators. In “PaLM: Scaling Language Modeling with Pathways”, we introduce the Pathways Language Model (PaLM), a 540-billion parameter, dense decoder-only Transformer model trained with the Pathways system, which enabled us to efficiently train a single model across multiple TPU v4 Pods. We evaluated PaLM on hundreds of language understanding and generation tasks, and found that it achieves state-of-the-art few-shot performance across most tasks, by significant margins in many cases.

As the scale of the model increases, the performance improves across tasks while also unlocking new capabilities.

Training a 540-Billion Parameter Language Model with Pathways

PaLM demonstrates the first large-scale use of the Pathways system to scale training to 6144 chips, the largest TPU-based system configuration used for training to date. The training is scaled using data parallelism at the Pod level across two Cloud TPU v4 Pods, while using standard data and model parallelism within each Pod. This is a significant increase in scale compared to most previous LLMs, which were either trained on a single TPU v3 Pod (e.g., GLaM, LaMDA), used pipeline parallelism to scale to 2240 A100 GPUs across GPU clusters (Megatron-Turing NLG) or used multiple TPU v3 Pods (Gopher) with a maximum scale of 4096 TPU v3 chips.

PaLM achieves a training efficiency of 57.8% hardware FLOPs utilization, the highest yet achieved for LLMs at this scale. This is due to a combination of the parallelism strategy and a reformulation of the Transformer block that allows for attention and feedforward layers to be computed in parallel, enabling speedups from TPU compiler optimizations.

PaLM was trained using a combination of English and multilingual datasets that include high-quality web documents, books, Wikipedia, conversations, and GitHub code. We also created a “lossless” vocabulary that preserves all whitespace (especially important for code), splits out-of-vocabulary Unicode characters into bytes, and splits numbers into individual tokens, one for each digit.


Breakthrough Capabilities on Language, Reasoning, and Code Tasks

PaLM shows breakthrough capabilities on numerous very difficult tasks. We highlight a few examples for language understanding and generation, reasoning, and code-related tasks below.


Language Understanding and Generation

We evaluated PaLM on 29 widely-used English natural language processing (NLP) tasks. PaLM 540B surpassed few-shot performance of prior large models, such as GLaM, GPT-3, Megatron-Turing NLG, Gopher, Chinchilla, and LaMDA, on 28 of 29 of tasks that span question-answering tasks (open-domain closed-book variant), cloze and sentence-completion tasks, Winograd-style tasks, in-context reading comprehension tasks, common-sense reasoning tasks, SuperGLUE tasks, and natural language inference tasks.

PaLM 540B performance improvement over prior state-of-the-art (SOTA) results on 29 English-based NLP tasks.

In addition to English NLP tasks, PaLM also shows strong performance on multilingual NLP benchmarks, including translation, even though only 22% of the training corpus is non-English.

We also probe emerging and future capabilities of PaLM on the Beyond the Imitation Game Benchmark (BIG-bench), a recently released suite of more than 150 new language modeling tasks, and find that PaLM achieves breakthrough performance. We compare the performance of PaLM to Gopher and Chinchilla, averaged across a common subset of 58 of these tasks. Interestingly, we note that PaLM’s performance as a function of scale follows a log-linear behavior similar to prior models, suggesting that performance improvements from scale have not yet plateaued. PaLM 540B 5-shot also does better than the average performance of people asked to solve the same tasks.

Scaling behavior of PaLM on a subset of 58 BIG-bench tasks. 

PaLM demonstrates impressive natural language understanding and generation capabilities on several BIG-bench tasks. For example, the model can distinguish cause and effect, understand conceptual combinations in appropriate contexts, and even guess the movie from an emoji.

Examples that showcase PaLM 540B 1-shot performance on BIG-bench tasks: labeling cause and effect, conceptual understanding, guessing movies from emoji, and finding synonyms and counterfactuals.

Reasoning

By combining model scale with chain-of-thought prompting, PaLM shows breakthrough capabilities on reasoning tasks that require multi-step arithmetic or common-sense reasoning. Prior LLMs, like Gopher, saw less benefit from model scale in improving performance.

Standard prompting versus chain-of-thought prompting for an example grade-school math problem. Chain-of-thought prompting decomposes the prompt for a multi-step reasoning problem into intermediate steps (highlighted in yellow), similar to how a person would approach it.

We observed strong performance from PaLM 540B combined with chain-of-thought prompting on three arithmetic datasets and two commonsense reasoning datasets. For example, with 8-shot prompting, PaLM solves 58% of the problems in GSM8K, a benchmark of thousands of challenging grade school level math questions, outperforming the prior top score of 55% achieved by fine-tuning the GPT-3 175B model with a training set of 7500 problems and combining it with an external calculator and verifier.

This new score is especially interesting, as it approaches the 60% average of problems solved by 9-12 year olds, who are the target audience for the question set. We suspect that separate encoding of digits in the PaLM vocabulary helps enable these performance improvements.

Remarkably, PaLM can even generate explicit explanations for scenarios that require a complex combination of multi-step logical inference, world knowledge, and deep language understanding. For example, it can provide high quality explanations for novel jokes not found on the web.

PaLM explains an original joke with two-shot prompts.

Code Generation

LLMs have also been shown [1, 2, 3, 4] to generalize well to coding tasks, such as writing code given a natural language description (text-to-code), translating code from one language to another, and fixing compilation errors (code-to-code).

PaLM 540B shows strong performance across coding tasks and natural language tasks in a single model, even though it has only 5% code in the pre-training dataset. Its few-shot performance is especially remarkable because it is on par with the fine-tuned Codex 12B while using 50 times less Python code for training. This result reinforces earlier findings that larger models can be more sample efficient than smaller models because they transfer learning from both other programming languages and natural language data more effectively.

Examples of a fine-tuned PaLM 540B model on text-to-code tasks, such as GSM8K-Python and HumanEval, and code-to-code tasks, such as Transcoder.

We also see a further increase in performance by fine-tuning PaLM on a Python-only code dataset, which we refer to as PaLM-Coder. For an example code repair task called DeepFix, where the objective is to modify initially broken C programs until they compile successfully, PaLM-Coder 540B demonstrates impressive performance, achieving a compile rate of 82.1%, which outperforms the prior 71.7% state of the art. This opens up opportunities for fixing more complex errors that arise during software development.

An example from the DeepFix Code Repair task. The fine-tuned PaLM-Coder 540B fixes compilation errors (left, in red) to a version of code that compiles (right).

Ethical Considerations

Recent research has highlighted various potential risks associated with LLMs trained on web text. It is crucial to analyze and document such potential undesirable risks through transparent artifacts such as model cards and datasheets, which also include information on intended use and testing. To this end, our paper provides a datasheet, model card and Responsible AI benchmark results, and it reports thorough analyses of the dataset and model outputs for biases and risks. While the analysis helps outline some potential risks of the model, domain- and task-specific analysis is essential to truly calibrate, contextualize, and mitigate possible harms. Further understanding of risks and benefits of these models is a topic of ongoing research, together with developing scalable solutions that can put guardrails against malicious uses of language models.


Conclusion and Future Work

PaLM demonstrates the scaling capability of the Pathways system to thousands of accelerator chips across two TPU v4 Pods by training a 540-billion parameter model efficiently with a well-studied, well-established recipe of a dense decoder-only Transformer model. Pushing the limits of model scale enables breakthrough few-shot performance of PaLM across a variety of natural language processing, reasoning, and code tasks.

PaLM paves the way for even more capable models by combining the scaling capabilities with novel architectural choices and training schemes, and brings us closer to the Pathways vision:


“Enable a single AI system to generalize across thousands or millions of tasks, to understand different types of data, and to do so with remarkable efficiency."


Acknowledgements

PaLM is the result of a large, collaborative effort by many teams within Google Research and across Alphabet. We’d like to thank the entire PaLM team for their contributions: Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, and Jason Wei. PaLM builds on top of work by many, many teams at Google and we would especially like to recognize the T5X team, the Pathways infrastructure team, the JAX team, the Flaxformer team, the XLA team, the Plaque team, the Borg team, and the Datacenter networking infrastructure team. We’d like to thank our co-authors on this blog post, Alexander Spiridonov and Maysam Moussalem, as well as Josh Newlan and Tom Small for the images and animations in this blog post. Finally, we would like to thank our advisors for the project: Noah Fiedel, Slav Petrov, Jeff Dean, Douglas Eck, and Kathy Meier-Hellstern.

做健身教练有什么要求 茯苓和茯神有什么区别 ky是什么 因材施教什么意思 叔叔的儿子叫什么
检查怀孕要做什么检查 dollars是什么意思 胃寒吃什么药 吃什么能治结石 这是什么字
从胃到小腹连着疼是什么原因 发呆表情是什么意思 去湿气吃什么最好 喧宾夺主什么意思 濯清涟而不妖的濯是什么意思
crocs是什么牌子 为什么上课会犯困 米肠是什么做的 kolumb是什么牌子 痔疮看什么科室
蚂蚁属于什么动物hcv7jop6ns2r.cn 长沙有什么景点hcv7jop7ns2r.cn 秋天穿什么衣服hcv9jop6ns0r.cn 拉稀肚子疼是什么原因hcv9jop8ns2r.cn 灰指甲用什么药最好hcv8jop9ns4r.cn
双肺斑索是什么意思hcv7jop6ns1r.cn 不洁是什么意思hcv8jop4ns0r.cn 肤色是什么颜色hcv8jop6ns8r.cn 小圆细胞阳性什么意思hcv7jop4ns8r.cn 羊肉炖什么好吃hcv7jop9ns2r.cn
肝左叶囊肿是什么意思hcv8jop7ns3r.cn sid是什么hcv9jop6ns9r.cn 纹眉需要注意什么hcv8jop2ns0r.cn 黄瓜和什么搭配最好hcv8jop3ns9r.cn 无情是什么意思hcv9jop4ns8r.cn
肝区回声密集是什么意思hcv8jop3ns8r.cn 是什么原因导致肥胖hcv8jop9ns6r.cn 什么的麦田hcv8jop5ns2r.cn 奥美拉唑治什么胃病hcv9jop4ns5r.cn 阳字属于五行属什么aiwuzhiyu.com
百度