斯坦福大學開發(fā)了一個名叫“羊駝”的語言模型,這個新聞大不是因為斯坦福大學熱衷于羊駝,而是這個“羊駝”可能讓你身邊最便宜的小設(shè)備可以跑比ChatGPT強100倍的AI。
斯坦福微調(diào)了 7B LLaMA 模型,只用了 52K 的數(shù)據(jù),達到了和達芬奇003類似的效果(達芬奇003是openAI最引以為豪的GPT技術(shù),在OpenAI API上賣的最貴),并且可以跑在比如樹莓派(Raspberry Pi)的消費級設(shè)備上,而且有個家伙已經(jīng)跑通了。
引述大v Orange.ai的話是:“這個模型沒有經(jīng)過道德訓練,也就是會亂說觸犯各國人類禁忌的話。如果以后人手一個自己的本地語言模型,審查會完全失靈。它的訓練成本奇低,數(shù)據(jù)生成過程產(chǎn)生 52K 條獨特指令和相應(yīng)的輸出,使用 OpenAI API 的成本不到 500 美元。在 8 個 80GB A100 上微調(diào)一個 7B LLaMA 模型需要 3 個小時,這對大多數(shù)云計算提供商來說成本不到 100 美元。'
以下是羊駝的詳細(亦可移步github,https://github.com/tatsu-lab/stanford_alpaca)
This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:
這是 Stanford Alpaca 項目的 repo,該項目旨在構(gòu)建和共享指令遵循的 LLaMA 模型?;刭彴?/span>
A web demo to interact with our Alpaca model
與我們的羊駝模型交互的網(wǎng)絡(luò)演示
The 52K data used for fine-tuning the model
用于微調(diào)模型的 52K 數(shù)據(jù)
The code for generating the data
生成數(shù)據(jù)的代碼
The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section. In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the text-davinci-003
model on the Self-Instruct instruction-following evaluation suite [2].
當前的羊駝模型是根據(jù) 7B LLaMA 模型 [1] 在 Self-Instruct [2] 論文中的技術(shù)生成的 52K 指令跟隨數(shù)據(jù)上進行微調(diào)的,我們將在下一節(jié)中討論一些修改。在初步的人類評估中,我們發(fā)現(xiàn) Alpaca 7B 模型在 Self-Instruct 指令遵循評估套件 [2] 上的行為類似于 text-davinci-003
模型。
Alpaca is still under development, and there are many limitations that have to be addressed. Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless. We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.
Alpaca 仍在開發(fā)中,有許多限制需要解決。重要的是,我們還沒有微調(diào)羊駝模型使其安全無害。因此,我們鼓勵用戶在與羊駝互動時保持謹慎,并報告任何相關(guān)行為,以幫助提高模型的安全性和道德考慮。
Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.
我們的初始版本包含數(shù)據(jù)生成過程、數(shù)據(jù)集和訓練方法。如果 LLaMA 的創(chuàng)建者允許我們這樣做,我們打算發(fā)布模型權(quán)重。目前,我們選擇舉辦現(xiàn)場演示,以幫助讀者更好地了解 Alpaca 的能力和局限性,同時也是一種幫助我們更好地評估 Alpaca 在更廣泛受眾中的表現(xiàn)的方式。
Please read our release blog post for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process of an open-source release.
請閱讀我們的發(fā)布博文,了解有關(guān)該模型的更多詳細信息、我們對羊駝毛模型的潛在危害和局限性的討論,以及我們對開源發(fā)布的思考過程。
[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1
[1]:LLaMA:開放高效的基礎(chǔ)語言模型。 Hugo Touvron、Thibaut Lavril、Gautier Izacard、Xavier Martinet、Marie-Anne Lachaux、Timothée Lacroix、Baptiste Rozière、Naman Goyal、Eric Hambro、Faisal Azhar、Aurelien Rodriguez、Armand Joulin、Edouard Grave、Guillaume Lample。 https://arxiv.org/abs/2302.13971v1
[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560
[2]:自指導:將語言模型與自生成指令對齊。 Yizhong Wang、Yeganeh Kordi、Swaroop Mishra、Alisa Liu、Noah A. Smith、Daniel Khashabi、Hannaneh Hajishirzi。 https://arxiv.org/abs/2212.10560
alpaca_data.json
contains 52K instruction-following data we used for fine-tuning the Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields:alpaca_data.json
包含我們用于微調(diào)羊駝模型的 52K 指令跟隨數(shù)據(jù)。這個 JSON 文件是一個字典列表,每個字典包含以下字段:
instruction
: str
, describes the task the model should perform. Each of the 52K instructions is unique.instruction
: str
,描述了模型應(yīng)該執(zhí)行的任務(wù)。 52K 條指令中的每一條都是唯一的。
input
: str
, optional context or input for the task. For example, when the instruction is 'Summarize the following article', the input is the article. Around 40% of the examples have an input.input
: str
,任務(wù)的可選上下文或輸入。例如,當指令是“總結(jié)以下文章”時,輸入就是文章。大約 40% 的示例有輸入。
output
: str
, the answer to the instruction as generated by text-davinci-003
.output
: str
,由 text-davinci-003
生成的指令的答案。
We used the following prompts for fine-tuning the Alpaca model:
我們使用以下提示來微調(diào)羊駝模型:
for examples with a non-empty input field:
對于具有非空輸入字段的示例:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
for examples with an empty input field:
對于輸入字段為空的示例:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
We built on the data generation pipeline from self-instruct and made the following modifications:
我們建立在自我指導的數(shù)據(jù)生成管道之上,并進行了以下修改:
We used text-davinci-003
to generate the instruction data instead of davinci
.
我們使用 text-davinci-003
而不是 davinci
來生成指令數(shù)據(jù)。
We wrote a new prompt (prompt.txt
) that explicitly gave the requirement of instruction generation to text-davinci-003
.
我們編寫了一個新的提示符( prompt.txt
),明確將生成指令的要求交給了 text-davinci-003
。
We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
我們采用了更積極的批量解碼,即一次生成 20 條指令,這顯著降低了數(shù)據(jù)生成的成本。
We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
我們通過丟棄分類和非分類指令之間的差異來簡化數(shù)據(jù)生成管道。
We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].
我們只為每條指令生成一個實例,而不是 [1] 中的 2 到 3 個實例。
This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by self-instruct. We plot the below figure (in the style of Figure 2 in the self-instruct paper to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.
這產(chǎn)生了一個指令跟隨數(shù)據(jù)集,其中包含 52K 個示例,并且成本要低得多(不到 500 美元)。在初步研究中,我們還發(fā)現(xiàn)我們的 52K 生成數(shù)據(jù)比自我指導發(fā)布的數(shù)據(jù)更加多樣化。我們繪制了下圖(采用自我指導論文中圖 2 的樣式,以展示我們數(shù)據(jù)的多樣性。圖的內(nèi)圈代表指令的詞根動詞,外圈代表直接賓語。
We fine-tune our model using standard huggingface training code with the following hyperparameters:
我們使用具有以下超參數(shù)的標準 huggingface 訓練代碼微調(diào)我們的模型:
Hyperparameter 超參數(shù) | Value |
---|---|
Batch size 批量大小 | 128 |
Learning rate 學習率 | 2e-5 |
Epochs | 3 |
Max length 最長長度 | 512 |
Weight decay 重量衰減 | 1 |
We are waiting for huggingface to officially support the llama models (i.e. this PR to be merged) before we release a stable version of the finetuning code.
在我們發(fā)布穩(wěn)定版本的微調(diào)代碼之前,我們正在等待 huggingface 正式支持 llama 模型(即此 PR 將被合并)。
All grad students below contributed equally and the order is determined by random draw.
以下所有研究生貢獻均等,順序由隨機抽簽決定。
Rohan Taori Rohan Taori
Ishaan Gulrajani
Tianyi Zhang
Yann Dubois
Xuechen Li Xuechen Li
All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.
所有建議均由 Tatsunori B. Hashimoto 提供。 Percy Liang 也為 Yann 提供建議,而 Carlos Guestrin 也為 Xuechen 提供建議。
Please cite the repo if you use the data or code in this repo.
如果您使用此 repo 中的數(shù)據(jù)或代碼,請引用 repo。
@misc{alpaca,
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
title = {Stanford Alpaca: An Instruction-following LLaMA model},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].
當然,您還應(yīng)該引用原始的 LLaMA 論文 [1] 和 Self-Instruct 論文 [2]。
答觀眾問(工具分享)
有朋友問為什么我可以拿到第一手的AI新聞,我猜可能我們的習慣不同,我個人喜歡開很多個瀏覽器Tab,看看都有什么更新,谷歌原生瀏覽器確實比較消耗資源,你可以試試看其他的一些瀏覽器,比如sigmaOS、Sidekick等,他們消耗資源少,tab布局更加緊湊,并且基于Chrome可以裝Chrome插件。
以下是使用Sidekick邀請鏈接
https://join.meetsidekick.com/hgike
聯(lián)系客服