罗布斯 - 生活无捷径，每步皆机会。

# 背景在许多问答应用程序中，我们希望允许用户进行来回的对话，这意味着应用程序需要对过去的问题和答案进行某种 “记忆”，并将这些问题和答案纳入当前思维中。需要做到两件事：提示：更新我们的提示，以支持历史消息作为输入。情境化问题：添加一个子链，它接受最新的用户问题，并在聊天历史的上下文中重新表达。这是需要的，以防最新的问题引用了过去消息中的某些上下文。例如，如果用户问一个后续问题，如 “你能详细说明第二点吗？" ，如果没有先前消息的上下文，这是无法理解的。因此，我们不能有效地执行检索这样的问题。 # 实战代码 # 所需依赖 12345678# BeautifulSoup (通常缩写为：BS4) 用于解析 HTML 和 XML 文件。它提供了从 HTML 和 XML 文档中提取数据的功能，以及对解析树进行导航的工具。import bs4from langchain import hubfrom langchain_community.document_loaders import WebBaseLoaderfrom langchain_communit ...

LLM

LangChain

未读

Q&A RAG 快速入门

发表于2024-04-082024-07-18 AI LLM 大语言模型 LangChain RAG

# 什么是 RAG RAG 全称：Retrieval-Augmented Generation RAG 是一种使用额外数据增强 LLM 知识的技术。最强大的应用程序 LLMs 之一是复杂的问答（Q&A）聊天机器人。这些应用程序可以回答有关特定源信息的问题。这些应用程序使用一种称为检索增强生成（RAG）的技术。 LLMs 可以对广泛的主题进行推理，但他们的知识仅限于公共数据，直到他们接受培训的特定时间点。如果要构建可以推理私有数据或模型截止日期后引入的数据的 AI 应用程序，则需要使用模型所需的特定信息来增强模型的知识。引入适当信息并将其插入模型提示符的过程称为检索增强生成（RAG）。 # RAG 架构典型的 RAG 应用程序有两个主要组件：索引：用于从源引入数据并对其进行索引的管道。这通常发生在离线状态。检索和生成：实际的 RAG 链，它在运行时接受用户查询并从索引中检索相关数据，然后将其传递给模型。 # Indexing 索引加载：首先我们需要加载数据。这是使用 DocumentLoaders 完成的。拆分：文本拆分器将大 Documents 块拆分为 ...

GPT-4 trained on YouTube transcripts GPT-4 在 YouTube 记录上进行训练

news

最新AI新闻

未读

GPT-4 trained on YouTube transcripts GPT-4 在 YouTube 记录上进行训练

发表于2024-04-072024-07-18 AI 新闻

# GPT-4 trained on YouTube transcripts GPT-4 在 YouTube 成绩单上进行训练 Based on the information provided in the search results, OpenAI reportedly used transcriptions of over a million hours of YouTube videos to train GPT-4, its most advanced large language model. This was part of their effort to gather high-quality training data, which is crucial for the development and improvement of AI models like GPT-4. The company developed its Whisper audio transcription model to assist in this process, which allo ...

news

最新AI新闻

未读

2026 AI data drought 2026年AI数据干旱

发表于2024-04-032024-07-18 AI 新闻

# 2026 年 AI 数据干旱 The potential for a data drought in 2026 is a significant concern for the artificial intelligence (AI) industry, as highlighted by various sources. This situation arises from the rapid consumption of high-quality language data by AI systems, such as ChatGPT, which are trained on extensive datasets compiled from the internet. The demand for this data is outpacing the rate at which it is being produced, leading to predictions that the stock of language data suitable for training A ...

news

最新AI新闻

未读

Cheap AI data poisoning 廉价AI数据中毒

发表于2024-04-012024-07-18 AI 新闻

Data poisoning is a cybersecurity threat that targets the integrity of machine learning (ML) and artificial intelligence (AI) systems by deliberately manipulating the data used to train these models. This manipulation can lead to incorrect or biased outcomes from AI systems, making data poisoning a significant concern for the reliability and security of AI applications. The concept of data poisoning is not new, but its implications are becoming increasingly critical as AI and ML technologies bec ...

LLM

LangChain

未读

如何将多个提示(prompts)组合在一起

发表于2024-04-012024-07-18 AI LLM 大语言模型 LangChain prompt

# 如何将多个提示组合在一起。当您想要重用部分提示时，这可能很有用。这可以通过 PipelinePrompt 完成。PipelinePrompt 由两个主要部分组成：最终提示：返回的最后一个提示管道提示：元组列表，由字符串名称和提示模板组成。每个提示模板都将被格式化，然后作为同名变量传递给将来的提示模板。 12345678910111213141516171819202122232425262728293031323334353637# Pipelinefrom langchain.prompts.pipeline import PipelinePromptTemplatefrom langchain.prompts.prompt import PromptTemplatefull_template = """{introduction}{example}{start}"""full_prompt = PromptTemplate.from_template ...

LLM

LangChain

未读

LangChain LCEL 链式调用

发表于2024-04-012024-07-18 AI LLM 大语言模型 LangChain MoonshotAI 月之暗面 kimi chain

# 准备大模型实例 123456789101112131415161718import osos.environ["OPENAI_API_KEY"] = 'moonshot api key'os.environ["OPENAI_API_BASE"] = 'https://api.moonshot.cn/v1/'from langchain_openai import ChatOpenAIapi_key = os.getenv("OPENAI_API_KEY")base_url = os.getenv("OPENAI_API_BASE")print(api_key, base_url)model = ChatOpenAI( openai_api_base=base_url, openai_api_key=api_key, model_name="moonshot-v1-8k", temperature=1,) # 案例 1 1 ...

LLM

LangChain

未读

Chat Models 聊天模型

发表于2024-04-012024-07-18 AI LLM 大语言模型 LangChain prompt

# 概述聊天模型是 LangChain 的核心组件。聊天模型是一种语言模型，它使用聊天消息作为输入，并将聊天消息作为输出返回（而不是使用纯文本）。 LangChain 与许多模型提供商（OpenAI、Cohere、Hugging Face 等）集成，并公开了一个标准接口来与所有这些模型进行交互。 LangChain 允许您在同步、异步、批处理和流模式下使用模型，并提供其他功能（例如，缓存）等。 # 初始化 123456789101112131415161718192021222324252627import osfrom langchain_openai import OpenAIfrom langchain_openai import ChatOpenAIos.environ["OPENAI_API_KEY"] = 'api-key'os.environ["OPENAI_API_BASE"] = 'https://api.moonshot.cn/v1/'api_key = os.getenv(&quo ...

LLM

LangChain

未读

LangChain 完美兼容适配 MoonshotAI 国内AI，无需魔法即可调用，完美替代 OpenAI

发表于2024-03-292024-07-18 AI LLM 大语言模型 LangChain MoonshotAI 月之暗面 kimi

# 必知概念官方概念：（地址：https://python.langchain.com） LangChain 是一个开发由语言模型驱动的应用程序的框架。它使应用程序能够：上下文感知：将语言模型连接到上下文源（提示指令、少量示例、响应内容等）依靠语言模型进行推理（关于如何根据提供的上下文回答，采取什么行动等）说人话：类似于 Java 的 SpringBoot 框架，而 LangChain 就是 AI 界的 “SpringBoot” 框架 # 必要条件创建月之暗面的 api 账号，月之暗面 == MoonshotAI 官网地址：https://platform.moonshot.cn 现在注册新用户还会赠送15元钱模型计费单位价格 moonshot-v1-8k 1M tokens ¥12.00 moonshot-v1-32k 1M tokens ¥24.00 moonshot-v1-128k 1M tokens ¥60.00 此处 1M = 1,000,000 价格比 OpenAI 的相当的实惠了，不过免费的 15 块已经 ...