Pipeline

一、基本概念

Transformers 库中最基本的对象是 pipeline() 函数。它将模型与其必要的预处理和后处理步骤连接起来，使我们能够通过直接输入任何文本并获得最终的答案：


from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")
# 或者多个句子
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", 
     "I hate this so much!"]
)

默认情况下，此 pipeline 选择一个特定的预训练模型，该模型已针对英语情感分析进行了微调。创建分类器对象时，将下载并缓存模型。如果你重新运行该命令，则将使用缓存的模型，无需再次下载模型。

将一些文本传递到 pipeline 时涉及三个主要步骤：
- 文本被预处理为模型可以理解的格式。
- 预处理的输入被传递给模型。
- 模型处理后输出最终人类可以理解的结果。

目前一些可用的 pipeline 是：


xxxxxxxxxx
feature-extraction (get the vector representation of a text)
fill-mask
ner (named entity recognition)
question-answering
sentiment-analysis
summarization
text-generation
translation
zero-shot-classification

Zero-shot classification：直接用预训练好的模型进行分类，允许直接指定用于分类的标签，因此不必依赖预训练模型的标签。


xxxxxxxxxx
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the 🤗 Transformers library",
    candidate_labels=["education", "politics", "business"],
)

此 pipeline 称为 zero-shot ，因为你不需要对数据上的模型进行微调即可使用它。

Text generation：用户提供一个提示，模型将通过生成剩余的文本来自动完成整段话。


xxxxxxxxxx
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

你可以使用参数 num_return_sequences 控制生成多少个不同的序列，并使用参数 max_length 控制输出文本的总长度。

Mask filling：此任务填充给定文本中的空白。


xxxxxxxxxx
from transformers import pipeline
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

top_k 参数控制要显示的结果有多少种。请注意，这里模型填充了特殊的 <mask> 单词，它通常被称为 mask token 。其他模型可能有不同的 mask token ，因此在探索其他模型时要验证正确的 mask token 是什么。

Named entity recognition：命名实体识别 (NER) 是一项任务，其中模型必须找到输入文本的哪些部分对应于诸如人员、位置或组织之类的实体。


xxxxxxxxxx
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

在这里，模型正确地识别出 Sylvain 是一个人 (PER)，Hugging Face 是一个组织 (ORG)，而 Brooklyn 是一个位置 ( LOC )。

我们在 pipeline 创建函数中传递选项 grouped_entities=True 以告诉 pipeline 将对应于同一实体的句子部分重新组合在一起。

Question answering：问答pipeline使用来自给定上下文的信息回答问题。


xxxxxxxxxx
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

请注意，此 pipeline 通过从提供的上下文中提取信息来工作；它不会凭空生成答案。

文本摘要 Summarization：文本摘要是将文本缩减为较短文本的任务，同时保留文本中的主要（重要）信息。


xxxxxxxxxx
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

与文本生成一样，你指定结果的 max_length 或 min_length 。

翻译：对于翻译，如果你在任务名称中提供 language pair （例如 "translation_en_to_fr"），则可以使用默认模型，但最简单的方法是在 Model Hub 中选择要使用的模型。在这里，我们将尝试从法语翻译成英语：


xxxxxxxxxx
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

与文本生成和摘要一样，你指定结果的 max_length 或 min_length 。

二、pipeline abstraction

pipeline abstraction 是所有其它可用 pipelines 的 wrapper 。

可以在一个 item 、一组 item、甚至数据集上调用 pipeline：


xxxxxxxxxx
from transformers import pipeline
pipe = pipeline("text-classification")

pipe("This restaurant is awesome")
pipe(["This restaurant is awesome", "This restaurant is aweful"])

import datasets
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
    print(out)

transformers.pipeline()：一个工具函数用于创建一个 Pipeline。一个 Pipeline 由三部分组成：一个 tokenizer、一个 model、以及某些后处理部分。
```
xxxxxxxxxx
transformers.pipeline( 
  task: str = None, 
  model: typing.Optional = None, 
  config: typing.Union[str, transformers.configuration_utils.PretrainedConfig, NoneType] = None, 
  tokenizer: typing.Union[str, transformers.tokenization_utils.PreTrainedTokenizer, transformers.tokenization_utils_fast.PreTrainedTokenizerFast, NoneType] = None, 
  feature_extractor: typing.Union[str, ForwardRef('SequenceFeatureExtractor'), NoneType] = None, 
  framework: typing.Optional[str] = None, 
  revision: typing.Optional[str] = None, 
  use_fast: bool = True, 
  use_auth_token: typing.Union[bool, str, NoneType] = None, 
  device: typing.Union[int, str, ForwardRef('torch.device'), NoneType] = None, 
  device_map = None, 
  torch_dtype = None, 
  trust_remote_code: typing.Optional[bool] = None, 
  model_kwargs: typing.Dict[str, typing.Any] = None, 
  pipeline_class: typing.Optional[typing.Any] = None, 
  **kwargs ) -> Pipeline
```
参数：
- task：一个字符串，指定任务类型。目前支持的任务包括：
  - "audio-classification"：将返回一个 AudioClassificationPipeline 。
  - "automatic-speech-recognition"：将返回一个 AutomaticSpeechRecognitionPipeline。
  - "conversational"：将返回一个 ConversationalPipeline 。
  - "feature-extraction"：将返回一个 FeatureExtractionPipeline。
  - "fill-mask"：将返回一个 FillMaskPipeline。
  - "image-classification"：将返回一个 ImageClassificationPipeline。
  - "question-answering"：将返回一个 QuestionAnsweringPipeline。
  - "table-question-answering"：将返回一个 TableQuestionAnsweringPipeline。
  - "text2text-generation"：将返回一个 Text2TextGenerationPipeline 。
  - "text-classification"（别名 "sentiment-analysis" ）：将返回一个 TextClassificationPipeline。
  - "text-generation"：将返回一个 TextGenerationPipeline。
  - "token-classification"（别名 "ner"）：将返回一个 TokenClassificationPipeline。
  - "translation"：将返回一个 TranslationPipeline。
  - "translation_xx_to_yy" : 将返回一个 TranslationPipeline 。
  - "summarization"：将返回一个 SummarizationPipeline。
  - "zero-shot-classification"：将返回一个 ZeroShotClassificationPipeline。
- model：一个字符串或 PreTrainedModel 或 TFPreTrainedModel，指定模型。如果未提供，则使用该任务的默认模型。
- config：一个字符串或 PretrainedConfig，指定用于实例化模型的配置。如果未提供，则使用模型的默认配置文件。
- tokenizer：一个字符串或 PreTrainedTokenizer，指定 tokenizer。如果未提供，则使用模型的默认 tokenizer 。
- feature_extractor：一个字符串或 PreTrainedFeatureExtractor，指定 feature extractor （用于 non-NLP 模型，如语音模型、视觉模型、以及多模态模型）。如果未提供，则使用模型的默认 feature extractor 。
- framework：一个字符串，指定框架，可以是 PyTorch 的 "pt" 或 TensorFlow 的 "tf" 。如果未提供，则默认为 PyTorch 。
- revision：一个字符串，指定使用的模型版本，可以为一个 git branch name、git tag name、或 git commit id 。默认为 'main' 。
- use_fast：一个布尔值，指定是否使用 Fast tokenizer 。
- use_auth_token：一个字符串或布尔值。如果为 True 则使用运行 huggingface-cli 登录时生成的 token（存储在~/.huggingface ）。如果为字符串则指定 token 。
- device：一个整数或字符串或 torch.device，定义设备。如 "cpu", "cuda:1", "mps", 1 。pipeline 将被分配到该设备上。
- device_map：一个字符串或字典，作为 model_kwargs 发送。当 accelerate library 存在时，设置 device_map="auto" 来自动计算最优的 device_map。
  不要同时使用 device_map 和 device，因为它们会冲突。
- torch_dtype：一个字符串或 torch.dtype，作为 model_kwargs 发送。它指定模型的可用精度（torch.float16, torch.bfloat16 或 "auto"）。
- trust_remote_code：一个布尔值，指定允许 Hub 上的自定义代码。这个选项应该只被设置为 True ，因为它将在你的本地机器上执行 Hub上的代码。
- model_kwargs：传递给模型的 from_pretrained(..., **model_kwargs) 函数的关键字参数的字典。
- kwargs：传递给特定管道的初始化函数的关键字参数。

我们可以使用 pipeline 的 batching 能力，当传入列表、Dataset、或者 generator 时：


xxxxxxxxxx
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
import datasets

dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
pipe = pipeline("text-classification", device=0)
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
    print(out)

但是开启 batching 可能更快、也可能更慢，这取决于硬件、数据、以及实际使用的模型。

一些经验原则是：

在你的负载上，用你的硬件来测量性能，用真实的数据来做判断。
如果你有 latency 限制，不要 batch 。
如果你使用 CPU，不要 batch 。
如果你想在一堆静态数据上运行模型，请使用 GPU 。
如果你不知道 sequence_length 的大小，默认情况下不要 batch。
一旦你启用 batch，确保你能很好地处理 OOM 。

三、实现一个新的 pipeline

从继承基类 Pipeline 开始，我们需要实现四个方法：preprocess、_forward、postprocess、_sanitize_parameters 。

这种分解方式是为了支持对 CPU/GPU 的相对无缝的支持，同时支持在 CPU 上在不同线程的上进行 pre/postprocessing 。

preprocess：把原始定义的 inputs 进行处理，然后将其转换为可以馈入模型的数据。
_forward：是实现的细节，并不意味着被直接调用（直接调用的是 forward 方法）。
postprocess：把 _forward 的输出变成 final output 。
_sanitize_parameters：让用户随时传递任何参数，无论是在 pipeline 初始化时、还是在 pipeline 被调用时。
_sanitize_parameters 返回了三个 kwargs 的字典，它们分别被传递给 preprocess、_forward、以及 postprocess 。


xxxxxxxxxx
from transformers import Pipeline


class MyPipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        if "maybe_arg" in kwargs:
            preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
        return preprocess_kwargs, {}, {}

    def preprocess(self, inputs, maybe_arg=2):
        model_input = Tensor(inputs["input_ids"])
        return {"model_input": model_input}

    def _forward(self, model_inputs):
        # model_inputs == {"model_input": model_input}
        outputs = self.model(**model_inputs)
        # Maybe {"logits": Tensor(...)}
        return outputs

    def postprocess(self, model_outputs):
        best_class = model_outputs["logits"].softmax(-1)
        return best_class

为了把你的新任务注册到支持的任务列表中，你必须把它添加到 PIPELINE_REGISTRY 中：


xxxxxxxxxx
from transformers.pipelines import PIPELINE_REGISTRY

PIPELINE_REGISTRY.register_pipeline(
    "new-task",
    pipeline_class=MyPipeline,
    pt_model=AutoModelForSequenceClassification,
)

你也可以指定一个默认的模型，在这种情况下，它应该带有一个特定的 revision （可以是一个分支的名称或一个 commit 哈希值）以及类型。


xxxxxxxxxx
PIPELINE_REGISTRY.register_pipeline(
    "new-task",
    pipeline_class=MyPipeline,
    pt_model=AutoModelForSequenceClassification,
    default={"pt": ("user/awesome_model", "abcdef")},
    type="text",  # current support type: text, audio, image, multimodal
)

share pipeline 到 Hub：

首先将你的自定义 Pipeline 子类保存到一个 python 文件中，例如 my_pipeline.py 。

然后导入和注册你的 pipeline：


xxxxxxxxxx
from my_pipeline import MyPipeline
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification

PIPELINE_REGISTRY.register_pipeline(
    "new-task",
    pipeline_class=MyPipeline,
    pt_model=AutoModelForSequenceClassification,
    tf_model=TFAutoModelForSequenceClassification,
)

然后我们在自定义 pipeline 中使用预训练模型：


xxxxxxxxxx
from transformers import pipeline
classifier = pipeline("new-task", model="sgugger/finetuned-bert-mrpc")

然后通过 save_pretrained 方法来在 Hub 上共享：


xxxxxxxxxx
from huggingface_hub import Repository

repo = Repository("test-dynamic-pipeline", clone_from="{your_username}/test-dynamic-pipeline")
classifier.save_pretrained("test-dynamic-pipeline")
repo.push_to_hub()

最后，任何用户都可以使用这个 pipeline：


xxxxxxxxxx
from transformers import pipeline

classifier = pipeline(model="{your_username}/test-dynamic-pipeline", trust_remote_code=True)

添加 pipeline 到 Transformers：你需要再 pipelines 子模块中添加一个新的模块，包含你的pipeline 的代码，然后在 pipelines/__init__.py 的任务列表中添加它。
然后，你需要添加测试：创建一个新的文件 test/test_pipelines_MY_PIPELINE.py 。你需要实现至少两个测试：
- test_small_model_pt：为 pipeline 定义一个小模型，并测试 pipeline 的输出。其结果应该与 test_small_model_tf 相同。
- test_small_model_tf：为 pipeline 定义一个小模型，并测试 pipeline 的输出。其结果应该与 test_small_model_pt 相同。
- test_large_model_pt：在一个真实的 pipeline 上进行测试，可选的。
- test_large_model_tf：在一个真实的 pipeline 上进行测试，可选的。

四、API

class transformers.Pipeline：所有 pipeline 的父类。

pipeline workflow 定义为：Input -> Tokenization -> Model Inference -> Post-Processing (task dependent) -> Output，并支持在 CPU/GPU 上运行。


xxxxxxxxxx
class transformers.Pipeline(
  model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')],
  tokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None,
  feature_extractor: typing.Optional[ForwardRef('SequenceFeatureExtractor')] = None,
  modelcard: typing.Optional[transformers.modelcard.ModelCard] = None,
  framework: typing.Optional[str] = None,
  task: str = '',
  args_parser: ArgumentHandler = None,
  device: typing.Union[int, str, ForwardRef('torch.device')] = -1,
  binary_output: bool = False,
  **kwargs
)

参数：

model/tokenizer/feature_extractor/framework/task/device：参考 transformers.pipeline() 。
modelcard：一个字符串或 ModelCard，指定 pipeline 中模型的属性。
num_workers：一个整数，指定 DataLoader 所使用的 workers 数量，默认为 8 。
batch_size：一个整数，指定 DataLoader 所使用的 batch size ，默认为 1 。
args_parser：一个 ArgumentHandler 对象，对负责解析提供的 pipeline parameters 的对象的引用。
binary_output：一个布尔值，指定 pipeline 的输出应该是二进制格式（即，pickle）还是原始文本格式。

方法：

check_model_type(supported_models: typing.Union[typing.List[str], dict] )：检查模型类型是否支持该 pipeline。
参数：supported_models：一个字符串列表或字符串字典，指定 pipeline 所支持的模型列表，或模型名称到 model class 的字典。

device_placement()：Context Manager ，允许在用户指定的设备上以与框架无关的方式分配张量。

示例：


xxxxxxxxxx
# Explicitly ask for tensor allocation on CUDA device :0
pipe = pipeline(..., device=0)
with pipe.device_placement():
    # Every framework specific tensor allocation will be done on the request device
    output = pipe(...)

ensure_tensor_on_device(**inputs) -> Dict[str, torch.Tensor] ：确保 inputs （PyTorch 张量）放到适当的设备上。
参数：inputs：指定的输入，需要将它放置到 self.device 上。仅考虑 torch.Tensor 。
postprocess( model_outputs: ModelOutput, **postprocess_parameters: typing.Dict)：后处理，它将接收 _forward 方法的原始输出（通常是张量），并将其重新格式化为更加友好的输出。
predict(X)：transformer pipeline 的 Scikit / Keras 接口。该方法将转发给 call() 。
preprocess(input_: typing.Any, **preprocess_parameters: typing.Dict )：预处理，它将接收 input_ ，并返回一个字典（字典里包含 _forward 正常运行所需要的一切）。
save_pretrained(save_directory: str )：保存 pipeline 的 model 和 tokenizer 。
transform(X)：transformer pipeline 的 Scikit / Keras 接口。该方法将转发给 call() 。

4.1 Audio

class transformers.AudioClassificationPipeline(*args, **kwargs)：使用任何 AutoModelForAudioClassification 的音频分类 pipeline。这个 pipeline 可以预测原始波形或音频文件的类别。如果是音频文件，应该安装 ffmpeg 以支持多种音频格式。
该 pipeline 可以通过 "audio-classification" 任务标识符来使用 pipeline() 来加载。
参数：参考 transformers.Pipeline 。
方法：
- __call__( inputs: typing.Union[numpy.ndarray, bytes, str], **kwargs ) -> A list ofdictwith the following keys：对 inputs 进行分类。
  参数：

inputs：一个正确采样率的原始波形（形状为 (n,) 的 np.ndarray）、或者一个字符串（音频文件的文件名，该文件将以正确的采样率被读取，用 ffmpeg 获取波形）、或者字节序列（音频文件的内容，被 ffmpeg 以同样的方式解释）。
- top_k：一个整数，指定由 pipeline 返回的 top label 数量。如果为 None、或者改数值高于模型配置中可用的标签数量，则默认为标签数量。
返回一个关于字典的列表，包含如下的键：
- label：指定被预测的 label （字符串）。
score：指定该 label 的概率（浮点数）。

示例：


xxxxxxxxxx
from transformers import pipeline
   classifier = pipeline(model="superb/wav2vec2-base-superb-ks")
   classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac")
# [{'score': 0.997, 'label': '_unknown_'}, {'score': 0.002, 'label': 'left'}, {'score': 0.0, 'label': 'yes'}, {'score': 0.0, 'label': 'down'}, {'score': 0.0, 'label': 'stop'}]

class transformers.AutomaticSpeechRecognitionPipeline(feature_extractor: typing.Union[ForwardRef('SequenceFeatureExtractor'), str], *args, **kwargs )：抽取一些音频中包含的 spoken text 的 pipeline 。输入可以是一个原始波形或一个音频文件。如果是音频文件，应该安装 ffmpeg 以支持多种音频格式。
参数：
- model/tokenizer/framework/device：参考 transformers.Pipeline 。
- feature_extractor：一个 SequenceFeatureExtractor 对象，用于为模型编码 waveform 。
- chunk_length_s：一个浮点数，指定每个 chunk 的 input length 。如果 chunk_length_s=0，则禁用 chunking 。仅用于 CTC 模型，如 Wav2Vec2ForCTC 。默认为 0 。
- stride_length_s：一个浮点数，指定每个 chunk 的 left stride 和 right stride 长度。这使得模型可以看到更多的上下文，并比没有这个上下文的情况下更好地推断 letters ，但 pipeline 会在最后丢弃 stride bits ，以使最终的重构尽可能完美。
- decoder：一个 pyctcdecode.BeamSearchDecoderCTC ，指定解码器。
方法：
- __call__(inputs: typing.Union[numpy.ndarray, bytes, str], **kwargs) -> Dict：对 inputs 进行转录到文本。
  参数：
  - inputs：可以为如下的格式：
    - 一个字符串，指定音频文件名。该文件将通过 ffmpeg 以正确的采样率被读取，以获得波形图。
    - bytes，指定音频文件的内容。
    - 一个字典：格式必须是 {"sampling_rate": int, "raw": np.array} ，指定采样率和音频文件的内容。可以选择 "stride": (left: int, right: int)。这样可以要求 pipeline 在解码时忽略第一个左样本和最后一个右样本。只对 CTC 模型使用 stride 。
  - return_timestamps：一个字符串，仅用于纯 CTC 模型。
    - 如果设置为 "char"，pipeline 将沿文本返回文本中每个字符的时间戳。
    - 如果设置为"word"，pipeline 将沿文本返回文本中每个单词的时间戳。
  返回一个字典，包含如下的键：
  - text：指定被识别的文本（字符串）。
  - chunks：当使用 return_timestamps 时，chunks 将成为包含各种 text chunks 的列表，如 [{"text": "hi ", "timestamps": (0.5,0.9), {"text": "there", "timestamps": (1.0, 1.5)}] 。原始的文本可以通过 "".join(chunk["text"] for chunk in output["chunks"]) 来得到。
示例：
```
xxxxxxxxxx
from transformers import pipeline
transcriber = pipeline(model="openai/whisper-base")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac")
# {'text': ' He hoped there would be stew for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered flour-fatten sauce.'}
```

4.2 Computer vision

class transformers.DepthEstimationPipeline(*args, **kwargs)：使用 AutoModelForDepthEstimation 的 depth estimation pipeline 。这个 pipeline 可以预测图像的 depth 。

该 pipeline 可以通过 "depth-estimation" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]], **kwargs)：对 inputs 进行预测。
参数：
- images：一个字符串、字符串列表、PIL.Image、或者 PIL.Image 的列表，指定输入图像。可以是包含指向图像的 http 字符串、或者包含指向图像的本地路径的字符串、或者直接是 PIL 图像。
  可以输入单张图片，也可以是 batch 的图片（此时为字符串列表、或者 PIL.Image 列表）。
- top_k：参考 AudioClassificationPipeline.__call__() 。

示例：


xxxxxxxxxx
from transformers import pipeline
depth_estimator = pipeline(task="depth-estimation", model="Intel/dpt-large")
output = depth_estimator("http://images.cocodataset.org/val2017/000000039769.jpg")
# This is a tensor with the values being the depth expressed in meters for each pixel
output["predicted_depth"].shape
# torch.Size([1, 384, 384])

class transformers.ImageClassificationPipeline(*args, **kwargs)：使用 AutoModelForImageClassification 的 image classification pipeline 。这个 pipeline 可以预测图像的类别。

该 pipeline 可以通过 "image-classification" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]], **kwargs)：对 inputs 进行预测。
参数：参考 DepthEstimationPipeline 。

示例：


xxxxxxxxxx
from transformers import pipeline
classifier = pipeline(model="microsoft/beit-base-patch16-224-pt22k-ft22k")
classifier("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
# [{'score': 0.442, 'label': 'macaw'}, {'score': 0.088, 'label': 'popinjay'}, {'score': 0.075, 'label': 'parrot'}, {'score': 0.073, 'label': 'parodist, lampooner'}, {'score': 0.046, 'label': 'poll, poll_parrot'}]

class transformers.ImageSegmentationPipeline(*args, **kwargs)：使用 AutoModelForXXXSegmentation 的 image segmentation pipeline 。这个 pipeline 可以预测图像中物体的 mask 和它们的类别。

该 pipeline 可以通过 "image-segmentation" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]], **kwargs)：对 inputs 进行预测。
参数：
- images：参考 DepthEstimationPipeline 。
- subtask：一个字符串，指定要执行的分割任务，可以选择 "semantic", "instance", "panoptic"。如果未设置，则 pipeline 将尝试按以下顺序来解决：panoptic, instance, semantic 。
- threshold：一个浮点数，指定用于过滤掉 predicted masks 的概率的阈值。默认为 0.9 。
- mask_threshold：一个浮点数，指定将 predicted masks 二元化时的阈值。默认为 0.5 。
- overlap_mask_area_threshold：一个浮点数，指定 mask overlap 阈值，用于消除小的、不相连的 segments 。默认为 0.5 。

示例：


xxxxxxxxxx
from transformers import pipeline
segmenter = pipeline(model="facebook/detr-resnet-50-panoptic")
segments = segmenter("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
len(segments)
# 2
segments[0]["label"]
# 'bird'
segments[1]["label"]
# 'bird'
type(segments[0]["mask"])  # This is a black and white mask showing where is the bird on the original image.
# <class 'PIL.Image.Image'>
segments[0]["mask"].size
# (768, 512)

class transformers.ObjectDetectionPipeline(*args, **kwargs)：使用 AutoModelForObjectDetection 的 object detection pipeline 。这个 pipeline 可以预测图像中物体的 bounding box 和它们的类别。

该 pipeline 可以通过 "object-detection" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]], **kwargs)：对 inputs 进行预测。
参数：
- images：参考 DepthEstimationPipeline 。
- threshold：一个浮点数，指定概率阈值从而生成一个预测。默认为 0.9 。

示例：


xxxxxxxxxx
from transformers import pipeline
detector = pipeline(model="facebook/detr-resnet-50")
detector("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
# [{'score': 0.997, 'label': 'bird', 'box': {'xmin': 69, 'ymin': 171, 'xmax': 396, 'ymax': 507}}, {'score': 0.999, 'label': 'bird', 'box': {'xmin': 398, 'ymin': 105, 'xmax': 767, 'ymax': 507}}]
# x, y  are expressed relative to the top left hand corner.

class transformers.ZeroShotImageClassificationPipeline(**kwargs)：使用 CLIPModel 的 zero-shot image classification pipeline 。这个 pipeline 预测图片的类别，由用户提供一张图片以及一组候选的标签。

该 pipeline 可以通过 "zero-shot-image-classification" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]], **kwargs)：对 inputs 进行预测。
参数：
- images：参考 DepthEstimationPipeline 。
- candidate_labels：一个关于字符串的列表，指定候选的标签集合。
- hypothesis_template：一个字符串，该句子与候选标签结合使用，通过用候选标签来替代占位符，然后通过使用 logits_per_image 来估计可能性。默认为 "This is a photo of {}" 。

示例：


xxxxxxxxxx
from transformers import pipeline
classifier = pipeline(model="openai/clip-vit-large-patch14")
classifier(
    "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
    candidate_labels=["animals", "humans", "landscape"],
)
# [{'score': 0.965, 'label': 'animals'}, {'score': 0.03, 'label': 'humans'}, {'score': 0.005, 'label': 'landscape'}]
classifier(
    "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
    candidate_labels=["black and white", "photorealist", "painting"],
)
# [{'score': 0.996, 'label': 'black and white'}, {'score': 0.003, 'label': 'photorealist'}, {'score': 0.0, 'label': 'painting'}]

class transformers.ZeroShotObjectDetectionPipeline(**kwargs )：使用 OwlViTForObjectDetection 的 zero-shot object detection pipeline 。这个 pipeline 可以预测图像中物体的 bounding box ，当你提供一张图片以及一组候选标签。

该 pipeline 可以通过 "zero-shot-object-detection" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(image: typing.Union[str, ForwardRef('Image.Image'), typing.List[typing.Dict[str, typing.Any]]], candidate_labels: typing.Union[str, typing.List[str]] = None, **kwargs)：对 inputs 进行预测。
参数：参考 ZeroShotImageClassificationPipeline 。

示例：


xxxxxxxxxx
from transformers import pipeline

detector = pipeline(model="google/owlvit-base-patch32", task="zero-shot-object-detection")
detector(
    "http://images.cocodataset.org/val2017/000000039769.jpg",
    candidate_labels=["cat", "couch"],
)
# [{'score': 0.287, 'label': 'cat', 'box': {'xmin': 324, 'ymin': 20, 'xmax': 640, 'ymax': 373}}, {'score': 0.254, 'label': 'cat', 'box': {'xmin': 1, 'ymin': 55, 'xmax': 315, 'ymax': 472}}, {'score': 0.121, 'label': 'couch', 'box': {'xmin': 4, 'ymin': 0, 'xmax': 642, 'ymax': 476}}]
detector(
    "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
    candidate_labels=["head", "bird"],
)
# [{'score': 0.119, 'label': 'bird', 'box': {'xmin': 71, 'ymin': 170, 'xmax': 410, 'ymax': 508}}]

4.3 NLP

class transformers.Conversation：包含一个对话及其历史的工具类。这个类是作为 ConversationalPipeline 的输入。
```
xxxxxxxxxx
class transformers.Conversation(
  text: str = None, conversation_id: UUID = None, past_user_inputs = None, generated_responses = None
)
```
conversation 包含一些实用函数来管理新的用户输入、以及生成的模型响应。在被传递给 ConversationalPipeline 之前，一个 conversation 需要包含一个未处理的用户输入（通过初始化方法传入，或通过 conversational_pipeline.append_response("input") 提供）。
参数：
- text：一个字符串，指定初始的 user input 从而开启对话。如果未提供，则在对话开始之前需要使用 add_user_input() 方法手动提供一个 user input 。
- conversation_id：一个 uuid.UUID，指定对话的唯一标识符。如果未提供，则分配一个随机的 UUID4 id 。
- past_user_inputs：一个关于字符串的列表，指定用户的对话的历史。如果你以交互的方式使用 pipeline，你不需要手动传递它。但是如果你希望 recreate history，那么你需要传入 past_user_inputs 和 generated_responses，它们都是字符串列表，并且字符串个数都相同。
- generated_responses：一个关于字符串的列表，指定模型的对话的历史。如果你以交互的方式使用 pipeline，你不需要手动传递它。
方法：
- add_user_input( text: str, overwrite: bool = False)：添加一个 user input 从而用于下一轮对话。这将填充内部的 new_user_input 字段。
  参数：
  - text：一个字符串，指定 user input 。
  - overwrite：一个布尔值，指定是否覆盖当前已有的 user input 。
- append_response( response: str )：添加一个 reponse 到 generated responses 列表。
  参数： response：一个字符串，指定模型产生的响应。
- iter_texts()：遍历对话中的所有 blobs 。
  返回：(is_user, text_chunk) 的迭代器，根据对话的时间顺序排列。is_user 是一个布尔值，text_chunk 是一个字符串。
- mark_processed()：将对话标记为已处理（将 new_user_input 的内容移至 past_user_inputs），并清空 new_user_input 字段。
示例：
```
xxxxxxxxxx
conversation = Conversation("Going to the movies tonight - any suggestions?")

# Steps usually performed by the model when generating a response:
# 1. Mark the user input as processed (moved to the history)
conversation.mark_processed()
# 2. Append a mode response
conversation.append_response("The Big lebowski.")

conversation.add_user_input("Is it good?")
```

class transformers.ConversationalPipeline(*args, **kwargs)：多轮对话 pipeline 。

该 pipeline 可以通过 "conversational" 任务标识符来使用 pipeline() 来加载。

该 pipeline 可以使用的模型是在多轮对话任务上微调过的模型，目前有："microsoft/DialoGPT-small"、"microsoft/DialoGPT-medium"、"microsoft/DialoGPT-large"。

参数：

model/tokenizer/modelcard/framework/task/num_workers/batch_size/args_parser/device/binary_output ：参考 transformers.Pipeline 。
min_length_for_response：一个整数，指定每个响应的最小长度（以 token 个数来计算）。默认为 32 。
minimum_tokens：一个整数，指定离开一个对话的最小长度（以 token 个数来计算）。

方法：

__call__(conversations: typing.Union[transformers.pipelines.conversational.Conversation, typing.List[transformers.pipelines.conversational.Conversation]], num_workers = 0, **kwargs) -> Conversation or List[Conversation ]：执行对话。
参数：
- conversations：一个 Conversation 或 Conversation 的列表，指定需要生成响应的对话。
- clean_up_tokenization_spaces：一个布尔值，指定是否清理文本输出中潜在的额外空格。
- generate_kwargs：关键字参数，传递给模型的 generate 方法。

示例：


xxxxxxxxxx
from transformers import pipeline, Conversation

chatbot = pipeline(model="microsoft/DialoGPT-medium")
conversation = Conversation("Going to the movies tonight - any suggestions?")
conversation = chatbot(conversation)
conversation.generated_responses[-1]
# 'The Big Lebowski'
conversation.add_user_input("Is it an action movie?")
conversation = chatbot(conversation)
conversation.generated_responses[-1]
# "It's a comedy."

class transformers.FillMaskPipeline(*args, **kwargs)：使用 ModelWithLMHead 的 mask filling pipeline 。

该 pipeline 可以通过 "fill-mask" 任务标识符来使用 pipeline() 来加载。

该 pipeline 可以使用的模型是已经用 masked language modeling objective 训练过的模型，其中包括库中的双向模型。

该 pipeline 仅用于正好有一个 token 被掩码的输入。

参数：

model/tokenizer/modelcard/framework/task/num_workers/batch_size/args_parser/device/binary_output ：参考 transformers.Pipeline 。
tok_k：一个整数，指定返回的 predictions 的数量。默认为 5 。
targets：一个字符串或关于字符串的列表，指定模型把 score 限制在 targets 上而不是在整个词表上。如果 targets 不在模型词表中，则它们会被 tokenized 并使用每个词汇的第一个 token 。

方法：

__call__(inputs, *args, **kwargs ) -> A list or a list of list of dict：对 inputs 进行预测。
参数：
- inputs：一个字符串或者关于字符串的列表，指定需要被预测的 masked 文本。
- targets/top_k：参考初始化方法。
返回值：字典的列表，每个字典包含如下的键：
- sequence：指定预测结果（一个字符串）。
- score：指定预测结果的概率（一个浮点数）。
- token：指定预测的 token id（被掩码的 token，一个整数）。
- token：指定预测的 token 字符串。

示例：


xxxxxxxxxx
from transformers import pipeline

fill_masker = pipeline(model="bert-base-uncased")
fill_masker("This is a simple [MASK].")
# [{'score': 0.042, 'token': 3291, 'token_str': 'problem', 'sequence': 'this is a simple problem.'}, {'score': 0.031, 'token': 3160, 'token_str': 'question', 'sequence': 'this is a simple question.'}, {'score': 0.03, 'token': 8522, 'token_str': 'equation', 'sequence': 'this is a simple equation.'}, {'score': 0.027, 'token': 2028, 'token_str': 'one', 'sequence': 'this is a simple one.'}, {'score': 0.024, 'token': 3627, 'token_str': 'rule', 'sequence': 'this is a simple rule.'}]

class transformers.NerPipeline(*args, **kwargs)：使用 ModelForTokenClassification 的 Named Entity Recognition pipeline 。

该 pipeline 可以通过 "ner" 任务标识符来使用 pipeline() 来加载。

参数：

model/tokenizer/modelcard/framework/task/num_workers/batch_size/args_parser/device/binary_output ：参考 transformers.Pipeline 。
ignore_labels：一个关于字符串的列表，指定需要忽略的 label 列表，默认为 ["O"] 。
grouped_entities：被废弃，推荐使用 aggregation_strategy 。
aggregation_strategy：一个字符串，指定基于模型预测来融合 token 的策略：
- "none"：不做任何聚合，仅返回模型的原始结果。
- "simple"：将尝试按照默认模式对命名实体进行分组。(A, B-TAG), (B, I-TAG), (C, I-TAG), (D, B-TAG2) (E, B-TAG2) 将被聚合为：
```
xxxxxxxxxx
[{“word”: ABC, “entity”: “TAG”}, {“word”: “D”, “entity”: “TAG2”}, {“word”: “E”, “entity”: “TAG2”}]
```
  注意，B-TAG 表示 begin、I-TAG 表示 intermediate 。两个连续的 B tag 将最终成为不同的实体。
- "first"：使用 "simple" 策略，除了 word 不能以不同的 tag 结束。当有歧义时， word 将简单地使用该 word 的第一个token 的标签。仅用于 word-based 模型。
- "average"：使用 "simple" 策略，除了 word 不能以不同的 tag 结束。当有歧义时， word 将简单地使用该 word 的所有 token 的平均分对应的 label 。仅用于 word-based 模型。
- "max"：使用 "simple" 策略，除了 word 不能以不同的 tag 结束。当有歧义时， word 将简单地使用该 word 的所有token 的最大分对应的 label 。仅用于 word-based 模型。

方法：

__call__(inputs: typing.Union[str, typing.List[str]], **kwargs) -> A list or a list of list of dict：执行预测。
参数：inputs：一个或多个文本，指定需要进行 token classification 的文本。
返回值：一个字典或字典的列表。每个字典包含以下 key：
- word：一个字符串，给出被分类的 token/word 。这是通过对所选择的 token 进行解码得到的。如果你想拥有原句中的准确字符串，请使用 start 和 end 。
- score：一个浮点数，给出实体的对应概率。
- entity：一个字符串，给出 token/word 被预测的实体名称。
- index：一个整数，给出句子中相应 token 的索引。仅当 aggregation_strategy="none" 时生效。
- start：一个整数，给出句子中相应实体的开始索引。
- end：一个整数，给出句子中相应实体的结束索引。
aggregate_words(entities: typing.List[dict], aggregation_strategy: AggregationStrategy)：重写 word 的聚合结果。
例如：micro|soft| com|pany| B-ENT I-NAME I-ENT I-ENT 可以通过 "first" 策略被重写为 microsoft| company| B-ENT I-ENT 。
参数：
- entities：一个字典，表示被 pipeline 预测的结果。
- aggregation_strategy：参考初始化方法。
gather_pre_entities(sentence: str, input_ids: ndarray, scores: ndarray, offset_mapping: typing.Union[typing.List[typing.Tuple[int, int]], NoneType], special_tokens_mask: ndarray, aggregation_strategy: AggregationStrategy ) ：将各种 numpy 数组与聚合所需的各种信息融合到 dict 中。
group_entities(entities: typing.List[dict] )：找到并分组具有相同预估实体的相邻 token 。
参数：entities：一个字典，表示被 pipeline 预测的结果。
group_sub_entities(entities: typing.List[dict] )：找到并分组具有相同预估实体的相邻 token 。
参数：entities：一个字典，表示被 pipeline 预测的结果。

示例：


xxxxxxxxxx
from transformers import pipeline

token_classifier = pipeline(model="Jean-Baptiste/camembert-ner", aggregation_strategy="simple")
sentence = "Je m'appelle jean-baptiste et je vis à montréal"
tokens = token_classifier(sentence)
tokens
# [{'entity_group': 'PER', 'score': 0.9931, 'word': 'jean-baptiste', 'start': 12, 'end': 26}, {'entity_group': 'LOC', 'score': 0.998, 'word': 'montréal', 'start': 38, 'end': 47}]

token = tokens[0]
# Start and end provide an easy way to highlight words in the original text.
sentence[token["start"] : token["end"]]
# ' jean-baptiste'

# Some models use the same idea to do part of speech.
syntaxer = pipeline(model="vblagoje/bert-english-uncased-finetuned-pos", aggregation_strategy="simple")
syntaxer("My name is Sarah and I live in London")
# [{'entity_group': 'PRON', 'score': 0.999, 'word': 'my', 'start': 0, 'end': 2}, {'entity_group': 'NOUN', 'score': 0.997, 'word': 'name', 'start': 3, 'end': 7}, {'entity_group': 'AUX', 'score': 0.994, 'word': 'is', 'start': 8, 'end': 10}, {'entity_group': 'PROPN', 'score': 0.999, 'word': 'sarah', 'start': 11, 'end': 16}, {'entity_group': 'CCONJ', 'score': 0.999, 'word': 'and', 'start': 17, 'end': 20}, {'entity_group': 'PRON', 'score': 0.999, 'word': 'i', 'start': 21, 'end': 22}, {'entity_group': 'VERB', 'score': 0.998, 'word': 'live', 'start': 23, 'end': 27}, {'entity_group': 'ADP', 'score': 0.999, 'word': 'in', 'start': 28, 'end': 30}, {'entity_group': 'PROPN', 'score': 0.999, 'word': 'london', 'start': 31, 'end': 37}]

class transformers.QuestionAnsweringPipeline(*args, **kwargs)：使用 ModelForQuestionAnswering 的 Question Answering pipeline 。
该 pipeline 可以通过 "question-answering" 任务标识符来使用 pipeline() 来加载。
参数：参考 transformers.Pipeline 。
方法：
- __call__(*args, **kwargs) -> A dict or a list of dict：对 inputs 进行预测。
  参数：
  - inputs：一个 SquadExample 或 SquadExample 列表，指定包含问题和上下文的 SquadExample 。
  - X：一个 SquadExample 或 SquadExample 列表，指定包含问题和上下文的 SquadExample 。它的作用和 inputs 相同。
  - data：一个 SquadExample 或 SquadExample 列表，指定包含问题和上下文的 SquadExample 。它的作用和 inputs 相同。
  - question：一个字符串或关于字符串的列表，指定问题。必须和 context 参数配合使用。
  - context：一个字符串或关于字符串的列表，指定上下文。必须和 question 参数配合使用。
  - topk：一个整数，指定返回的答案数量（根据 likelihood 来选择）。默认为 1 。注意，如果在上下文中没有足够的选项，则返回的答案数量可能少于 topk 。
  - doc_stride：一个整数，如果上下文太长，则它将被拆分为具有一定重叠的若干个 chunks 。这个参数控制重叠部分的大小。默认为 128 。
  - max_answer_len：一个整数，指定预测答案的最大长度。默认为 15 。
  - max_seq_len：一个整数，指定每个 chunk 的最大长度（ context + question ，以 token 计数）。如果需要的话，context 将被拆分为若干个 chunk 。默认为 384 。
  - max_question_len：一个整数，指定问题的最大长度（以 token 计数）。如果需要的话，question 将被截断。默认为 64 。
  - handle_impossible_answer：一个布尔值，指定我们是否接受 impossible 作为答案。
  - align_to_words：一个布尔值，指定试图将答案与真实的单词对齐。在空格分隔的语言上可以提高质量。对非空格分隔的语言（如日语或中文）可能会有影响。
  返回一个字典或字典的列表。每个字典包含如下的键：
  - score：一个浮点数，指定答案的概率。
  - start：一个整数，指定答案开始在输入中的位置（在 tokenized input 中）。
  - end：一个整数，指定答案结束在输入中的位置（在 tokenized input 中）。
  - answer：一个字符串，指定问题的答案。
- create_sample(question: typing.Union[str, typing.List[str]], context: typing.Union[str, typing.List[str]]) -> One or a list of SquadExample：将 question 和 context 转换为 SquadExample 。
  参数：参考初始化方法。
  返回：一个 SquadExample 或 SquadExample 的列表。
- span_to_answer(text: str, start: int, end: int) -> dict ：返回答案在原始上下文中的开始位置和结束位置。
  参数：
  - text：一个字符串，指定原始的上下文。
  - start：一个整数，指定答案的 starting token index 。
  - end：一个整数，指定答案的 end token index 。
  返回值：一个字典，键包括 'answer', 'start', 'end' 。
示例：
```
xxxxxxxxxx
from transformers import pipeline

oracle = pipeline(model="deepset/roberta-base-squad2")
oracle(question="Where do I live?", context="My name is Wolfgang and I live in Berlin")
# {'score': 0.9191, 'start': 34, 'end': 40, 'answer': 'Berlin'}
```
class transformers.SummarizationPipeline(*args, **kwargs)：文本摘要 pipeline 。
该 pipeline 可以通过 "summarization" 任务标识符来使用 pipeline() 来加载。
该 pipeline 可以使用的模型是在摘要任务上微调过的模型，目前有："bart-large-cnn"、"t5-small"、"t5-base"、"t5-large"、"t5-3b"、"t5-11b"。
参数：参考 transformers.Pipeline 。
方法：
- __call__(*args, **kwargs) -> A list or a list of list of dict：执行预测。
  参数：
  - documents：一个字符串或字符串列表，指定需要摘要的文章。
  - return_text：一个布尔值，指定是否在输出中包含解码后的文本。默认为 True 。
  - return_tensors：一个布尔值，指定是否在输出中包括预测的张量（token id ）。默认为 False 。
  - clean_up_tokenization_spaces：一个布尔值，指定是否清理文本输出中潜在的额外空格。默认为 False 。
  - generate_kwargs：传递给模型 generate 方法的额外关键字参数。
  返回值：字典或字典的列表。字典包含如下的键：
  - summary_text：一个字符串，指定摘要文本。当 return_text=True 时生效。
  - summary_token_ids：一个张量，指定摘要的 token id 。当 return_tensors=True 时生效。
示例：
```
xxxxxxxxxx
# use bart in pytorch
summarizer = pipeline("summarization")
summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

# use t5 in tf
summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf")
summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
```

class transformers.TableQuestionAnsweringPipeline(*args, **kwargs)：使用 ModelForTableQuestionAnswering 的 Question Answering pipeline 。仅用于 PyTorch 。

该 pipeline 可以通过 "table-question-answering" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__(*args, **kwargs) -> A dictionary or a list of dictionaries containing results：执行预测。
参数：
- table：一个 pd.DataFrame 或字典（字典将被转换为 DataFrame），指定 table 。
- query：一个字符串或字符串列表，指定 query 。
- sequential：一个布尔值，指定是按顺序还是按 batch 地执行推断。batch 更快，但是像 SQA 这样的模型需要按顺序进行推断，以提取序列中的关系，因为它们具有对话性质。
- padding：一个布尔值、字符串、或 PaddingStrategy ，指定 padding 策略。可以为：
  - True 或 'longest'：填充到 batch 中的最长序列。如果只提供一个序列，则不填充。
  - 'max_length'：填充到用参数 max_length 指定的最大长度，如果没有提供该参数，则填充到模型可接受的最大输入长度。
  - False 或 'do_not_pad' （默认值）：无填充。
- truncation：一个布尔值、字符串、或 TapasTruncationStrategy ，指定截断策略。可以为：
  - True 或 'drop_rows_to_fit'：截断到用参数 max_length 指定的最大长度，如果没有提供该参数，则截断到模型的最大可接受输入长度。这将逐行截断，从表中删除行。
  - False 或 'do_not_truncate' （默认值）：无截断。
返回值：一个字典或字典的列表。每个字典包含以下的键：
- answer：指定 query 的答案文本。如果有一个 aggregator，则答案将以 "AGGREGATOR >" 作为前导。
- coordinates：指定答案的单元格坐标（List[Tuple[int, int]]）。
- cells：由答案的单元格值组成的字符串列表。
- aggregator：如果该模型有一个 aggregator ，这将返回该 aggregator （字符串）。

示例：


xxxxxxxxxx
from transformers import pipeline

oracle = pipeline(model="google/tapas-base-finetuned-wtq")
table = {
    "Repository": ["Transformers", "Datasets", "Tokenizers"],
    "Stars": ["36542", "4512", "3934"],
    "Contributors": ["651", "77", "34"],
    "Programming language": ["Python", "Python", "Rust, Python and NodeJS"],
}
oracle(query="How many stars does the transformers repository have?", table=table)
# {'answer': 'AVERAGE > 36542', 'coordinates': [(0, 1)], 'cells': ['36542'], 'aggregator': 'AVERAGE'}

注意，有多种调用形式：


xxxxxxxxxx
pipeline(table, query)
pipeline(table, [query])
pipeline(table=table, query=query)
pipeline(table=table, query=[query])
pipeline({"table": table, "query": query})
pipeline({"table": table, "query": [query]})
pipeline([{"table": table, "query": query}, {"table": table, "query": query}])

class transformers.TextClassificationPipeline(*args, **kwargs)：使用 ModelForSequenceClassification 的 text classification pipeline 。
该 pipeline 可以通过 "sentiment-analysis" 任务标识符来使用 pipeline() 来加载。
如果有多个类别标签（model.config.num_labels >= 2），那么 pipeline 将在 results 上运行 softmax ；如果只有一个标签，那么 pipeline 将在 results 上运行 sigmoid 。
参数：
- model/tokenizer/modelcard/framework/task/num_workers/batch_size/args_parser/device/binary_output ：参考 transformers.Pipeline 。
- return_all_scores：一个布尔值，指定是返回所有的预测分数还是只返回预测类别的分数。默认为 False 。
- function_to_apply：一个字符串，指定应用于模型输出的函数。可以为：
  - "default"：如果模型只有一个标签，将在输出上应用 sigmoid 函数。如果模型有多个标签，将在输出上应用 softmax 函数。
  - "sigmoid"：在输出上应用 sigmoid 函数。
  - "softmax"：在输出上应用 softmax 函数。
  - "none"：不在输出上应用任何函数。
方法：
- __call__( *args, **kwargs ) -> A list or a list of list of dict：执行预测。
  参数：
  - args：一个字符串或字符串列表或字典，指定要分类的文本。字典可以为 {"text", "text_pair"} 这样的键。
  - top_k：一个整数，指定要返回多少个结果。默认为 1 。
  - function_to_apply：参考初始化方法。
  返回值：一个字典或字典列表。其中字典包含如下的键：
  - label：指定被预测的标签文本。
  - score：指定对应的概率。
示例：
```
xxxxxxxxxx
from transformers import pipeline

classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
classifier("This movie is disgustingly good !")
# [{'label': 'POSITIVE', 'score': 1.0}]

classifier("Director tried too much.")
# [{'label': 'NEGATIVE', 'score': 0.996}]
```
class transformers.TextGenerationPipeline(*args, **kwargs)：使用 ModelWithLMHead 的 language generation pipeline 。
该 pipeline 可以通过 "text-generation" 任务标识符来使用 pipeline() 来加载。
该 pipeline 可以使用的模型是已经用 autoregressive language modeling objective 训练过的模型。
参数：参考 transformers.Pipeline 。
方法：
- __call__( text_inputs, **kwargs ) -> A list or a list of list of dict：执行预测。
  参数：
  - args：一个字符串或字符串列表，指定 prompt 。
  - return_tensors：一个布尔值，指定是否输出张量（token id）。默认为 False 。
  - return_text：一个布尔值，指定是否输出解码后的文本。默认为 True 。
  - return_full_text：一个布尔值，如果设置为 False 则只返回新增的文本，否则返回全文。只有在 return_text = True 时才有意义。默认为 True 。
  - clean_up_tokenization_spaces：一个布尔值，指定是否清理文本输出中潜在的额外空格。默认为 False 。
  - prefix：一个字符串，指定在 prompt 中添加的前缀。
  - handle_long_generation：一个字符串。默认情况下，该 pipeline 不处理长的生成。有几种策略用于解决该问题：
    - None：默认策略，不做任何处理。
    - "hole"：截断 input 的左边，并留下足够的 gap 来执行生成。
  - generate_kwargs：额外的关键字参数，传递给模型的 generate 方法。
  返回值：一个字典或字典的列表。字典包含如下的键：
  - generated_text：指定生成的文本。仅当 return_text=True 时才生效。
  - generated_token_ids：指定生成的文本的 token id 。仅当 return_tensors=True 时才生效。
示例：
```
xxxxxxxxxx
from transformers import pipeline

generator = pipeline(model="gpt2")
generator("I can't believe you did such a ", do_sample=False)

# [{'generated_text': "I can't believe you did such a icky thing to me. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I"}]

# These parameters will return suggestions, and only the newly created text making it easier for prompting suggestions.
outputs = generator("My tart needs some", num_return_sequences=4, return_full_text=False)
```

class transformers.Text2TextGenerationPipeline(*args, **kwargs)：使用 seq2seq 的 text to text generation pipeline 。

该 pipeline 可以通过 "text2text-generation" 任务标识符来使用 pipeline() 来加载。

参数：参考 transformers.Pipeline 。

方法：

__call__( *args, **kwargs ) -> A list or a list of list of dict：执行预测。
参数：
- args：一个字符串或字符串列表，指定用于编码器的 input text 。
- return_tensors/return_text/clean_up_tokenization_spaces：参考 TextGenerationPipeline.__call__() 。
- truncation：参考 TableQuestionAnsweringPipeline.__call__() 。
返回值：参考 TextGenerationPipeline.__call__() 。
check_inputs(input_length: int, min_length: int, max_length: int )：检查给定的输入对于指定的模型是否可能有问题。

示例：


xxxxxxxxxx
from transformers import pipeline

generator = pipeline("text2text-generation", model="mrm8488/t5-base-finetuned-question-generation-ap")
generator(
    "answer: Manuel context: Manuel has created RuPERTa-base with the support of HF-Transformers and Google"
)
# [{'generated_text': 'question: Who created the RuPERTa-base?'}]

class transformers.TokenClassificationPipeline(*args, **kwargs)：使用 ModelForTokenClassification 的 Named Entity Recognition pipeline 。
该 pipeline 可以通过 "ner" 任务标识符来使用 pipeline() 来加载。
参数：参考 NerPipeline 。
方法：参考 NerPipeline 。
示例：参考 NerPipeline 。
class transformers.TranslationPipeline(*args, **kwargs)：用于翻译的 pipeline 。
该 pipeline 可以通过 "translation_xx_to_yy" 任务标识符来使用 pipeline() 来加载。
参数：参考 transformers.Pipeline 。
方法：
- __call__( *args, **kwargs ) -> A list or a list of list of dict：执行预测。
  参数：
  - args：一个字符串或字符串列表，指定待翻译的文本。
  - return_tensors/return_text/clean_up_tokenization_spaces：参考 TextGenerationPipeline.__call__() 。
  - src_lang：一个字符串，指定 input 的语言。对于多语言模型可能是必需的，对 single pair 翻译模型不会有任何影响。
  - tgt_lang：一个字符串，指定 output 的语言。对于多语言模型可能是必需的，对 single pair 翻译模型不会有任何影响。
  - generate_kwargs：额外的关键字参数，传递给模型的generate 方法。
  返回值：字典或者字典的列表。字典包含如下的键：
  - translation_text：一个字符串，包含译文。仅当 return_text=True 时生效。
  - translation_token_ids：一个张量，包含译文的 token id。仅当 return_tensors=True 时生效。
示例：
```
xxxxxxxxxx
en_fr_translator = pipeline("translation_en_to_fr")
en_fr_translator("How old are you?")
```
class transformers.ZeroShotClassificationPipeline(*args, **kwargs)：使用在 natural language inference: NLI 任务上训练好的 ModelForSequenceClassification 模型的 NLI-based zero-shot classification pipeline 。
相对于 text-classification pipeline ，这些模型不需要硬编码的潜在的类别数量，它们可以在运行时选择。这通常意味着它们比较慢，但更灵活。
sequences, labels 的任何组合都可以传入，每个组合都被视为 premise/hypothesis pair 并被传入预训练好的模型。然后 entailment 的 logit 被当做候选标签的有效的 logit 。任何 NLI 模型都可以使用，但是 entailment label 的 id 必须包含在模型配置的 transformers.PretrainedConfig.label2id 中。
该 pipeline 可以通过 "zero-shot-classification" 任务标识符来使用 pipeline() 来加载。
参数：参考 transformers.Pipeline 。
方法：
- __call__(sequences: typing.Union[str, typing.List[str]], *args, **kwargs ) -> A list or a list of list of dict：执行预测。
  参数：
  - sequences：一个字符串或字符串列表，指定被分类的文本序列。如果序列太长则可能被截断。
  - candidate_labels：一个字符串或字符串列表，指定每个序列的候选标签集合。可以是单个标签、一个由逗号分隔的标签组成的字符串、或者一个标签列表。
  - hypothesis_template：一个字符串，指定模板字符串从而将每个标签转化为 NLI-style 的 hypothesis 。这个模板必须包含一个 {} 或类似的语法，从而将候选标签插入到模板中。默认的模板字符串为 "This example is {}." 。
    假设候选标签为 "sports"，那么馈入到模型的字符串可能是 "<cls> sequence to classify <sep> This example is sports . <sep>" 。
    默认的模板在很多情况下都很好用，但是根据任务的设置，可能值得尝试不同的模板。
  - multi_label：一个布尔值，指定是否使用多个候选标签。如果为 False ，分数将被归一化，给定序列的每个标签的可能性之和为 1 。如果是 False ，则标签概率被认为是独立的。
  返回值：一个字典或字典的列表。字典包含如下的键：
  - sequence：一个字符串，给出输出序列。
  - labels：一个关于字符串的列表，给出按概率大小排序的标签。
  - scores：一个关于浮点数的列表，给出每个标签的概率。
示例：
```
xxxxxxxxxx
from transformers import pipeline

oracle = pipeline(model="facebook/bart-large-mnli")
oracle(
    "I have a problem with my iphone that needs to be resolved asap!!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)
# {'sequence': 'I have a problem with my iphone that needs to be resolved asap!!', 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'], 'scores': [0.504, 0.479, 0.013, 0.003, 0.002]}

oracle(
    "I have a problem with my iphone that needs to be resolved asap!!",
    candidate_labels=["english", "german"],
)
# {'sequence': 'I have a problem with my iphone that needs to be resolved asap!!', 'labels': ['english', 'german'], 'scores': [0.814, 0.186]}
```

4.4 多模态

class transformers.DocumentQuestionAnsweringPipeline(*args, **kwargs)：使用 AutoModelForDocumentQuestionAnswering 的 Document Question Answering pipeline 。
inputs/outputs 类似于抽取式的 question answering pipeline，但是这里的 pipeline 将一个图片而不是文本作为 input 。
该 pipeline 可以通过 "document-question-answering" 任务标识符来使用 pipeline() 来加载。
参数：参考 transformers.Pipeline 。
方法：
- __call__(image: typing.Union[ForwardRef('Image.Image'), str], question: typing.Optional[str] = None, word_boxes: typing.Tuple[str, typing.List[float]] = None, **kwargs ) -> A dict or a list of dict：执行预测。
  这里的一个 document 被定义为一个 image 、以及（可能包含）一个 (word, box) tuples 的列表。其中 (word, box) 代表文档中的文本。如果未提供 word_boxes，则将自动使用 Tesseract OCR 引擎来抽取 word 和 box ，对于需要 word_boxes 作为输入的 LayoutLM-like model 。对于 Donut，则不运行 OCR 。
  参数：
  - image：一个字符串或 PIL.Image，指定图片。可以为指向图像的 http 链接的字符串、可以是指向图像的本地路径的字符串、可以直接是 PIL 加载的图像。
    可以传入单张图片或多个图像。如果是单个图像，那么它将被广播给多个 question 。
  - question：一个字符串，指定问题。
  - word_boxes：一个 List[str, Tuple[float, float, float, float]]，给出单词和边界框的列表（边界框标准化为 0 -> 1000 ）。如果你提供了 word_boxes，那么 pipeline 将使用这些单词和边界框，而不是在图像上运行 OCR 来推导单词和边界框。这允许你在 pipeline 的多次调用中重复使用OCR 的结果，而不必每次都重新运行 OCR。
  - top_k：一个整数，指定返回的答案数量。默认为 1 。注意，如果上下文中没有足够的选项，我们会返回少于 top_k 的答案。
  - doc_stride/max_answer_len/max_seq_len/max_question_len/handle_impossible_answer：参考 QuestionAnsweringPipeline.__call__() 。
  - lang：一个字符串，指定运行 OCR 时需要的语言。默认为英文。
  - tesseract_config：一个字符串，指定运行 OCR 时需要传递给 tesseract 的额外标志。
  返回一个字典或字典的列表。每个字典包含如下的键：
  - score/start/end/answer：参考 QuestionAnsweringPipeline.__call__() 。
  - words：一个整数列表，给出每个 word/box pair 在答案中的索引。
示例：
```
xxxxxxxxxx
from transformers import pipeline

document_qa = pipeline(model="impira/layoutlm-document-qa")
document_qa(
   image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png",
    question="What is the invoice number?",
)
# [{'score': 0.425, 'answer': 'us-001', 'start': 16, 'end': 16}]
```
你可以通过以下几种方式来调用：
```
xxxxxxxxxx
pipeline(image=image, question=question)
pipeline(image=image, question=question, word_boxes=word_boxes)
pipeline([{"image": image, "question": question}])
pipeline([{"image": image, "question": question, "word_boxes": word_boxes}])
```

class transformers.FeatureExtractionPipeline(*args, **kwargs)：使用 no model head 的 feature extraction pipeline。该 pipeline 从 base transformer 抽取 hidden states 从而用于下游任务。

该 pipeline 可以通过 "feature-extraction" 任务标识符来使用 pipeline() 来加载。

参数：

return_tensor：一个布尔值，指定是否返回张量（否则返回列表）。
model/tokenizer/modelcard/framework/task/args_parser/device ：参考 transformers.Pipeline 。

方法：

__call__(*args, **kwargs) -> A nested list of float：抽取特征。
参数：args：一个或多个文本，指定需要被抽取特征的文本。
返回值：浮点列表。

示例：


xxxxxxxxxx
from transformers import pipeline

extractor = pipeline(model="bert-base-uncased", task="feature-extraction")
result = extractor("This is a simple test.", return_tensors=True)
result.shape  # This is a tensor of shape [1, sequence_lenth, hidden_dimension] representing the input string.
# torch.Size([1, 8, 768])

class transformers.ImageToTextPipeline(*args, **kwargs)：使用 AutoModelForVision2Seq 的 Image To Text pipeline，它对于给定的图片来预测 caption 。
参数：参考 transformers.Pipeline 。
该 pipeline 可以通过 "image-to-text" 任务标识符来使用 pipeline() 来加载。
方法：
- __call__(images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]], **kwargs) -> A list or a list of list of dict：执行预测。
  参数：
  - images：一个字符串、字符串的列表、PIL.Image、或者 PIL.Image 列表，指定被处理的图片。可以为指向图像的 http 链接的字符串、可以是指向图像的本地路径的字符串、可以直接是 PIL 加载的图像。
  - max_new_tokens：一个整数，指定生成的最大的 token 的数量。
  - generate_kwargs：关键字参数，将被直接传递给 generate 方法。
  返回值：一个字典或字典的列表。字典包含如下的键：
  - generated_text：包含被生成的文本字符串。
示例：
```
xxxxxxxxxx
from transformers import pipeline

captioner = pipeline(model="ydshieh/vit-gpt2-coco-en")
captioner("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
# [{'generated_text': 'two birds are standing next to each other '}]
```

class transformers.VisualQuestionAnsweringPipeline(*args, **kwargs)：使用 AutoModelForVisualQuestionAnswering 的 Visual Question Answering pipeline，目前仅在 PyTorch 中可用。

参数：参考 transformers.Pipeline 。

该 pipeline 可以通过 "visual-question-answering" 或 "vqa" 任务标识符来使用 pipeline() 来加载。

方法：

__call__(image: typing.Union[ForwardRef('Image.Image'), str], question: str = None, **kwargs) -> A dictionary or a list of dicts：执行预测。
参数：
- image：一个字符串、字符串的列表、PIL.Image、或者 PIL.Image 列表，指定被处理的图片。可以为指向图像的 http 链接的字符串、可以是指向图像的本地路径的字符串、可以直接是 PIL 加载的图像。
- question：一个字符串或字符串列表，指定 question。如果提供了单个 question，则它被传播到多个图片。
- top_k：一个整数，指定返回的 top 概率的标签的数量。默认为 5 。如果超出了模型可用的标签数量，则以模型可用标签数量为准。
返回值：一个字典或字典列表。字典包含如下的键：
- label：一个字符串，包含模型所识别出的标签。
- score：一个浮点数，包含该标签的概率。

示例：


xxxxxxxxxx
from transformers import pipeline

oracle = pipeline(model="dandelin/vilt-b32-finetuned-vqa")
image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/lena.png"
oracle(question="What is she wearing ?", image=image_url)
# [{'score': 0.948, 'answer': 'hat'}, {'score': 0.009, 'answer': 'fedora'}, {'score': 0.003, 'answer': 'clothes'}, {'score': 0.003, 'answer': 'sun hat'}, {'score': 0.002, 'answer': 'nothing'}]

oracle(question="What is she wearing ?", image=image_url, top_k=1)
# [{'score': 0.948, 'answer': 'hat'}]

oracle(question="Is this a person ?", image=image_url, top_k=1)
# [{'score': 0.993, 'answer': 'yes'}]

oracle(question="Is this a man ?", image=image_url, top_k=1)
# [{'score': 0.996, 'answer': 'no'}]

该 pipeline 有如下的调用形式：


xxxxxxxxxx
pipeline(image=image, question=question)
pipeline({"image": image, "question": question})
pipeline([{"image": image, "question": question}])
pipeline([{"image": image, "question": question}, {"image": image, "question": question}])