FastBertTokenizer 0.4.67

FastBertTokenizer

NuGet version (FastBertTokenizer) .NET Build codecov

A fast and memory-efficient library for WordPiece tokenization as it is used by BERT. Tokenization correctness and speed are automatically evaluated in extensive unit tests and benchmarks.

Goals

  • Enabling you to run your AI workloads on .NET in production.
  • Correctness - Results that are equivalent to HuggingFace Transformers' AutoTokenizer's in all practical cases.
  • Speed - Tokenization should be as fast as reasonably possible.
  • Ease of use - The API should be easy to understand and use.

Getting Started

dotnet new console
dotnet add package FastBertTokenizer
using FastBertTokenizer;

var tok = new BertTokenizer();
await tok.LoadFromHuggingFaceAsync("bert-base-uncased");
var (inputIds, attentionMask, tokenTypeIds) = tok.Encode("Lorem ipsum dolor sit amet.");
Console.WriteLine(string.Join(", ", inputIds.ToArray()));
var decoded = tok.Decode(inputIds.Span);
Console.WriteLine(decoded);

// Output:
// [101,19544,2213,12997,17421,2079,10626,4133,2572,3388,1012,102]
// [CLS] lorem ipsum dolor sit amet. [SEP]

example project

Comparison to BERTTokenizers

Note that while BERTTokenizers handles token type incorrectly, it does support input of two pieces of text that are tokenized with a separator in between. FastBertTokenizer currently does not support this.

Created by combining https://icons.getbootstrap.com/icons/cursor-text/ in .NET brand color with https://icons.getbootstrap.com/icons/braces/.

Showing the top 20 packages that depend on FastBertTokenizer.

Packages Downloads
SmartComponents.LocalEmbeddings
Experimental, end-to-end AI features for .NET apps. Docs and info at https://github.com/dotnet-smartcomponents/smartcomponents
1
SmartComponents.LocalEmbeddings
Experimental, end-to-end AI features for .NET apps. Docs and info at https://github.com/dotnet-smartcomponents/smartcomponents
2

https://github.com/georg-jung/FastBertTokenizer/releases/tag/v0.4.67

.NET 6.0

  • No dependencies.

.NET 8.0

  • No dependencies.

Version Downloads Last updated
1.0.28 1 13.01.2025
0.5.18-alpha 0 21.12.2023
0.4.67 1 12.12.2024
0.3.29 0 18.09.2023
0.2.7 0 14.09.2023