Main Article Content
Natural language processing (NLP) models are susceptible to inheriting biases from their training data, resulting
in potential harm caused by AI models that generate texts. To
address these concerns, a multinational team of about a thousand
mostly academic volunteers developed the BLOOM (Science Language Open-science Open-access Multilingual (BLOOM) NLP
model with $7 million in allocations for computer time. The
BLOOM model is comparable to OpenAI and Google’s models
but offers multilingual and open-source access. BLOOM’s 176
billion parameters are on par with OpenAI’s GPT-3 model. The
model was developed by hand selecting two-thirds of the dataset
from 500 multilingual sources, resulting in 341 billion words. The
sources were finalized through community workshops to ensure
language diversity and eliminate potential biases. BLOOM’s
multilingual capacity of BLOOM may be imbued with deeper
language awareness to facilitate more complex and diverse tasks.
Researchers can download BLOOM freely; however, this requires
sophisticated hardware. BLOOM may not be limited to AI
research and may help to analyze historical texts.
This work is licensed under a Creative Commons Attribution 4.0 International License.