Good morning dear readers of Tecnogalaxy, today we will talk about the new Bloomberg chatbot, BloombergGPT, focused on live finance.

The paper published last week by Bloomberg reveals the great technical depth of its BloombergGPT machine learning model, applying the type of artificial intelligence techniques that GPT uses to financial datasets.


Bloomberg’s terminal has been the reference resource for the world of trading and finance for financial market data for over four decades. As a result, Bloomberg has acquired or developed a large number of proprietary and curated datasets. In many ways, this data is Bloomberg’s crown jewel and in this version of BloombergGPT, this proprietary data is used to create an unprecedented financial research and analysis tool.

The large language models that power these artificial intelligence experiments are syntactic and semantic in nature and are used to predict a new outcome based on the existing relationships within and between the source texts.

Machine learning algorithms learn from source data and produce a model, a process known as “training“. Training for the BloombergGPT model took about 53 days of calculations performed on 64 servers, each containing 8 GB NVIDIA gpu. For comparison, when we use ChatGPT, we provide an input to a model, known as a prompt, and the model then produces an output, just like providing input to a formula and observing the output. The generation of these models requires huge amounts of computing power, and so Bloomberg collaborated with NVIDIA and Amazon Web Services to produce the BloombergGPT model.

The Bloomberg team used PyTorch, a popular free and open source deep learning package based on Python, to train the BloombergGPT model.

In the case of BloombergGPT, the source datasets include some weighted proportions of financial news, corporate financial documents, press releases and Bloomberg News content, all collected and edited by Bloomberg over decades. In addition to these finance-specific sources, BloombergGPT integrates into some general and common datasets such as The Pile, The Colossal Clean Crawled Corpus or C4 and Wikipedia.

As regards the Bloomberg data used for training between 1 March 2007 and 31 July 2022, Bloomberg refers to this financial collection of data as FINPILE. FINPILE consists of five main sources of financial content, namely:

  • Financial network . Generic Web content is used: but restricted to specific sites that can be classified as financial..
  • Financial news . Although the Web scans websites of a financial nature, news sites that generate news require special attention.
  • Archives of the Society. Anyone who conducts research on a public society must consider studying the documents of the society.
  • Press releases . A company’s formal public communication can often contain financial information.
  • Bloomberg News . Since Bloomberg is also a media company, its news content has been used and sent to BloombergGPT.

BloombergGPT will provide an organizational chart of an organization and links between an individual and multiple companies. Since company names and executives’ names are included in the BloombergGPT model, it is quite possible that it can be questioned at least for the organization’s executive-level structure.


BloombergGPT represents a significant leap forward for the financial and AI communities. Currently, the model is not publicly available and there is no API, let alone a chat interface, to access it. It is unclear when or if public access will be available or even the current incarnation of BloombergGPT will see further revisions. The BloombergGPT team concludes in their paper that we “err on the side of caution and follow the practice of other LLM developers in not releasing our model” and will not make the model available to the public for now, until it is safe.

