JG 2022-12-01

Article #4: Tips for starting a machine learning project

As I have tried to demonstrate throughout this series, machine learning is capable of drawing accurate and useful conclusions from large amounts of data, quickly and economically. But how does a mining company actually start a machine-learning project?

Here are a few tips:

Tip 1: Work with your teams

A recent report by McKinsey begins by stating that, “In a capital-intensive industry like mining, productivity improvements can have a major bottom-line impact. For that reason, advanced analytics can generate immense value, helping leaders optimize processes, reduce downtime, and inform on-site decision making.” At the same time, it also says that “Advanced analytics can drive value only if employees use them to make decisions. But adoption is often the biggest stumbling block in analytics initiatives.”

To overcome this obstacle, a mining company determined to start a machine learning project should:

involve its own subject matter experts in designing the project and building machine-learning models
support key employees in acquiring new skills where necessary
ensure all stakeholders understand the technology and what the project’s overall objectives are, and
make project data transparent and accessible to all stakeholders.

Tip 2: Build your digital backbone

Machine learning requires large volumes of data from multiple sources, which means it also requires a strong digital backbone.

A digital backbone is the fundamental structural support on which all of a company’s systems, networks, and applications depend — and which determines a company’s ability to keep up with today’s rapid technological advancements.

For a mining company that has not yet gone through digital transformation, this may require a big commitment of time and money, but it will be worth it. Digital transformation is key not only to unlocking the potential of machine learning but to stay competitive in virtually every area of business, from pit to port.

Tip 3: Evaluate the need for additional data

Sometimes, a mining company may not have enough of its own data, or enough well-organised data (see Tip 4), to provide the large amounts needed for a machine-learning project. In this case, the company may decide to collect more data with support from third-party providers. However, that can be expensive and it’s important to establish a solid use case for acquiring additional data.

If you cannot justify additional data acquisition, a better idea might be to scale down or postpone the machine-learning project so that it uses only the high-quality data that your company already owns or will own at some point. For example, if your company is at the beginning of an exploration program, you are probably too early in the data collection stage to start a machine-learning project. If you are in the midst of a brownfield project, you might want to focus your machine-learning project on those where you have ample high-quality, well-structured data.

Tip 4: Prepare your data

The large volumes of data that machine learning depends on must be clean, well-organized and accessible.

Analytics company SAS defines big data as “large, hard-to-manage volumes of data — both structured and unstructured — that inundate businesses on a day-to-day basis.” However, while big data is critical for discovering “new insights that improve decisions and give confidence for making strategic business moves valuable,” the company also warns that “these data sets are so voluminous that traditional data processing software just can’t manage them”.

In order to use big data effectively for a machine learning project, a mining company will need sophisticated data management technology — such as that provided by Dassault Systèmes virtual twin system — that can agglomerate and clean their raw data to fix or remove data that does not belong in your dataset. This includes incorrect or incorrectly formatted data, corrupted or incomplete data, and data that may have been affected by human biases.

Once your big data is cleaned, sorted, located and formatted in a way that is convenient to users, you will be ready to perform tasks aided by automation or algorithms.

Tip 5: Choose your machine learning model

From “A Survey and Perspective on Artificial Intelligence for Security-Aware Electronic Design Automation,” Koblah et al (2022)

There are a number of different machine learning models. The one you need depends on the problem you want to solve and the dimension of your databases, as well as the type of data you want to evaluate in order to generate a model.

Here are brief descriptions of four common machine learning models:

Recursive Partitioning Forest

Works well for small to medium datasets (approximately 50K data records).
Produces high-quality models for both regression and classification problems.
Hyperparameter tuning is usually not necessary and data preprocessing is minimal.

XGBoost

Efficient for medium to large datasets (more than 1M data records).
High quality regression and classification models.
Requires CV/hyperparameter tuning,but can compensate with training speed.

Bayesian classification

Fits large datasets.
Usually gives good results for very fast classifications.
Efficient compared to other methods in terms of computational resources.

Deep Learning

Rapidly evolving, ever-growing literature and understanding of methods.
Best method for imaging; less clear in other areas.
Good for most specific classification problems.

Tip 6: Contact us for more information

Machine learning enables us to efficiently process big data from a range of sources — both within and beyond the traditional mining value chain — to support real-time decision-making and future projections. If you would like to learn more about it, please get in touch with our team.

About the Author

Jose Gonzalez is a Geologist from the University of Chile. Experience in modeling, participation in projects of economic geology, geostatistics and sustainability in mining. Applied knowledge in Data Science, oriented to the innovation of digital solutions. José González has conducted trainings in the LATAM region, supporting numerous projects and his role is oriented to the search and adoption of new solutions that can strengthen the mining services portfolio.

Related posts: