←back to thread

280 points antidnan | 1 comments | | HN request time: 0.206s | source
Show context
folli ◴[] No.41917385[source]
From the paper's method section, a bit more about which type of ML algo was used:

An RF machine-learning model was developed to predict lithium concentrations in Smackover Formation brines throughout southern Arkansas. The model was developed by (i) assigning explanatory variables to brine samples collected at wells, (ii) tuning the RF model to make predictions at wells and assess model performance, (iii) mapping spatially continuous predictions of lithium concentrations across the Reynolds oolite unit of the Smackover Formation in southern Arkansas, and (iv) inspecting the model for explanatory variable importance and influence. Initial model tuning used the tidymodels framework (52) in R (53) to test XGBoost, K-nearest neighbors, and RF algorithms; RF models consistently had higher accuracy and lower bias, so they were used to train the final model and predict lithium.

Explanatory variables used to tune the RF model included geologic, geochemical, and temperature information for Jurassic and Cretaceous units. The geologic framework of the model domain is expected to influence brine chemistry both spatially and with depth. Explanatory variables used to train the RF model must be mapped across the model domain to create spatially continuous predictions of lithium. Thus, spatially continuous subsurface geologic information is key, although these digital resources are often difficult to acquire.

Interesting to me that RF performed better the XGBoost, would have expected at least a similar outcome if tuned correctly.

replies(5): >>41918303 #>>41918430 #>>41918858 #>>41919565 #>>41921426 #
1. jofer ◴[] No.41919565[source]
Put another way, this is pretty similar to the interpolation approaches that would normally be used for datasets like this in the world of mineral exploration. Kriging/co-kriging (i.e. gaussian processes) is the more commonly used approach in this particular field due to both the long history and the available hyperparameters for things like spatial aniostropy.

However, kriging is really quite difficult to use with non-continuous inputs. RF is a lot more forgiving there. You don't need to develop a covariance model for discrete values (or a covariance model for how the different inputs relate, either).