Session 5D: Machine Learning/Computing
Time: 10:00 AM to 12:00 PM
Description
Machine learning is rapidly becoming more advanced and useful for use in the geothermal industry. These recent advancements create new opportunities for innovation and system optimizations while helping to drive down the overall cost of geothermal energy. This session will highlight recent improvements and applications of machine learning in the geothermal industry.
1. Empowering Geothermal Research: The Geothermal Data Repository's New AI Research Assistant (10:00 AM - 10:20 AM)
DescriptionThe Department of Energy’s (DOE) Geothermal Data Repository (GDR) team has integrated a Large Language Model (LLM) with the metadata and supporting documents associated with GDR datasets to create an Artificially Intelligent (AI) research assistant. By leveraging work done to make GDR metadata machine-readable and an open-source LLM integration model called the Energy Language Model, developed by the National Renewable Energy Laboratory, AskGDR serves as a virtual research assistant to GDR users. It provides answers to a variety of user-provided questions using natural language processing and generative machine learning. Users can get answers to questions about specific datasets, including inquiries about the equipment, assumptions and methodologies used in the origination of the data; or more abstract questions, such as the applicability of data to specific research fields. AskGDR improves the discoverability of geothermal data by helping guide users to datasets beyond simple keyword searches. It enables users to find data based on properties of the data, discover information contained within supporting documents, and explore data from projects related to their research objectives.
This paper will outline the development, integration, output, and efficacy of the AskGDR LLM, including adherence to scientific rigor through improvements designed to increase the accuracy of generated answers, avoid speculation, and provide proper references for all resources used.
Speakers2. Separating Signals in Elevation Data Improves Supervised Machine Learning Predictions for Hydrothermal Favorability (10:20 AM - 10:40 AM)
DescriptionA recent study identified topography (land surface elevation above sea level) as an important input dataset (feature) for predicting the location of hydrothermal systems in the Great Basin in Nevada. Yet, topography is generally a result of more than one geological process and may consequently contain multiple distinct signals. For example, the geologic evolution of the Great Basin has produced both crustal thickening (i.e., regional-scale trends in elevation) and thinning via Basin and Range extensional faulting (i.e., valley-scale topographic relief). We postulate that these geologic processes may affect the occurrence of hydrothermal systems differently. Therefore, we separate the regional trend from the valley-scale signal in the Great Basin, and then use them separately to evaluate the importance of each as predictors for hydrothermal favorability.
Our prior work applying supervised machine learning (ML) using the data from the Nevada Machine Learning Project demonstrated that employing a training strategy that randomly selects negative training sites produces better performing models for predicting hydrothermal favorability than a training strategy that used expert-selected negatives. The models created using both training strategies exhibited a west-east geographic trend in the predictions for the favorability of hydrothermal resources. These models generally predicted higher favorability in western Nevada and lower favorability in eastern Nevada. This west-east trend in predicted favorability correlates with elevation across the Great Basin, which trends higher from west to east.
By separating the original elevation feature into distinct features for elevation trend (i.e., regional-scale topography) and detrended elevation (i.e., valley-scale or local relative topography), we find that models using the separated topographic signals consistently outperform competing models that use the original elevation feature. Although western Nevada still exhibits higher favorability than eastern Nevada, using separated signals for regional elevation and local structure reduces the west-east prediction trend in the region and emphasizes structures associated with hydrothermal upflow. This work emphasizes how carefully engineering features to represent geological conditions relevant to hydrothermal systems allows ML algorithms to detect important patterns to predict hydrothermal resource favorability and leads to better model performance.
Speakers3. Unsupervised Machine Learning For Assessing Geothermal Heat Exchanger Performance (10:40 AM - 11:00 AM)
DescriptionArtificial intelligence and machine learning applications for above-ground geothermal energy operations are limited despite volumes of data being generated in real-time by facilities all over the world. There is a need to extensively annotate operational data prior to it being ready for tasks such as condition monitoring, fault detection, and performance forecasting. In this study, we propose the application of unsupervised machine learning to automatically label fifteen (15) years of performance data for a geothermal heat exchanger. The framework implemented in this study relies on well-established techniques in systematic time-series feature engineering, feature selection, and cluster analyses. The trained models were able to identify data groups that represented periods of time before and after a cleaning activity was done on the heat exchangers. The resulting data labels can be used to assess the effectiveness of different heat exchanger cleaning methods and the development of an optimal maintenance plan. The developed approach can also be applied in other parts of the geothermal value chain, such as using unsupervised machine learning to develop an effective maintenance plan for scaled-up production wells.
Speakers4. Development of Artificial Neural Networks for Estimating Static Formation Temperature (11:00 AM - 11:20 AM)
DescriptionThis study looks at using artificial neural networks (ANNs) for estimating the static formation temperature (SFT) in geothermal wells. SFTs are conventionally estimated from temperature recovery data using some form of linear regression, such as the Horner method. However, there are a multitude of regression models to choose from, and the optimal one is not settled and may be situation specific. Moreover, commonly used linear regression methods, like the Horner method, require suitably long shut-in periods for their linearization assumptions to apply. As an alternative, machine learning methods have been considered previously to some extent for estimating SFTs. Nevertheless, the development of machine learning alternatives has been limited by available field data and especially accurate SFT data. In this study, we look at using ANNs for estimating the static formation temperature (SFT) in geothermal wells based on transient temperature recovery data. For training the ANNs, a large set of synthetic temperature recovery data was generated using a wellbore simulator called GEOTEMP2. The GEOTEMP2 simulations describe the temperature in a wellbore during drilling and the following recovery period after drilling. The developed ANN models were evaluated by comparing their estimation accuracy with the Horner method for synthetic validation data.
Speakers5. Hard Rock Drilling Efficiency Mapping using Machine Learning (11:20 AM - 11:40 AM)
DescriptionIt is crucial to comprehend how drilling efficiency changes as operating parameters and confining pressure environment change since this can enable improved drilling performance and optimization. The rate of penetration (ROP) and torque (T) required to penetrate hard rock significantly impact the efficiency of the cutting action of the Polycrystalline Diamond Compact (PDC) cutters. As the incremental weight of bit is applied, the drilling process undergoes three different distinct drilling efficiency phases, Phase 1, inefficient drilling, Phase II, efficient drilling where weight on bit (WOB) and rotation per minute (RPM) are strongly correlated to ROP and T, and Phase III where ROP and T are limited with incremental weight on bit and rotation per minute. This study analyzed the different efficiency phases for a PDC bit design under different pressure environments and applied parameters with resulting T and ROP. An in-house rig was utilized for the testing, with the capability of collecting data at frequencies of 200 Hz. Tests were performed in Sierra White Granite (SWG) with a 3 ¾” bit design. The operating parameters applied were rotation per minute (80-160 RPM), weight on bit (0-6 klbs), and confining pressure (0-2,000 psi). The rotation per minute was set constant during all the tests, and the weight on bit was incrementally increased stepwise for data collection at each step. For each weight on bit range and step, heat maps were created showing the correlations between all the different parameters (applied and resultant) to inform the machine learning (ML) feature selection method. Distinct correlations between applied and resultant parameters were seen in the different drilling efficiency phases and within the different phases for different applied parameters. The potential application of this method in a real-time drilling environment shows promising results in identifying the most efficient drilling parameters to obtain optimal drilling.
Speakers6. Fostering Geothermal Machine Learning Success: Elevating Big Data Accessibility and Automated Data Standardization in the Geothermal Data Repository (11:40 AM - 12:00 PM)
DescriptionThe Department of Energy’s (DOE) Geothermal Data Repository (GDR) has implemented improvements to both its data lakes and its data standards and automated data pipelines. The GDR data lakes have reduced storage and compute-related barriers to using large geothermal datasets, enabling these large datasets to be accessed by anyone with a modern computer and internet access. More recently, the GDR has been working to further reduce barriers through streamlining the data intake process, educating users on the process and requirements, and aiding users in accessing data from the data lakes. These improvements have augmented the quantity of datasets the GDR is able to accept into its data lakes and have enabled users who are new to cloud tools to access these datasets more easily, overall increasing the accessibility of big geothermal data for use in machine learning and other projects. In addition, the GDR now has built-in data standards and pipelines for drilling data, geospatial data, and distributed acoustic sensing (DAS) data. These standardization efforts aim to enhance the real-world applicability of geothermal machine learning outcomes by improving the quality of training data. Specifically, through standardizing high-value datasets, the GDR is reducing project-specific data curation requirements, thus allowing more time for actual research. By automating this process, the burden of standardization is lifted from the user, ultimately increasing the availability of standardized data.
This paper provides an update on recent improvements made to the GDR’s data lakes and automated data pipelines, including: (1) the streamlining of the data lake intake process, (2) better educating users on the process and requirements through a new data lakes page, (3) the addition of data lake direct access links to GDR data lake submission pages, (4) the implementation of a DAS data pipeline to convert DAS data uploaded in SEG-Y format to a standardized hierarchical data format v5 (HDF5), (5) the extension of this pipeline to encompass data in the GDR data lake, (6) added metadata requirements for geospatial data, (7) user interface (UI)/user experience (UX) enhancements to the data pipelines’ documentation pages, and (8) improvements to the GDR’s data standards and pipelines pages to better guide users in ensuring their data is standardized by the GDR’s automated data pipelines.
Speakers