Dinkum Journal of Natural & Scientific Innovations (DJNSI)

Publication History

Submitted: May 19, 2024
Accepted:   May 30, 2024
Published:  January 31, 2025

Identification

D-0340

DOI

https://doi.org/10.71017/djnsi.4.1.d-0340

Citation

Md Arafat Al Ajmir Sarker, Ayon Sen & Amit Barai (2025). Fields of Future: Utilizing Supervised Ml for Barley Yield Prediction. Dinkum Journal of Natural & Scientific Innovations, 4(01):01-10.

Copyright

© 2025 The Author(s).

Fields of Future: Utilizing Supervised Ml for Barley Yield PredictionOriginal Article

Md Arafat Al Ajmir Sarker 1*, Ayon Sen 2, Amit Barai 3

  1. Computer Science & Engineering, Daffodil International University, Bangladesh.
  2. Computer Science & Engineering, East West University, Bangladesh.
  3. Computer Science & Engineering, Daffodil International University, Bangladesh.

*             Correspondence: mdarafatsarker00@gmail.com

Abstract: Barley holds a unique position in the agricultural landscape of Bangladesh. Although it is not as widely cultivated as rice or wheat, barley plays a crucial role in diversifying cropping systems and providing alternative food sources. The crop is rich in essential nutrients, including protein, dietary fiber, and minerals, making it a valuable addition to the local diet. This study suggests using supervised (machine learning) ML to anticipate barley production, especially for Bangladesh. Just two of the resources used to gather data on weather patterns, barley output, and other significant elements are the BBS and BMD. As part of the preprocessing procedures, the dataset goes through feature scaling and missing value imputation. DTR, RFR, LinR, XGBR, and GBR are just a few of the ML models that are trained and assessed using metrics like MAE, MSE, and RMSE. The LinR model outperforms the other models with an accuracy of 97.23%. This study aims to explore the utilization of supervised ML techniques for barley yield prediction in future fields, aligning with the evolving landscape of agriculture. Also aims to optimize agricultural practices, make informed decisions, and ensure food security. The rationale behind conducting this study is presented and the benefits it offers for the agricultural sector in the future. The research’s conclusions will assist Bangladeshi farmers, officials, and the agricultural sector in making decisions that will improve barley yield. In order to advance agricultural practices, future research can also look into how ML techniques might be used to forecast crop yields across a range of regions and crops.

Keywords: Machine learning, Barley, Yield Prediction, Bangladesh

  1. INTRODUCTION

Barley is frequently used as food and animal feed, the brewing industry, food processing, and the feed making industries all use barley extensively. As the demand for beer rises, so does the demand for barley [1]. In 2020, Bangladesh produced 7,000 tons of barley. Bangladesh’s production of barley decreased gradually over the years, from 20,000 tons in 1971 to 7,000 tons in 2020 [2]. Bangladesh exported $9.96k worth of barley in 2021, ranking as the 80th highest barley exporter in the world. In the same year, barley ranked 648th in Bangladesh’s list of top exports. The top two countries that Bangladesh exports barley to are Malaysia ($9.9k) and the Netherlands ($64). Bangladesh imported $53.7k of barley in 2021, ranking as the 114th highest barley importer globally [3]. Barley was Bangladesh’s 1055th most imported well that same year. India ($53.7k) is where Bangladesh imports barley from most frequently [4]. By having access to a reliable yield prediction tool, barley farmers in Bangladesh may gain from this study by being able to make better decisions and boost production in this important agricultural sector [5]. In this investigation, we offer a supervised ML method to predict Bangladeshi barley yield. In our approach, five ML models, linear regression, DTR, RFR, XGBR, and GBR are applied [6]. To train and assess these models, a dataset that comprises barley yield together with a variety of environmental factors including year, region, and climatic features is used. The dataset was obtained from the BBS [7] and spans the years 1968 to 2021. The most crucial factors are found with the use of the DTR and RFR models, allowing forecasts to be built around them. XGBR, which uses parallel tree boosting, is also employed [8]. We evaluate the models’ performance using measures such as the coefficient of determination, RMSE, MAE, and MSE. RMSE and MAE calculate the average difference between the expected and actual yield of barley. Our investigation’s findings show that the LinR model outperforms other methods in terms of accuracy [9]. However, calculating barley output is still challenging because of the complex interplay between environmental factors like rainfall, temperature, location, and area. If barley production is forecast properly, farmers may be better able to decide on crop output, resource allocation, and overall profitability [10]. Barley, a versatile cereal crop, holds significant importance in the global food and beverage industry. Its production and yield play a crucial role in ensuring food security and economic stability in many regions [11]. Accurate prediction of barley yield is essential for farmers and policymakers as it aids in decision-making related to crop management, resource allocation, and market forecasting [12]. Traditionally, yield prediction has relied on historical data and expert knowledge. ML algorithms provide valuable insights into barley yield by analyzing historical climate data, soil characteristics, and agronomic factors [13]. This information is used to optimize agricultural practices to maximize yield, reduce resource wastage, and minimize environmental impact. Accurate barley yield prediction enables efficient resource allocation, reducing costs and improving productivity [14]. It also allows farmers to plan for potential market demands and adjust production accordingly, leading to a more stable and profitable agricultural system. ML is used to develop models that incorporate climate data and assess its impact on barley yield, providing valuable insights into adaptation strategies [15]. Farmers can anticipate the effects of changing climate patterns and implement measures to minimize crop losses. ML techniques help develop sustainable agriculture practices by analyzing large datasets and identifying correlations between agronomic factors and yield [16]. This can reduce chemical inputs, optimize water usage, and adopt precision agriculture techniques, contributing to long-term agricultural resilience and biodiversity conservation. ML algorithms can provide real-time information on yield predictions, market conditions, and optimal management practices, enabling stakeholders to make informed decisions to improve productivity, profitability, and sustainability in the agricultural sector [17]. However, with the advancements in data collection, storage, and processing capabilities, the application of ML techniques has gained prominence in agriculture. This research paper aims to explore the potential of supervised ML algorithms for barley yield prediction and their implications for the Fields of Future. Significant findings from our investigation include the following, Creation of a dataset that was carefully gathered, using several techniques including LinR, GBR, XGBR, RFR, and DTR & the research community has a new wing through this study. This study aims to explore the utilization of supervised ML techniques for barley yield prediction in future fields, aligning with the evolving landscape of agriculture. Also aims to optimize agricultural practices, make informed decisions, and ensure food security. The rationale behind conducting this study is presented and the benefits it offers for the agricultural sector in the future.

  1. MATERIALS AND METHODS

The study subjected to barley yield prediction utilizing supervised ML in future fields focuses on developing and evaluating ML models to accurately predict barley yield based on various input variables such as climate data, soil properties, agronomic practices, and other relevant factors. The objective is to improve the accuracy and efficiency of barley yield prediction, enabling farmers and stakeholders to make informed decisions for optimizing agricultural practices and resource allocation. To conduct research on barley yield prediction utilizing supervised ML, several instruments and tools are required. The following are some key components of the instrumentation, Data Collection and Management Instrument: Effective data collection is crucial for developing accurate prediction models. Instruments such as weather stations, soil sensors, and remote sensing technologies is used to collect climate data, soil properties, and other relevant environmental information. Data management tools, including databases and data processing software, are needed to store, clean, integrate, and preprocess the collected data. ML Algorithms Instrument: A wide range of ML algorithms is employed for barley yield prediction, including decision trees, random forests, support vector machines, neural networks, and gradient boosting models. Instrumentation involves implementing these algorithms using Python programming languages and ML libraries scikit-learn, TensorFlow. Feature engineering techniques are utilized to transform and derive new features from the collected data to enhance prediction performance. Instrumentation involves implementing feature engineering methods such as feature scaling, dimensionality reduction, and feature selection algorithms to identify the most informative features for barley yield prediction. Instrumentation involves splitting the available data into training, validation, and testing sets for model training and evaluation. The chosen ML algorithms are trained using the training set, and their performance is evaluated using appropriate evaluation metrics such as mean squared error, root mean squared error, or coefficient of determination. Cross-validation techniques, such as k-fold cross-validation, is employed for robust model evaluation. To enhance the performance of ML models, instrumentation includes model optimization and hyperparameter tuning. Techniques such as grid search, random search, or Bayesian optimization is used to search for optimal hyperparameters, such as learning rate, regularization parameters, or the number of layers and neurons in neural networks. Decision Support System Integration: To translate the developed prediction models into practical applications, instrumentation involves integrating the models into decision support systems. This includes designing user-friendly interfaces, developing software applications or web-based platforms, and implementing real-time data feeds and visualization tools to provide actionable insights and support decision-making for farmers and stakeholders. The data collection procedure for real-time barley production on Bangladeshi climate data using supervised learning is broken down into the following steps: The first step in data collection is to identify the relevant variables that are required to predict barley production. This includes climate variables such as temperature, rainfall, humidity, wind speed, and solar radiation, as well as barley production variables such as the area under cultivation, yield per unit area, and total production. Once the relevant variables have been identified, the next step is to identify the data sources. Climate data is obtained from various sources such as the Bangladesh Meteorological Department, and barley production data is obtained from the Bangladesh Bureau of Statistics or from individual barley farmers. The BBS provided the information needed to create this dataset [18]. Collected information over a 54-year period on barley yields from 23 localities in Bangladesh between 1968 and 2021. Location, bale productivity, hectare area, year, wind speed, sunshine, rainfall, minimum and maximum temperatures, humidity, and cloud cover are among the eleven characteristics [19]. The collected data is cleaned and remove all missing and erroneous values and preprocessed to ensure that all variables are on the same scale. Then the data is also split into training and testing sets for applying algorithms. The collected and preprocessed data is stored in a excel sheet and save it in google drive for easy access and management. Real-time data collection and monitoring is maintained to ensure that the predictive models remain up-to-date and accurate. This involves the use of automated data collection and processing methods [20]. To create an algorithmic prediction of barley yield production, a few procedures must be followed. This section provides a thorough explanation of the entire mechanism for predicting barley yield. The hardest part of a research investigation is gathering the necessary information about barley yield and prediction methodology. Finished preparing the dataset before applying the model. First, a custom dataset is carefully constructed. The creation of a dataset and the use of models are the most challenging aspects of this research process. It could be challenging to choose the model that best matches the dataset. The entire methodology is covered in this section. The methodology work flow diagram is as follows.

Methodology workflow diagram

Figure 01: Methodology workflow diagram

By providing an overview of this framework used for data analysis, including data collection, preprocessing, and exploration, as well as the creation and evaluation of ML models for barley yield prediction, with a focus on the models’ level of accuracy. Before beginning the initial and utilizing stage, the issue is identified. The variables for both the input and output is chosen. The output variable shows the desired result of the detection. We compared the results of various ML techniques in our model. The components of our investigation in this paper that proved to be the most difficult to complete were the data gathering, planning, and implementation. The development of our plan also required having objectives and resources. Analyzed the unique data and determined the statistical study’s findings using the Google Collaboratory platform. Computers now have greater space as a result of the additional GPU and TPU from the Google Collaboratory. Python was used to carry out the coding. Weather stations for collecting historical weather data, soil, and other properties are examples of important data sources. Utilized the Python modules Pandas and NumPy to handle missing values, standardize data, and engineer features. For supervised learning applications, a variety of algorithms like linear regression, DTR, RFR, XGBR, and GBR are available in well-known libraries like sklearn and matplotlib. A correlation matrix is used to display the correlation coefficients between the variables in the barley yield dataset. The correlation between any two variables is represented by each cell in the matrix. The results are summarized in this matrix, which also acts as an input for more in-depth analysis and diagnoses in subsequent research.

Correlation Matrix

Figure 02: Correlation Matrix

In the above figure 02 the correlation matrix of full dataset is showed. This plt graph is generated by python code in google colab with the help of matplotlib. The most crucial phase of any research study is selecting the testing and training approach. Separate the custom dataset’s training and testing components. 810 of the 1160 data were used to train the model, while 350 of the 1160 data were used to test it. As a result, give testing 30% of the resources and training 70%. The outcome of the categorization algorithm depends on how accurate each method is. Our ML models in this work are validated using accuracy, MAE, MSE, and RMSE. In this case, a number of models are used to forecast and analyze the dataset. Models like LinR, XGBR, GBR, RFR, and DTR are employed. Each model has special qualities and skills that add to the overall analysis and forecasting of the data. By using these models, we want to capture the dataset’s complexity and determine the best strategy for achieving these particular research goals.

Accuracy vs Model

Figure 03: Accuracy vs Model

In the above figure 03, Shown all accuracy of different algorithms that are applied. Results showed that LinR 97.23%, XGBR 95.03%, GBR 95.45%, DTR 95.70%, and RTR 95.99% were all accurate predictors. Where the highest accuracy 97.23% is achieved by LinR. And the lowest accuracy is achieved by XGBR accuracy of 95.03%. A comparative analysis is conducted regarding the methodology and performance of the current time technique for predicting barley yields on Bangladeshi climatic data utilizing supervised learning against other similar studies.

Table 01: Current, distinctive crop yield ml research.

Reference Published year Algorithm Dataset area Parameters Model and Result Findings
[8] 2018 DNN Bangladesh 46 Best efficiency of 97.7% utilizing various crops
[9] 2020 NB, DTR India 5 accuracy of 84%

and 88%

collected a dataset consisting of 3101 observations
[12] 2019 ANN,

linear regression

India 8 82 %

accuracy and very little loss.

.

applies forward and backward propagation and Evaluate metrics
[14] 2021 SVM, RFR India 7 Support vector machine 99.47%. compiled data from multiple books and websites

In this table, this study uses various crops like rice, potato, paddy, wheat however our work’s key focus is barley yield nonetheless its algorithm is very much supporting to this paper and its accuracy is high. In these articles, working methods maximize work associated to the paper. So, we include specially above this table.

  1. RESULT & DISCUSSION

In this paper, to use total dataset 1162 and its region, area, product, year, cloud cover, humidity, max temperature, min temperature, rainfall, sunshine and wind speed collected each data per year. It implemented different ML models like Linear Regression, XG, Gradient Boosting Regression and Random Forest Regressor. Each model measured MAE, MSE, RMSE, R2 and accuracy like train set accuracy, test set accuracy. Results showed that LinR 97.23%, XGBR 95.03%, GBR 95.45%, DTR 95.70%, and RTR 95.99% were all accurate predictors. Where the highest accuracy 97.23% is achieved by LinR. And the lowest accuracy is achieved by XGBR accuracy of 95.03%. Use a dataset containing 1160 total data points, each of which contains information about the year, the region, the area, the product, the cloud cover, the maximum temperature, the minimum temperature, the rainfall, the sunshine, and the wind speed. It made use of different ML models, such as RFR, DTR, GBR, LinR, and XGBR. Each model evaluated accuracy using metrics including accuracy, MAE, MSE, and RMSE.

MAE vs Model

Figure 04: MAE vs Model

MSE vs Model

Figure 05: MSE vs Model

In the above figure 04, MAE value is shown for all models where RFR, XGBR, GBR, DTR, and LinR correspondingly gets 36.1935, 41.0406, 42.046, 36.2787 and 66.5288. Here, LinR has the highest and RFR has the lowest MAE score. In the above figure 05, MSE value is shown for all models where LinR, XGBR, GBR, RFR, and DTR, correspondingly gets 25060.76, 45096.8, 41259.8, 36352.09, 39030.02. Where XGBR is the highest MSE and lowest is LinR.

RMSE vs Model

Figure 06: RMSE vs Model

In the above figure 06, RMSE shown for all models where LinR, XGBR, GBR, RFR, and DTR, correspondingly gets 158.3, 212.36, 203.12, 190.66 and 197.56. This time the highest RMSE is achieved by XGBR and the lowest RMSE by LinR. After analyzing the MAE, MSE and RMSE graph, LinR gives the best performance among all used algorithms in relation to this particular research project. The findings of a supervised learning analysis of barley production on Bangladeshi climate data will depend on the specific variables and models used in the analysis. However, some possible findings that could be of interest include, Improved Prediction Accuracy: Supervised ML models, such as decision trees, random forests, support vector machines, and neural networks, have shown promising results in predicting barley yield compared to traditional statistical methods. These models have demonstrated higher prediction accuracy and the ability to capture complex relationships between input variables and barley yield. Importance of Climate Variables: Climate variables, such as temperature, precipitation, and solar radiation, have been identified as crucial predictors for barley yield prediction. ML models have been successful in leveraging these variables to capture the impact of weather conditions on barley growth and yield. Soil Properties and Agronomic Practices: Incorporating soil properties, such as soil moisture, pH, and nutrient levels, along with agronomic practices, such as planting density, fertilization, and irrigation, has improved the predictive performance of ML models. These factors play a significant role in determining barley yield and should be considered when developing prediction models. Feature Selection and Engineering: Feature selection and engineering techniques have been found to enhance the prediction accuracy of barley yield models. Identifying the most informative features and transforming or deriving new features from the available data have improved the model’s ability to capture the underlying patterns and factors influencing barley yield. Model Interpretability: Interpretability of the prediction models has been a challenge, especially for complex models like deep neural networks. While these models may offer high prediction accuracy, understanding the underlying factors driving the predictions and providing explanations to end-users is essential for their acceptance and practical application in future fields. Scaling and Generalization: Ensuring that the developed prediction models can scale to large-scale datasets and generalize well to different geographical regions and time periods is crucial. Models that can handle diverse environmental conditions and provide accurate predictions across different farming systems have practical implications for the adoption of supervised ML in future barley yield prediction.

  1. CONCLUSION

Agriculture is very important in Bangladesh, yet there has been little technological advancement in this area. By using ML approaches to forecast barley yields under various regional and climatic conditions, this work seeks to close this gap. Regression techniques are used in the study to find the best model from a dataset with 1160 entries and 11 variables. The LinR model exhibits the most promising results among the examined models. Future research will concentrate on examining regional soil features and using DL models to increase prediction accuracy. This study has the potential to expand forecasting capabilities to include other crops in addition to creating a customized recommendation system for barley production and distribution. The development of a user-friendly website or mobile application that helps farmers make informed crop selection decisions would be facilitated by the integration of crop data with soil properties. Despite being a tiny start, this has the potential to lead to enormous breakthroughs in the agriculture industry.

  1. RECOMMENDATIONS

The scope for further developments of barley yield prediction utilizing supervised ML in future fields is extensive, offering numerous opportunities for advancements and improvements. Here are some key areas of focus for future development:

  • Further advancements are made in data collection methods, including the integration of real- time data streams from weather stations, soil sensors, drones, and satellite imagery. Improving data quality, expanding data coverage, and integrating data from various sources can enhance the accuracy and robustness of prediction models.
  • Future developments can explore the use of advanced ML techniques, such as deep learning, reinforcement learning, or ensemble methods, to enhance the predictive performance of barley yield models. These techniques can capture complex patterns and interactions in data and potentially improve the interpretability of the models.
  • Future research can focus on integrating data at multiple scales, such as field-level, regional, and global data, to account for spatial variability in yield predictions. This can involve incorporating high-resolution satellite imagery, GIS data, and other geospatial information to capture localized variations in soil properties, climate patterns, and agronomic practices.
  • Incorporating domain knowledge and expert insights into the prediction models can enhance their accuracy and relevance. Collaborations between ML experts and agricultural scientists can help develop hybrid models that combine data-driven approaches with domain-specific knowledge, resulting in more accurate and actionable predictions.
  • Considering the socio-economic aspects of barley yield prediction is important for addressing the needs and constraints of different stakeholders. Future developments should consider the economic viability, scalability, and potential barriers to adoption to ensure the practical implementation of the prediction models in diverse agricultural systems.
  • The scope for further developments in barley yield prediction utilizing supervised ML in future fields is broad, encompassing data collection and integration, advanced ML techniques, model interpretability, decision support system integration, scalability, and socio-economic considerations. Future research efforts should focus on addressing these areas to maximize the potential of supervised ML for accurate and practical barley yield prediction.

REFERENCES

  1. Bangladesh Bureau of Statistics, http://www.bbs.gov.bd/, last accessed 23/05/2023.
  2. The Economics Time, https://economictimes.indiatimes.com/definition/barley, last accessed 23/05/2023.
  3. Rizwana Mehmood, Rabia Mustafa & Tania Ijaz (2023). Optimizing Water Utilization Effectiveness in Rice Agriculture: An All-Inclusive Examination of Cutting-Edge Irrigation Technology. Dinkum Journal of Natural & Scientific Innovations, 2(11):719-730.
  4. OEC, https://oec.world/en/profile/bilateral-product/barley/reporter/bgd, last accessed 23/05/2023.
  5. Durga Karki & Arun Kumar Shrestha (2024). An Analysis of the Ground water for Irrigation Purpose of Morang, Nepal. Dinkum Journal of Natural & Scientific Innovations, 3(03):316-333.
  6. BURHAN, Hasan Arda. “Crop Yield Prediction by Integrating Meteorological and Pesticides Use Data with ML Methods: An Application for Major Crops in Turkey.” Ekonomi Politika ve Finans Araştırmaları Dergisi 7.IERFM Özel Sayısı: 1-18.
  7. Nishant, Potnuru Sai, et al. “Crop yield prediction based on Indian agriculture using ML.” 2020 International Conference for Emerging Technology (INCET). IEEE, 2020.
  8. Islam, Tanhim, Tanjir Alam Chisty, and Amitabha Chakrabarty. “A deep neural network approach for crop selection and yield prediction in Bangladesh.” 2018 IEEE Region 10 Humanitarian Technology Conference (R10-HTC). IEEE, 2018.
  9. Kumar, Y. Jeevan Nagendra, et al. “Supervised ML approach for crop yield prediction in agriculture sector.” 2020 5th International Conference on Communication and Electronics Systems (ICCES). IEEE, 2020.
  10. Champaneri, Mayank, et al. “Crop yield prediction using ML.” Technology 9 (2016): 38.
  11. Gandhi, Niketa, et al. “Rice crop yield prediction in India using support vector machines.” 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE, 2016.
  12. Kale, Shivani S., and Preeti S. Patil. “A ML approach to predict crop yield and success rate.” 2019 IEEE Pune Section International Conference (PuneCon). IEEE, 2019.
  13. Pandith, Vaishali, et al. “Performance evaluation of ML techniques for mustard crop yield prediction from soil analysis.” Journal of scientific research 64.2 (2020): 394-398.
  14. Bondre, Devdatta A., and Santosh Mahagaonkar. “Prediction of crop yield and fertilizer recommendation using ML algorithms.” International Journal of Engineering Applied Sciences and Technology 4.5 (2019): 371-376.
  15. Veenadhari, S., Bharat Misra, and C. D. Singh. “ML approach for forecasting crop yield based on climatic parameters.” 2014 International Conference on Computer Communication and Informatics. IEEE, 2014.
  16. Ahamed, AT M. Shakil, et al. “Applying data mining techniques to predict annual yield of major crops and recommend planting different crops in different districts in Bangladesh.” 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, 2015.
  17. Mahdi, Mostafa Didar, et al. “A Deep Gaussian Process for Forecasting Crop Yield and Time Series Analysis of Precipitation Based in Munshiganj, Bangladesh.” IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2020.
  18. Mamun, Shahriar, et al. “JuteBangla: A Comparative Study on Jute Yield Prediction using Supervised ML Approach based on Bangladesh Perspective.” 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2022.
  19. Haque, Kohinoor, Md Khairul Islam, and Abdus Sattar. “Wheat Production Forecasting in Bangladesh Using Deep Learning Techniques.” 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2022.
  20. Bhattacharyya, Debnath, et al. “Hybrid CNN-SVM Classifier Approaches to Process Semi- Structured Data in Barley Yield Forecasting Production.” Agronomy 13.4 (2023): 1169.

Publication History

Submitted: May 19, 2024
Accepted:   May 30, 2024
Published:  January 31, 2025

Identification

D-0340

DOI

https://doi.org/10.71017/djnsi.4.1.d-0340

Citation

Md Arafat Al Ajmir Sarker, Ayon Sen & Amit Barai (2025). Fields of Future: Utilizing Supervised Ml for Barley Yield Prediction. Dinkum Journal of Natural & Scientific Innovations, 4(01):01-10.

Copyright

© 2025 The Author(s).