Department of Computer Science
Permanent URI for this community
Browse
Browsing Department of Computer Science by Title
Now showing 1 - 20 of 107
Results Per Page
Sort Options
- ItemA Churn Prediction System for Telecommunication Company Using Random Forest and Convolution Neural Network Algorithms(EAI Endorsed Transactions on Moblie Communications and Applications, 2022-07-27) Sulaiman Olaniyi Abdulsalam; Jumoke Falilat Ajao; Bukola Fatimah Balogun; Micheal Olaolu ArowoloINTRODUCTION: Customer churn is a severe problem of migrating from one service provider to another. Due to the direct influence on the company's sales, companies are attempting to promote strategies to identify the churn of prospective consumers. Hence it is necessary to examine issues that influence customer churn to yield effective solutions to minimize churn. OBJECTIVES: The major purpose of this work is to create a model of churn prediction that assists telecom operatives to envisage clients that are more probably to be prone to churn. METHODS: The experimental strategy for this study leverages the machine learning techniques on the telecom churn dataset, employing an improved Relief-F feature selection algorithm to extract related features from the enormous dataset. RESULTS: The result demonstrates that CNN has a high prediction capability of 94 percent compared to the 91 percent Random Forest classifier. CONCLUSION: The results are of enormous relevance to the telecommunication business in improving churners and loyal clients.
- ItemA Clinical Decision Support System for Predicting Psychotic Disorder using Random Forest Algorithm(Published by Faculty of Engineering and Technology, Adeleke University. Ede. Osun State, 2023-05-19) Babatunde Roneke SeyiPsychotic disorders exert a profound effect on one's perception of reality, cognitive abilities, emotions, and conduct. By precisely pinpointing individuals at risk or in the initial phases of a psychotic disorder, healthcare providers can promptly implement interventions, employ suitable treatment approaches, and offer essential support. This proactive approach holds the potential to impede the advancement of the disorder and mitigate its long-term impact on the individual's well-being. Earlier automated predictive systems are limited in accuracy of prediction, ease of use and complexity of computation. This work implements a Clinical Decision Support System (CDSS) utilizing the Random Forest algorithm. A dataset consisting of medical records from 500 carefully selected psychotic patients at Yaba Psychiatric Hospital in Lagos, Nigeria, spanning a period of five years (January 2010 to December 2014) was obtained. The dataset provides crucial insights into the demographic and clinical characteristics of the patients, enabling the Random Forest algorithm to capture relevant patterns and relationships associated with psychotic disorders. Following model training and evaluation, employing a range of evaluation metrics such as cross-validation, F1 score, recall score, and precision score, the predictive model achieves an impressive accuracy rate of approximately 95%. The implications of this research are profound. By harnessing the power of machine learning and the Random Forest algorithm, the CDSS holds great potential in significantly enhancing psychiatric diagnoses. The accuracy attained by the predictive model showcases its reliability and effectiveness as a decision support tool for healthcare professionals. This technology has the capacity to expedite diagnosis, enabling timely interventions and personalized treatment plans for patients with psychotic disorders.
- ItemA COMPARATIVE ANALYSIS OF DEEP LEARNING TECHNIQUES FOR INSIDER THREAT DETECTION(FUW Trends in Science&Technology Journal, 2024-06-28) A.O.Akinlolu; E.R. Jimoh; I.A.Alabere; O.EkundayoAn insider threat refers to any malicious activities carried out by employees, contractors, or vendors who have authorised access to an organisation’s IT assets, resulting in significant negative impacts on its data and information resources. Existing literature reviews have identified shortcomings in utilizing diverse domains (such as system log files, file processes, logon records, HTTP, email, external drives, and the Lightweight Directory Access Protocol (LDAP)) to develop techniques capable of identifying insiders posing a threat to an organisation with minimal false positives. In contrast, this study opts for a more robust domain, specifically LAN data packets, to assess the activities of users within the organization's Local Area Network. It monitors deviations from the normal flow of data packets on the network, thereby classifying these anomalies as either malicious or benign. The synthetic dataset KDDCUP’99, evaluated in this study, is widely recognized as one of the few publicly accessible datasets for network-based anomaly detection systems. The proposed stacked ensemble model demonstrates superior predictive performance, achieving an accuracy of 98%, compared to the 91.88% and 98.58% accuracy of the individual classifiers Naïve Bayes and KNN, respectively. The model can be improved by incorporating additional user behaviours such as email communication, browser activity, and file access to enhance accuracy and applicability.
- ItemA Comparative analysis of Texture feature and 3D colour Histogram for Content Based Image Retrieval.(U6+ Consortium of African Universities Proceedings. Maiden Edition on Harnessing African Potentials for Sustainable Development, Calabar, Nigeria., 2019-09-15) Babatunde, R. S., and Ajao, J. F. (2019)The visual content of an image can be used to search for similar images based on user interest from a large database, in the art known as Content Based Image Retrieval (CBIR). In CBIR systems, the user presents a query image to the system, the system obtains the feature vector, which is compared with certain image features of the images in the database. The adequacy of the feature vectors extracted for the retrieval of appropriate and exact image from the database is an open issue which calls for continual research and attention. This work involved the retrieval of similar images based on the content of the image by comparing the feature vectors of the queried image and those of the image database using Mahalanobis distance measure. Textural feature vectors of the images were obtained using local binary pattern and 3D colour histogram feature vectors were also extracted. The performances of these two different feature vectors were compared based on the distance metrics to determine their suitability. The similar images in the database are displayed along with their similarity distance value, in which the minimum distance is a metric for the matched images. The CBIR system was implemented using a locally acquired database of over 600 different images. Various images were used to query the CBIR system, which was able to successfully output similar images. The simulation result obtained revealed that textural feature vectors are more adequate in terms of speed and accuracy in content-based image retrieval than 3D colour histogram based on the images used in the experiment.
- ItemA comparison of Boosting techniques for Classification of Microarray data(Published by Faculty of Computing and Information Systems, University of Ilorin, 2023-04-13) Babatunde Ronke SeyiContext: The advancements in technology, particularly microarrays, have played a pivotal role in enhancing crucial aspects within the domains of genomics and bioinformatics. These advancements have significantly contributed to the enhancement of illness diagnosis, evaluation of therapy response in patients, and advancements in cancer research. Microarray data often exhibits a substantial likelihood of encompassing extraneous and duplicative factors, hence introducing noise into the dataset. Consequently, the process of scrutinizing the data to identify significant patterns for diagnosis can be quite daunting when employing conventional statistical approaches. Numerous studies are currently being conducted to enhance the analysis of microarray data, with the aim of enhancing performance and prediction accuracy at an accelerated pace. Most of these earlier methods are limited in their predictive capacity and are characterised by high computational time and algorithm complexity. Objective: This research addresses some of these issues by implementing the classification of microarray data using Boosting algorithms. Method: Benchmarked on a publicly available dataset, the microarray data was cleaned, normalised and salient features carrying essential information were obtained. Three state of the art boosting algorithms; AdaBoost, Gradient Boost, and XGBoost were used in classifying the microarray data and the performance result of each was compared. Results: The experimental findings indicate that XGBoost demonstrates superior performance compared to other boosting approaches, with a classification accuracy rate of 98.18% and training time of 11seconds. Conclusions: The novelty of the experiment compared to earlier work is evident in the training time reported which is an information not frequently explicit in other report of findings.
- ItemA Deep Neural Network-Based Yoruba Intelligent Chatbot System(iSTEAMS, 2022-06-14) Babatunde, A.N., Oke, A.A., Balogun, B.F., AbdulRahman, T.A. & Ogundokun, R.O.Two Artificial Intelligence software systems, Bot and Chatbot have recently debuted on the internet. This initiate a communication between the user and a virtual agent. The modeling and performance in deep learning (DL) computation for an Assistant Conversational Agent are presented in this research (Chatbot). The deep neural network (DNN) technique is used to respond to a large number of tokens in an input sentence with more appropriate dialogue. The model was created to do Yoruba-to-Yoruba translations. The major goal of this project is to improve the model's perplexity and learning rate, as well as to find a blue score for translation in the same language. Kares is used to run the experiments, which is written in Python. The data was trained using a deep learning-based algorithm. With the use of training examples, a collection of Yoruba phrases with various intentions was produced. The results demonstrate that the system can communicate in basic Yoruba terms and that it would be able to learn simple Yoruba words. The study result when evaluated showed that the system had 80% accuracy rate. Keywords: Chatbot, Natural Language Processing, Deep Learning, Artificial Neural Network, Yoruba Language
- ItemA Generalized Neuron Model (GNM) Based Human Age Estimation.(Advances in Multidisciplinary & Scientific Research Journal., 2017-10-20) Babatunde, R.S. and Yakubu, I.A. (2017)In real-world application, the identification characteristic of face images has been widely explored, ranging from National ID card, International passport, driving license amongst others. In spite of the numerous investigation of person identification from face images, there exists only a limited amount of research on detecting and estimating the demographic information contained in face images such as age, gender, and ethnicity. This research aim at detecting the age/age range of individual based on the facial image. In this research, a generalized neuron (GN), which is a modification of the simple neuron, is used, to overcome some of the problems of artificial neural network (ANN) and improve its training and testing performance. The GN is trained with discrete wavelet transform (DWT) features obtained after the application of Canny edge detection algorithm on the face Image. Validating the technique on FGNET face images reveals that the frequency domain features obtained using the DWT captures the wrinkles on the face region, which represents a distinguishing factor on the face as humans grow older. The empirical results demonstrates that the GN outperforms the simple neuron, with detection rate of 93.5%, training time of 96.30secs, matching time of 14secs and root mean square error of 0.0523. The experimental results suggest that the GN model performs comparably and could be adopted for detecting human ages.
- ItemA Hybrid Approach for Face Morphing Detection(KASU Journal of Computer Science, 2024) Saka Kayode Kamil; Abdulrauf Uthman Tosho; Aro Taye; Seriki Aliu Adebayo; Sulaiman Olaniyi AbdulsalamBackground: In biometrics, one of the most popular study topics is the detection of face morphing attacks. However, because present methods are unable to capture significant feature changes, they are unable to strike the correct balance between accuracy and complexity. Survey investigation and analysis have shown that the existing method of face morphing detection take a bit longer time to detect the image attack due to the high computation required by facial feature extraction approaches. Conversely, further study is needed to develop a model to enhance the computational time and accuracy of the current face morphing recognition methods. The paper developed a hybrid model for face morphing detection. The FERET database was created to aid in the evaluation and development of algorithms. Local Binary Pattern (LBP) was used as feature extraction algorithm and Residue Number System (RNS) was introduced to reduce the lengthy computational time of LBP during the extraction of images. The classification accuracy of 98% was achieved for the FERET database, while an accuracy of 96% was achieved for the FRGCv2 database. An average training time of 0.0532seconds was recorded for the FERET database, while an average training time of 0.0582seconds was achieved for the FRGCv2 database. The study concluded that the high dimensionality of LBP was well reduced and optimized by the RNS algorithm, which improved the performance of face morphing recognition
- ItemA Hybridized Feature Extraction Model for Offline Yorùbá Document Recognition(Asian Journal of Research in Computer Science, 2023-03-22) Jumoke F. Ajao a*, Rafiu M. Isiaka a and Ronke S. BabatundeDocument recognition is required to convert handwritten and text documents into digital equivalents, making them more easily accessible and convenient to store. This study combined feature extraction techniques for recognizing Yorùbá documents in an effort to preserve the cultural values and heritages of the Yorùbá people. Ten Yorùbá documents were acquired from Kwara State University’s Library, and ten indigenous literate writers wrote the handwritten version of the documents. These were digitized using HP Scanjet300 and pre-processed. The pre-processed image served as input to the Local Binary Pattern, Speeded-Up-Robust-Features and Histogram of Gradient. The combined extracted feature vectors were input into the Genetic Algorithm. The reduced feature vector was fed into Support Vector Machine. A 10-folds cross-validation was used to train the model: LBP-GA, SURF-GA, HOG-GA, LBP-SURF-GA, HOG-SURF-GA, LBP-HOG-GA and LBP-HOG-SURF-GA. LBP-HOG-SURF-GA for Yorùbá printed text gave 90.0% precision, 90.3% accuracy and 15.5% FPR. LBP-HOG-SURF-GA for Handwritten Yorùbá document showed 80.9% precision, 82.6% accuracy and 20.4% (FPR) LBP-HOG-SURF-GA for CEDAR gave 98.0% precision, 98.4% accuracy and 2.6% FPR. LBP-HOG-SURF-GA for MNIST gave 99% precision, 99.5% accuracy, 99.0% and 1.1% FPR. The results of the hybridized feature extractions (LBP-HOG-SURF) demonstrated that the proposed work improves significantly on the various classification metrics.
- ItemA KNN and ANN model for predicting heart diseases(Explainable Artificial Intelligence in Medical Decision Support Systems, 2024-07-03) Abdulsalam, Sulaiman Olaniyi; Arowolo, Micheal Olaolu; Udofot, Enobong Chidera; Sanni, Ayodeji Matthew; Popoola, Damilola David; Adebiyi, Marion OlubunmiThe heart is the single most important organ in the human body. Patients, professions, and medical systems are all bearing the brunt of heart failure’s devastating effects on contemporary society. Since cardiac arrest may well be demonstrated as a better understanding or conceivably go unobserved, particularly in the vast population of clients that have other cardiovascular disorders, the true prevalence of heart failure is likely to be underestimated, accounting for only 1–4% of all hospitalized patients as test procedures in developed nations.A person with heart failure has a heart that is unable to circulate sufficient blood through the body, but the term“heart failure” does not explain why this happens. The clinical picture is confusing since there are several possible causes of heart problems, many of which are diseases in and of themselves. Many cases of heart failure can be avoided if the underlying medical conditions that cause them are identified and treated promptly. The study and prediction of cardiac conditions must be precise because numerous diseases have been connected to the cardiovascular system. The resolution of this problem requires intensive online research on the relevant topic. Since incorrect illness prognoses are a leading cause of death among heart patients, learning more about effective prediction algorithms is crucial. This research utilizes K-nearest neighbor (KNN) and artificial neural network (ANN) to assess cardiovascular diseases using data collected from Kaggle. The highest accuracy (96%) was achieved by ANN trained with the standard scalar. Medical experts, specialists, and academics can all benefit greatly from this study. Based on the results of this study, cardiologists will be able to make more knowledgeable decisions about the inhibition, analysis, and handling of heart disease
- ItemA Kohonen Self Organizing Map (KSOM) Technique for Classification of Electrocardiogram (ECG) Signals.(Published by Faculty of Computing and Informatics. Ladoke Akintola University of Technology, Ogbomoso. Oyo State., 2024-03-15) Babatunde Ronke SeyiElectrocardiogram (ECG) signals are crucial in diagnosing cardiovascular diseases. Handling noisy ECG data, which is common in real-world situations makes accurate classification a critical task. Because ECG signals are faint and are quickly disrupted, classification accuracy can be poor, hence the need for improvement in the automatic ECG categorization system's recognition accuracy. The Kohonen Self Organising Map (KSOM) is known for its ability to cluster high-dimensional data in a low-dimensional space, hence its adoption in this research. The procedure employed include collection and pre-processing of diverse ECG data, including normal and abnormal cardiac rhythms. Inherent noise was removed from the data to ensure better-quality input data into the classification algorithm. A Kohonen SOM neural network (MiniSom model) was trained using the preprocessed ECG data. The KSOM organizes ECG signals into clusters on a topological map, preserving similarities and dissimilarities between different cardiac rhythms. Subsequently, the trained SOM serves as a reference model for classifying unseen ECG signals, indicating the corresponding cardiac rhythm. Benchmarked on two different dataset, evaluation of the classification performance of the technique was carried out. Cross-validation was done to assess the model's robustness and generalizability. Comparative analysis was conducted to measure the effectiveness and efficiency of the SOM-based approach against other common ECG signal classification techniques based on accuracy, precision, recall and fi-score. The result obtained shows that the average accuracy of 94.2%, precision of 83%, recall of 100% and f1-score of 91% achieved by the MiniSOM model outperformed the other models.
- ItemA Logistic Regression-Based Technique for Predicting Type II Diabetes(Journal of The Faculty of Computational Sciences & Informatics, 2024) Ronke Seyi Babatunde; Akinbowale Nathaniel Babatunde; Shuaib Babatunde MohammedIn recent years, diabetes has emerged as one of the main causes of death for people. The spread of unhealthy foods, sedentary lifestyles, and eating habits have all contributed to the annual increase in the incidence of diabetes. A diabetes prediction model can help with clinical management decision-making. Diabetes prevention may be aided by being aware of potential risk factors and early detection of high-risk individuals. Numerous diabetes prediction models have been created. The size of the data set to be used was an issue in earlier research, but more recent studies have incorporated the use of high-quality, trustworthy data sets, such as the Vanderbilt and PIMA India data sets. Recent research has demonstrated that a few variables, including glucose, pregnancy, body mass index (BMI), the function of the diabetic pedigree, and age, can be used to predict Type II diabetes. Machine learning models of these parameters can be used to accurately predict the chance of the disease occurring as it was investigated in this study. In order to predict Type II diabetes, this study used the machine learning method Logistic Regression.
- ItemA Logistic Regression-Based Technique for Predicting Type II Diabetes.(Published by Academic City University College, Accra, Ghana., 2024-01-29) Babatunde, R.S., Babatunde, A.N., Balogun, B.F., Abdulrahman, T.A., Umar, E., Ajiboye, R.A., Mohammed, S.B., Oke, A.A. and Obiwusi, K.Y. (2024)In recent years, diabetes has emerged as one of the main causes of death for people. The spread of unhealthy foods, sedentary lifestyles, and eating habits have all contributed to the annual increase in the incidence of diabetes. A diabetes prediction model can help with clinical management decision-making. Diabetes prevention may be aided by being aware of potential risk factors and early detection of high-risk individuals. Numerous diabetes prediction models have been created. The size of the data set to be used was an issue in earlier research, but more recent studies have incorporated the use of high-quality, trustworthy data sets, such as the Vanderbilt and PIMA India data sets. Recent research has demonstrated that a few variables, including glucose, pregnancy, body mass index (BMI), the function of the diabetic pedigree, and age, can be used to predict Type II diabetes. Machine learning models of these parameters can be used to accurately predict the chance of the disease occurring as it was investigated in this study. In order to predict Type II diabetes, this study used the machine learning method Logistic Regression.
- ItemA Machine Learning Approach to Dropout Early Warning System Modeling. International(Journal of Advanced Studies In Computer Science And Engineering, 2019-10-18) Isiaka R.M., Babatunde R. S., Ajao F.J. and Abdulsalam S.O. (2019)Dropout has been identified as a social problem with an immediate and long term consequences on the socioeconomic development of the society. A major element of a conceptual framework for dropout monitoring and control system is the prevention component. The component specified the need for automatic dropout early warning system. This paper presents the procedure for building the adaptive model, that is the core element for the realization of the prevention component of the framework. The model development is guided by the knowledge of the domain experts. An experimental approach was used to identify the K-Nearest Neighbors (KNN) as the best of the six algorithms considered for the adaptive models. Other algorithms explored are Logistic Regression (LR), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Gaussian Naive Bayes (NB) and Support Vector Machines (SVM). Historical dataset from a State University in the North Central region of Nigeria was used for building and testing the adaptive model. The average performance of the model on full attributes and the first year academic attributes compared favourably with 0.98 Precision, Recall and F1-Scores. Conversely, the Precision, Recall and F1-Scores on the entry attributes are 0.90, 0.95 and 0.92 respectively. The implication of this is that the academic performance is a very significant factor for dropping out of the educational system. The developed model can provide an early signal on the students with the propensity to dropout, thereby serving as an advisory system for the students, parents and school management towards curtailing the menace.
- ItemA Neuro-Fuzzy-based Approach to Detect Liver Diseases(Published by University of Lahore, Lahore, Pakistan. Pakistan Engineering Council (PEC)., 2024-03-22) Babatunde, R. S., Babatunde, A. N., Balogun, B. F., Abdulahi, A. T., Umar, E., Mohammed, S. B., Ajiboye, A. R., Obiwusi, K. Y., and Oke, A. A. (2024):The liver is a crucial organ in the human body and performs vital functions essential for overall health, including metabolism, immunity, digestion, detoxification, and vitamin storage. Detecting liver diseases at an early stage poses challenges due to the liver's ability to function adequately despite partial damage. Early detection is crucial as liver diseases have significant clinical and socio-economic impacts, affecting other organ systems and requiring timely intervention to improve patient survival rates. Classical diagnostic methods for liver disorders may not always produce better results, thus necessitating more advanced and accurate diagnostic systems. Intelligent systems like predictive modeling and decision support systems, have shown promising results in recent years in disease detection and are aiding medical practitioners. In this research, a neuro fuzzy-based system integrating neural networks and fuzzy logic (FL) was implemented. Based on the risk factors present in the dataset that was used to benchmark the algorithm, this system offered a classification accuracy of 97% which is comparable with existing systems in the literature. This study creates a neuro-fuzzy system for early liver disease identification, solving diagnostic issues and offering healthcare improvements. The proposed system is validated by presenting the simulation results.
- ItemA Neuro-Fuzzy-based Approach to Detect Liver Diseases(Pakistan Journal of Engineering and Technology, 2024) Ronke Seyi Babatunde; Shuaib Babatunde Mohammed; Akinbowale Nathaniel BabatundeThe liver is a crucial organ in the human body and performs vital functions essential for overall health, including metabolism, immunity, digestion, detoxification, and vitamin storage. Detecting liver diseases at an early stage poses challenges due to the liver's ability to function adequately despite partial damage. Early detection is crucial as liver diseases have significant clinical and socio-economic impacts, affecting other organ systems and requiring timely intervention to improve patient survival rates. Classical diagnostic methods for liver disorders may not always produce better results, thus necessitating more advanced and accurate diagnostic systems. Intelligent systems like predictive modeling and decision support systems, have shown promising results in recent years in disease detection and are aiding medical practitioners. In this research, a neuro fuzzy-based system integrating neural networks and fuzzy logic (FL) was implemented. Based on the risk factors present in the dataset that was used to benchmark the algorithm, this system offered a classification accuracy of 97% which is comparable with existing systems in the literature. This study creates a neuro-fuzzy system for early liver disease identification, solving diagnostic issues and offering healthcare improvements. The proposed system is validated by presenting the simulation results.
- ItemA Neuro-Fuzzy-based Approach to Detect Liver Diseases(Pakistan Journal of Engineering and Technology, 2024) Ronke Seyi Babatunde; Akinbowale Nathaniel Babatunde; Shuaib Babatunde MohammedThe liver is a crucial organ in the human body and performs vital functions essential for overall health, including metabolism, immunity, digestion, detoxification, and vitamin storage. Detecting liver diseases at an early stage poses challenges due to the liver's ability to function adequately despite partial damage. Early detection is crucial as liver diseases have significant clinical and socio-economic impacts, affecting other organ systems and requiring timely intervention to improve patient survival rates. Classical diagnostic methods for liver disorders may not always produce better results, thus necessitating more advanced and accurate diagnostic systems. Intelligent systems like predictive modeling and decision support systems, have shown promising results in recent years in disease detection and are aiding medical practitioners. In this research, a neuro fuzzy-based system integrating neural networks and fuzzy logic (FL) was implemented. Based on the risk factors present in the dataset that was used to benchmark the algorithm, this system offered a classification accuracy of 97% which is comparable with existing systems in the literature. This study creates a neuro-fuzzy system for early liver disease identification, solving diagnostic issues and offering healthcare improvements. The proposed system is validated by presenting the simulation results
- ItemA New Hash Function Based on Chaotic Maps and Deterministic Finite State Automata(IEEE, 2020-06-16) Moatsum Alawida; Je Sen Teh; Damilare Peter Oyinloye; Musheer Ahmad; Rami S AlkhawaldehIn this paper, a new chaos-based hash function is proposed based on a recently proposed structure known as the deterministic chaotic finite state automata (DCFSA). Out of its various configurations, we select the forward and parameter permutation variant, DCFSAFWP due to its desirable chaotic properties. These properties are analogous to hash function requirements such as diffusion, confusion and collision resistance. The proposed hash function consists of six machine states and three simple chaotic maps. This particular structure of DCFSA can process larger message blocks (leading to higher hashing rates) and optimizes its randomness. The proposed hash function is analyzed in terms of various security aspects and compared with other recently proposed chaos-based hash functions to demonstrate its efficiency and reliability. Results indicate that the proposed hash function has desirable statistical characteristics, elevated randomness, optimal diffusion and confusion properties as well as flexibility.
- ItemA novel approach to outliers removal in a noisy numeric dataset for efficient Mining.(Published by Faculty of Computing and Information Systems, University of Ilorin., 2016-02-16) Ajiboye, A. R, Adewole, K. S., Babatunde, R. S. and Oladipo, I. D. (2016):Data pre-processing is a key task in the data mining process. The task generally consumes the largest portion of the total data engineering effort while unveiling useful patterns from datasets. Basically, data mining is about fitting descriptive or predictive models from data. However, the presence of outlier sometimes reduces the reliability of the models created. It is, therefore, essential to have raw data properly pre-processed before exploring them for mining. In this paper, an algorithm that detects and removes outliers in a numeric dataset is proposed. In order to establish the effectiveness of the proposed algorithm, the clean data obtained through the implementation of the proposed approach is used to create a prediction model. Similarly, the clean data obtained through the use of one of the existing techniques is also used to create a prediction model. Each of the models created is simulated using a set of untrained data and the error associated with each model is measured. The resulting outputs from the two approaches reveal that, the prediction model created using the output from the proposed algorithm has an error of 0.38, while the prediction model created using the cleaned data from the clustering method gives an error of 0.61. Comparison of the errors associated with the models created using the two approaches shows that, the proposed algorithm is suitable for cleaning numeric dataset. The results of the experiment also unveils that, the proposed approach is efficient and can be used as an alternative technique to other existing cleaning methods.
- ItemA novel smartphone application for early detection of habanero disease.(Scientific Report., 2024-01-23) Babatunde, R. S., Babatunde A. N, Ogundokun R.O, Yusuf O.K, Sadiku P.O and Shah M.A (2024)Habanero plant diseases can significantly reduce crop yield and quality, making early detection and treatment crucial for farmers. In this study, we discuss the creation of a modified VGG16 (MVGG16) Deep Transfer Learning (DTL) model-based smartphone app for identifying habanero plant diseases. With the help of the smartphone application, growers can quickly diagnose the health of a habanero plant by taking a photo of one of its leaves. We trained the DTL model on a dataset of labelled images of healthy and infected habanero plants and evaluated its performance on a separate test dataset. The MVGG16 DTL algorithm had an accuracy, precision, f1-score, recall and AUC of 98.79%, 97.93%, 98.44%, 98.95 and 98.63%, respectively, on the testing dataset. The MVGG16 DTL model was then integrated into a smartphone app that enables users to upload photographs, get diagnosed, and explore a history of earlier diagnoses. We tested the software on a collection of photos of habanero plant leaves and discovered that it was highly accurate at spotting infected plants. The smartphone software can boost early identification and treatment of habanero plant diseases, resulting in higher crop output and higher-quality harvests.