Browsing by Author "Yahya W. B."
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- ItemA Test Procedure for Ordered Hypothesis of Population Proportions Against a Control(Turkiye Klinikleri Journal of Biostatistics, 2016) Yahya W. B.; Olaniran O. R.; Garba M. K.; Oloyede I.; Banjoko A. W.; Dauda K. A.; Olorede K. O.Objective: This paper aims to present a novel procedure for testing a set of population proportions against an ordered alternative with a control. Material and Methods: The distribution of the test statistic for the proposed test was determined theoretically and through Monte-Carlo experiments. The efficiency of the proposed test method was compared with the classical Chi-square test of homogeneity of population proportions using their empirical Type I error rates and powers at various sample sizes. Results: The new test statistic that was developed for testing a set of population proportions against an ordered alternative with a control was found to have a Chi-square distribution with non-integer values degrees of freedom v that depend on the number of population groups k being compared. Table of values of v for comparing up to 26 population groups was constructed while an expression was developed to determine v for cases where k > 26. Further results showed that the new test method is capable of detecting the superiority of a treatment, for instance a new drug type, over some of the existing ones in situations where only the qualitative data on users' preferences of all the available treatments (drug types) are available. The new test method was found to be relatively more powerful and consistent at estimating the nominal Type I error rates (α), especially at smaller sample sizes than the classical Chi-square test of homogeneity of population proportions. Conclusion: Conclusion: The new test method proposed here could find applications in pharmacology where a newly developed drug might be expected to be more preferred by users than some of the existing ones. This kind of test problem can equally exist in medicine, engineering and humanities in situations where only the qualitative data on users' preferences of a set of treatments or systems are available.
- ItemA Trade-Off between the PLSR and PCR Models for Modelling Data with Collinear Structure(Nigerian Association of Mathematical Physics, 2017-01) Yahya W. B.; Olorede K.O.; Garba M.K.; Banjoko A.W.; Dauda K.A.This paper investigates the partial least squares regression (PLSR) and principal component regression (PCR) methods as versatile alternative regression techniques when the use of the ordinary least squares method breaks down. Emphasis is more on the situation where the predictor variables are evidently correlated. Data sets with Gaussian non-orthogonal predictor variables were simulated at different sample sizes ranging from 20 to 1000 to examine the performance of the two regression types under varying situations. The data were randomly partitioned into training and test sets with both PLSR and PCR models constructed on the training sets while their performances were evaluated on the test sets using themean square error of predictions and other indices. At each fit of the models, the leave-one-out cross-validation technique was employed to enhance the efficiency and stability of the fitted models. Results from the simulation studies revealed the goodness of the two regression methods but at varying degrees of accuracy. More importantly, it is evident from the results that though, both the PLSR and PCR techniques yielded good regression models, the PLSRtechniqueis consistently more efficient on the test datain terms of good predictions than the PCR method irrespective of sample sizes. Also in terms of model parsimony, the PLSR technique yielded efficient regression models with relatively fewer latent components than the PCR method. Data sets on the performance of M.Sc. graduates from the Department of Statistics, University of Ilorin, Nigeria during the 2012 academic session were used to validate the results from the Monte Carlo studies.
- ItemEfficient Support Vector Machine Classification of Diffused Large B-Cell Lymphoma and Follicular Lymphoma mRNA Tissue Samples(Annals Computer Science Series, 2015) 11. Banjoko A. W.; Yahya W. B.; Garba M. K.; Olaniran O. R.; Dauda K. A.; Olorede K. O.In this study, an efficient Support Vector Machine (SVM) algorithm that incorporates feature selection procedure for efficient identification and selection of gene biomarkers that are predictive of Diffuse Large B–Cell Lymphoma (DLBCL) and Follicular Lymphoma (FL) cancer tumor samples is presented. The data employed were published real life microarray cancer data that contained 7,129 gene expression profiles measured on 77 biological samples that comprised 58 DLBCL and 19 FL tissue samples. The dimension reduction approach of the Welch statistic was employed at the feature selection phase of the SVM algorithm. The cost and kernel parameters of the SVM model were tuned over a 10–fold cross-validation to improve the efficiency of the SVM classifier. The entire sample was randomly partitioned into 95% training and 5% test samples. The SVM classifier was trained using Monte Carlo Cross validation approach with 1000 replications. The performance of this classifier was assessed on the test samples using misclassification error rate (MER) and other performance measures. The results showed that the SVM classifier is quite efficient by yielding very high prediction accuracy of the tumor samples with fewer differentially expressed genes. The selected gene biomarkers in this work can be subjected to further clinical screening for proper determination of their biological relationship with DLBCL and FL tumour sub groups. However, more studies with large samples might be needed in future to validate the results from this work.
- ItemMulticlass Feature Selection and Classification with Support Vector Machine in Genomic Study(Professional Statisticians Society of Nigeria (PSSN), 2017) Banjoko A. W.; Yahya W. B.; Garba M. K.; Olaniran O. R.; Amusa L. B.; Gatta N. F.; Dauda K. A.; Olorede K. O.This study proposes an efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multiclass response group in high dimensional (microarray) data. The Feature selection stage of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate (FWER) to control for the detection of some false positive features. In a 10-fold cross validation, the hyper-parameters of the SVM were tuned to determine the appropriate kernel using one-versus-all approach. The entire simulated dataset was randomly partitioned into 95% training and 5% test sets with the SVM classifier built on the training sets while its prediction accuracy on the response class was assessed on the test sets over 1000 Monte-Carlo cross-validation (MCCV) runs. The classification results of the proposed classifier were assessed using the Misclassification Error Rates (MERs) and other performance indices. Results from the Monte-Carlo study showed that the proposed SVM classifier was quite efficient by yielding high prediction accuracy of the response groups with fewer differentially expressed features than when all the features were employed for classification. The performance of this new method on some published cancer data sets shall be examined vis-à-vis other state-of-the-earth machine learning methods in future works.
- ItemPartial Least Squares-Based Classification and Selection of Predictive Variables of Crimes against Properties in Nigeria(Professional Statisticians Society of Nigeria (PSSN), 2017) Olorede K. O.; Yahya W. B.; Garuba A. O.; Banjoko A. W.; Dauda K. A.In this study, the state-of-the-art Partial Least Squares (PLS) based models (PLS-Discriminant analysis (PLS-DA), Sparse PLS-DA (SPLS-DA) and Sparse Generalized PLS (SGPLS)) were employed to model and classify the rate of crimes (low or high) committed against properties across the 36 states in Nigeria and the Federal Capital Territory (FCT). The core variables that are predictive of this crime type in Nigeria were identified using the LASSO penalty method via the PLS. Data on occurrences of cases of offences against property obtained from the data base of Nigerian Police Force were utilized in this study. The missing values due to non-occurrence or non-reportage of crime cases were imputed, using the techniques of multivariate imputation by chained equation. The complete data set were partitioned into training and test sets using 80:20 holdout scheme. The 80% training set was used to build the PLS-based models that were in turn used to predict the overall crime rates of Nigerian cities in the 20% held out test data over 200 Monte-Carlo cross-validation runs. All the PLS-based models yielded good classification of unseen test samples into either of two qualitative classes of high and low crime rates with average Correct Classification Rate (CCR) of 94%. Other performance metrics including sensitivity, specificity, positive and negative predictive values, balance accuracy and diagnostic odds ratio were estimated to further examine their classification efficiencies. The SGPLS identified fewer (just 3 out of 12) core relevant crime variables that are predictive of the overall crime rates in Nigerian states with highest CCR than the SPLS which selected 9 such variables to achieved about the same feat.
- ItemSufficient Dimension Reduction Based Classification of Nigerian Cities by Crimes against Properties Safety(Professional Statisticians Society of Nigeria (PSSN), 2020) Olorede K. O.; Yahya W. B.This study adopts the method of Sufficient Dimension Reduction (SDR) to estimate sufficient predictors for visualizing the data of crimes against pproperties in Nigerian cities and training statical classification models that are capable of efficiently detecting true safety status of such cities without losing information. Modified version of the sliced inverse regression (SIR) methods was adopted by replacing the usual maximum likelihood covariance estimator by the Hybridized Smoothed Maximum Entropy Covariance Estimator (HSMEC) proposed by Olorede and Yahya in the dimension reduction step. All the seven statistical classifiers achieved excellent results based the first sufficient predictor estimated by the modified (SIR HSMEC) with k-Nearest Neighbour model with one optimal neighbour achiving false positive rate of 0% and 100% classification accuracy, sensitivity, specificity, and area under the curve, respectively.