(49-6) 36 * << * >> * Russian * English * Content * All Issues
Pseudo-Boolean Polynomial Method for InterpreTab. Dimensionality Reduction: A Paradigm Shift from Abstract to Meaningful Feature Extraction
T.M. Chikake 1, B.I. Goldengorin 1,2, P.M. Pardalos 3,4
1 Department of Discrete Mathematics,Phystech School of Applied Mathematics and Informatics,
Moscow Institute of Physics and Technology,
Institutsky lane 9, Dolgoprudny, 141700, Russia;
2 The Scientific and Educational Mathematical Center "Sofia Kovalevskaya Northwestern Center for Mathematical Research"
in Pskov State University,
Sovetskaya Ulitsa, 21, Pskov, 180000, Russia;
3 Center for Applied Optimization, Department of Industrial and Systems Engineering,
University of Florida,
Gainesville, FL 32611, USA;
4 LATNA, National Research University Higher School of Economics,
20 Myasnitskaya Street, Moscow 101000, Russia
PDF, 1960 kB
DOI: 10.18287/COJ1815
Pages: 1191-1201.
Full text of article: English language.
Abstract:
We present a general-purpose, training-free framework for dimensionality reduction and clustering based on per–sample pseudo–Boolean polynomials (PBP). The method constructs compact, interpreTab. features without model fitting and is evaluated under a standardized protocol that compares PBP to PCA, t-SNE, and UMAP using identical inputs and metrics: clustering alignment (V-measure, Adjusted Rand Index), cluster geometry (Silhouette coefficient, Calinski–Harabasz index, Davies–Bouldin index), and supervised probes (linear separability and boundary complexity (1–NN error)). Across 11 diverse datasets spanning tabular, signal, and ecological domains, PBP leads on linear separability in 5/11 datasets and achieves lower boundary complexity in 2/11 datasets, while remaining competitive on clustering metrics. We report best-performing aggregation and sorting configurations per dataset and provide guidance on when PBP should be preferred for interpreTab. analysis and reproducible evaluation.
Keywords:
dimensionality reduction, pseudo-Boolean polynomials, clustering, interpretable features, sample independence, feature selection.
Citation:
Chikake TM, Goldengorin BI, Pardalos PM. Pseudo-Boolean polynomial method for interpretable dimensionality reduction. Computer Optics 2025; 49(6): 1191-1201. DOI: 10.18287/COJ1815.
Acknowledgements:
First of all we thank both reviewers for their important feedback on our paper leading to essential improvements of this paper structure and clarity of our explanations. Tendai Mapungwana Chikake and Boris Goldengorin's research was supported by Ministry of Science and Higher Education of the Russian Federation (Goszadaniye), project No. FSMG-2024-0011. Boris Goldengorin acknowledges Scientific and Educational Mathematical Center "Sofia Kovalevskaya Northwestern Center for Mathematical Research" for financial support of the present study (agreement No. 075-02-2025-1607, 27.02.2025). The work of Panos Pardalos was conducted within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE).
References:
- AlBdaiwi BF, Ghosh D, Goldengorin B. Data aggregation for p-Median problems. Journal of Combinatorial Optimization 2011; 21: 348-363. DOI: 10.1007/s10878-009-9251-8.
- Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data 2021; 8(1): 53-53. DOI: 10.1186/s40537-021-00444-8.
- Anthony M, Boros E, Crama Y, Gruber A. Quadratization of symmetric pseudo-Boolean functions. Discrete Applied Mathematics 2016; 203: 1-12. DOI: 10.1016/j.dam.2016.01.001.
- Boros E, Hammer PL. Pseudo-boolean optimization. Discrete Applied Mathematics 2002; 123(1-3): 155-225. DOI: 10.1016/S0166-218X(01)00341-9.
- Chikake TM, Goldengorin B, Samosyuk A. Pseudo-boolean polynomials approach to edge detection and image segmentation. In Book: Goldengorin B, Kuznetsov S, eds. Data Analysis and Optimization. In Honor of Boris Mirkin 80th Birthday. International Conference Data Analysis, Optimization and Their Applications on the Occasion of Boris Mirkin's 80th Birthday. Cham: Springer Optimization and Its Applications; 2023: 73-87. DOI: 10.1007/978-3-031-31654-8_5.
- Chikake TM, Goldengorin B. Image edge detection using pseudo–Boolean polynomials. In Book: Osten W, ed. Sixteenth International Conference on Machine Vision (ICMV 2023), volume 13072. International Society for Optics and Photonics, SPIE, 2024: 130720O. DOI: 10.1117/12.3023452.
- Crama Y, Hammer PL. Boolean Functions: Theory, Algorithms, and Applications. Cambridge: Cambridge University Press; 2011. DOI: 10.1017/CBO9780511852008.
- Fisher RA. The use of multiple measurements in taxonomic problems. Annals of eugenics 1936; 7(2): 179-188. DOI: 10.1111/j.1469-1809.1936.tb02137.x.
- Freedman D, Drineas P. Energy minimization via graph cuts: Settling what is possible. In Book: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), CVPR '05. IEEE Computer Society, 2005: 939-946. DOI: 10.1109/CVPR.2005.143.
- Goldengorin B, Krushinsky D, Pardalos PM. Cell Formation in Industrial Engineering, volume 79 of Springer Optimization and Its Applications. New York, NY: Springer New York; 2013. ISBN: 978-1-4614-8001-3, DOI: 10.1007/978-1-4614-8002-0.
- Ishikawa H. Transformation of general binary MRF minimization to the first-order case. IEEE Transactions on Pattern Analysis and Machine Intelligence 2011; 33(6): 1234-1249. DOI: 10.1109/TPAMI.2010.91.
- Jegelka S, Bilmes J. Submodularity beyond submodular energies: coupling edges in graph cuts. In Book: CVPR 2011, 2011: 1897-1904. DOI: 10.1109/CVPR.2011.5995589.
- Knuth DE. The Art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1, 1st ed. Upper Saddle River, NJ: Addison–Wesley Professional; 2011. ISBN: 0201038048.
- Kolmogorov V, Zabih R. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 2004; 26(2): 147-159. DOI: 10.1109/TPAMI.2004.1262177.
- Street WN, Wolberg WH, Mangasarian OL. Nuclear feature extraction for breast tumor diagnosis. In Book: Acharya RS, Goldgof DB, eds. Biomedical Image Processing and Biomedical Visualization, volume 1905. International Society for Optics; Photonics; SPIE, 1993: 861-870. DOI: 10.1117/12.148698.
- Taherdoost H. Deep learning and neural networks: Decision-making implications. Symmetry 2023; 15(1723). DOI: 10.3390/sym15091723.
- Warren HS. Hacker's Delight, 2nd ed. Upper Saddle River, NJ: Addison–Wesley Professional; 2012. ISBN: 0321842685.
- Wolberg W, Mangasarian O, Street N, Street W. Breast Cancer Wisconsin (Diagnostic), 1993. Source: <https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic>, DOI: 10.24432/C5DW2B.
- Lohweg V. Banknote Authentication, 2012. Source: <https://archive.ics.uci.edu/dataset/267/banknote+authentication>, DOI: 10.24432/C55P57.
- Bartley C. Replication Data for: Pima Indians Diabetes, 2016. Source: <https://doi.org/10.7910/DVN/XFOZQR>, DOI: 10.7910/DVN/XFOZQR.
- Lyon RJ, Stappers BW, Cooper S, Brooke JM, Knowles JD. Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach. Monthly Notices of the Royal Astronomical Society 2016; 459(1): 1104-1123. DOI: 10.1093/mnras/stw656.
- Sigillito VG. Ionosphere. UCI Machine Learning Repository, 1989. Source: <https://archive.ics.uci.edu/dataset/52/ionosphere>, DOI: 10.24432/C5W01B.
- Jolliffe I. Principal Component Analysis. Springer Series in Statistics. Springer, 2002. ISBN: 9780387954424, DOI: 10.1007/978-1-4757-1904-8.
- Little M. Parkinsons. UCI Machine Learning Repository, 2007. Source: <https://archive.ics.uci.edu/dataset/174/parkinsons>, DOI: 10.24432/C59C74.
- McInnes L, Healy J, Saul N, Großberger L. UMAP: Uniform manifold approximation and projection. Journal of Open Source Software 2018; 3(29): 861-861. DOI: 10.21105/joss.00861.
- Horst AM, Hill AP, Gorman KB. Palmer archipelago penguins data in the palmerpenguins r package – an alternative to anderson's irises. The R Journal 2022; 14(1): 244-254. DOI: 10.32614/rj-2022-020.
- Sarveniazi A. An Actual Survey of Dimensionality Reduction. American Journal of Computational Mathematics 2014; 04(02): 55-72. DOI: 10.4236/ajcm.2014.42006.
- Schölkopf B, Smola A, Müller KR. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 1998; 10(5): 1299-1319. DOI: 10.1162/089976698300017467.
- Charytanowicz M, Niewczas J, Kulczycki P, Kowalski PA, Lukasik S, Zak S. Seeds, 2010. Source: <https://archive.ics.uci.edu/dataset/236/seeds>, DOI: 10.24432/C5H30K.
- Sejnowski T, Gorman R. Connectionist Bench (Sonar, Mines vs. Rocks). UCI Machine Learning Repository, 1988. Source: <https://archive.ics.uci.edu/dataset/151/connectionist+bench+sonar+mines+vs+rocks>, DOI: 10.24432/C5T01Q.
- Downey G, Briandet R, Wilson RH, Kemsley EK. Near- and mid-infrared spectroscopies in food authentication: Coffee varietal identification. Journal of Agricultural and Food Chemistry 1997; 45(11): 4357-4361. DOI: 10.1021/jf970337t.
- van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research 2008; 9(nov): 2579–2605.
© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20