(48-3) 18 * << * >> * Russian * English * Content * All Issues

Information-theoretic bounds for accuracy of letter encoding and pattern recognition via ensembles of datasets
M.M. Lange 1, A.M. Lange 1

Federal Research Center "Computer Science and Control" RAS,
119333, Moscow, Russia, Vavilova 42

 PDF, 1193 kB

DOI: 10.18287/2412-6179-CO-1362

Pages: 460-470.

Full text of article: Russian language.

Abstract:
In this paper, we study stochastic models for discrete letter encoding and object classification via ensembles of different modality datasets. For these models, the minimal values of the average mutual information between a given ensemble of datasets and the corresponding set of possible decisions are constructed as the appropriate monotonic decreasing functions of a given admissible error probability. We present examples of such functions constructed for a scheme of coding independent letters represented by pairs of observation values with possible errors as well as for a scheme of classifying composite objects given by pairs of face and signature images. The inversions of the obtained functions yield the lower bounds for the error probability for any amount of processed information. So, these functions can be considered as the appropriate bifactor fidelity criteria for source coding and object classification decisions. Moreover, the obtained functions are similar to the rate distortion function known in the information theory.

Keywords:
source coding, ensemble of datasets, entropy, object classification, error probability, mutual information, rate distortion function.

Citation:
Lange MM, Lange AM. Information-theoretic bounds for accuracy of letter encoding and pattern recognition via ensembles of datasets. Computer Optics 2024; 48(3): 460-470. DOI: 10.18287/2412-6179-CO-1362.

Acknowledgements:
This work was supported by the RF Ministry of Science and Higher Education within the State assignment of the FRC “Computer Science and Control” RAS.

References:

  1. Gallager RG. Information theory and reliable communication. New York: Wiley & Sons; 1968. ISBN: 0471-29048-3.
  2. Lam L, Suen CY. Application of majority voting to pattern recognition: An analysis of its behavior and performance. IEEE Trans Syst Man Cybern A Syst 1997; 27(5): 553-568. DOI: 10.1109/3468.618255.
  3. Kuncheva LI, Whitaker CJ, Shipp CA, Duin RPW. Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 2003; 6(1): 22-31. DOI: 10.1007/s10044-002-0173-7.
  4. Dobrushin RL, Tsybakov BS. Information transmission with additional noise. IRE Trans Inf Theory 1962; 8(5): 293-304. DOI: 10.1109/TIT.1962.1057738.
  5. Berger T. Rate distortion theory: A mathematical basis for data compression. New Jersey: Prentice-Hall Inc, Englewood Cliffs; 1971. ISBN: 013-753103-6.
  6. Lange MM, Lange AM. Information-theoretic lower bounds to error probability for the models of noisy discrete source coding and object classification. Pattern Recogn Image Anal 2022; 32(3): 570-574. DOI: 10.1134/S105466182203021X.
  7. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley & Sons; 2001. ISBN: 978-0471056690.
  8. Djukova EV, Zhuravlev YuI, Prokofjev PA. Logical cor-rectors in the problem of classification by precedents. Comput Math Math Phys 2017; 57(11): 1866-1886. DOI: 10.1134/S0965542517110057.
  9. Sueno HT, Gerardo BD, Medina RP. Medina multi-class document classification using Support Vector Machine (SVM) based on improved Naïve Bayes Vectorization technique. Int J Adv Trends Comput Sci Eng 2020; 9(3): 3937-3944. DOI: 10.30534/ijatcse/2020/216932020.
  10. Brown G, Pocock A, Zhao MJ, Luján M. Conditional likelihood maximization: A unifying framework for information theoretic feature selection. J Mach Learn Res 2012; 13(8): 27-66.
  11. Xu X, Huang SL, Zheng L, Wornell GW. An information theoretic interpretation to deep neural networks. Entropy 2022; 24(1): 135. DOI: 10.3390/e24010135.
  12. Lange MM, Ganebnykh SN. On fusion schemes for multiclass classification with reject in a given ensemble of sources. J Phys Conf Ser 2018; 1096: 012048. DOI: 10.1088/1742-6596/1096/1/012048.
  13. Denisova AY, Sergeev VV. Algorithms for calculating multichannel image histogram using hierarchical data structures. Computer Optics 2016; 40(4): 535-542. DOI: 10.18287/2412-6179-2016-40-4-535-542.
  14. Lange AM, Lange MM, Paramonov SV. Tradeoff Relation between Mutual Information and Error Probability in Data lassification Problems. Comput Math Math Phys 2021; 61(7): 1181-1193. DOI: 10.1134/S0965542521070113.
  15. Distance matrices for face dataset. 2020. Source: <http://sourceforge.net/projects/distance-matrices-face>.
  16. Distance matrices for signature dataset. 2020. Source: <http://sourceforge.net/projects/distance-matrices-signature>.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20