Author Archives: wj.shin

Regularized Autoencoders for Isometric Representation Learning

February 3, 2022

[ Abstract ]

The recent success of autoencoders for representation learning can be traced in large part to the addition of a regularization term. Such regularized autoencoders “constrain” the representation so as to prevent overfitting to the data while producing a parsimonious generative model. A regularized autoencoder should in principle learn not only the data manifold, but also a set of geometry-preserving coordinates for the latent representation space; by geometry-preserving we mean that the latent space representation should attempt to preserve actual distances and angles on the data manifold. In this paper we first formulate a hierarchy for geometry-preserving mappings (isometry, conformal mapping of degree k, area-preserving mappings). We then show that a conformal regularization term of degree zero — i.e., one that attempts to preserve angles and relative distances, instead of angles and exact distances — produces data representations that are superior to other existing methods. Applying our algorithm to an unsupervised information retrieval task for CelebA data with 40 annotations, we achieve 79% precision at five retrieved images, an improvement of more than 10% compared to recent related work.

Neighborhood Reconstructing Autoencoders

December 8, 2021

[ Abstract ]

Vanilla autoencoders often produce manifolds that overfit to noisy training data, or have the wrong local connectivity and geometry. Autoencoder regularization techniques, e.g., the denoising autoencoder, have had some success in reducing overfitting, whereas recent graph-based methods that exploit local connectivity information provided by neighborhood graphs have had some success in mitigating local connectivity errors. Neither of these two approaches satisfactorily reduce both overfitting and connectivity errors; moreover, graph-based methods typically involve considerable preprocessing and tuning. To simultaneously address the two issues of overfitting and local connectivity, we propose a new graph-based autoencoder, the Neighborhood Reconstructing Autoencoder (NRAE). Unlike existing graph-based methods that attempt to encode the training data to some prescribed latent space distribution — one consequence being that only the encoder is the object of the regularization — NRAE merges local connectivity information contained in the neighborhood graphs with local quadratic approximations of the decoder function to formulate a new neighborhood reconstruction loss. Compared to existing graph-based methods, our new loss function is simple and easy to implement, and the resulting algorithm is scalable and computationally efficient; the only required preprocessing step is the construction of the neighborhood graph. Extensive experiments with standard datasets demonstrate that, compared to existing methods, NRAE improves both overfitting and local connectivity in the learned manifold, in some cases by significant margins. Code for NRAE is available at https://github.com/Gabe-YHLee/NRAE-public.

Neighborhood Reconstructing Autoencoders: Supplement

December 8, 2021

[ Abstract ]

Vanilla autoencoders often produce manifolds that overfit to noisy training data, or have the wrong local connectivity and geometry. Autoencoder regularization techniques, e.g., the denoising autoencoder, have had some success in reducing overfitting, whereas recent graph-based methods that exploit local connectivity information provided by neighborhood graphs have had some success in mitigating local connectivity errors. Neither of these two approaches satisfactorily reduce both overfitting and connectivity errors; moreover, graph-based methods typically involve considerable preprocessing and tuning. To simultaneously address the two issues of overfitting and local connectivity, we propose a new graph-based autoencoder, the Neighborhood Reconstructing Autoencoder (NRAE). Unlike existing graph-based methods that attempt to encode the training data to some prescribed latent space distribution — one consequence being that only the encoder is the object of the regularization — NRAE merges local connectivity information contained in the neighborhood graphs with local quadratic approximations of the decoder function to formulate a new neighborhood reconstruction loss. Compared to existing graph-based methods, our new loss function is simple and easy to implement, and the resulting algorithm is scalable and computationally efficient; the only required preprocessing step is the construction of the neighborhood graph. Extensive experiments with standard datasets demonstrate that, compared to existing methods, NRAE improves both overfitting and local connectivity in the learned manifold, in some cases by significant margins. Code for NRAE is available at https://github.com/Gabe-YHLee/NRAE-public.

Efficient neural network compression via transfer learning for machine vision inspection

October 21, 2021

[ Abstract ]

Several practical difficulties arise when trying to apply deep learning to image-based industrial inspection tasks: training datasets are difficult to obtain, each image must be inspected in milliseconds, and defects must be detected with 99% or greater accuracy. In this paper we show how, for image-based industrial inspection tasks, transfer learning can be leveraged to address these challenges. Whereas transfer learning is known to work well only when the source and target domain images are similar, we show that using ImageNet—whose images differ significantly from our target industrial domain—as the source domain, and performing transfer learning, works remarkably well. For one benchmark problem involving 5,520 training images, the resulting transfer-learned network achieves 99.90% accuracy, compared to only a 70.87% accuracy achieved by the same network trained from scratch. Further analysis reveals that the transfer-learned network produces a considerably more sparse and disentangled representation compared to the trained-from-scratch network. The sparsity can be exploited to compress the transfer-learned network up to 1/128 the original number of convolution filters with only a 0.48% drop in accuracy, compared to a drop of nearly 5% when compressing a trained-from-scratch network. Our findings are validated by extensive systematic experiments and empirical analysis.

A Riemannian geometric framework for manifold learning of non-Euclidean data

August 18, 2021

[ Abstract ]

A growing number of problems in data analysis and classification involve data that are non-Euclidean. For such problems, a naive application of vector space analysis algorithms will produce results that depend on the choice of local coordinates used to parametrize the data. At the same time, many data analysis and classification problems eventually reduce to an optimization, in which the criteria being minimized can be interpreted as the distortion associated with a mapping between two curved spaces. Exploiting this distortion minimizing perspective, we first show that manifold learning problems involving non-Euclidean data can be naturally framed as seeking a mapping between two Riemannian manifolds that is closest to being an isometry. A family of coordinate-invariant first-order distortion measures is then proposed that measure the proximity of the mapping to an isometry, and applied to manifold learning for non-Euclidean data sets. Case studies ranging from synthetic data to human mass-shape data demonstrate the many performance advantages of our Riemannian distortion minimization framework.

Age-group determination of living individuals using first molar images based on artificial intelligence

July 21, 2021

[ Abstract ]

Dental age estimation of living individuals is difficult and challenging, and there is no consensus method in adults with permanent dentition. Thus, we aimed to provide an accurate and robust artificial intelligence (AI)-based diagnostic system for age-group estimation by incorporating a convolutional neural network (CNN) using dental X-ray image patches of the first molars extracted via panoramic radiography. The data set consisted of four first molar images from the right and left sides of the maxilla and mandible of each of 1586 individuals across all age groups, which were extracted from their panoramic radiographs. The accuracy of the tooth-wise estimation was 89.05 to 90.27%. Performance accuracy was evaluated mainly using a majority voting system and area under curve (AUC) scores. The AUC scores ranged from 0.94 to 0.98 for all age groups, which indicates outstanding capacity. The learned features of CNNs were visualized as a heatmap, and revealed that CNNs focus on differentiated anatomical parameters, including tooth pulp, alveolar bone level, or interdental space, depending on the age and location of the tooth. With this, we provided a deeper understanding of the most informative regions distinguished by age groups. The prediction accuracy and heat map analyses support that this AI-based age-group determination model is plausible and useful.