Fast Text Placement Scheme for ASCII Art Synthesis

This study suggests an algorithm that creates ASCII art from a binary image. Our approach aims to generate ASCII art in a short period of time using multi-threaded local optimizations for a text placement method instead of a global optimization. To generate ASCII art from various images, the original image is first converted into a thinned black and white image suitable for generating ASCII art. We then extract the pixel orientations from the input image and introduce a character similarity scheme that considers these orientations. We also propose a novel text placement algorithm to complete ASCII art in a swift manner. Our final system suggested here can generate ASCII art using a variety of proportional fonts. The results of the experiments of this study show that the suggested system can generate ASCII art much faster than existing state-of-the-art techniques using proportional fonts.


I. INTRODUCTION
Frequent transmission of large-sized images consumes a large amount of data on networks and websites. Under a slow Internet environment, using ASCII art images can be of great use. Since ASCII art images consist of text, the data size is much smaller than that of general bitmap-based images. In addition, since ASCII art is a text-based image which also allows users to easily edit and make various revisions to it. Thanks to these characteristics, ASCII art is frequently used in web pages that do not support image uploading, and some artists create cartoon-style art by properly arranging the ASCII art images and text.
ASCII art can largely be classified into two sectors: structure-based and tone-based (see Figure 1). Structurebased ASCII art mainly represents the structure of 2D line drawing images articulated with outlines. Tone-based ASCII art, on the other hand, expresses the brightness or color of a reference image in a more realistic manner. In general, the structure-based method is a more complicated scheme than the tone-based method, as it is necessary to understand the structure of the reference image to create the ASCII art images. In this study, we propose a system for generating structure-based ASCII art.
When artists manually create a structure-based ASCII art image, they usually place the appropriate characters on top of the original image. This is a time consuming and tedious task. Thankfully, there are several ways to convert reference images to structure-based ASCII art on behalf of the artists [2]- [5]. Although some of these systems show decent quality in the output, it takes a long time to run the system. Recently, an ASCII art synthesis system using a Convolutional Network (CNN) was proposed [6]. Based on machine learning, the system can reproduce the 'style' of the artist-created reference images. However, machine learning based system cannot reproduce a style that system did not learn. This means that a set of reference images are necessary for every font type or size.
In this paper, we suggest a new automatic ASCII art generation system that complements the shortcomings of (a) Structure-based ASCII art (b) Tone-based ASCII art FIGURE 1. Example of structure-based ASCII art (a) and tone-based ASCII art (b). Example of tone-based ASCII art was taken from Markuš et al. [1] This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. other conventional methods. Our system aims to achieve the following: • Quality: The system can create ASCII art of high quality. • Efficiency: The results are created in a short period of time so that the artist can edit them immediately. • Adaptability: The system can create ASCII art stably regardless of the font type or size. Two algorithms are at the heart of our system: one is to extract the orientation features of the input image and use it for matching the most suitable characters, and another is to place such character on the input image stripes, which is called the text placement algorithm. The text placement suggested in this study was inspired by how artists manually make ASCII art. The suggested system does not just place the character images from the left to the right side of the input image stripes. Instead, it places the most suitable character in the local optimum location of the stripe and repeats the process. With this greedy algorithm, the suggested system can generate ASCII art in a short period of time using a multithreaded implementation. Since the suggested algorithm does not use a neural network or machine learning, it can immediately respond to changes in font type or font size, or changes in the type or number of characters used.
The results generated from the suggested algorithm were compared with a state-of-the-art ASCII art generation system [5] and the CNN-based system [6]. The results show that the suggested system can generate ASCII art at a much faster rate than the existing system. Our contributions can be summarized as follows: • A novel character-image matching algorithm extracts orientation features from the image and finds the most suitable characters through comparison with such features. • An efficient text placement algorithm places characters on the stripes of the image. This algorithm allows us to generate ASCII art from a 512x512 input image in up to 8.7 seconds.

II. RELATED WORKS A. STRUCTURE-BASED ASCII ART SYNTHESIS
There have been several studies on ASCII art generating methods. Xu et al. [2] created ASCII art from an input image using the alignment-insensitive shape similarity (AISS) metric and constrained deformation model. Miyake et al. [3], proposed real-time ASCII art generation based on the glyph matching method using Normalized Cross-Correlation (NCC) or Histogram of Oriented Gradients (HOG). However, these methods can only be used in environments where fixed-width fonts are used: not in an environment using proportional fonts. To create ASCII art in a proportional font environment, a similarity metric and dynamic programmingbased optimal placement method using multi-orientation phase congruency model is suggested (Xu et al. [4]). Xu et al. [5] further improved the output quality using non-classical receptive field (non-CRF) modulation to extract structures from an image to create ASCII art. The system can generate ASCII art from natural photographs as well as from line drawing images. However, it takes much time to create ASCII art of a decent quality with this system. To tackle this issue, we suggest a new similarity metric and a fast text placement method in this study. Akiyama suggested an ASCII art generating method using a CNN [6]. This study used the ASCII art created by a human artist and a rough sketch converted from the ASCII art to train the network. Although this method was able to produce art of good quality, the network was not trained with hand-drawn original images but instead with system-generated images. Such a bias in the dataset may decrease the adaptability of the learned network, for example, when the original image has a dark background. Moreover, only particular fonts-based ASCII art can be generated.

B. STRUCTURE LINE DETECTION
To produce structure-based ASCII art, it is necessary to detect the structure lines from the input image. One of the simplest ways of structure line detection is to use an edge detector such as the canny edge detector [7]. However, edge detectors depend largely on the contrast and scale of the image. Several studies have proposed methods to extract structure lines from input images. Kang et al. [8] used a flow-based difference of gaussian (DoG) filter to create a line drawing style image from an input image. Arbelaez et al. [9], proposed a contour detector that combines multiple local cues with a spectral clustering-based globalization framework. Kokkinos [10] proposed a boundary detection system using a deep CNN. Simo-Serra [11] suggested a CNN-based framework that converts a pencil drawn rough sketch to a clean line drawing image. Li et al. [12], proposed a deep network model that can remove the texture from a manga image with screen patterns and extract structural lines from it. This study used the method employed by Kang et al. [8] in generating a structure line image from an original image.

C. GENERATING FEATURES
The structure line detection algorithm can extract the structure lines of the original image, but it is difficult to use those lines for ASCII art synthesis. The thinning algorithm can be used to facilitate the comparison between structure lines and ASCII characters. Studies on thinning have been around for several decades, with a variety of thinning algorithms being proposed [13]- [16].
An image similarity metric can be used in matching the structure lines with the ASCII characters. For example, SSIM and its extensions compare the luminance, contrast, and structure of an image to calculate the similarity [17]- [20]. Meanwhile, a phase congruency-based similarity metric has also been proposed [21]. Phase congruency is a technique used for the state-of-the-art technique of ASCII art synthesis [4], [5]. The SIFT [22] identifies key points from gaussian differences of an image for image matching. In this study VOLUME 4, 2016 we used an image's pixel orientation-based similarity metric [23]. This method can identify the orientation of the pixels with accuracy using the surrounding pixels' data, and can be applied to fingerprint recognition, etc [24].

D. IMAGE STYLIZATION
Research regarding image stylization, which translates input images into different styles of images, has been conducted for several years and considerable advances have been achieved. Decarlo and Santella [25] converted an input image to a linedrawing style image through image segmentation and edge detection, while Lu et al. [26] generated a pencil drawing style image from a natural image. Kim et al. [27] presented a GAN-based system that uses color tag information to paint the image, Chen et al. [28] suggested a fully convolutional network to perform various advanced image processing operations, Fischer et al. [29] presented stylization of an augmented reality screen, and Lin et al. [30] proposed an abstraction layout that generates a flat design style black and white image in the 3D model. However, few studies on image hatching into hundreds of image patches, such as ASCII art synthesis, have been reported. This study aims to divide the input image into various types of rectangular character images.

III. IMAGE PREPROCESSING A. STRUCTURE LINE EXTRACTION
In general, the font characters used for ASCII art are a onepixel-width binary image. However, most of the input images are not binary images. Therefore, it is necessary to extract the structure lines from the input image and convert those to one-pixel-width binary images. In this study, the method employed by Kang et al. [8] was used in extracting structure lines from the input image. Using this method, the edge tangent flow and flow-based DoG filter can be used to generate a line drawing style image preserving the structure of the input image. For this study, the parameters were adjusted properly to generate the binary image.

B. THINNING
Since the structure line image is not a one-pixel-width image, a thinning operation is required to make it easier to comparing with font characters. However, before the thinning operation, the noise of the structure lines should first be removed. We used the pre-thinning method employed by Dong et al. [14] for this de-noising operation. Under this prethinning method, the value of the p changes according to the value of its eight neighboring pixels (see Figure 2). The p value changes as follows: (1) Jang and chin [31] suggested m t scores to measure the convergence of a thinned skeleton S m . m t can be defined as follows: where Area[] counts the number of one-pixels of the skeleton and Q k is the pattern given in Figure 4. This means that the fewer Q k patterns that S m has, the closer the S m is to the perfect unit skeleton. In this study, KMM thinning [15] was used since the results of the thinning had fewer Q k patterns. Figure 4 shows the results of several thinning algorithms.

IV. OVERVIEW
Our system generates ASCII art from the input image of a one-pixel-width binary image. To generate ASCII art, the suggested system uses additional character data. The system can generate ASCII art from various proportional fonts, and in our lab environment, the font chosen for ASCII art was Saitamaar [32] of 16px height.  As illustrated in Figure 5 below, the suggested system consists of five algorithms. In the feature extraction step, the local orientation of each pixel is calculated from the input image. The stripe segmentation step divides the input image and features into stripes of specified pixel height. The character score estimation step calculates the character score for each location of the stripes, and in the text placement step, character scores are used to place a character on the stripes. Finally, the ASCII art completion step combines all the text stripes to create one ASCII art. Details of each step are described in Section 5 below.

V. ALGORITHM
This section covers the algorithms of the suggested system in detail. Section 5.1 describes the feature extraction, Section 5.2 the character score estimation, and Section 5.3 the text placement. The stripe segmentation and ASCII art completion step simply divides the image and combines it again.

A. FEATURE EXTRACTION
To create ASCII art, a human artist compares an original image with a character image. To mimic a human artist, the suggested system extracts features from the input image. The state-of-the-art technique for generating ASCII art uses a 6-way phase congruency using these features [5]. For sophisticated character score calculations, the suggested system calculates pixel orientation.
We used the method suggested in [23] for pixel orientation calculation. First, gaussian blur was applied to the input image I. We created a blurred image I ′ using a gaussian blur filter with conditions of 3x3 size and σ=0.7. The local orientation of the pixel p on the blurred image I ′ is as follows: where G x and G y are the gradients in the x-axis and y-axis direction of the image and W is an image block containing p.
In this study, we used the scharr filter as a gradient value and W was set as a square block of 5x5 centered around p. Figure  5 (b) shows an example of the visualized pixel orientation of the input image. Pixel orientation for every pixel in I ′ was calculated.

B. CHARACTER SCORE ESTIMATION
For text placement, the character score measures the similarity between the character and the image on top of the image stripes. If the character and the image are perfectly matched, the pixel value and the pixel orientation value will be identical. The character match scores can be calculated based on this idea; however, the pixel orientation can be defined only around the foreground pixels, not among the background pixels. Therefore, this study defined the character match score S m (c) only when the pixels p c of the character image c are foreground pixels. S m (c) is defined as follows: where p ′ i and p ′ c are a blurred pixel value of the input image and the character image, respectively, W c is a character image area, F c is a set of foreground pixels on c, and T c a set of pixels on c having a valid ∆θ(p) value.
However, if only the character match score is used, other characters that are not suitable may be matched due to the overmatched score. As such, we designed the algorithm to prevent overmatching by introducing mismatch scores. As VOLUME 4, 2016 the difference between the character pixel value and image pixel value becomes greater, as well as that between the character pixel orientation and image pixel orientation, the mismatch score becomes higher. As with the match score, for the mismatch score, p c is always 1 because p c is defined only when among the foreground pixels. Therefore, the mismatch score S u (c) and character score S(c) are defined as follows: Here, ω u and ω m are weight parameters. The user can easily control the style of ASCII art by changing these parameters. We set the weight values as ω u = 0.65 and ω m = 1 in the experiment.

C. TEXT PLACEMENT
Placing the text on the image stripes can be a true challenge. One simple way is to place the characters, one-by-one, from the left to the right of the stripes. This method is intuitive and simple, but quality may degrade due to differences in character widths. In order to create ASCII art in a proportional font environment, preceding studies suggest character placement schemes which use dynamic programming [4], [5]. A dynamic programming-based scheme guarantees an optimal solution of the quality objective of the ASCII art; however, this particular class of dynamic programming algorithms cannot easily be implemented using parallel processing due to data dependencies between sub-problems. Therefore, this study proposes a stripe-based text placement algorithm that can generate ASCII art faster through parallelization. Our algorithm creates ASCII art text stripes from the stripes of the input image. The stripes of the input image are obtained by slicing the input image horizontally into several small images with width w and height h, where w is the width of the input image, and h(= 16px) is the fontheight in pixel units. The number of stripes N is obtained by dividing the height of the input image by a stride value (= 18px) slightly higher than the font-height h to express gaps between text lines.
The text placement algorithm approach works similar to the divide and conquer process. To create text stripes, the character score table calculated in Section 5.2 was used as input values. For the input score table S[0, W ], the system divides the table into three subsequences: S[0, l], C, and S[r, W ]. For each subsequence, text stripes were then repeatedly created, where W is each input stripe width, C is the character with the smallest character score in the score table, and l and r, are the left and right positions of C, respectively. If C is a space character, the system checks whether S can be filled with only space characters, instead of dividing S into three subsequences.
During the repetitive operations, the system may not be able to generate text stripes for the subsequence S. For example, if the width of the S is 2px and the minimum width of the C is 3px, the text stripes for S cannot be created. In this case, the system generates text stripes for another subsequence that contains the second smaller character C ′ instead of C of the original subsequence. Since the right-side boundary of the ASCII art is often not perfectly aligned, it is assumed that if the input subsequence contains the right-side boundary of the original sequence, the text stripes can always be created. Algorithm 1 is the pseudocode of the algorithm proposed in this study. Figure 6 compares the results of left-to-right text placement with that of the proposed algorithm. For the left-toright algorithm, the structure of the input image may not be reflected to the same degree as achieved with the suggested algorithm. For example, if there is a space on the left side of the image, the ASCII art created will have a discrepancy. The proposed algorithm, however, places the characters with a well-retained structure first owing to the use of character scores; hence, no discrepancy will be found in the generated ASCII art. 8: if l f ∧ r f then 10: The suggested system in this study generates ASCII art from a one-pixel-width binary image. As the input images, the  Danbooru dataset [33] and Akiyama's input data [6] were used. The original image was converted to a one-pixel-width binary image via the preprocessing described in Section 3. A total of 752 characters of 16px height Saitamaar font [32] were used to create ASCII art. Figure 7 below shows the original image, the preprocessed image, and the ASCII art generated by the suggested system.

A. PERFORMANCE
The suggested ASCII art generating system was built using C++, OpenCV library [34]. Our lab environment was a PC with an AMD Ryzen 7 3700MHz, 32GB main memory, and NVIDIA Geforce GTX 1080Ti. The system performance was measured by randomly selecting images with a 512x512 pixel resolution. Table 1 shows the runtime for each step of the system. The total runtime of the system is shown in Table 4. The suggested system can generate ASCII art from the input image in an average of 45.5 seconds under a single-threaded environment and 10.3 seconds under a multithreaded environment.

B. COMPARISON
This section compares our suggested system with the existing, state-of-the-art ASCII art generation algorithm. For comparison, the system proposed by Xu, et al., [5] and Akiyama [6] (as the state-of-the-art process) were used. Even though it was not possible to obtain the source code or executable program for Xu et al.'s system, a brief performance summary  [5] 2.0GHz 305 18min for the system was available. For Akiyama's system, we ran the program on a PC having the same lab environment as ours.
The performance of the Xu et al. system was measured on a PC with a 2GHz CPU, and 8GB memory. Table 2 compares the performance of the suggested system with that of Xu et al. 1024x1024 pixel resolution was used for the input image in this comparison. Although the CPU clock performance under our lab environment was only 1.85 times greater than that of Xu et al., the suggested system was about six times faster, even with more than 2.4 times more character types. Figure 8, Table 3, and Table 4 show the comparison between Akiyama's system and our suggested system. Akiyama employed a system that uses a CNN to create ASCII art. It generates several ASCII artworks by moving the original image by one pixel. However, in the suggested system, the original image is fixed and only one ASCII art is generated. An image with a 512x512 pixel resolution was used as the input image. According to the results, our system obtained better Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) scores than the Akiyama's system. This indicates that our system can preserve both the content and the structure of the input image better than the existing system. However, we also notice that the Akiyama's system reserves the thin lines of the input image better than our system in some parts of the image, although this was not quantitatively reflected in the scores. Since the Akiyama's system is a CNNbased algorithm, it relies on many examples and additional   training procedures are required if the execution environment changes (e.g., when using a different font). Our suggested system can respond immediately to changes in the execution environment. Our system also runs faster; in particular, in a multi-threading environment, ASCII art can be generated within 12 seconds. This would allow human artists to make additional edits in a short time.

C. USING DIFFERENT FONTS
In this section, we experimentally show the robustness of the proposed algorithm using two different fonts (the Satiamaar and Afta Sans fonts). The Afta Sans font environment uses 380 characters of 16px height. Figure 9 shows the original image, an ASCII image created with Saitamaar font, and an ASCII image created with Afta sans font.

VII. CONCLUSION AND LIMITATION
In this paper, we propose a new method for generating ASCII art in an environment using proportional fonts. Our method generates ASCII art using a character-image matching algo-  rithm based on pixel orientation feature and a greedy text placement algorithm. The proposed method can generate ASCII art from an input image in a short time, regardless of the type of font or the character set. In particular, our algorithm can generate high quality ASCII art from images with many vertical lines or straight lines. In addition, our experiments show that ASCII art can be generated in an average of 10.3 seconds from an image with a resolution of 512x512 pixels in an environment using multithreading. This timing performance helps the artist to quickly obtain a high-quality result, allowing the user to add additional texts or highlighting effects on the fly.

A. DISCUSSION
Using different kinds of characters can improve the quality of ASCII art. Ideally, the best quality would be achieved when using many kinds of unicode characters. However, using too many typefaces can adversely affect the time it takes to compare them with the input image. Using several types of similarly shaped characters may bring insignificant improvement in quality compared to increased calculation time. Also, some font types do not contain some characters. These constraints must be considered when determining the character set to be used in the creation of ASCII art.

B. LIMITATIONS AND FUTURE WORK
Our algorithm expresses the linear structure of an image well; however, there are structures that the algorithm does not handle well. For example, detailed and complex curved structures, such as human eyes, are particularly difficult to express well. Figure 10 shows the structure line input and the generated ASCII art for an actual human face. The algorithm expressed the overall structure of the input image relatively well, but some facial features were missed.
Also, our current system cannot express the tone of the image. In the case of an artist, ASCII art often includes not only the structure of the image, but also the tone of the image. Developing a scheme for subtle tonal representation would be an interesting future research topic.
Another possible topic for future research is the development of real-time creation and authoring tools that can reflect user constraints. For example, an artist may want to put together several ASCII art pieces in an image, or prearrange certain characters in a specific location as a guide. These authoring tools can enable users to obtain desired results more quickly and intuitively.
We are also interested in combining the advantages of machine-learning-based methods. There have already been studies related to CNN-based ASCII art creation [6]. Methods, such as online transfer learning, may be able to quickly produce a network applicable to a new font-set.