Among the many artificial intelligence and machine learning models available today for image translation, image-to-image translation models using Generative Adversarial Networks (GANs) can change the style of images. These models work by using two input images: a content image, which is altered to match the style of a reference image. These models are used for tasks like transforming images into different artistic styles, simulating weather changes, improving satellite video resolution, and helping autonomous vehicles recognize different lighting conditions, like day and night.
Now, researchers from Sophia University have developed a model which can reduce the computational requirements needed to run these models, making it possible to run them on a wide range of devices, including smartphones. In a study published in the IEEE Open Journal of the Computer Society on 25 September 2024, Project Assistant Professor Rina Oh and Professor Tad Gonsalves from the Department of Information and Communication Sciences at Sophia University proposed a 'Single-Stream Image-to-Image Translation (SSIT)' model that uses only a single encoder to carry out this transformation.
Typically, image-to-image translation models require two encoders—one for the content image and one for the style image—to 'understand' the images. These encoders convert the content and style images into numerical values (feature space) that represent key aspects of the image, such as color, objects, and other features. The decoder then takes the combined content and style features and reconstructs the final image with the desired content and style.
In contrast, SSIT uses a single encoder to extract spatial features such as the shapes, object boundaries, and layouts of the content image. For the style image, the model uses Direct Adaptive Instance Normalization with Pooling (DAdaINP), which captures key style details like colors and textures while focusing on the most prominent features to improve efficiency. A decoder then takes the combined content and style features and reconstructs the final image with the desired content and style.
Elaborating further, Prof. Oh says, "We implemented a guided image-to-image translation model that performs style transformation with reduced GPU computational costs while referencing input style images. Unlike previous related models, our approach utilizes Pooling and Deformable Convolution to efficiently extract style features, enabling high-quality style transformation with both reduced computational cost and preserved spatial features in the content images."
The model is trained using adversarial training, where the generated images are evaluated by a Discriminator with a Vision Transformer, which captures patterns in images. The discriminator assesses whether the generated images are real or fake by comparing them to the target images, while the generator learns to create images that can fool the discriminator.
Using the model, the researchers performed three types of image transformation tasks. The first involved seasonal transformation, where landscape photos were converted from summer to winter and vice versa. The second task was photo-to-art conversion, in which landscape photos were transformed into famous artistic styles, such as those of Picasso, Monet, or anime. The third task focused on time and weather translation for driving, where images captured from the front of a car were altered to simulate different conditions, such as changing from day to night or from sunny to rainy weather.
In all these tasks, the model performed better than five other GAN models (namely NST, CNNMRF, MUNIT, GDWCT, and TSIT), with lower Fréchet Inception Distance and Kernel Inception Distance scores. This demonstrates that the generated images were similar to the target styles and did a better job of replicating colors and artistic details.
"Our generator was able to reduce the computational cost and FLOPs compared to the other models because we employed a single encoder that consists of multiple convolution layers only for content image and placed pooling layers for extracting style features in different angles instead of convolution layers," says Prof. Oh.
In the long run, the SSIT model has the potential to democratize image transformation, making it deployable on devices like smartphones or personal computers. It enables users across various fields, including digital art, design, and scientific research, to create high-quality image transformations without relying on expensive hardware or cloud services.
Reference
■Title of original paper:
Photogenic Guided Image-to-Image Translation With Single Encoder
■Journal:
IEEE Open Journal of the Computer Society
■DOI:
■Authors:
Rina Oh and T. Gonsalves
■Affiliations:
Department of Information and Communication Sciences, Sophia University, Japan
About Sophia University
Established as a private Jesuit affiliated university in 1913, Sophia University is one of the most prestigious universities located in the heart of Tokyo, Japan. Imparting education through 29 departments in 9 faculties and 25 majors in 10 graduate schools, Sophia hosts more than 13,000 students from around the world.
Conceived with the spirit of "For Others, With Others," Sophia University truly values internationality and neighborliness, and believes in education and research that go beyond national, linguistic, and academic boundaries. Sophia emphasizes on the need for multidisciplinary and fusion research to find solutions for the most pressing global issues like climate change, poverty, conflict, and violence. Over the course of the last century, Sophia has made dedicated efforts to hone future-ready graduates who can contribute their talents and learnings for the benefit of others, and pave the way for a sustainable future while "Bringing the World Together."
Website: https://www.sophia.ac.jp/eng/
About Project Assistant Professor Rina (Komatsu) Oh from Sophia University
Rina (Komatsu) Oh is a Project Assistant Professor at the Department of Information and Communication Sciences, Faculty of Science and Technology, Sophia University. Her research interest includes artificial intelligence, particularly in deep learning applications. She is working on implementing various deep learning model architectures focused on tasks, such as image recognition and image generation.
About Professor Tad Gonsalves from Sophia University
Tad Gonsalves (Member, IEEE) is a Full Professor at the Department of Information and Communication Sciences, Faculty of Science and Technology, Sophia University. He has authored or co-authored more than 150 papers in international conferences and journals. His research interests include bio-inspired optimization techniques and the application of deep learning techniques to diverse problems like autonomous driving, drones, digital art and music, and computational linguistics. He has also been involved in affective computing models. His research laboratory ( https://www.gonken.tokyo/ ) in Tokyo specializes in applications of deep learning and multi-GPU computing.