top of page

[AI Engineering & Web Development] Conversational AI service

최종 수정일: 2022년 6월 3일



Figure 1. SSIFI logo

 
Project Summary


Many markets and companies are adopting conversational AI services. Markets and markets announced a CAGR of 21.8% for the conversational AI market, and predicted that USD 18.4 billion would focused on this market in 2026.


The AI architecture of SSIFI consists of Speech-to-Text (STT), Natural Language Process (NLP) and Text-to-Speech (TTS).


SSIFI provides conversational AI service. In addition, we open SSIFI's AI tech in GitHub for User who want to make personalized SSIFI.



Figure 2. Conversational AI market (Markets and markets)


Figure 3. SSIFI AI architecture


  • Service Name : SSIFI

  • Project Duration : 11. Apr. 2022 ~ 27. May.2022

  • Number of Team members : 6

  • Role : Part Leader of AI and Engineer

  • Skills


 
AI

1. Speech-to-Text (STT)


STT is a AI model that converts human speech language into text data through machine interpretation. As shown in Figure 4, the process is divided into pre-processing, acoustic model and language model.




Figure 4. Speech-to-Text model process



2. Natural Language Process (NLP)


A total of two language models were introduced in SSIFI. The first is Generative Pre-trained Transformer (GPT) and the second is Text-to-Image model (GLIDE).


2-1. Generative Pre-trained Transformer (GPT)


The GPT model has a structure in which the decoder of the Transformer model is overlapped. So it has good performance at predicting words after prompt. (shown in Figure 5)


SSIFI provides five Korean generation models using the GPT. These include chat bot, reporter bot, and novel bot.


Figure 5. Generative Pre-trained Transformer model sample



2-2. Text-to-Image model (GLIDE)


SSIFI provides GLIDE, an image generation model, in addition to the text generation model. The GLIDE is a model published by OpenAI in 2021 and trained using text-labeled images dataset. SSIFI receives a Korean prompt and outputs a matching image.


Figure 6 shows an example prompt and output of GLIDE.



Figure 6. Example outputs of GLIDE model



3. Text-to-Speech (TTS)


SSIFI provides not only text but also audio output by introducing the TTS model. As shown in Figure 7, TTS consists of an Acoustic model and a Vocoder model, which are Fast-speech2 and VOCGAN respectively.


TTS model of SSIFI was trained using Korean-Single-Speech dataset in Kaggle. (4.32 GB, 12 hours dataset)



Figure 7. Text-to-Speech model process



 
Service Demo



SSIFI Conversational Mode


SSIFI Chat Mode

조회수 20회댓글 0개

Комментарии


bottom of page