๐๏ธ Interests: Document AI, Multi-modal Tasks (CV+NLP), OCR, RL, ML
- Programming: Python
- Frameworks & Libraries: PyTorch, OpenCV, TensorFlow
- Deployment & Serving: Docker, Triton
- OCR: Scene Text Detection (STD), Scene Text Recognition (STR)
- NER: Named Entity Recognition
- TSR: Table Structure Recognition
- Detection: Strikethrough, Checkmark, Circled Number, Document Contour
- DLA: Document Layout Analysis
- Model deployment using Docker and Triton
- API development with FastAPI
- Synthetic Data Generation (Image Processing)
- Data Annotation Management (Label Studio, LabelMe)
- Document AI Pipeline Design & Implementation
- PoC (Proof of Concept) Support
๐ Academic & Personal Projects
๐ My Papers
- Deep Reinforcement Learning for Visual Dialogue Agents-2018.05, KIPS Conference
- Deep Reinforcement Learning for Optimizing Visual Questions-2018.09, Journal of ICROS
- Real-Time Visual Grounding for Natural Language Instructions with Deep Neural Network-2019.05, KIPS Conference
- LVLN : A Landmark-Based Deep Neural Network Model for Vision-and-Language Navigation-2019.09, Journal of KIPS(KTSDE)
- Landmark-based Search for Vision-and-Language Navigation-2019.12 KSC Conference
- AnoVid: A Deep Neural Network-Based Tool for Video Annotation-2020.08, Journal of KMMS
- ์๊ฐ-์ธ์ด ์ด๋์ ์ํ ๋ค์ค ๋ชจ๋ฌ ๊ณต๋ ์๋ฒ ๋ฉ๊ณผ ์ญ์ถ์ ํ์- Master's thesis
- Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation-2021.02, Journal of Sensors(SCIE)