Please login first

Sensors Webinar | Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing

Part of the Sensors Webinar Series series
28 Apr 2022, 09:00 (CEST)

Deep Learning, neural network, Image video understanding, Image video representation, 3D point cloud, Image video enhancement
Bookmark
Bookmark event Remove event from bookmarks
Add this event to bookmarks
Event Registration Contact Us

Welcome from the Chair

2nd Sensors Webinar

Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing

Dear all,

Thank you for participating in this webinar promoting our Special Issue “Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing” published in the MDPI journal Sensors.

Deep learning techniques are capable of discovering knowledge from massive amounts of unstructured data and providing data-driven solutions. They have significantly improved technical advancements in many areas of research, such as audio-visual signal processing, communication, computer vision, and pattern recognition. In addition, as they continue to improve, deep learning and deep learning techniques are expected to be included in future sensors and visual systems.

Recently, with the rapid development of advanced deep learning models and the increasing demands for effective visual signal processing, new opportunities and cutting-edge research has emerged in deep-learning-based sensing, imaging, and video processing.

Today, I am happy to introduce two experts: Prof. Hanli Wang from Tongji University, whose major research areas are machine learning, intelligent visual analysis, and compression, and Dr. Junhui Hou from City University of Hong Kong, whose major research areas are deep learning and 3D visual signal processing, as well as the emerging hot topics of light field imaging and point cloud. Prof. Wang and Dr. Hou present the two talks in this webinar.

Date: 28 April 2022

Time: 9:00 am CEST | 3:00 am EDT | 3:00 pm CST Asia

Webinar ID: 883 0728 9350

Webinar Secretariat: sensors.webinar@mdpi.com

Chair

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China

Introduction
Bio
Yun Zhang received his Ph.D. in computer science from the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing, China, in 2010. From 2009 to 2014, he was a Visiting Scholar with the City University of Hong Kong, Hong Kong. From 2010 to 2017, he was an Assistant Professor and an Associate Professor with the Shenzhen Institute of Advanced Technology (SIAT) the Chinese Academy of Sciences, Shenzhen, China, where he is currently a Full Professor. His research interests are in the field of multimedia communications and visual signal processing, including image/video compression, computational visual perception, and machine learning. He has published 1 book and over 100 high-quality scientific research papers, with more than 40 of these papers having been published in Top IEEE/ACM Transactions. In addition, he has filed over 40 domestic and international visual signal processing patents. He is a Senior Member of the IEEE and serves as an Associate Editor/Editorial Board Member of Electronic Letters, Sensors, and IEEE Assess.

Invited Speakers

Department of Computer Science and Technology, College of Electronics and Information Engineering, Tongji University, China

Introduction
Talk
Translating an image or a video automatically into natural language is an interesting, promising, but challenging process. The task is to summarize the visual content of the image or video and to re-express it with the correct words and suitable grammar, sentence patterns, and human habits. Nowadays, the encoding–decoding pipeline is the most commonly used framework that is implemented to achieve this goal. In particular, convolutional neural networks are used as the encoder to extract the semantics of images or videos, while recurrent neural networks are employed as the decoder to generate word sequenced. In this talk, the literature on image and video description is first reviewed, and preliminary research advances, including visual captioning, visual storytelling, visual dense captioning, visual sentiment captioning, and more complex visual paragraph description, are introduced.
Bio
Hanli Wang received his B.S. and M.S. in electrical engineering from Zhejiang University, Hangzhou, China, in 2001 and 2004, respectively, and a Ph.D. degree in computer science from the City University of Hong Kong, Kowloon, Hong Kong, in 2007. From 2007 to 2008, he was a Research Fellow with the Department of Computer Science, City University of Hong Kong. From 2007 to 2008, he was also a Visiting Scholar with Stanford University, Palo Alto, CA. From 2008 to 2009, he was a Research Engineer with Precoad, Inc., Menlo Park, CA. From 2009 to 2010, he was an Alexander von Humboldt Research Fellow at the University of Hagen, Hagen, Germany. Since 2010, he has been a Full Professor with the Department of Computer Science and Technology, Tongji University, Shanghai, China. His research interests include multimedia signal processing, computer vision, and machine learning. He has published more than 160 research papers in these research fields. His personal website can be accessed at https://mic.tongji.edu.cn.

Department of Computer Science, City University of Hong Kong, Hong Kong, China

Introduction
Talk
Three-dimensional point clouds are widely used in immersive telepresence, cultural heritage reconstruction, geophysical information systems, autonomous driving, and virtual/augmented reality. Despite the rapid developments in 3D sensing technology, it is still time consuming, difficult, and costly to acquire 3D point cloud data with a high spatial and temporal resolution and complex geometry and topology. In this talk, I will present recent studies on computational methods (i.e., deep learning)-based 3D point cloud reconstruction, including sparse 3D point cloud upsampling, the temporal interpolation of dynamic 3D point cloud sequences, and adversarial 3D point cloud generation.
Bio
Junhui Hou (Senior Member) has been an Assistant Professor with the Department of Computer Science, City University of Hong Kong, since January 2017. He received his B.Eng. degree in information engineering (Talented Students Program) from the South China University of Technology, Guangzhou, China, in 2009, his M.Eng. degree in signal and information processing from Northwestern Polytechnical University, Xian, China, in 2012, and his Ph.D. degree in electrical and electronic engineering from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2016. His research interests fall into the general areas of multimedia signal processing, such as image/video/3D geometry data representation, processing and analysis, semi-/unsupervised data modeling, and data compression. He received the Chinese Government Award for Outstanding Students Study Abroad from the China Scholarship Council in 2015 and the Early Career Award (3/381) from the Hong Kong Research Grants Council in 2018. He is an elected member of MSA-TC, VSPC-TC, IEEE CAS, MMSP-TC, an IEEE SPS. He is currently an Associate Editor for IEEE Transactions on Image Processing, IEEE Transactions on Circuits and Systems for Video Technology, Signal Processing: Image Communication, and The Visual Computer. He has also served as a Guest Editor for the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing and as an Area Chair of ACM MM’19/20/21/22, IEEE ICME’20, VCIP’20/21, and WACV’21.

Webinar Content

To view this content, you need to be registered and logged in to Sciforum platform.

Program

Speaker/Presentation

Time in CEST

Prof. Dr. Yun Zhang

Chair Introduction

9:00 - 9:05 am

Prof. Dr. Hanli Wang

Visual Translation: From Image and Video to Language

Abstract:
Translating an image or a video automatically into natural language is an interesting, promising, but challenging process. The task is to summarize the visual content of the image or video and to re-express it with the correct words and suitable grammar, sentence patterns, and human habits. Nowadays, the encoding–decoding pipeline is the most commonly used framework that is implemented to achieve this goal. In particular, convolutional neural networks are used as the encoder to extract the semantics of images or videos, while recurrent neural networks are employed as the decoder to generate word sequenced. In this talk, the literature on image and video description is first reviewed, and preliminary research advances, including visual captioning, visual storytelling, visual dense captioning, visual sentiment captioning, and more complex visual paragraph description, are introduced.

9:05 - 9:50 am

Q&A

9:50 - 10:00 am

Dr. Junhui Hou

Deep Learning-based 3D Point Cloud Reconstruction

Abstract:
Three-dimensional point clouds are widely used in immersive telepresence, cultural heritage reconstruction, geophysical information systems, autonomous driving, and virtual/augmented reality. Despite the rapid developments in 3D sensing technology, it is still time consuming, difficult, and costly to acquire 3D point cloud data with a high spatial and temporal resolution and complex geometry and topology. In this talk, I will present recent studies on computational methods (i.e., deep learning)-based 3D point cloud reconstruction, including sparse 3D point cloud upsampling, the temporal interpolation of dynamic 3D point cloud sequences, and adversarial 3D point cloud generation.

10:00 - 10:45 am

Q&A

10:45 - 10:55 am

Closing of Webinar
Prof. Dr. Yun Zhang

10:55 - 11:00 am

Relevant SI

Topical Collection "Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing"
A topical collection in Sensors (ISSN 1424-8220). This collection belongs to the section "Sensing and Imaging".
Collection Editors: Prof. Dr. Yun Zhang, Prof. Dr. KWONG Tak Wu Sam, Prof. Dr. Xu Long & Prof. Dr. Tiesong Zhao

Sponsors and Partners

Organizers

Top