Abstract:
Bi-directional image-text retrieval and matching attract much attention recently. This cross-domain task demands a fine understanding of both modalities for learning a me...Show MoreMetadata
Abstract:
Bi-directional image-text retrieval and matching attract much attention recently. This cross-domain task demands a fine understanding of both modalities for learning a measure of different modality data. In this paper, we propose a novel position focused attention network to investigate the relation between the visual and the textual views. This work integrates the prior object position to enhance the visual-text joint-embedding learning. The image is first split into blocks, which are treated as the basic position cells, and the position of an image region is inferred. Then, we propose a position attention to model the relations between the image region and position cells. Finally, we generate a valuable position feature to further enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical large-scale news dataset (Tencent-News) to validate the practical application value of the proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method achieves the competitive performance on all of these three datasets.
Published in: IEEE Transactions on Multimedia ( Volume: 23)
Funding Agency:

School of Software Engineering, Xi'an Jiaotong University, Xi'an, China
Department of PCG, Tencent, Shenzhen, China
Yaxiong Wang received the B.S. degree from Lanzhou University, Lanzhou, China, in 2015. He is currently working toward the Ph.D. degree with the School of Software Engineering, Xi'an Jiaotong University, Xi'an, China. He is currently a Postgraduate with SMILES Laboratory, Xi'an Jiaotong University. His current research interests include tag-based image retrieval and imge-text matching.
Yaxiong Wang received the B.S. degree from Lanzhou University, Lanzhou, China, in 2015. He is currently working toward the Ph.D. degree with the School of Software Engineering, Xi'an Jiaotong University, Xi'an, China. He is currently a Postgraduate with SMILES Laboratory, Xi'an Jiaotong University. His current research interests include tag-based image retrieval and imge-text matching.View more

Department of PCG, Tencent, Shenzhen, China
Hao Yang received the B.E. degree from Wuhan University, Wuhan, China, in 2012 and the Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2018. He is currently working with the Department of PCG, Tencent, Beijing, China. His current research interests include cross media retrieval and video recommendation.
Hao Yang received the B.E. degree from Wuhan University, Wuhan, China, in 2012 and the Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2018. He is currently working with the Department of PCG, Tencent, Beijing, China. His current research interests include cross media retrieval and video recommendation.View more

School of Software Engineering, Xi'an Jiaotong University, Xi'an, China
Xiuxiu Bai received the B.S. and Ph.D. degrees from Xi'an Jiaotong University, Xi'an, China, in 2009 and 2016, respectively. She visited Edinburgh University, from 2017 to 2018. She is currently an Assistant Professor with the School of Software Engineering, Xi'an Jiaotong University. Her research interests include computer vision and visual neuroscience.
Xiuxiu Bai received the B.S. and Ph.D. degrees from Xi'an Jiaotong University, Xi'an, China, in 2009 and 2016, respectively. She visited Edinburgh University, from 2017 to 2018. She is currently an Assistant Professor with the School of Software Engineering, Xi'an Jiaotong University. Her research interests include computer vision and visual neuroscience.View more

Ministry of Education Key Laboratory for Intelligent Networks and Network Security, School of Information and Communication Engineering, and SMILES LAB, Xi'an Jiaotong University, Xi'an, China
Xueming Qian (Member, IEEE) received the B.S. and M.S. degrees from the Xi'an University of Technology, Xi'an, China, in 1999 and 2004, respectively, and the Ph.D. degree from the School of Electronics and Information Engineering, Xi'an Jiaotong University, in 2008. He was a Visiting Scholar with Microsoft Research Asia from 2010 to 2011. He was an Assistant Professor with Xi'an Jiaotong University, where he was an Associ...Show More
Xueming Qian (Member, IEEE) received the B.S. and M.S. degrees from the Xi'an University of Technology, Xi'an, China, in 1999 and 2004, respectively, and the Ph.D. degree from the School of Electronics and Information Engineering, Xi'an Jiaotong University, in 2008. He was a Visiting Scholar with Microsoft Research Asia from 2010 to 2011. He was an Assistant Professor with Xi'an Jiaotong University, where he was an Associ...View more

Meituan-Dianping Group, Beijing, China
Lin Ma received the B.E. and M.E. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 2006 and 2008, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, in 2013. He was a Researcher with the Huawei Noah's Ark Laboratory, Hong Kong, from 2013 to 2016. He was a Principal Researcher with the Tencent AI Laboratory,...Show More
Lin Ma received the B.E. and M.E. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 2006 and 2008, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, in 2013. He was a Researcher with the Huawei Noah's Ark Laboratory, Hong Kong, from 2013 to 2016. He was a Principal Researcher with the Tencent AI Laboratory,...View more

Department of PCG, Tencent, China
Jing Lu received the Ph.D. degree from the University of Science and Technology Beijing, Beijing, China, in 2012, and served as a Postdoctoral Researcher with Peking University, in 2016. He is a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. He currently focuses on the research of image-text matching, nature language process and recommendation system.
Jing Lu received the Ph.D. degree from the University of Science and Technology Beijing, Beijing, China, in 2012, and served as a Postdoctoral Researcher with Peking University, in 2016. He is a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. He currently focuses on the research of image-text matching, nature language process and recommendation system.View more

Department of PCG, Tencent, China
Biao Li received the B.S. and M.S. degrees from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008 and 2011, respectively. He is currently a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. His main research interests include cross-media retrieval and news recommendation and retrieval.
Biao Li received the B.S. and M.S. degrees from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008 and 2011, respectively. He is currently a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. His main research interests include cross-media retrieval and news recommendation and retrieval.View more

Department of PCG, Tencent, China
Xin Fan received the B.S., M.S., and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001, 2004, and 2007, respectively. He was a Scientist with Yahoo Labs Beijing and a Senior Architect with the Core Search Department, Baidu. He is currently the Director with the Algorithm Center of News Product and Technology Department, Tencent, Beijing, China. His current research interests inclu...Show More
Xin Fan received the B.S., M.S., and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001, 2004, and 2007, respectively. He was a Scientist with Yahoo Labs Beijing and a Senior Architect with the Core Search Department, Baidu. He is currently the Director with the Algorithm Center of News Product and Technology Department, Tencent, Beijing, China. His current research interests inclu...View more

School of Software Engineering, Xi'an Jiaotong University, Xi'an, China
Department of PCG, Tencent, Shenzhen, China
Yaxiong Wang received the B.S. degree from Lanzhou University, Lanzhou, China, in 2015. He is currently working toward the Ph.D. degree with the School of Software Engineering, Xi'an Jiaotong University, Xi'an, China. He is currently a Postgraduate with SMILES Laboratory, Xi'an Jiaotong University. His current research interests include tag-based image retrieval and imge-text matching.
Yaxiong Wang received the B.S. degree from Lanzhou University, Lanzhou, China, in 2015. He is currently working toward the Ph.D. degree with the School of Software Engineering, Xi'an Jiaotong University, Xi'an, China. He is currently a Postgraduate with SMILES Laboratory, Xi'an Jiaotong University. His current research interests include tag-based image retrieval and imge-text matching.View more

Department of PCG, Tencent, Shenzhen, China
Hao Yang received the B.E. degree from Wuhan University, Wuhan, China, in 2012 and the Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2018. He is currently working with the Department of PCG, Tencent, Beijing, China. His current research interests include cross media retrieval and video recommendation.
Hao Yang received the B.E. degree from Wuhan University, Wuhan, China, in 2012 and the Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2018. He is currently working with the Department of PCG, Tencent, Beijing, China. His current research interests include cross media retrieval and video recommendation.View more

School of Software Engineering, Xi'an Jiaotong University, Xi'an, China
Xiuxiu Bai received the B.S. and Ph.D. degrees from Xi'an Jiaotong University, Xi'an, China, in 2009 and 2016, respectively. She visited Edinburgh University, from 2017 to 2018. She is currently an Assistant Professor with the School of Software Engineering, Xi'an Jiaotong University. Her research interests include computer vision and visual neuroscience.
Xiuxiu Bai received the B.S. and Ph.D. degrees from Xi'an Jiaotong University, Xi'an, China, in 2009 and 2016, respectively. She visited Edinburgh University, from 2017 to 2018. She is currently an Assistant Professor with the School of Software Engineering, Xi'an Jiaotong University. Her research interests include computer vision and visual neuroscience.View more

Ministry of Education Key Laboratory for Intelligent Networks and Network Security, School of Information and Communication Engineering, and SMILES LAB, Xi'an Jiaotong University, Xi'an, China
Xueming Qian (Member, IEEE) received the B.S. and M.S. degrees from the Xi'an University of Technology, Xi'an, China, in 1999 and 2004, respectively, and the Ph.D. degree from the School of Electronics and Information Engineering, Xi'an Jiaotong University, in 2008. He was a Visiting Scholar with Microsoft Research Asia from 2010 to 2011. He was an Assistant Professor with Xi'an Jiaotong University, where he was an Associate Professor from 2011 to 2014, and is currently a Full Professor. He is also the Director of the Smiles Laboratory, Xi'an Jiaotong University. He received the Microsoft Fellowship in 2006. He received outstanding doctoral dissertations of Xi'an Jiaotong University and Shaanxi Province, in 2010 and 2011, respectively. His research interests include social media big data mining and search. His research is supported by the National Natural Science Foundation of China, Microsoft Research, and Ministry of Science and Technology.
Xueming Qian (Member, IEEE) received the B.S. and M.S. degrees from the Xi'an University of Technology, Xi'an, China, in 1999 and 2004, respectively, and the Ph.D. degree from the School of Electronics and Information Engineering, Xi'an Jiaotong University, in 2008. He was a Visiting Scholar with Microsoft Research Asia from 2010 to 2011. He was an Assistant Professor with Xi'an Jiaotong University, where he was an Associate Professor from 2011 to 2014, and is currently a Full Professor. He is also the Director of the Smiles Laboratory, Xi'an Jiaotong University. He received the Microsoft Fellowship in 2006. He received outstanding doctoral dissertations of Xi'an Jiaotong University and Shaanxi Province, in 2010 and 2011, respectively. His research interests include social media big data mining and search. His research is supported by the National Natural Science Foundation of China, Microsoft Research, and Ministry of Science and Technology.View more

Meituan-Dianping Group, Beijing, China
Lin Ma received the B.E. and M.E. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 2006 and 2008, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, in 2013. He was a Researcher with the Huawei Noah's Ark Laboratory, Hong Kong, from 2013 to 2016. He was a Principal Researcher with the Tencent AI Laboratory, Shenzhen, China, from 2016 to 2020. He is a currently a Principal Researcher with the Meituan-Dianping Group, Beijing, China. His current research interests lie in the areas of computer vision, multimodal deep learning, specifically for image and language, image/video understanding, and quality assessment. He received the Best Paper Award from the Pacific-Rim Conference on Multimedia in 2008. He was the recipient of the Microsoft Research Asia Fellowship in 2011. He was a finalist in HKIS Young Scientist Award in engineering science in 2012.
Lin Ma received the B.E. and M.E. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 2006 and 2008, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, in 2013. He was a Researcher with the Huawei Noah's Ark Laboratory, Hong Kong, from 2013 to 2016. He was a Principal Researcher with the Tencent AI Laboratory, Shenzhen, China, from 2016 to 2020. He is a currently a Principal Researcher with the Meituan-Dianping Group, Beijing, China. His current research interests lie in the areas of computer vision, multimodal deep learning, specifically for image and language, image/video understanding, and quality assessment. He received the Best Paper Award from the Pacific-Rim Conference on Multimedia in 2008. He was the recipient of the Microsoft Research Asia Fellowship in 2011. He was a finalist in HKIS Young Scientist Award in engineering science in 2012.View more

Department of PCG, Tencent, China
Jing Lu received the Ph.D. degree from the University of Science and Technology Beijing, Beijing, China, in 2012, and served as a Postdoctoral Researcher with Peking University, in 2016. He is a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. He currently focuses on the research of image-text matching, nature language process and recommendation system.
Jing Lu received the Ph.D. degree from the University of Science and Technology Beijing, Beijing, China, in 2012, and served as a Postdoctoral Researcher with Peking University, in 2016. He is a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. He currently focuses on the research of image-text matching, nature language process and recommendation system.View more

Department of PCG, Tencent, China
Biao Li received the B.S. and M.S. degrees from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008 and 2011, respectively. He is currently a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. His main research interests include cross-media retrieval and news recommendation and retrieval.
Biao Li received the B.S. and M.S. degrees from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008 and 2011, respectively. He is currently a Senior Researcher with News Algorithm Center, Department of PCG, Tencent, Beijing, China. His main research interests include cross-media retrieval and news recommendation and retrieval.View more

Department of PCG, Tencent, China
Xin Fan received the B.S., M.S., and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001, 2004, and 2007, respectively. He was a Scientist with Yahoo Labs Beijing and a Senior Architect with the Core Search Department, Baidu. He is currently the Director with the Algorithm Center of News Product and Technology Department, Tencent, Beijing, China. His current research interests include machine learning, NLP and data mining in search and recommendation.
Xin Fan received the B.S., M.S., and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001, 2004, and 2007, respectively. He was a Scientist with Yahoo Labs Beijing and a Senior Architect with the Core Search Department, Baidu. He is currently the Director with the Algorithm Center of News Product and Technology Department, Tencent, Beijing, China. His current research interests include machine learning, NLP and data mining in search and recommendation.View more