J. Balão, Atabak Dehban, Plinio Moreno, J. Santos-Victor
{"title":"Benchmarking shape completion methods for robotic grasping","authors":"J. Balão, Atabak Dehban, Plinio Moreno, J. Santos-Victor","doi":"10.1109/ICDL53763.2022.9962226","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962226","url":null,"abstract":"This paper proposes a novel benchmark for 3D shape completion methods based on their adaptability for the task of robotic grasping.Firstly, state-of-the-art single image shape completion methods are used to reconstruct object shapes from RGB images. These images contain views of objects belonging to different categories. Two specific shape-reconstruction methods are selected for this study.On the next step, the resulting 3D reconstructions of these methods are loaded into a robotic grasp simulator in order to attempt to grasp the objects from different approaches and using different hand configurations. Then, the unsuccessful grasps (according to a grasp quality metric) are excluded and the remaining ones are used to compute a grasp related metric, the Joint Error, which evaluates the usability of the reconstructed mesh for grasping the ground-truth 3D model.Finally, based on the results obtained from our experiments, we draw several conclusions about the performance of each of the methods. Furthermore, an analysis is made for the possible correlation between the newly proposed Joint Error metric and the popular reconstruction quality metrics used by most shape completion methods. Our results indicate that geometry-based reconstruction metrics are mostly inadequate for assessing the usability of a 3D reconstruction algorithm for robotic grasping.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117276081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toddler-inspired embodied vision for learning object representations","authors":"A. Aubret, Céline Teulière, J. Triesch","doi":"10.1109/ICDL53763.2022.9962190","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962190","url":null,"abstract":"Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by mapping successive views of an object onto close-by internal representations. When considering this learning approach as a model of the development of human object recognition, it is important to consider what visual input a toddler would typically observe while interacting with objects. First, human vision is highly foveated, with high resolution only available in the central region of the field of view. Second, objects may be seen against a blurry background due to toddlers’ limited depth of field. Third, during object manipulation a toddler mostly observes close objects filling a large part of the field of view due to their rather short arms. Here, we study how these effects impact the quality of visual representations learnt through time-contrastive learning. To this end, we let a visually embodied agent “play” with objects in different locations of a near photo-realistic flat. During each play session the agent views an object in multiple orientations before turning its body to view another object. The resulting sequence of views feeds a time-contrastive learning algorithm. Our results show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments. We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions. The results of our model suggest that several influences on toddler’s visual input statistics support their unsupervised learning of object representations.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130477540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oscar Youngquist, Alenna Spiro, Khoshrav Doctor, R. Grupen
{"title":"Evaluating Sensorimotor Abstraction on Curricula for Learning Mobile Manipulation Skills","authors":"Oscar Youngquist, Alenna Spiro, Khoshrav Doctor, R. Grupen","doi":"10.1109/ICDL53763.2022.9962221","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962221","url":null,"abstract":"Developmental mechanisms in newborn animals shepherd the infant through interactions with the world that form the foundation for hierarchical skills. An important part of this guidance resides in mechanisms of growth and maturation, wherein patterns of sensory and motor recruitment constrain learning complexity while building foundational expertise and transferable control knowledge. The resulting control policies represent a sensorimotor state abstraction that can be leveraged when developing new behaviors. This paper uses a computational model of developmental learning with parameters for controlling the recruitment of sensory and motor resources, and evaluates how this influences sample efficiency and fitness for a specific mobile manipulation task. We find that a developmental curriculum driven by sensorimotor abstraction drastically improves (by up to an order of magnitude) learning performance and sample efficiency over non-developmental approaches. Additionally, we find that the developmental policies/state abstractions offer significant robustness properties, enabling skill transfer to novel domains without additional training.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"2464 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131038078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti
{"title":"Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks","authors":"Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti","doi":"10.1109/ICDL53763.2022.9962214","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962214","url":null,"abstract":"Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128830488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jason Khoury, S. T. Popescu, Filipe Gama, Valentin Marcel, M. Hoffmann
{"title":"Self-touch and other spontaneous behavior patterns in early infancy","authors":"Jason Khoury, S. T. Popescu, Filipe Gama, Valentin Marcel, M. Hoffmann","doi":"10.1109/ICDL53763.2022.9962203","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962203","url":null,"abstract":"Children are not born tabula rasa. However, interacting with the environment through their body movements in the first months after birth is critical to building the models or representations that are the foundation for everything that follows. We present longitudinal data on spontaneous behavior of three infants observed between about 8 and 25 weeks of age in supine position. We combined manual scoring of video recordings with an automatic extraction of motion data in order to study infants’ behavioral patterns and developmental progression such as: (i) spatial distribution of self-touches on the body, (ii) spatial patterns and regularities of hand movements, (iii) midline crossing, (iv) preferential use of one arm, and (v) dynamic patterns of movements indicative of goal-directedness. From the patterns observed in this pilot data set, we can speculate on the development of first body and peripersonal space representations. Several methods of extracting 3D kinematics from videos have recently been made available by the computer vision community. We applied one of these methods on infant videos and provide guidelines on its possibilities and limitations—a methodological contribution to automating the analysis of infant videos. In the future, we plan to use the patterns we extracted from the recordings as inputs to embodied computational models of learning of body representations in infancy.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115664116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying and localizing dynamic affordances to improve interactions with other agents","authors":"S. L. Gay, Jean-Paul Jamont, Olivier L. Georgeon","doi":"10.1109/ICDL53763.2022.9962231","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962231","url":null,"abstract":"Allowing robots to learn by themselves to coordinate their actions and cooperate requires that they be able to recognize each other and be capable of intersubjectivity. To comply with artificial developmental learning and self motivation, we follow the radical interactionism hypothesis, in which an agent has no a priori knowledge on its environment (not even that the environment is 2D), and does not receive rewards defined as a direct function of the environment’s state. We aim at designing agents that learn to efficiently interact with other entities that may be static or may make irregular moves following their own motivation. This paper presents new mechanisms to identify and localize such mobile entities. The agent has to learn the relation between its perception of mobile entities and the interactions that they afford. These relations are recorded under the form of data structures, called signatures of interaction, that characterize entities in the agent’s point of view, and whose properties are exploited to interact with distant entities. These mechanisms were tested in a simulated prey-predator environment. The obtained signatures showed that the predator successfully learned to identify mobile preys and their probabilistic moves, and to efficiently localize distant preys in the 2D environment.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A kinematic study on social intention during a human-robot interaction","authors":"Jean-Marc Bah, Ghilès Mostafaoui, Laura Cohen","doi":"10.1109/ICDL53763.2022.9962213","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962213","url":null,"abstract":"In this study, we investigate the possible effects on the human movement kinematics of the presence of a humanoid robot during an interaction. We conducted an experiment in which 11 participants were required to grab a cube and to drop it in the hands of a robot, a human and in a rectangular box. Using this setup, we explore whether the kinematics of the participants’ gestures would be particularly influenced by the presence of the robot and whether this influence would be due to the fact that the robot is considered as a possible social partner. The results show that the condition that includes the robot partner leads to kinematic modulation that are similar to the condition including the human partner. Furthermore, there are significant differences between the condition including the robot and the one with the box. Finally, our results show that the participants pro-social behavior is correlated with the perceived agency of the robot as evaluated by a user questionnaire.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcos Maroto-Gómez, Sara Marques-Villarroya, M. Malfaz, Álvaro Castro González, J. C. Castillo, M. Salichs
{"title":"A Preference Learning System for the Autonomous Selection and Personalization of Entertainment Activities during Human-Robot Interaction","authors":"Marcos Maroto-Gómez, Sara Marques-Villarroya, M. Malfaz, Álvaro Castro González, J. C. Castillo, M. Salichs","doi":"10.1109/ICDL53763.2022.9962204","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962204","url":null,"abstract":"Social robots assisting in cognitive stimulation therapies, physical rehabilitation, or entertainment sessions have gained visibility in the last years. In these activities, users may present different features and needs, so personalization is essential. This manuscript presents a Preference Learning System for social robots to personalize Human-Robot Interaction during entertainment activities. Our system is integrated into Mini, a social robot dedicated to research with a wide repertoire of entertainment activities like games, displaying multimedia content, or storytelling. The learning model we propose consists of four stages. First, the robot creates a unique profile of its users by obtaining their defining features using interaction. Secondly, a Preference Learning algorithm predicts the users’ favorite entertainment activities using their features and a database with the features and preferences of other users. Third, the prediction is adapted using Reinforcement Learning while entertainment sessions occur. Finally, the robot personalizes Human-Robot Interaction by autonomously selecting the users’ favorite activities. Thus, the robot aims at promoting longer-lasting interactions and sustaining engagement.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125598405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Dias, Lu'is Simoes, Plinio Moreno, A. Bernardino
{"title":"Active Gaze Control for Foveal Scene Exploration","authors":"Alexandre Dias, Lu'is Simoes, Plinio Moreno, A. Bernardino","doi":"10.1109/ICDL53763.2022.9962223","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962223","url":null,"abstract":"Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects present in their surroundings within least number of gaze shifts. Our approach is based on three key methods. First, we take an off-the-shelf deep object detector, pre-trained on a large dataset of regular images, and calibrate the classification outputs to the case of foveated images. Second, a body-centered semantic map, encoding the objects classifications and corresponding uncertainties, is sequentially updated with the calibrated detections, considering several data fusion techniques. Third, the next best gaze fixation point is determined based on information-theoretic metrics that aim at minimizing the overall expected uncertainty of the semantic map. When compared to the random selection of next gaze shifts, the proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts and reduces to one third the number of required gaze shifts to attain similar performance.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126402059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Satoshi Tsutsui, Xizi Wang, Guangyuan Weng, Yayun Zhang, David J. Crandall, Chen Yu
{"title":"Action Recognition based on Cross-Situational Action-object Statistics","authors":"Satoshi Tsutsui, Xizi Wang, Guangyuan Weng, Yayun Zhang, David J. Crandall, Chen Yu","doi":"10.1109/ICDL53763.2022.9962199","DOIUrl":"https://doi.org/10.1109/ICDL53763.2022.9962199","url":null,"abstract":"Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model’s ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121308046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}