Home

Interpreting Vision Models

I performed research as part of MIT Summer Research Program on interpretable machine learning working on improving MIT’s MILAN - a model that annotates neurons. The project focused on describing the behaviors of individual neurons. MILAN takes the top 15 activating image regions that a neuron analyzes and generates compositional descriptions that describe the patterns among these 15 image regions. Even though this model works well, it requires too much training data to be useful for other applications. My research addressed this problem by creating a new implementation of MILAN, which consisted of a model that generated one-word annotations of neurons using no data with the Contrastive Language Image Pre-training (CLIP) model. At the end of the summer research program, I presented my work in a poster presentation and a lightning talk at the MIT summer research conference. I was asked to continue this research project with MIT through the fall of 2022 so that I may start to work on creating multi-word compositional descriptions and better descriptions overall for the abstract ideas.

Symbol and Communicative Grounding through Object Permanence with a Mobile Robot

I worked with the SLIM group on a project that explored how object permanence affected a robot’s ability to perform symbol and communicative grounding while performing a first language acquisition task. For this project I built an incremental, modular, embodied spoken dialogue system into a robot named Cozmo that approximated object permanence using Simultaneous Localization and Mapping (SLAM). I then recruited 24 participants to work with and evaluate the system in Cozmo. My findings show that an AI’s understanding of object permanence improves its ability to learn language and perform symbol and communicative grounding with the participant. I presented this research at Heriot-Watt University in Edinburgh, Scotland and it was published in the 2022 Special Interest Group on Discourse and Dialogue (SIGDial) conference and archived as part of the ACL anthology. Both my work for LREC and SIGDial helped me realize the importance of emotion in spoken dialogue systems and human robot interaction, which I used this fall for my NSF GRFP Research Proposal.

HADREB: Human Appraisals and (English) Descriptions of Robot Emotional Behaviors

Acting as my first project with the SLIM group spent over two months recruiting and working with 30+ participants to gather over 1000 human appraisals and (English) descriptions of robot emotional behaviors between two robots. I then collected the internal state data by writing a Python program that communicated with the robot, collected its internal sensor readings (e.g. wheel speed), and organized them in a file. After this data was collected, analyzed, and cleaned, I then helped create a machine learning model that mapped descriptions of robots’ internal states to emotions that those behaviors expressed. This work was presented remotely at the Language Resource and Evaluation Conference.

Josue Torres-Fonseca

About Me

Education

Research Projects

Interpreting Vision Models

Symbol and Communicative Grounding through Object Permanence with a Mobile Robot

HADREB: Human Appraisals and (English) Descriptions of Robot Emotional Behaviors