Community/science

Stanford CS25: V2 I Represent part-whole hierarchies in a neural network, Geoff Hinton

ACM SIGKDD India Chapter

2025年11月16日·28 slides

This is a lecture by Geoffrey Hinton introducing a hypothetical neural network system called "Glom" [04:16]. Glom aims to address how the brain represents "part-whole" hierarchies (e.g., a face composed of a nose, mouth, etc.) [00:28]. Its core idea involves "islands of agreement" between representations at different levels [04:52]: at higher levels, different parts belonging to the same object (e.g., a "face") converge to the same vector representation [05:05]. The model draws inspiration from contrastive learning [20:43] and Transformer-like attention mechanisms [26:16] to resolve ambiguities in visual perception and form unified representations.

You can watch the full video here: http://www.youtube.com/watch?v=CYaju6aCMoQ

内容摘要

This lecture by Geoffrey Hinton explores a novel approach to visual perception using neural networks, drawing inspiration from the brain's ability to represent part-whole hierarchies. Hinton proposes a system called "Glom" that combines three recent advances: transformers, contrastive self-supervised learning, and a hierarchical representation inspired by cellular biology. The core idea is to create a system that can parse visual scenes into structural descriptions with intrinsic coordinate frames, similar to how humans perceive objects. Glom aims to address limitations in existing contrastive learning methods by focusing on agreement within the part-whole hierarchy, rather than enforcing identical representations for different patches of the same image. This theoretical framework, while not yet fully implemented, offers a promising direction for developing more human-like vision systems.

核心要点

  • 1Human perception relies on part-whole hierarchies and intrinsic coordinate frames, influencing how we interpret visual information.
  • 2Transformers, originally for natural language processing, can be adapted for vision by using attention mechanisms based on scalar products of activity vectors.
  • 3Contrastive self-supervised learning aims to extract spatially coherent features from images by making representations of different patches similar.
  • 4Glom seeks to improve contrastive learning by focusing on agreement within the part-whole hierarchy, addressing the issue of differing content in image patches.
  • 5Glom's architecture is inspired by cellular biology, where each location in an image is analogous to a cell with a complete set of instructions.
  • 6The lecture emphasizes the distinction between layers of a neural network and levels of representation, proposing that each location has an embedding vector refined through layers.
  • 7The proposed system aims to create islands of agreement to represent the parse tree, avoiding dynamic memory allocation and offering a biologically plausible approach.

演示预览

幻灯片内容

Introduction
第 1 页Introduction

The lecture introduces a new type of vision system inspired by human perception, combining recent advances in neural network research. It acknowledges that the system is theoretical but presents a compelling argument for its potential effectiveness.

Part-Whole Hierarchies and Coordinate Frames
第 2 页Part-Whole Hierarchies and Coordinate Frames

This section emphasizes the psychological reality of part-whole hierarchies and coordinate frames in human perception. It challenges the notion that coordinate frames are solely related to Cartesian geometry, presenting evidence for their broader role.

Cube Demonstration
第 3 页Cube Demonstration

A demonstration involving a wireframe cube illustrates how people struggle to perceive its complete structure, highlighting the limitations of our intuitive understanding of spatial relationships. Most people only identify four corners instead of all eight.

Cube Edges
第 4 页Cube Edges

The edges of a cube form a zigzag ring, a structure often overlooked. This section shows how the same arrangement of rods can be understood in different ways, influencing what aspects of the structure are noticed.

Parallel Lines
第 5 页Parallel Lines

The perception of parallel lines is influenced by the coordinate frame used. Lines that are parallel but don't align with the rectangular coordinate frame may not be recognized as such.

Structural Descriptions
第 6 页Structural Descriptions

The same image can be parsed differently, leading to different structural descriptions. These descriptions can be represented as graphs that capture the relationships between parts.

此处展示的视频和 PDF 资料均来源于公开渠道,仅用于教育演示目的。所有版权归其各自所有者所有。如果您认为任何资源侵犯了您的权利,请联系 support@video2ppt.com 我们将立即删除。

相关资源