pre : read python and numpy tutorial
the problem: semantic gap (difference between computer and human)
challenges: viewpoint variation (camera moves)/ illumination/ deformation(poses and positions)/ occulsion/ background clutter( look similar with ba