物體檢測算法通常在輸入圖像中采樣大量區(qū)域,判斷這些區(qū)域是否包含感興趣的物體,并調(diào)整區(qū)域的邊界,從而更準(zhǔn)確地預(yù)測物體的真實(shí)邊界 框。不同的模型可能采用不同的區(qū)域采樣方案。在這里,我們介紹其中一種方法:它生成多個以每個像素為中心的具有不同比例和縱橫比的邊界框。這些邊界框稱為錨框。我們將在14.7 節(jié)設(shè)計(jì)一個基于錨框的目標(biāo)檢測模型。
首先,讓我們修改打印精度以獲得更簡潔的輸出。
14.4.1。生成多個錨框
假設(shè)輸入圖像的高度為h和寬度 w. 我們以圖像的每個像素為中心生成具有不同形狀的錨框。讓規(guī)模成為s∈(0,1]縱橫比(寬高比)為 r>0. 那么anchor box的寬高分別是hsr和 hs/r, 分別。請注意,當(dāng)中心位置給定時,將確定一個已知寬度和高度的錨框。
為了生成多個不同形狀的錨框,讓我們設(shè)置一系列尺度s1,…,sn和一系列縱橫比 r1,…,rm. 當(dāng)以每個像素為中心使用這些尺度和縱橫比的所有組合時,輸入圖像將總共有whnm錨箱。雖然這些anchor boxes可能會覆蓋所有的ground-truth bounding boxes,但是計(jì)算復(fù)雜度很容易過高。在實(shí)踐中,我們只能考慮那些包含s1或者r1:
也就是說,以同一個像素為中心的anchor boxes的個數(shù)為 n+m?1. 對于整個輸入圖像,我們將生成總共 wh(n+m?1)錨箱。
上面生成anchor boxes的方法是在下面的multibox_prior
函數(shù)中實(shí)現(xiàn)的。我們指定輸入圖像、比例列表和縱橫比列表,然后此函數(shù)將返回所有錨框。
#@save
def multibox_prior(data, sizes, ratios):
"""Generate anchor boxes with different shapes centered on each pixel."""
in_height, in_width = data.shape[-2:]
device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
boxes_per_pixel = (num_sizes + num_ratios - 1)
size_tensor = torch.tensor(sizes, device=device)
ratio_tensor = torch.tensor(ratios, device=device)
# Offsets are required to move the anchor to the center of a pixel. Since
# a pixel has height=1 and width=1, we choose to offset our centers by 0.5
offset_h, offset_w = 0.5, 0.5
steps_h = 1.0 / in_height # Scaled steps in y axis
steps_w = 1.0 / in_width # Scaled steps in x axis
# Generate all center points for the anchor boxes
center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')
shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
# Generate `boxes_per_pixel` number of heights and widths that are later
# used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
sizes[0] * torch.sqrt(ratio_tensor[1:])))\
* in_height / in_width # Handle rectangular inputs
h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
sizes[0] / torch.sqrt(ratio_tensor[1:])))
# Divide by 2 to get half height and half width
anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
in_height * in_width, 1) / 2
# Each center point will have `boxes_per_pixel` number of anchor boxes, so
# generate a grid of all anchor box centers with `boxes_per_pixel` repeats
out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
dim=1).repeat_interleave(boxes_per_pixel, dim=0)
output = out_grid + anchor_manipulations
return output.unsqueeze(0)
#@save
def multibox_prior(data, sizes, ratios):
"""Generate anchor boxes with different shapes centered on each pixel."""
in_height, in_width = data.shape[-2:]
device, num_sizes, num_ratios = data.ctx, len(sizes), len(ratios)
boxes_per_pixel = (num_sizes + num_ratios - 1)
size_tensor = np.array(sizes, ctx=device)
ratio_tensor = np.array(ratios, ctx=device)
# Offsets are required to move the anchor to the center of a pixel. Since
# a pixel has height=1 and width=1, we choose to offset our centers by 0.5
offset_h, offset_w = 0.5, 0.5
steps_h = 1.0 / in_height # Scaled steps in y-axis
steps_w = 1.0 / in_width # Scaled steps in x-axis
# Generate all center points for the anchor boxes
center_h = (np.arange(in_height, ctx=device) + offset_h) * steps_h
center_w = (np.arange(in_width, ctx=device) + offset_w) * steps_w
shift_x, shift_y = np.meshgrid(center_w, center_h)
shift_x, shift_y = shift_x.reshape(-1), shift_y.reshape(-1)
# Generate `boxes_per_pixel` number of heights and widths that are later
# used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
w = np.concatenate((size_tensor * np.sqrt(ratio_tensor[0]),
sizes[0] * np.sqrt(ratio_tensor[1:]))) \
*
評論
查看更多