This notebook details how to recognize a sudoku puzzle from a picture. We'll make use of simple image processing algorithms (edge detection, thresholds... ) and character recognition using the K-Nearest Neighbors (KNN) algorithm. It is a very simple but effective algorithm for solving multi-class classification problems. This puzzle matrix is a 9x9 array of known numbers 1-9 or 0s where the number is unknown
This ipython notebook is divided into two parts. The first part is related to computer vision and the second part is related to machine learning. To complete this task it is necessary to use a computer vision library, in this case we are going to use OpenCV library.
The first thing we need to do is to identify the puzzle. We have some challenges: The lines are not perfect, the black grid lines have similar color as a lot of elements in the image or the small squares are difficult to extract.
To simplify the problem we assumed: the puzzle is the biggest square in the image and the puzzle will be orientated reasonably correctly.
In the next picture you can see a sudoku puzzle in a newspaper. To load the image it's used imread.
%matplotlib inline import pylab as pl #import Opencv library try: import cv2 except ImportError: print "You must have OpenCV installed" #load image image_sudoku_original = cv2.imread('../../../data/ocr/sudoku.jpg') #Show Images _=pl.imshow(image_sudoku_original) _=pl.axis("off")
Once we have read the image we have to detect the lines. For this task, It used an adaptativeThreshold to extract the edge of the image (for each pixel in the image take the average value of the surrounding area). This function accepts a gray scale image, with the function cvtColor we change the color space (from RGB to gray scale). In this picture it's possible to see letters, lines or numbers. Light pixels are the paper and dark pixels are the ink.
The adaptativeThreshold function takes as first argument the input image, the second argument returns the binary image, the third argument is the non-zero value assigned to the pixels for which the condition is satisfied. The fourth is the adaptive thresholding algorithm, the fifth argument is the threshold type, the next argument is size of the pixel neighborhood and finally the last argument is a constant that is subtracted from the mean or weight mean.
#gray image image_sudoku_gray = cv2.cvtColor(image_sudoku_original,cv2.COLOR_BGR2GRAY) #adaptive threshold thresh = cv2.adaptiveThreshold(image_sudoku_gray,255,1,1,11,15) #show image _=pl.imshow(thresh, cmap=pl.gray()) _=pl.axis("off")
We have to find the connected contours in the image (using the function findContours). These function returns a vector with the corners of the contours. We'll find the sukodu in this contours.
The assumption is that the sudoku puzzle has 4 sides and it's convex. Checking the number of the contour is equal to four and using the funcion of OpenCV isContourConvex to check if the square is convex. We obtain the possible candidates.
We need to filter the candidates using the assumption: the sudoku puzzle is the biggest square in the image. We calculate the area of the possible contours (contourArea). The biggest square is the sudoku puzzle.
#find the countours contours0,hierarchy = cv2.findContours( thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) #size of the image (height, width) h, w = image_sudoku_original.shape[:2] #copy the original image to show the posible candidate image_sudoku_candidates = image_sudoku_original.copy() #biggest rectangle size_rectangle_max = 0; for i in range(len(contours0)): #aproximate countours to polygons approximation = cv2.approxPolyDP(contours0[i], 4, True) #has the polygon 4 sides? if(not (len (approximation)==4)): continue; #is the polygon convex ? if(not cv2.isContourConvex(approximation) ): continue; #area of the polygon size_rectangle = cv2.contourArea(approximation) #store the biggest if size_rectangle> size_rectangle_max: size_rectangle_max = size_rectangle big_rectangle = approximation
In the image below it's possible to see the 4 sides of the sudoku puzzle in red
#show the best candidate approximation = big_rectangle for i in range(len(approximation)): cv2.line(image_sudoku_candidates, (big_rectangle[(i%4)], big_rectangle[(i%4)]), (big_rectangle[((i+1)%4)], big_rectangle[((i+1)%4)]), (255, 0, 0), 2) #show image _=pl.imshow(image_sudoku_candidates, cmap=pl.gray()) _=pl.axis("off")
Now we have the sudoku puzzle segmented. We have got the corner points of the puzzle. It's currently not really usable for much. The sudoku puzzle is a bit distorted. It's necessary to correct the skewed perspective of the image. We need a way to mapping from the puzzle in the original picture back into a square. Where each corner of the sudoku puzzle corresponds to a corner on the a new image.
We use a transformation that will map one arbitrary 2D quadrilateral into another. We can use a perspective transformation:
This perspective transformation maps a point $ (x, y) $ in one quadrilateral into a new $ (X, Y) $ in another quadrilateral. These two equations contain 8 unknowns, but we have 8 values. (the corners $x$ and $y$ coordinates of the puzzle). Solving these equations gives us the $a,b,c,d,e,f,g,h$ which provide us with a mapping to get our puzzle out nice and straight.
The OpenCV function getperspectivetransform resolved the perspective transformation. Calculates a perspective transform from four pairs of the corresponding points, where the first parameter are the coordinates of quadrangle vertices in the source image and the second parameter are the coordinates of the corresponding quadrangle vertices in the destination image.
To applies a perspective transformation to an image we used the function warpPerspective. This function transforms the source image using the specified matrix. We obtains the image below.
We need to sort the corner of the sudoku puzzle and then associate each point with the new image dimension.
import numpy as np IMAGE_WIDHT = 16 IMAGE_HEIGHT = 16 SUDOKU_SIZE= 9 N_MIN_ACTVE_PIXELS = 10 #sort the corners to remap the image def getOuterPoints(rcCorners): ar = ; ar.append(rcCorners[0,0,:]); ar.append(rcCorners[1,0,:]); ar.append(rcCorners[2,0,:]); ar.append(rcCorners[3,0,:]); x_sum = sum(rcCorners[x, 0, 0] for x in range(len(rcCorners)) ) / len(rcCorners) y_sum = sum(rcCorners[x, 0, 1] for x in range(len(rcCorners)) ) / len(rcCorners) def algo(v): return (math.atan2(v - x_sum, v - y_sum) + 2 * math.pi) % 2*math.pi ar.sort(key=algo) return ( ar, ar, ar, ar)
The dataset images have 16x16 pixels. The size of the new image will be 144x144, because we have to divide each row and col by 9 and we have to return the same size of the images.
#point to remap points1 = np.array([ np.array([0.0,0.0] ,np.float32) + np.array([144,0], np.float32), np.array([0.0,0.0] ,np.float32), np.array([0.0,0.0] ,np.float32) + np.array([0.0,144], np.float32), np.array([0.0,0.0] ,np.float32) + np.array([144,144], np.float32), ],np.float32) outerPoints = getOuterPoints(approximation) points2 = np.array(outerPoints,np.float32) #Transformation matrix pers = cv2.getPerspectiveTransform(points2, points1 ); #remap the image warp = cv2.warpPerspective(image_sudoku_original, pers, (SUDOKU_SIZE*IMAGE_HEIGHT, SUDOKU_SIZE*IMAGE_WIDHT)); warp_gray = cv2.cvtColor(warp, cv2.COLOR_BGR2GRAY) #show image _=pl.imshow(warp_gray, cmap=pl.gray()) _=pl.axis("off")
we've now got an undistorted square Sudoku puzzle
After remapping, we can divide the puzzle into a 9×9 grid. Each cell in the grid will correspond (approximately) to a cell on the puzzle. The correspondence might not be perfect, but it should be good enough.
Now, we know the size of the bounding box. We also know the size of the image. We can easily center the image. We create a new image that’s the same size as the original.
Sending images directly to k-Nearest Neighbors is not a good idea. A little bit of work on the image can increase accuracy. Here, we’ll center the actual digit in the image. The dataset has been made in such a way, all digits are centered by their bounding box.
To remove the noise around the number (lines in general) we delete all the pixels outside the center of the image from a radius.
def extract_number(x, y): #square -> position x-y im_number = warp_gray[x*IMAGE_HEIGHT:(x+1)*IMAGE_HEIGHT][:, y*IMAGE_WIDHT:(y+1)*IMAGE_WIDHT] #threshold im_number_thresh = cv2.adaptiveThreshold(im_number,255,1,1,15,9) #delete active pixel in a radius (from center) for i in range(im_number.shape): for j in range(im_number.shape): dist_center = math.sqrt( (IMAGE_WIDHT/2 - i)**2 + (IMAGE_HEIGHT/2 - j)**2); if dist_center > 6: im_number_thresh[i,j] = 0; n_active_pixels = cv2.countNonZero(im_number_thresh) return [im_number, im_number_thresh, n_active_pixels]
Sometimes after remove the noise of the images can appear other particles close to the number. Because of this, we find the biggest bounding box in the small squared. We make the bounding box a little more bigger to improve the matching with the dataset.
def find_biggest_bounding_box(im_number_thresh): contour,hierarchy = cv2.findContours(im_number_thresh.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) biggest_bound_rect = ; bound_rect_max_size = 0; for i in range(len(contour)): bound_rect = cv2.boundingRect(contour[i]) size_bound_rect = bound_rect*bound_rect if size_bound_rect > bound_rect_max_size: bound_rect_max_size = size_bound_rect biggest_bound_rect = bound_rect #bounding box a little more bigger x_b, y_b, w, h = biggest_bound_rect; x_b= x_b-1; y_b= y_b-1; w = w+2; h = h+2; return [x_b, y_b, w, h]
Now we have to separate all the numbers from the grid. First we extract the numbers. If we have more active pixels than a threshold, we find the biggest bounding box in the image if the size of this bounding box is bigger than 0, we store this number in a matrix. In another case we store a matrix of zeros.
import math import numpy as np #sudoku representation sudoku = np.zeros(shape=(9*9,IMAGE_WIDHT*IMAGE_HEIGHT)) def Recognize_number( x, y): """ Recognize the number in the rectangle """ #extract the number (small squares) [im_number, im_number_thresh, n_active_pixels] = extract_number(x, y) if n_active_pixels> N_MIN_ACTVE_PIXELS: [x_b, y_b, w, h] = find_biggest_bounding_box(im_number_thresh) im_t = cv2.adaptiveThreshold(im_number,255,1,1,15,9); number = im_t[y_b:y_b+h, x_b:x_b+w] if number.shape*number.shape>0: number = cv2.resize(number, (IMAGE_WIDHT, IMAGE_HEIGHT), interpolation=cv2.INTER_LINEAR) ret,number2 = cv2.threshold(number, 127, 255, 0) number = number2.reshape(1, IMAGE_WIDHT*IMAGE_HEIGHT) sudoku[x*9+y, :] = number; return 1 else: sudoku[x*9+y, :] = np.zeros(shape=(1, IMAGE_WIDHT*IMAGE_HEIGHT)); return 0
Here you can see the segmented numbers in a grid
index_subplot=0 n_numbers=0 indexes_numbers =  for i in range(SUDOKU_SIZE): for j in range(SUDOKU_SIZE): if Recognize_number(i, j)==1: if (n_numbers%5)==0: index_subplot=index_subplot+1 indexes_numbers.insert(n_numbers, i*9+j) n_numbers=n_numbers+1 #create subfigures f,axarr= pl.subplots(index_subplot,5) width = 0; for i in range(len(indexes_numbers)): ind = indexes_numbers[i] if (i%5)==0 and i!=0: width=width+1 axarr[i%5, width].imshow(cv2.resize(sudoku[ind, :].reshape(IMAGE_WIDHT,IMAGE_HEIGHT), (IMAGE_WIDHT*5,IMAGE_HEIGHT*5)), cmap=pl.gray()) axarr[i%5, width].axis("off")
In machine learning, classification is the problem of identifying to which of a set of categories a new observation belongs. Classification is an example of the more general problem of pattern recognition. Which is the assignment of some sort of output value to a given input value. The k-Nearest Neighbors algorithm (or KNN) is a non-parametric method used for classification and regression.
In SHOGUN, you can use CKNN to perform k-Nearest Neighbors algorithm, it's a non-parametric method used for classification and regression. To construct a KNN machine, you must choose the hyper-parameter K and a distance function. Usually, we simply use the standard CEuclideanDistance, but in general, any subclass of CDistance could be used.
We need to load the training data. but the dataset values are between -1 and 1. We need to normalize the data between 0-255.
from modshogun import MulticlassLabels,RealFeatures from modshogun import KNN, EuclideanDistance from scipy.io import loadmat from numpy import random #load data mat = loadmat('../../../data/multiclass/usps.mat') Xall = mat['data'] #normalize between 0-255 normalize = np.zeros((Xall.shape, Xall.shape,1), np.uint8) normalize= cv2.normalize(Xall, normalize, 0, 255, cv2.NORM_MINMAX, cv2.CV_8UC1); Xall = np.asmatrix(normalize, np.float64) #load labels Yall = np.array(mat['label'].squeeze(), dtype=np.double) Yall = Yall - 1 #random position to train knn random.seed(0) subset = random.permutation(len(Yall)) test_images = Xall[:, subset[:5000]] test_labels = Yall[subset[:5000]]
We initialize the knn, set up the features and configure the kind of distance that we want to use (in this case Euclidean distance).
k=10 labels_numbers = MulticlassLabels(test_labels) feats = RealFeatures(test_images) dist = EuclideanDistance() knn = KNN(k, dist, labels_numbers) _=knn.train(feats)
Now we have to take all the segmented numbers and returns what digit it is.
sudoku_puzzle = np.zeros(shape=(9,9)) #Predict the number for i in range(SUDOKU_SIZE): for j in range(SUDOKU_SIZE): #has the image some active pixel? n_active_pixels = cv2.countNonZero(np.asmatrix(sudoku[i*9+j, :])) if n_active_pixels>20 : feats_test = RealFeatures( np.asmatrix(sudoku[i*9+j, :], np.float64).T) sudoku_puzzle[i,j] = knn.apply_multiclass(feats_test) else: sudoku_puzzle[i,j] = 0 print sudoku_puzzle #show image _=pl.imshow(image_sudoku_original) _=pl.axis("off")
[[ 0. 3. 0. 4. 0. 0. 0. 0. 0.] [ 4. 0. 0. 0. 0. 0. 0. 0. 2.] [ 2. 0. 0. 5. 0. 0. 7. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 2. 0. 0. 0. 4. 0. 0. 0.] [ 3. 0. 0. 0. 0. 0. 0. 4. 0.] [ 0. 0. 0. 0. 7. 0. 0. 3. 0.] [ 0. 4. 0. 2. 0. 0. 0. 0. 8.] [ 7. 0. 0. 0. 0. 0. 4. 0. 0.]]
The recognition isn't very accurate because the training data is for handwritten text.
The accuracy of the process could be improved. In this example we have used a holistic method, all the image is introduced in the classifier. But adding new features to the classifier, for example the number of holes, number of straight lines or length of the lines, could improve the accuracy of the classifier.