Turning a bad gel electrophoresis image into a professional-grade figure

An image for a publication better be looking nice. But what if your gel electrophoresis setup looks like that:

DIY gel setup

And it produces images similar to this:

Original bad quality gel image

 Why does it look this way? Maybe your lab us very underfunded, or maybe you were lazy to remake the image, or you were in a hurry, or you outright refuse to pay thousands of dollars for the setup, that must cost 200 dollars at best (because super-complex smartphone cost only $800). Or maybe you despise manual wet lab work, but don't mind tinkering with some programming, like I do.

Regardless of the reasons, I made this tutorial so you can make a professional-looking image from a very bad quality gel picture.

I am assuming that you have some experience with experimental molecular biology, as well as basic Python skills. The work is done in Jupyter notebook environment.

Opening gel image in Jupyter notebook

To open an image, we will use OpenCV python library. Please check the separate guide for installation and basic usage. The OpenCV package will help with opening the file as well as with some basic manipulations. We well use matplotlib package to display the images. We will store images as a Numpy arrays, so you should also import numpy module.

import skimage
import scipy
from scipy.ndimage.interpolation import shift
from PIL import Image
import cv2
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

 For this tutorial, I am using the following versions of the packages: 

OpenCV: 4.5.5
MatPlotLib: 3.2.1
Numpy: 1.21.5

First, lets open the raw image:

gel = cv2.imread(r'.\data\2022-03-10 02.36.53.jpg')

The gel is the 3-dimensional numpy array, 1st and 2nd dimension represent the coordinate of a pixel, the third dimension is the color. The value at each coordinate shows the strength of each color for each point, thus forming a final picture.

The command gel.shape returns (3456, 4608, 3) 

To show only one code, we can do:

Raw gel image one color values numpy array

Rotating the picture

Obviously, you noted that the image is sideways, and it is very hard to interpret. Fortunately, it is very easy to rotate the image using OpenCV:

plt.imshow(cv2.rotate(gel, cv2.ROTATE_90_COUNTERCLOCKWISE))

Raw gel image vertical

Straightening the gel

The gel is supposed to be square-shaped, but my picture-taking setup was far from ideal (I took a picture on my cell phone while holding the UV light filter with my other hand). As a result, the cell phone plane was not parallel with the gel plane, and the gel comes out distorted.

At this step, we will fix this distortion.

To simplify the task, we first transform the gel into grayscale.

Transforming image to grayscale

gray = cv2.cvtColor(gel, cv2.COLOR_BGR2GRAY)

The gray variable contains the array that represents the following monochrome image:

plt.imshow(gray, cmap='Greys') plt.imshow(gray)
Gel monochrome Gel monochrome yellow blue

I kept the image tilted for simplicity; we will make it vertical at the future steps.

Manually finding the gel corners coordinates

To find the coordinates, we will set arbitrary (x, y) coordinates of each corner. Using those coordinates, we will make a mask that will cover everything outside the desired area. We will plot the masked picture. The coverage will likely be not ideal, so we will correct the coordinates and replot. We will continue until we are satisfied with the result.

upper_left = (160, 200)
upper_right = (4150, 100)
lower_right = (4110, 3420)
lower_left = (190, 3260)

corners_coordinates = np.array([[upper_left, upper_right, lower_right, lower_left]], dtype=np.int32)

mask = np.zeros(gray.shape, dtype="uint8")
mask = cv2.fillPoly(mask, corners_coordinates, 255)
masked = cv2.bitwise_and(gray, mask)

After numerous adjustments of the coordinates, the final masked picture looks like this:

Gel masked for top-down view stretching

 Stretching the gel image into top-down view

The transformation uses the OpenCV functions getPerspectiveTransform and warpPerspective. Collectively, they take the original image with known corner coordinates and "stretch" (or shrink) it to the new corner coordinates. To use those functions, we need to provide it with the original image, the original corner coordinates, and with the desired new corner coordinates.

Lets organize those technical manipulations into the function:

def fourPointTransform(image, top_left, top_right, bottom_right, bottom_left):

# transforming point coordinates into array
rect = np.array([top_left, top_right, bottom_right, bottom_left], dtype=np.float32)
(tl, tr, br, bl) = rect

# compute the width and height of the new image, which will be the
# maximum distance between bottom-right and bottom-left
# x-coordiates or the top-right and top-left x-coordinates
# Same for the y-coordinates.
calcLength = lambda a, b: np.sqrt(((a[0] - b[0]) ** 2) + ((a[1] - b[1]) ** 2))
maxLength = lambda a, b, c, d: max(int(calcLength(a, b)), int(calcLength(c, d)))

maxWidth = maxLength(br, bl, tr, tl)
maxHeight = maxLength(tr, br, tl, bl)

# Constructing the set of destination points to obtain a top-down view of the gel.
# Points are in the top-left, top-right, bottom-right, and bottom-left # order
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")

# compute the perspective transform matrix and then apply it
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
warped = warped.astype(np.int16)

# return the warped image
return warped

After running the function:

transformed = fourPointTransform(gray, upper_left, upper_right, lower_right, lower_left)

we receive the following corrected image:

Gel top-down view, corners stretched

Removing the background

The resulting square-shaped monochrome gel image still has a lot of artifacts.

First, most notable, is the high-intensity (yellow) stripes on the sides. I used standard blue LEDs to illuminate the gel in my hand-made transilluminator. It costed me almost nothing to build it, but now the LEDs beam strong light strobes into the gel. That light causes the gel to fluoresce even without the DNA present, resulting in a strong beams where the light intensity is the highest.

The second is even without the bright stripes, the gel background is not uniform. It is brighter on the side and darker in the center.

One way to deal with those is to create a very blurred image of the gel and subtract it from the original. The blurred image will average out the small features, such as DNA bands, but preserve the bigger features, such as the background gradients.

OpenCV can perform the operation called GaussianBlur, that requires the original image and the size of the area at which to perform the Gaussian blurring. The area does not have to be circle; rather, in this case, it is better to use the elliptic one. This way, we can capture finer features, such as the light stripes on the side.

The size of the averaged area (209, 2009) is found by the trial and error, it is better to adjust this value for every case separately.

bkgnd = cv2.GaussianBlur(transformed, (209, 2009), 0)

Gel background

The resulting background picture looks pretty much the same except the DNA bands are not present. This is what we wanted.

Now subtracting the background from the original and plotting the results.

subtr1 = transformed - bkgnd

Gel without background

As you can see, we completely removed background gradient and significantly decreased the intensity of the stripes.

Isolating the lines

The next step is to transform each gel line into the plot of fluorescence intensity vs. distance. This is what the professional bioanalyzers output, and this data will allow us to build a simulated gel picture in any format or color that we want.

It will also let us average and smooth the data, removing the remaining gel artifacts.

To do that, we calculate the mean value across all horizontal lines. The lines that has bands in them will naturally has higher average value, while the gaps between the gels will be dimmer in average.

Generating gel profile

Here is the resulting plot. Y axis represents the fluorescence intensity, while X axis is the vertical position in the gel. The teeth-like features are our gel bands. 

profile = subtr1.mean(axis=1)
plt.plot(profile)

Gel lines vertical profile 

We can also apply the noise filter to smoothen the plot. It will come handy at the next step. Scipy package has all the functionality. 

profile_smooth = scipy.signal.savgol_filter(profile, 51, 3)
plt.plot(profile_smooth)

 

Finding the coordinates of each gel line

We can find the approximate coordinate at which the gel starts and finish by finding the point at which the fluorescence changes the most. In the other words, we differentiate the profile of the gel:

differ = np.diff(profile_smooth)

We also remove the sides of the gel that does not carry any useful information for the plot:

plt.plot(differ[300:3000])

Gel image profile differentiated

Same as with the profile, we can apply the noise suppression:

differ_no_noise = scipy.signal.savgol_filter(differ, 51, 3)
plt.plot(differ_no_noise[300:3000])

Gel lines profile differentiated smoothed

Cutting the threshold:

differ_no_noise[differ_no_noise <= 0.1] = 0
plt.plot(differ_no_noise)

Gel vertical profile differentiated, thresholded

Now attempting to find the peak coordinates automatically.

peak_coords = scipy.signal.argrelextrema(differ_no_noise, np.greater)[0]

The output of the array looks like:

array([ 468, 710, 953, 1200, 1444, 1692, 1936, 2185, 2432, 3083, 3215], dtype=int64)

The last two peaks are the artifacts from the gel, we will ignore them.

peaks = peak_coords[:-2]

Output:

array([ 468, 710, 953, 1200, 1444, 1692, 1936, 2185, 2432], dtype=int64)

So here are the coordinates of where the gel lines start. We need to use slightly narrower line, to skip the irregularities at the border of each line. So, we need to add some constant to each of the peak values found before. We also need to use a certain value for the width of the gel line. We also need to agree on the length of the gel, specifying the coordinates before and after. Then, we simply slice the numpy array of the gel picture, leaving only one line. We do it with the function:

def extractLine(gel, fluor_increase_coord, line_width, dist_from_peak, start_point, end_point):
    line_start = fluor_increase_coord + dist_from_peak
    line_end = line_start + line_width
    line = gel[line_start:line_end, start_point:end_point]
    return line

After extracting the line, we need to average it over the width. This function will return a line that represents the fluorescence intensity versus the gel coordinate. It will still have some background, though.

def averageLine(line):
    return line.mean(axis=0)

We can also obtain the bacground, grabbing the empty areas around each gel line. The following function accepts two areas and calculates a mean bacground, also a line.

def averageBackground(bk1, bk2):
    bk1mean = averageLine(bk1)
    bk2mean = averageLine(bk2)
    bk = (bk1mean + bk2mean)/2.0
    return bk

The next function simply subtracts the background from the line

def bkgroundSubtract(line, background):
    return line - background

Smoothing the line:

def smoothLine(line, a=51, b=5):
    return scipy.signal.savgol_filter(line, a, b)

This one calculates the average width of a line. The result is used as an input of the extractLine function we reviewed before

def calcAverageBandWidth(peaks):
    diff_array = np.diff(peaks)
    return int(diff_array.mean())

Adding one extra "peak" which will serve for the background subtraction.

peaks = np.append(peaks, peaks[-1]+calcAverageBandWidth(peaks))

The peaks array looks like this:

array([ 468, 710, 953, 1200, 1444, 1692, 1936, 2185, 2432, 2677], dtype=int64)

Now putting it all together, and extracting every gel line from the gel. Note we will also do only right side of the gel, for the simplicity.

lines_list = []
for index, peak in enumerate(peaks):
    if index+1 < len(peaks):
        line = extractLine(subtr1, peak, 100, 40, 2150, 3100)
        bk1 = extractLine(subtr1, peak, 20, -30, 2150, 3100)
        bk2 = extractLine(subtr1, peaks[index+1], 20, -30, 2150, 3100)
        line_avg = averageLine(line)
        bkgnd_avg = averageBackground(bk1, bk2)
        line_no_bkg = bkgroundSubtract(line_avg, bkgnd_avg)
        line_smooth = smoothLine(line_no_bkg)
        lines_list.append(line_smooth)

This is what the result will look like for the first line:

First line averaged fluorescence intensity

The peaks on the left represent smaller DNA fragments, and the peaks on the right - larger ones.

Since the gel line contains 100 bp ladder, the peaks represent following DNA sizes, from right to left: 100, 200, 300, 400, 500, 600 (largest peak), followed by 700, 800, 900, 1000, 1200, 1500 and 2000 bp rightmost band.

Generating the artificial gel image

Now, using the linearized peaks, we can generate an artificial gel image, where we have a full control over band widths, size of the image, color palette etc.

We will do this by copying the fluorescence vs length array multiple times. This will result with a line. We can also add empty gaps on each side of the line, so the generated gel image looks more natural.

def genGap(length, width):
    return np.zeros((length, width))


def genGelLine(line, width, gap_width):
    line_img = np.zeros((line.shape[0], width))

    for i in range(width):
        line_img[:, i] = line[::-1]

    if gap_width > 0:
        gap = np.zeros((line.shape[0], gap_width))
        line_img = np.concatenate((gap, line_img, gap), axis=1)

    return line_img


def genGel(lines_list, width, gap_width):
    gel_lines_list = list(map(lambda x: genGelLine(x, width, gap_width), lines_list))
    gel = np.concatenate(gel_lines_list, axis=1)
    return gel

Plotting the result for one line. Looks like a gel line, but without any background artifacts!

plt.imshow(genGelLine(lines_list[0], 100, 10))

Generated gel line for the 1st line

Now plotting all the lines:

plt.imshow(genGel(lines_list, 100, 1))

Artificial gel all lines

You see, the bands are shifted against each other. However, each line contains 2000 bp DNA band (on the top), so the top bands should align.

The misalignment is likely due to slightly deformed gel.

We can normalize the lines against the top band. Lets find the line where the 2000 bp band is uppermost, and shift the other lines relatively, so their 2000 bp bands are at the same position.

def getMaxBandPos(line, start, finish):
    subline = line[start:finish]
    index = subline.argmax()
    index = start + index
    return index

band_index_list = list(map(lambda x: getMaxBandPos(x, 850, 1000), lines_list))

max_index = max(band_index_list)

aligned_lines = list(map(lambda x, y: shift(x, max_index-y, cval=0), lines_list, band_index_list))

And finally, we can annotate the image. I provided a manual set of coordinates for each text element, found by trial and error.

If your image is standard, you can also write a custom automated procedure to position the text.

fig,ax = plt.subplots(1)
ax.set_yticklabels([])
ax.set_xticklabels([])
ax.set_xticks([])
ax.set_yticks([])
textstr = "Beads : sample volume ratio"
plt.gcf().text(0.47, 0.95, textstr, fontsize=14)
plt.gcf().text(0.335, 0.90, '100bp\nladder', fontsize=14)
plt.gcf().text(0.385, 0.90, '3x', fontsize=14)
plt.gcf().text(0.42, 0.90, '1.5x', fontsize=14)
plt.gcf().text(0.465, 0.90, '1x', fontsize=14)
plt.gcf().text(0.5, 0.90, '0.8x', fontsize=14)
plt.gcf().text(0.545, 0.90, '0.7x', fontsize=14)
plt.gcf().text(0.585, 0.90, '0.6x', fontsize=14)
plt.gcf().text(0.625, 0.90, '0.5x', fontsize=14)
plt.gcf().text(0.665, 0.90, '0.4x', fontsize=14)

plt.gcf().text(0.295, 0.21, '100bp', fontsize=14)
plt.gcf().text(0.295, 0.28, '200bp', fontsize=14)
plt.gcf().text(0.295, 0.36, '300bp', fontsize=14)
plt.gcf().text(0.295, 0.415, '400bp', fontsize=14)
plt.gcf().text(0.295, 0.47, '500bp', fontsize=14)
plt.gcf().text(0.295, 0.51, '600bp', fontsize=14)
plt.gcf().text(0.295, 0.545, '700bp', fontsize=14)
plt.gcf().text(0.295, 0.58, '800bp', fontsize=14)
plt.gcf().text(0.295, 0.61, '900bp', fontsize=14)
plt.gcf().text(0.290, 0.64, '1000bp', fontsize=14)
plt.gcf().text(0.290, 0.7, '1200bp', fontsize=14)
plt.gcf().text(0.290, 0.76, '1500bp', fontsize=14)
plt.gcf().text(0.290, 0.84, '2000bp', fontsize=14)
ax.imshow(genGel(aligned_lines, 100, 1))

That's it! Let me know if you have any questions, or have your own tricks to share.

This post is written with a support from SergiLabSupplies, LLC. It is a store where scientists can purchase unreasonably expensive products for a reasonable price. Check it out: https://sergilabsupplies.com/

I am using a gel image generated as part of a routine quality control for a magnetic beads for DNA purification: https://sergilabsupplies.com/collections/spri-magnetic-beads/products/magnetic-beads-for-dna-purification-10ml