sinisa at eecs oregonstate edu
2107 Kelley Engineering Center

MWF 2-2:50pm, BAT 144

T 2:30-3pm, or by appointment

## Department of Computer Science and Engineering

Cs474/674 image processing and interpretation (fall 2023).

• M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis and Machine Vision , Cengage Learning, 2015.
• S. Birchfield, Image Processing and Analysis , Cengage Learning, 2018.

## Prerequisites

Course outline (tentative), exams and assignments, course policies, useful information.

• Math Review Material (from textbook)

## Sample Exams

• Course Overview
• PGM Image File Format
• Introduction to Image Processing
• Math Review
• Fourier Transform (see also Fourier Transform Pairs 1 and Fourier Transform Pairs 2)
• Midterm Review
• Fast Fourier Transform (FFT)
• Convolution
• Sampling and Aliasing
• Image Restoration
• Image Compression (also, see Image Compression Techniques and Survey )
• Final Review
• Short Time Fourier Transform (STFT) (study chapters 1 and 2 from Wavelet Tutorial )
• No time to cover this but here is the link if you are interested in learning more about wavelets: Wavelets (also, see notes part 1 , and part 2 )
• Multiresolution Analysis

## Homework Assignments

• Homework 7 (Wavelets - except problems 7.13 and 7.14) (7.9, p 521 >> 6.44, p 525) (7.12, 7.13, p 522 >> 6.36, 6.37, p 524) (7.14, p 522 >> 6.38, p 524) (7.16, p 522 >> 6.40, p 524) (7.19, p 523 >> 6.41, p 525) (7.21, p 523 >> 6.43, p 525) (7.24, p 523 >> 6.45, p 525) Solutions

## Programming Assignments

• Pogramming Assignment 3 (Due date (extended): 11/20/2023) powerpoint file fft.c , documentation , Rect_128.txt
• Project 6 (Due date: 12/14/98) (download data) ( penny_head.gif and penny_tails.gif ) ( nickel_head.gif and nickel_tails.gif ) ( dime_head.gif and dime_tails.gif ) ( quarter_head.gif and quarter_tails.gif ) coins1.gif \$0.36 coins2.gif \$0.36 (same as coins1 - different lighting) coins3.gif \$0.51 coins4.gif \$0.51 (same as coins3 - different scale) coins5.gif \$0.66 coins6.gif \$0.65 (occlusion) coins8.gif \$0.47 coins9.gif \$0.50 coins10.gif \$0.11 coins11.gif \$0.11 (same as 10 - different scale) coins16.gif \$0.36 coins17.gif \$0.60 (different viewpoint)

## Sample Presentation Topics (Graduate Students Only)

Presentation guidelines.

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

## Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

• Notifications You must be signed in to change notification settings

Assignments done as a part of Digital Image Processing Course at IIIT-H

## swetanjal/Digital-Image-Processing

Folders and files.

NameName
29 Commits

## Repository files navigation

Digital image processing.

This repository contains all the Assignments done as a part of 'Digital Image Processing (DIP)' Course instructed by Dr. Ravi Kiran during the Spring 2020 Semester at IIIT-H. The goal of this course was to make oneself comfortable with how digital images are stored and processed on the computer and also get a feel of how an image would look like after going through some digital processing. I have committed the Problem Statements and Solution Notebooks in the appropriate Assignment directories.

Here is a brief description of each of the Assignments. Please check out the Problem Statements PDFs for a detailed understanding of the objectives of each of the Assignments.

• Assignment 1: Pixel Manipulation, Contrast Stretching, Bits Manipulation, Intensity Transformation, Histogram Equalization and Histogram Transformation
• Assignment 2: Edge Detection, High Boost Filtering, Laplacian of an image, Convolution, Bilateral Filters, Image Restoration, Linear Spatial Filters
• Assignment 3: Discrete Fourier Transform(DFT), Fast Fourier Transform(FFT), Low pass Filtering, High pass Filtering, Band pass Filtering, Spatial Sampling
• Assignment 4: Morphological Operations(erosion, dilation), Skeletonization of an image, Segregating organs of interest from CT Scan, Connected Components

For any queries related to the content, feel free to reach out to me at: [email protected]

• Jupyter Notebook 99.1%

## Image Processing: Techniques, Types, & Applications [2023]

Deep learning has revolutionized the world of computer vision—the ability for machines to “see” and interpret the world around them.

In particular, Convolutional Neural Networks (CNNs) were designed to process image data more efficiently than traditional Multi-Layer Perceptrons (MLP).

Since images contain a consistent pattern spanning several pixels, processing them one pixel at a time—as MLPs do—is inefficient.

This is why CNNs that process images in patches or windows are now the de-facto choice for image processing tasks.

But let’s start from the beginning—

‍ Here’s what we’ll cover:

## What is Image Processing?

• How Machines “See” Images?

## Phases of Image Processing

Image processing techniques.

Turn images, PDFs, or free-form text into structured insights

Digital Image processing is the class of methods that deal with manipulating digital images through the use of computer algorithms. It is an essential preprocessing step in many applications, such as face recognition, object detection, and image compression.

Image processing is done to enhance an existing image or to sift out important information from it. This is important in several Deep Learning-based Computer Vision applications, where such preprocessing can dramatically boost the performance of a model. Manipulating images, for example, adding or removing objects to images, is another application, especially in the entertainment industry.

This paper addresses a medical image segmentation problem, where the authors used image inpainting in their preprocessing pipeline for the removal of artifacts from dermoscopy images. Examples of this operation are shown below.

The authors achieved a 3% boost in performance with this simple preprocessing procedure which is a considerable enhancement, especially in a biomedical application where the accuracy of diagnosis is crucial for AI systems. The quantitative results obtained with and without preprocessing for the lesion segmentation problem in three different datasets are shown below.

## Types of Images / How Machines “See” Images?

Digital images are interpreted as 2D or 3D matrices by a computer, where each value or pixel in the matrix represents the amplitude, known as the “intensity” of the pixel. Typically, we are used to dealing with 8-bit images, wherein the amplitude value ranges from 0 to 255.

Thus, a computer “sees” digital images as a function: I(x, y) or I(x, y, z) , where “ I ” is the pixel intensity and (x, y) or (x, y, z) represent the coordinates (for binary/grayscale or RGB images respectively) of the pixel in the image.

Computers deal with different “types” of images based on their function representations. Let us look into them next.

## 1. Binary Image

Images that have only two unique values of pixel intensity- 0 (representing black) and 1 (representing white) are called binary images. Such images are generally used to highlight a discriminating portion of a colored image. For example, it is commonly used for image segmentation, as shown below.

## 2. Grayscale Image

Grayscale or 8-bit images are composed of 256 unique colors, where a pixel intensity of 0 represents the black color and pixel intensity of 255 represents the white color. All the other 254 values in between are the different shades of gray.

An example of an RGB image converted to its grayscale version is shown below. Notice that the shape of the histogram remains the same for the RGB and grayscale images.

## 3. RGB Color Image

The images we are used to in the modern world are RGB or colored images which are 16-bit matrices to computers. That is, 65,536 different colors are possible for each pixel. “RGB” represents the Red, Green, and Blue “channels” of an image.

Up until now, we had images with only one channel. That is, two coordinates could have defined the location of any value of a matrix. Now, three equal-sized matrices (called channels), each having values ranging from 0 to 255, are stacked on top of each other, and thus we require three unique coordinates to specify the value of a matrix element.

Thus, a pixel in an RGB image will be of color black when the pixel value is (0, 0, 0) and white when it is (255, 255, 255). Any combination of numbers in between gives rise to all the different colors existing in nature. For example, (255, 0, 0) is the color red (since only the red channel is activated for this pixel). Similarly, (0, 255, 0) is green and (0, 0, 255) is blue.

An example of an RGB image split into its channel components is shown below. Notice that the shapes of the histograms for each of the channels are different.

## 4. RGBA Image

RGBA images are colored RGB images with an extra channel known as “alpha” that depicts the opacity of the RGB image. Opacity ranges from a value of 0% to 100% and is essentially a “see-through” property.

Opacity in physics depicts the amount of light that passes through an object. For instance, cellophane paper is transparent (100% opacity), frosted glass is translucent, and wood is opaque. The alpha channel in RGBA images tries to mimic this property. An example of this is shown below.

The fundamental steps in any typical Digital Image Processing pipeline are as follows:

## 1. Image Acquisition

The image is captured by a camera and digitized (if the camera output is not digitized automatically) using an analogue-to-digital converter for further processing in a computer.

## 2. Image Enhancement

In this step, the acquired image is manipulated to meet the requirements of the specific task for which the image will be used. Such techniques are primarily aimed at highlighting the hidden or important details in an image, like contrast and brightness adjustment, etc. Image enhancement is highly subjective in nature.

## 3. Image Restoration

This step deals with improving the appearance of an image and is an objective operation since the degradation of an image can be attributed to a mathematical or probabilistic model. For example, removing noise or blur from images.

## 4. Color Image Processing

This step aims at handling the processing of colored images (16-bit RGB or RGBA images), for example, peforming color correction or color modeling in images.

## 5. Wavelets and Multi-Resolution Processing

Wavelets are the building blocks for representing images in various degrees of resolution. Images subdivision successively into smaller regions for data compression and for pyramidal representation.

## 6. Image Compression

For transferring images to other devices or due to computational storage constraints, images need to be compressed and cannot be kept at their original size. This is also important in displaying images over the internet; for example, on Google, a small thumbnail of an image is a highly compressed version of the original. Only when you click on the image is it shown in the original resolution. This process saves bandwidth on the servers.

## 7. Morphological Processing

Image components that are useful in the representation and description of shape need to be extracted for further processing or downstream tasks. Morphological Processing provides the tools (which are essentially mathematical operations) to accomplish this. For example, erosion and dilation operations are used to sharpen and blur the edges of objects in an image, respectively.

## 8. Image Segmentation

This step involves partitioning an image into different key parts to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation allows for computers to put attention on the more important parts of the image, discarding the rest, which enables automated systems to have improved performance.

## 9. Representation and Description

Image segmentation procedures are generally followed by this step, where the task for representation is to decide whether the segmented region should be depicted as a boundary or a complete region. Description deals with extracting attributes that result in some quantitative information of interest or are basic for differentiating one class of objects from another.

## 10. Object Detection and Recognition

After the objects are segmented from an image and the representation and description phases are complete, the automated system needs to assign a label to the object—to let the human users know what object has been detected, for example, “vehicle” or “person”, etc.

## 11. Knowledge Base

Knowledge may be as simple as the bounding box coordinates for an object of interest that has been found in the image, along with the object label assigned to it. Anything that will help in solving the problem for the specific task at hand can be encoded into the knowledge base.

Automate repetitive tasks and complex processes with AI

Image processing can be used to improve the quality of an image, remove undesired objects from an image, or even create new images from scratch. For example, image processing can be used to remove the background from an image of a person, leaving only the subject in the foreground.

Image processing is a vast and complex field, with many different algorithms and techniques that can be used to achieve different results. In this section, we will focus on some of the most common image processing tasks and how they are performed.

## Task 1: Image Enhancement

One of the most common image processing tasks is an image enhancement, or improving the quality of an image. It has crucial applications in Computer Vision tasks, Remote Sensing, and surveillance. One common approach is adjusting the image's contrast and brightness.

Contrast is the difference in brightness between the lightest and darkest areas of an image. By increasing the contrast, the overall brightness of an image can be increased, making it easier to see. Brightness is the overall lightness or darkness of an image. By increasing the brightness, an image can be made lighter, making it easier to see. Both contrast and brightness can be adjusted automatically by most image editing software, or they can be adjusted manually.

However, adjusting the contrast and brightness of an image are elementary operations. Sometimes an image with perfect contrast and brightness, when upscaled, becomes blurry due to lower pixel per square inch (pixel density). To address this issue, a relatively new and much more advanced concept of Image Super-Resolution is used, wherein a high-resolution image is obtained from its low-resolution counterpart(s). Deep Learning techniques are popularly used to accomplish this.

For example, the earliest example of using Deep Learning to address the Super-Resolution problem is the SRCNN model, where a low-resolution image is first upscaled using traditional Bicubic Interpolation and then used as the input to a CNN model. The non-linear mapping in the CNN extracts overlapping patches from the input image, and a convolution layer is fitted over the extracted patches to obtain the reconstructed high-resolution image. The model framework is depicted visually below.

An example of the results obtained by the SRCNN model compared to its contemporaries is shown below.

## Task 2: Image Restoration

The quality of images could degrade for several reasons, especially photos from the era when cloud storage was not so commonplace. For example, images scanned from hard copies taken with old instant cameras often acquire scratches on them.

Image Restoration is particularly fascinating because advanced techniques in this area could potentially restore damaged historical documents. Powerful Deep Learning-based image restoration algorithms may be able to reveal large chunks of missing information from torn documents.

Image inpainting, for example, falls under this category, and it is the process of filling in the missing pixels in an image. This can be done by using a texture synthesis algorithm, which synthesizes new textures to fill in the missing pixels. However, Deep Learning-based models are the de facto choice due to their pattern recognition capabilities.

An example of an image painting framework (based on the U-Net autoencoder) was proposed in this paper that uses a two-step approach to the problem: a coarse estimation step and a refinement step. The main feature of this network is the Coherent Semantic Attention (CSA) layer that fills the occluded regions in the input images through iterative optimization. The architecture of the proposed model is shown below.

Some example results obtained by the authors and other competing models are shown below.

## Task 3: Image Segmentation

Image segmentation is the process of partitioning an image into multiple segments or regions. Each segment represents a different object in the image, and image segmentation is often used as a preprocessing step for object detection.

There are many different algorithms that can be used for image segmentation, but one of the most common approaches is to use thresholding. Binary thresholding, for example, is the process of converting an image into a binary image, where each pixel is either black or white. The threshold value is chosen such that all pixels with a brightness level below the threshold are turned black, and all pixels with a brightness level above the threshold are turned white. This results in the objects in the image being segmented, as they are now represented by distinct black and white regions.

In multi-level thresholding, as the name suggests, different parts of an image are converted to different shades of gray depending on the number of levels. This paper , for example, used multi-level thresholding for medical imaging —specifically for brain MRI segmentation, an example of which is shown below.

Modern techniques use automated image segmentation algorithms using deep learning for both binary and multi-label segmentation problems. For example, the PFNet or Positioning and Focus Network is a CNN-based model that addresses the camouflaged object segmentation problem. It consists of two key modules—the positioning module (PM) designed for object detection (that mimics predators that try to identify a coarse position of the prey); and the focus module (FM) designed to perform the identification process in predation for refining the initial segmentation results by focusing on the ambiguous regions. The architecture of the PFNet model is shown below.

The results obtained by the PFNet model outperformed contemporary state-of-the-art models, examples of which are shown below.

## Task 4: Object Detection

Object Detection is the task of identifying objects in an image and is often used in applications such as security and surveillance. Many different algorithms can be used for object detection, but the most common approach is to use Deep Learning models, specifically Convolutional Neural Networks (CNNs).

CNNs are a type of Artificial Neural Network that were specifically designed for image processing tasks since the convolution operation in their core helps the computer “see” patches of an image at once instead of having to deal with one pixel at a time. CNNs trained for object detection will output a bounding box (as shown in the illustration above) depicting the location where the object is detected in the image along with its class label.

An example of such a network is the popular Faster R-CNN ( R egion-based C onvolutional N eural N etwork) model, which is an end-to-end trainable, fully convolutional network. The Faster R-CNN model alternates between fine-tuning for the region proposal task (predicting regions in the image where an object might be present) and then fine-tuning for object detection (detecting what object is present) while keeping the proposals fixed. The architecture and some examples of region proposals are shown below.

## Task 5: Image Compression

Image compression is the process of reducing the file size of an image while still trying to preserve the quality of the image. This is done to save storage space, especially to run Image Processing algorithms on mobile and edge devices, or to reduce the bandwidth required to transmit the image.

Traditional approaches use lossy compression algorithms, which work by reducing the quality of the image slightly in order to achieve a smaller file size. JPEG file format, for example, uses the Discrete Cosine Transform for image compression.

Modern approaches to image compression involve the use of Deep Learning for encoding images into a lower-dimensional feature space and then recovering that on the receiver’s side using a decoding network. Such models are called autoencoders , which consist of an encoding branch that learns an efficient encoding scheme and a decoder branch that tries to revive the image loss-free from the encoded features.

For example, this paper proposed a variable rate image compression framework using a conditional autoencoder. The conditional autoencoder is conditioned on the Lagrange multiplier, i.e., the network takes the Lagrange multiplier as input and produces a latent representation whose rate depends on the input value. The authors also train the network with mixed quantization bin sizes for fine-tuning the rate of compression. Their framework is depicted below.

The authors obtained superior results compared to popular methods like JPEG, both by reducing the bits per pixel and in reconstruction quality. An example of this is shown below.

## Task 6: Image Manipulation

Image manipulation is the process of altering an image to change its appearance. This may be desired for several reasons, such as removing an unwanted object from an image or adding an object that is not present in the image. Graphic designers often do this to create posters, films, etc.

An example of Image Manipulation is Neural Style Transfer , which is a technique that utilizes Deep Learning models to adapt an image to the style of another. For example, a regular image could be transferred to the style of “Starry Night” by van Gogh. Neural Style Transfer also enables AI to generate art .

An example of such a model is the one proposed in this paper that is able to transfer arbitrary new styles in real-time (other approaches often take much longer inference times) using an autoencoder-based framework. The authors proposed an adaptive instance normalization (AdaIN) layer that adjusts the mean and variance of the content input (the image that needs to be changed) to match those of the style input (image whose style is to be adopted). The AdaIN output is then decoded back to the image space to get the final style transferred image. An overview of the framework is shown below.

Examples of images transferred to other artistic styles are shown below and compared to existing state-of-the-art methods.

## Task 7: Image Generation

Synthesis of new images is another important task in image processing, especially in Deep Learning algorithms which require large quantities of labeled data to train. Image generation methods typically use Generative Adversarial Networks (GANs) which is another unique neural network architecture .

GANs consist of two separate models: the generator, which generates the synthetic images, and the discriminator, which tries to distinguish synthetic images from real images. The generator tries to synthesize images that look realistic to fool the discriminator, and the discriminator trains to better critique whether an image is synthetic or real. This adversarial game allows the generator to produce photo-realistic images after several iterations, which can then be used to train other Deep Learning models.

## Task 8: Image-to-Image Translation

Image-to-Image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. For example, a free-hand sketch can be drawn as an input to get a realistic image of the object depicted in the sketch as the output, as shown below.

‍ Pix2pix is a popular model in this domain that uses a conditional GAN (cGAN) model for general purpose image-to-image translation, i.e., several problems in image processing like semantic segmentation, sketch-to-image translation, and colorizing images, are all solved by the same network. cGANs involve the conditional generation of images by a generator model. For example, image generation can be conditioned on a class label to generate images specific to that class.

Pix2pix consists of a U-Net generator network and a PatchGAN discriminator network, which takes in NxN patches of an image to predict whether it is real or fake, unlike traditional GAN models. The authors argue that such a discriminator enforces more constraints that encourage sharp high-frequency detail. Examples of results obtained by the pix2pix model on image-to-map and map-to-image tasks are shown below.

## Key Takeaways

The information technology era we live in has made visual data widely available. However, a lot of processing is required for them to be transferred over the internet or for purposes like information extraction, predictive modeling, etc.

The advancement of deep learning technology gave rise to CNN models, which were specifically designed for processing images. Since then, several advanced models have been developed that cater to specific tasks in the Image Processing niche. We looked at some of the most critical techniques in Image Processing and popular Deep Learning-based methods that address these problems, from image compression and enhancement to image synthesis.

Recent research is focused on reducing the need for ground truth labels for complex tasks like object detection, semantic segmentation, etc., by employing concepts like Semi-Supervised Learning and Self-Supervised Learning , which makes models more suitable for broad practical applications.

If you’re interested in learning more about computer vision, deep learning, and neural networks, have a look at these articles:

• Deep Learning 101: Introduction [Pros, Cons & Uses]
• What Is Computer Vision? [Basic Tasks & Techniques]
• Convolutional Neural Networks: Architectures, Types & Examples

Rohit Kundu is a Ph.D. student in the Electrical and Computer Engineering department of the University of California, Riverside. He is a researcher in the Vision-Language domain of AI and published several papers in top-tier conferences and notable peer-reviewed journals.

“Collecting user feedback and using human-in-the-loop methods for quality control are crucial for improving Al models over time and ensuring their reliability and safety. Capturing data on the inputs, outputs, user actions, and corrections can help filter and refine the dataset for fine-tuning and developing secure ML solutions.”

## Related articles

• File Exchange
• AI Chat Playground
• Discussions
• Communities
• Treasure Hunt
• MathWorks.com
• Trial software

You are now following this Submission

• You may receive emails, depending on your communication preferences

## Digital Image Processing - Assignment 1

• Open in MATLAB Online
• Version History
• Reviews (0)
• Discussions (0)

## COT 5930 Digital Image Processing - Assignment 1

Dan zimmerman - z23590872, requirements.

• MATLAB 2021a or later
• Image Processing Toolbox

## Suggested steps

• Clone or fork the repository.
• Open MATLAB.
• Open the Zimmerman_Assignment01v2_0_0.mlx Live Script.
• Follow the instructions in the Live Script.
• The images used are part of MATLAB core files.

Daniel Zimmerman (2024). Digital Image Processing - Assignment 1 (https://github.com/z3301/DigImgProc_01/releases/tag/v2.0.1), GitHub. Retrieved June 22, 2024 .

## MATLAB Release Compatibility

Platform compatibility, tags add tags, community treasure hunt.

Find the treasures in MATLAB Central and discover how the community can help you!

## Discover Live Editor

Create scripts with code, output, and formatted text in a single executable document.

Learn About Live Editor

• Zimmerman_Assignment01v2_0_0.mlx
Version Published Release Notes
2.0.1.0

See release notes for this release on GitHub:

2.0.0.0

See release notes for this release on GitHub:

1.0.1

1.0.0.0

See release notes for this release on GitHub:

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

• América Latina (Español)
• United States (English)
• Belgium (English)
• Denmark (English)
• Deutschland (Deutsch)
• España (Español)
• Finland (English)
• France (Français)
• Ireland (English)
• Italia (Italiano)
• Luxembourg (English)
• Netherlands (English)
• Norway (English)
• Österreich (Deutsch)
• Portugal (English)
• Sweden (English)
• United Kingdom (English)

Asia Pacific

• Australia (English)
• India (English)
• New Zealand (English)
• 简体中文 Chinese
• 日本 Japanese (日本語)
• 한국 Korean (한국어)

Contact your local office

• Data Science
• Data Analysis
• Data Visualization
• Machine Learning
• Deep Learning
• Computer Vision
• Artificial Intelligence
• AI ML DS Interview Series
• AI ML DS Projects series
• Data Engineering
• Web Scrapping
• Digital Image Processing Tutorial

## Introduction to Digital Image Processing

Digital image processing basics.

• What is a Pixel?

## Image Conversion

• MATLAB | RGB image representation
• How to Convert RGB Image to Binary Image Using MATLAB?
• YIQ Color Model in Computer Graphics
• How to Convert YIQ Image to RGB Image Using MATLAB?
• How to Convert RGB Image to YIQ Image using MATLAB?
• MATLAB | RGB image to grayscale image conversion
• MATLAB | Change the color of background pixels by OTSU Thresholding
• How to Converting RGB Image to HSI Image in MATLAB?
• How to Convert HSI Image to RGB Image in MATLAB?
• How to Partially Colored Gray Image in MATLAB?
• HSV Color Model in Computer Graphics
• How to Color Slicing Using HSV Color Space in MATLAB?

## Image Filtering Techniques

• Spatial Filtering and its Types
• Frequency Domain Filters and its Types
• How to Remove Salt and Pepper Noise from Image Using MATLAB?
• How to Decide Window Size for a Moving Average Filter in MATLAB?
• Noise Models in Digital Image Processing
• How to Apply Median Filter For RGB Image in MATLAB?
• How to Linear Filtering Without Using Imfilter Function in MATLAB?
• Noise addition using in-built Matlab function
• Adaptive Filtering - Local Noise Filter in MATLAB
• Difference between Low pass filter and High pass filter
• MATLAB - Butterworth Lowpass Filter in Image Processing
• MATLAB - Ideal Lowpass Filter in Image Processing
• MATLAB | Converting a Grayscale Image to Binary Image using Thresholding
• Laplacian of Gaussian Filter in MATLAB
• What is Upsampling in MATLAB?
• Upsampling in Frequency Domain in MATLAB
• Convolution Shape (full/same/valid) in MATLAB
• Linear Convolution using C and MATLAB

## Histogram Equalization

• Histogram Equalization in Digital Image Processing
• Histogram Equalization Without Using histeq() Function in MATLAB
• MATLAB | Display histogram of a grayscale Image
• What Color Histogram Equalization in MATLAB?
• Histogram of an Image

## Object Identification and Edge Detection

• Functions in MATLAB
• Program to determine the quadrant of the cartesian plane
• How To Identifying Objects Based On Label in MATLAB?
• What is Image shading in MATLAB?
• Edge detection using in-built function in MATLAB
• Digital Image Processing Algorithms using MATLAB
• MATLAB - Image Edge Detection using Sobel Operator from Scratch
• Image Complement in Matlab
• Image Sharpening Using Laplacian Filter and High Boost Filtering in MATLAB

## PhotoShop Effects in MATLAB

• What is Swirl Effect in MATLAB?
• What is Oil Painting in MATLAB?
• Cone Effect in MATLAB
• What is Glassy Effect in MATLAB?
• What is Tiling Effect in MATLAB?

## Image Geometry, Optical Illusion and Image Transformation

• Matlab program to rotate an image 180 degrees clockwise without using function
• Image Resizing in Matlab
• Nearest-Neighbor Interpolation Algorithm in MATLAB
• Black and White Optical illusion in MATLAB
• MATLAB | Complement colors in a Binary image
• Discrete Cosine Transform (Algorithm and Program)
• 2-D Inverse Cosine Transform in MATLAB
• MATLAB - Intensity Transformation Operations on Images
• Fast Fourier Transformation for polynomial multiplication
• Gray Scale to Pseudo Color Transformation in MATLAB
• Piece-wise Linear Transformation
• Balance Contrast Enhancement Technique in MATLAB

## Morphologiocal Image Processing, Compression and Files

• Boundary Extraction of image using MATLAB
• MATLAB: Connected Component Labeling without Using bwlabel or bwconncomp Functions
• Morphological operations in MATLAB
• Matlab | Erosion of an Image
• Auto Cropping- Based on Labeling the Connected Components using MATLAB
• Run Length Encoding & Decoding in MATLAB
• Lossless Predictive Coding in MATLAB
• Extract bit planes from an Image in Matlab
• How to Read Text File Backwards Using MATLAB?
• MATLAB - Read Words in a File in Reverse Order
• How to Read Image File or Complex Image File in MATLAB?

## Image Coding, Comparison and Texture Features

• Digital Watermarking and its Types
• How To Hide Message or Image Inside An Image In MATLAB?
• How to Match a Template in MATLAB?
• Grey Level Co-occurrence Matrix in MATLAB
• MATLAB - Texture Measures from GLCM

## Difference Between

• Difference Between RGB, CMYK, HSV, and YIQ Color Models
• Difference between Dilation and Erosion

Digital Image Processing means processing digital image by means of a digital computer. We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some useful information.

Digital image processing is the use of algorithms and mathematical models to process and analyze digital images. The goal of digital image processing is to enhance the quality of images, extract meaningful information from images, and automate image-based tasks.

## The basic steps involved in digital image processing are:

• Image acquisition: This involves capturing an image using a digital camera or scanner, or importing an existing image into a computer.
• Image enhancement: This involves improving the visual quality of an image, such as increasing contrast, reducing noise, and removing artifacts.
• Image restoration: This involves removing degradation from an image, such as blurring, noise, and distortion.
• Image segmentation: This involves dividing an image into regions or segments, each of which corresponds to a specific object or feature in the image.
• Image representation and description: This involves representing an image in a way that can be analyzed and manipulated by a computer, and describing the features of an image in a compact and meaningful way.
• Image analysis: This involves using algorithms and mathematical models to extract information from an image, such as recognizing objects, detecting patterns, and quantifying features.
• Image synthesis and compression: This involves generating new images or compressing existing images to reduce storage and transmission requirements.
• Digital image processing is widely used in a variety of applications, including medical imaging, remote sensing, computer vision, and multimedia.

## Image processing mainly include the following steps:

1.Importing the image via image acquisition tools;  2.Analysing and manipulating the image;  3.Output in which result can be altered image or a report which is based on analysing that image.

## What is an image?

An image is defined as a two-dimensional function, F(x,y) , where x and y are spatial coordinates, and the amplitude of F at any pair of coordinates (x,y) is called the intensity of that image at that point. When x,y, and amplitude values of F are finite, we call it a digital image .  In other words, an image can be defined by a two-dimensional array specifically arranged in rows and columns.  Digital Image is composed of a finite number of elements, each of which elements have a particular value at a particular location.These elements are referred to as picture elements,image elements,and pixels .A Pixel is most widely used to denote the elements of a Digital Image.

## Types of an image

• BINARY IMAGE – The binary image as its name suggests, contain only two pixel elements i.e 0 & 1,where 0 refers to black and 1 refers to white. This image is also known as Monochrome.
• BLACK AND WHITE IMAGE – The image which consist of only black and white color is called BLACK AND WHITE IMAGE.
• 8 bit COLOR FORMAT – It is the most famous image format.It has 256 different shades of colors in it and commonly known as Grayscale Image. In this format, 0 stands for Black, and 255 stands for white, and 127 stands for gray.
• 16 bit COLOR FORMAT – It is a color image format. It has 65,536 different colors in it.It is also known as High Color Format. In this format the distribution of color is not as same as Grayscale image.

A 16 bit format is actually divided into three further formats which are Red, Green and Blue. That famous RGB format.

## Image as a Matrix

As we know, images are represented in rows and columns we have the following syntax in which images are represented:

The right side of this equation is digital image by definition. Every element of this matrix is called image element , picture element , or pixel.

## DIGITAL IMAGE REPRESENTATION IN MATLAB:

In MATLAB the start index is from 1 instead of 0. Therefore, f(1,1) = f(0,0).  henceforth the two representation of image are identical, except for the shift in origin.  In MATLAB, matrices are stored in a variable i.e X,x,input_image , and so on. The variables must be a letter as same as other programming languages.

## PHASES OF IMAGE PROCESSING:

1. ACQUISITION – It could be as simple as being given an image which is in digital form. The main work involves:  a) Scaling  b) Color conversion(RGB to Gray or vice-versa)  2. IMAGE ENHANCEMENT – It is amongst the simplest and most appealing in areas of Image Processing it is also used to extract some hidden details from an image and is subjective.  3. IMAGE RESTORATION – It also deals with appealing of an image but it is objective(Restoration is based on mathematical or probabilistic model or image degradation).  4. COLOR IMAGE PROCESSING – It deals with pseudocolor and full color image processing color models are applicable to digital image processing.  5. WAVELETS AND MULTI-RESOLUTION PROCESSING – It is foundation of representing images in various degrees.  6. IMAGE COMPRESSION -It involves in developing some functions to perform this operation. It mainly deals with image size or resolution.  7. MORPHOLOGICAL PROCESSING -It deals with tools for extracting image components that are useful in the representation & description of shape.  8. SEGMENTATION PROCEDURE -It includes partitioning an image into its constituent parts or objects. Autonomous segmentation is the most difficult task in Image Processing.  9. REPRESENTATION & DESCRIPTION -It follows output of segmentation stage, choosing a representation is only the part of solution for transforming raw data into processed data.  10. OBJECT DETECTION AND RECOGNITION -It is a process that assigns a label to an object based on its descriptor.

## OVERLAPPING FIELDS WITH IMAGE PROCESSING

According to block 1 ,if input is an image and we get out image as a output, then it is termed as Digital Image Processing.  According to block 2 ,if input is an image and we get some kind of information or description as a output, then it is termed as Computer Vision.  According to block 3 ,if input is some description or code and we get image as an output, then it is termed as Computer Graphics.  According to block 4 ,if input is description or some keywords or some code and we get description or some keywords as a output,then it is termed as Artificial Intelligence

## Advantages of Digital Image Processing:

• Improved image quality: Digital image processing algorithms can improve the visual quality of images, making them clearer, sharper, and more informative.
• Automated image-based tasks: Digital image processing can automate many image-based tasks, such as object recognition, pattern detection, and measurement.
• Increased efficiency: Digital image processing algorithms can process images much faster than humans, making it possible to analyze large amounts of data in a short amount of time.
• Increased accuracy: Digital image processing algorithms can provide more accurate results than humans, especially for tasks that require precise measurements or quantitative analysis.

## Disadvantages of Digital Image Processing:

• High computational cost: Some digital image processing algorithms are computationally intensive and require significant computational resources.
• Limited interpretability: Some digital image processing algorithms may produce results that are difficult for humans to interpret, especially for complex or sophisticated algorithms.
• Dependence on quality of input: The quality of the output of digital image processing algorithms is highly dependent on the quality of the input images. Poor quality input images can result in poor quality output.
• Limitations of algorithms: Digital image processing algorithms have limitations, such as the difficulty of recognizing objects in cluttered or poorly lit scenes, or the inability to recognize objects with significant deformations or occlusions.
• Dependence on good training data: The performance of many digital image processing algorithms is dependent on the quality of the training data used to develop the algorithms. Poor quality training data can result in poor performance of the algorit

Digital Image Processing (Rafael c. gonzalez)

Reference books:

“Digital Image Processing” by Rafael C. Gonzalez and Richard E. Woods. “Computer Vision: Algorithms and Applications” by Richard Szeliski. “Digital Image Processing Using MATLAB” by Rafael C. Gonzalez, Richard E. Woods, and Steven L. Eddins.

• Image-Processing

## Assignment 5: Image Processing

• Image enhancement
• Image segmentation

## Prerequisites (Before you start)

• Read Section 12 Notes .

Tuesday 18/12/2018

## Joining to Assignment Repository

Refer to this sheet to know your group number :

• Go to the Assignment Page .
• Joint Group or make another group.
• Wait till your repository created.

## Requirements

• Load color RGB image.
• Display the image on the screen.
• Convert the image to a grayscale image and display it.
• Get the histogram of the grayscale image and display it.
• Apply histogram equalization on the grayscale image and display the result.
• Get the histogram of the image after histogram equalization. Are they similar?
• Segment the grayscale image using image thresholding and display the result.
• Report all in a Markdown file.

## Important Notes

• You are allowed to discuss task problems with your mates. But code must be on your own.
• You can get code lines from internet and include them in your own code and you must cite the source.
• Sharing few code lines of your own with your classmates is allowed for identifying and fixing bugs, it is not allowed to see others solution before submitting.
• Report must include summary about your implementation, sample results and issues that you faced and how you fixed it.
• You must mention any kind of contribution of other mates.

## How to ask for help?

You can ask me to review your code, give an advice and fixing bugs. It is so easy, you have just to commit your buggy code and push it to github then mention me in the a comment and I will review the code.

• How to Tackle Digital Image Processing Assignments

## How to Tackle Digital Image Processing Assignments Using MATLAB

Digital Image Processing (DIP) merges technology and creativity, enabling innovative image manipulation and analysis. Whether enhancing image contrast or removing noise, MATLAB offers powerful tools to perform these tasks efficiently. This blog will guide you through solving common DIP assignments, focusing on techniques like histogram equalisation, median filtering, and image arithmetic operations. This guide will provide valuable help with your image processing assignment , ensuring you can effectively use MATLAB to achieve desired results in your projects.

Histogram equalisation enhances image contrast by spreading out intensity values, revealing hidden details. Median filtering effectively removes salt and pepper noise by replacing each pixel with the median value of its neighborhood, preserving edges while smoothing out noise. Image arithmetic operations, such as subtraction and amplification, highlight differences and extract features from images.

By understanding these operations and using MATLAB's built-in functions like imread, imshow, imhist, histeq, imnoise, and medfilt2, you can tackle complex DIP assignments with confidence. Documenting each step and comparing results helps in refining techniques and gaining deeper insights. Continuous practice with different images and noise levels will enhance your proficiency in Digital Image Processing, preparing you for diverse real-world applications.

## Understanding the Basics of Digital Image Processing

Digital Image Processing involves various techniques to enhance, manipulate, and analyze images. Common operations include histogram equalisation, which improves image contrast by redistributing intensity values, making hidden details more visible. Noise reduction using median filters is another key technique, especially effective for removing salt and pepper noise while preserving edges. Basic arithmetic operations on images, such as addition, subtraction, and multiplication, allow for detailed image analysis and feature extraction. These techniques are crucial in a wide range of applications, from medical imaging, where clear and detailed images are vital for accurate diagnosis, to computer vision, which relies on processed images for object recognition, tracking, and analysis. By mastering these operations, one can effectively enhance image quality, reduce noise, and extract meaningful information, making Digital Image Processing an invaluable tool in both scientific research and practical applications.

## Problem 1: Histogram Equalisation

Histogram equalisation is a method to enhance the contrast of an image. By spreading out the most frequent intensity values, this technique makes hidden details more visible.

Step-by-Step Solution:

1. Loading and Displaying the Image: To begin, load the image 'tire.tif' into MATLAB and display it.

When you load and display the image, observe the overall appearance. You might notice that the image has a narrow range of intensity values, making it look dull or washed out.

2. Plotting the Histogram: Next, plot the histogram of the image to analyze its intensity distribution.

The histogram shows how pixel intensities are distributed. A narrow histogram indicates low contrast, where most pixel values are clustered together.

3. Image Inversion: Invert the image using the transformation s=L−rs = L - rs=L−r, where L=255L = 255L=255. This operation flips the intensity values, turning dark areas light and vice versa.

imshow(255 - B);

imhist(255 - B);

Inverting the image and plotting its histogram helps you understand the distribution of pixel values after transformation. The histogram will be a mirrored version of the original.

4. Histogram Equalisation: Perform histogram equalisation to enhance the image contrast.

C = histeq(B);

imhist(C); After equalisation, the histogram should be more spread out, indicating a wider range of intensity values. This operation enhances the contrast, making details more visible.

## Problem 2: Median Filter

Median filtering is a nonlinear process useful for removing noise, especially salt and pepper noise, from images. It replaces each pixel value with the median value of the intensities in its neighborhood.

1. Loading and Displaying the Image: Load the image 'eight.tif' into MATLAB and display it.

Observe the image of the four coins. Notice the clarity and any existing noise.

2. Adding Noise: Introduce salt and pepper noise to the image to simulate a noisy environment.

E = imnoise(D, 'salt & pepper', 0.09);

Salt and pepper noise appears as random black and white pixels scattered throughout the image. This noise type is common in real-world scenarios, such as during transmission errors.

3. Applying Median Filter: Apply a median filter to remove the noise.

F = medfilt2(E);

After applying the median filter, the noise should be significantly reduced. The filtered image will have smoother areas where noise was previously present.

## Problem 3: Image Operation

Image arithmetic operations can reveal interesting details and differences between images. In this problem, we will perform operations on the 'eight.tif' image.

1. Loading and Displaying the Image: Re-use the previously loaded image 'eight.tif'.

2. Applying Median Filter Directly: Apply the median filter directly to the original image.

G = medfilt2(D);

The filtered image should be free of noise, similar to the previous median filter application.

3. Performing Image Arithmetic Operations: Perform and display various arithmetic operations to explore their effects.

• Subtract the filtered image from the original.

imshow(D - G);

This operation highlights the noise and details removed by the median filter. It's useful for understanding the impact of the filtering process.

• Multiply the result by 2 and display.

imshow((D - G) * 2);

Multiplying by 2 amplifies the differences, making it easier to see what was removed during filtering.

• Add the amplified result back to the original image.

imshow((D - G) * 2 + D);

This final step combines the original image with the amplified differences, which can enhance certain features or details.

## General Approach to Similar Assignments

When faced with similar Digital Image Processing assignments, it's crucial to follow a structured approach. Here are some tips to help you succeed:

• Understand the Operations: Familiarize yourself with the core operations like histogram equalisation, median filtering, and image arithmetic. Understand their purposes and effects on images.
• Use MATLAB Functions Effectively: MATLAB offers a suite of powerful functions for image processing. Learn and use functions like imread, imshow, imhist, histeq, imnoise, and medfilt2. These functions simplify complex operations, allowing you to focus on analysis and interpretation.
• Document Your Steps: Keep thorough documentation of each step, including comments in your code and visual outputs (images and histograms). This practice helps you track your progress and makes it easier to explain your process to others.
• Compare Results: Always compare the original image with the processed images. Analyzing the differences helps you understand the impact of each operation and refine your techniques.
• Practice with Variations: Experiment with different images, noise levels, and processing parameters. Practicing with variations enhances your understanding and prepares you for diverse scenarios.

Digital Image Processing (DIP) is a vital skill in today's technology-driven world. By mastering techniques like histogram equalisation, median filtering, and image arithmetic, you can enhance, manipulate, and analyze images effectively. MATLAB provides powerful tools to perform these tasks, allowing you to tackle complex assignments with confidence.

This guide offers a comprehensive approach to DIP assignments, emphasizing the importance of understanding each operation, using MATLAB functions effectively, documenting steps, and comparing results. True mastery, however, comes from continuous practice and exploration. As you work on more assignments, you'll gain a deeper understanding and appreciation for the power and versatility of DIP.

Embark on your journey with a curious mind and a willingness to experiment. The world of image processing is vast and full of opportunities for creativity and innovation. With persistence and curiosity, you can unlock the full potential of DIP, making significant contributions to fields ranging from medical imaging to computer vision. The possibilities are endless, and your ingenuity is the key to discovering new horizons in this exciting domain.

How to tackle digital image processing assignments submit your assignment, attached files.

File Actions

## Computer Graphics, Spring 2005

Assignment 1: image processing, due on sunday, feb 20 at 11:59 pm.

In this assignment you will create a simple image processing program. The operations that you implement will be mostly filters which take an input image, process the image, and produce an output image.

## Getting Started

You should use the following skeleton code ( 1.zip or 1.tar.gz  ) as a starting point for your assignment. We provide you with several files, but you should mainly change image.cpp . main.cpp : Parses the command line arguments, and calls the appropriate image functions. image.[cpp/h] : Image processing. pixel.[cpp/h] : Pixel processing. vector.[cpp/h] : Simple 2D vector class you may find useful. bmp.[cpp/h] : Functions to read and write Windows BMP files. Makefile : A Makefile suitable for UNIX platforms. visual.dsp : Project file suitable for Visual C++ on Windows 2000 platform. You can find test images at /u/cos426/images (on the OIT accounts, such as arizona.princeton.edu ). After you copy the provided files to your directory, the first thing to do is compile the program. If you are developing on a Windows machine, double click on image.dsw and select "build" from the build menu. If you are developing on a UNIX machine, type make . In either case, an executable called image (or image.exe ) will be created.

## How the Program Works

The user interface for this assignment was kept to the simplest possible, so you can concentrate on the image processing issues. The program runs on the command line. It reads an image from the standard input, processes the image using the filters specified by the command line arguments, and writes the resulting image to the standard output. For example, to increase the brightness of the image in.bmp by 10%, and save the result in the image out.bmp , you would type: % image -brightness 0.1 < in.bmp > out.bmp
% image -help
% image -contrast 0.8 -scale 0.5 0.5 < in.bmp > out.bmp
% image -contrast 0.8 | image -scale 0.5 0.5 < in.bmp > out.bmp
% image -contrast 0.8 | image -scale 0.5 0.5 < in.bmp | xv -

## What You Have to Do

The assignment is worth 20 points. The following is a list of features that you may implement (listed roughly from easiest to hardest). The number in front of the feature corresponds to how many points the feature is worth. The features in bold face are required. The other ones are optional. Refer to this web page for more details on the implementation of each filter and example output images. (1/2) Random noise: Add noise to an image. (1/2) Brighten : Individually scale the RGB channels of an image. (1/2) Contrast : Change the contrast of an image. See Graphica Obscura . (1/2) Saturation: Change the saturation of an image. See Graphica Obscura . (1/2) Crop: Extract a subimage specified by two corners. (1/2) Extract Channel: Leave specified channel intact and set all others to zero. (1) Transition : Create 90 images of a transition between yourself and a fellow classmate. These images will be taken on the first and second day of lectures and each student will be assigned a pair of images by the TA over e-mail. We recommend implementing a standard cross-fade first and later (i.e. after completing the required components) replace the fade with the fancier Beier-Neeley morph (see below) if you desire. (1) Random dither: Convert an image to a given number of bits per channel, using a random threshold. (1) Quantize : Change the number of bits per channel of an image, using simple rounding. (1) Composite : Compose one image with a second image, using a third image as a matte. (1) Create a composite image of yourself and a famous person. (2) Blur : Blur an image by convolving it with a Gaussian low-pass filter. (2) Edge detect: Detect edges in an image by convolving it with an edge detection kernel. (2) Ordered dither: Convert an image to a given number of bits per channel, using a 4x4 ordered dithering matrix. (2) Floyd-Steinberg dither : Convert an image to a given number of bits per channel, using dithering with error diffusion. (2) Scale : Scale an image up or down by a real valued factor. (2) Rotate: Rotate an image by a given angle. (2) Fun : Warp an image using a non-linear mapping of your choice (examples are fisheye, sine, bulge, swirl). (3) Morph: Morph two images using the method in Beier & Neeley's " Feature-based Image Metamorphosis ." We also provide a stand-alone utility (only runs under Win32) for selecting line correspondences -- these are input the morph. See this page for details. (up to 3) Nonphotorealism: Implement any non-trivial painterly filter. For inspiration, take a look at the effects available in programs like xv , PhotoShop , and Image Composer (e.g., impressionist, charcoal, stained glass, etc.). The points awarded for this feature will depend on the creativity and difficulty of the filter. At most one such filter will receive points. For any feature that involves resampling (i.e., scale, rotate, "fun," and morph), you have to provide three sampling methods: point sampling, bilinear sampling, and Gaussian sampling. By implementing all the required features, you get 13 points. There are many ways to get more points: implementing the optional features listed above; (1) submitting one or more images for the art contest, (1) submitting a .mpeg or .gif movie animating the results of one or more filters with continuously varying parameters (e.g., use the makemovie command on the SGIs), (2) winning the art contest. For images or movies that you submit, you also have to submit the sequence of commands used to created them, otherwise they will not be considered valid. It is possible to get more than 20 points. However, after 20 points, each point is divided by 2, and after 22 points, each point is divided by 4. If your raw score is 19, your final score will be 19. If the raw score is 23, you'll get 21.25. For a raw score of 26, you'll get 22.

## What to Submit

You should submit one archive (zip or tar file) containing: the complete source code with a Visual C project file; the .bmp images (sources, masks, and final) for the composite feature; the .mpeg movie for the movie feature (optional); the images for the art contest (optional); and a writeup. The writeup should be a HTML document called assignment1.html which may include other documents or pictures. It should be brief, describing what you have implemented, what works and what doesn't, how you created the composite image, how you created the art contest images, and instructions on how to run the fun filters you have implemented. Make sure the source code compiles on the workstations in Friend 017. If it doesn't, you will have to attend to a grading session with a TA, and your grade will suffer. Always remember the late policy and the collaboration policy .
A few hints: Do the simplest filters first! Look at the example pages here and here . There are functions to manipulate pixel components and pixels in image.[cpp/h] . You may find them helpful while writing your filters. There are functions to manipulate 2D vectors in vector.[cpp/h] . You may find them helpful while writing the morphing functions. Send mail to the cos426 staff. Stay tuned for more hints.

## Introducing Apple’s On-Device and Server Foundation Models

At the 2024 Worldwide Developers Conference , we introduced Apple Intelligence, a personal intelligence system integrated deeply into iOS 18, iPadOS 18, and macOS Sequoia.

Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity. The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.

In the following overview, we will detail how two of these models — a ~3 billion parameter on-device language model, and a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers — have been built and adapted to perform specialized tasks efficiently, accurately, and responsibly. These two foundation models are part of a larger family of generative models created by Apple to support users and developers; this includes a coding model to build intelligence into Xcode, as well as a diffusion model to help users express themselves visually, for example, in the Messages app. We look forward to sharing more information soon on this broader set of models.

## Our Focus on Responsible AI Development

Apple Intelligence is designed with our core values at every step and built on a foundation of groundbreaking privacy innovations.

Additionally, we have created a set of Responsible AI principles to guide how we develop AI tools, as well as the models that underpin them:

• Empower users with intelligent tools : We identify areas where AI can be used responsibly to create tools for addressing specific user needs. We respect how our users choose to use these tools to accomplish their goals.
• Represent our users : We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.
• Design with care : We take precautions at every stage of our process, including design, model training, feature development, and quality evaluation to identify how our AI tools may be misused or lead to potential harm. We will continuously and proactively improve our AI tools with the help of user feedback.
• Protect privacy : We protect our users' privacy with powerful on-device processing and groundbreaking infrastructure like Private Cloud Compute. We do not use our users' private personal data or user interactions when training our foundation models.

These principles are reflected throughout the architecture that enables Apple Intelligence, connects features and tools with specialized models, and scans inputs and outputs to provide each feature with the information needed to function responsibly.

In the remainder of this overview, we provide details on decisions such as: how we develop models that are highly capable, fast, and power-efficient; how we approach training these models; how our adapters are fine-tuned for specific user needs; and how we evaluate model performance for both helpfulness and unintended harm.

## Pre-Training

Our foundation models are trained on Apple's AXLearn framework , an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs. We used a combination of data parallelism, tensor parallelism, sequence parallelism, and Fully Sharded Data Parallel (FSDP) to scale training along multiple dimensions such as data, model, and sequence length.

We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.

We never use our users’ private personal data or user interactions when training our foundation models, and we apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet. We also filter profanity and other low-quality content to prevent its inclusion in the training corpus. In addition to filtering, we perform data extraction, deduplication, and the application of a model-based classifier to identify high quality documents.

## Post-Training

We find that data quality is essential to model success, so we utilize a hybrid data strategy in our training pipeline, incorporating both human-annotated and synthetic data, and conduct thorough data curation and filtering procedures. We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality.

## Optimization

In addition to ensuring our generative models are highly capable, we have used a range of innovative techniques to optimize them on-device and on our private cloud for speed and efficiency. We have applied an extensive set of optimizations for both first token and extended token inference performance.

Both the on-device and server models use grouped-query-attention. We use shared input and output vocab embedding tables to reduce memory requirements and inference cost. These shared embedding tensors are mapped without duplications. The on-device model uses a vocab size of 49K, while the server model uses a vocab size of 100K, which includes additional language and technical tokens.

For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models.

Additionally, we use an interactive model latency and power analysis tool, Talaria , to better guide the bit rate selection for each operation. We also utilize activation quantization and embedding quantization, and have developed an approach to enable efficient Key-Value (KV) cache update on our neural engines.

With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.

Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.

By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.

We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.

To facilitate the training of the adapters, we created an efficient infrastructure that allows us to rapidly retrain, test, and deploy adapters when either the base model or the training data gets updated. The adapter parameters are initialized using the accuracy-recovery adapter introduced in the Optimization section.

## Performance and Evaluation

Our focus is on delivering generative models that can enable users to communicate, work, express themselves, and get things done across their Apple products. When benchmarking our models, we focus on human evaluation as we find that these results are highly correlated to user experience in our products. We conducted performance evaluations on both feature-specific adapters and the foundation models.

To illustrate our approach, we look at how we evaluated our adapter for summarization. As product requirements for summaries of emails and notifications differ in subtle but important ways, we fine-tune accuracy-recovery low-rank (LoRA) adapters on top of the palletized model to meet these specific requirements. Our training data is based on synthetic summaries generated from bigger server models, filtered by a rejection sampling strategy that keeps only the high quality summaries.

To evaluate the product-specific summarization, we use a set of 750 responses carefully sampled for each use case. These evaluation datasets emphasize a diverse set of inputs that our product features are likely to face in production, and include a stratified mixture of single and stacked documents of varying content types and lengths. As product features, it was important to evaluate performance against datasets that are representative of real use cases. We find that our models with adapters generate better summaries than a comparable model.

As part of responsible development, we identified and evaluated specific risks inherent to summarization. For example, summaries occasionally remove important nuance or other details in ways that are undesirable. However, we found that the summarization adapter did not amplify sensitive content in over 99% of targeted adversarial examples. We continue to adversarially probe to identify unknown harms and expand our evaluations to help guide further improvements.

In addition to evaluating feature specific performance powered by foundation models and adapters, we evaluate both the on-device and server-based models’ general capabilities. We utilize a comprehensive evaluation set of real-world prompts to test the general model capabilities. These prompts are diverse across different difficulty levels and cover major categories such as brainstorming, classification, closed question answering, coding, extraction, mathematical reasoning, open question answering, rewriting, safety, summarization, and writing.

We compare our models with both open-source models (Phi-3, Gemma, Mistral, DBRX) and commercial models of comparable size (GPT-3.5-Turbo, GPT-4-Turbo) 1 . We find that our models are preferred by human graders over most comparable competitor models. On this benchmark, our on-device model, with ~3B parameters, outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B. Our server model compares favorably to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo while being highly efficient.

We use a set of diverse adversarial prompts to test the model performance on harmful content, sensitive topics, and factuality. We measure the violation rates of each model as evaluated by human graders on this evaluation set, with a lower number being desirable. Both the on-device and server models are robust when faced with adversarial prompts, achieving violation rates lower than open-source and commercial models.

Our models are preferred by human graders as safe and helpful over competitor models for these prompts. However, considering the broad capabilities of large language models, we understand the limitation of our safety benchmark. We are actively conducting both manual and automatic red-teaming with internal and external teams to continue evaluating our models' safety.

To further evaluate our models, we use the Instruction-Following Eval (IFEval) benchmark to compare their instruction-following capabilities with models of comparable size. The results suggest that both our on-device and server model follow detailed instructions better than the open-source and commercial models of comparable size.

We evaluate our models’ writing ability on our internal summarization and composition benchmarks, consisting of a variety of writing instructions. These results do not refer to our feature-specific adapter for summarization (seen in Figure 3 ), nor do we have an adapter focused on composition.

The Apple foundation models and adapters introduced at WWDC24 underlie Apple Intelligence, the new personal intelligence system that is integrated deeply into iPhone, iPad, and Mac, and enables powerful capabilities across language, images, actions, and personal context. Our models have been created with the purpose of helping users do everyday activities across their Apple products, and developed responsibly at every stage and guided by Apple’s core values. We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.

[1] We compared against the following model versions: gpt-3.5-turbo-0125, gpt-4-0125-preview, Phi-3-mini-4k-instruct, Mistral-7B-Instruct-v0.2, Mixtral-8x22B-Instruct-v0.1, Gemma-1.1-2B, and Gemma-1.1-7B. The open-source and Apple models are evaluated in bfloat16 precision.

Advancing speech accessibility with personal voice.

A voice replicator is a powerful tool for people at risk of losing their ability to speak, including those with a recent diagnosis of amyotrophic lateral sclerosis (ALS) or other conditions that can progressively impact speaking ability. First introduced in May 2023 and made available on iOS 17 in September 2023, Personal Voice is a tool that creates a synthesized voice for such users to speak in FaceTime, phone calls, assistive communication apps, and in-person conversations.

## Apple Natural Language Understanding Workshop 2023

Earlier this year, Apple hosted the Natural Language Understanding workshop. This two-day hybrid event brought together Apple and members of the academic research community for talks and discussions on the state of the art in natural language understanding.

In this post, we share highlights from workshop discussions and recordings of select workshop talks.

## Discover opportunities in Machine Learning.

Our research in machine learning breaks new ground every day.

Work with us

1. Assignment 1: Image Processing

The user interface for this assignment was kept to the simplest possible, so you can concentrate on the image processing issues. The program runs on the command line. It reads an image from the standard input, processes the image using the filters specified by the command line arguments, and writes the resulting image to the standard output.

2. Introduction to Computer Vision and Image Processing

As part of this course, you will utilize Python, Pillow, and OpenCV for basic image processing and perform image classification and object detection. This is a hands-on course and involves several labs and exercises. Labs will combine Jupyter Labs and Computer Vision Learning Studio (CV Studio), a free learning tool for computer vision.

3. Homework Assignments for ECE 5273 Digital Image Processing

Homework Assignments for ECE 5273 Digital Image Processing . Assignments: Solutions: HW 1: PDF: PDF: C: M-file: OLD 2: PDF: PDF: C: M-file: HW 2: PDF

4. Introduction to Image Processing Course by MathWorks

There are 4 modules in this course. In this introduction to image processing, you'll take your first steps in accessing and adjusting digital images for analysis and processing. You will load, save, and adjust image size and orientation while also understanding how digital images are recognized. You will then perform basic segmentation and ...

5. Mastering Image Processing Assignments: A Comprehensive Guide ...

Mastering image processing assignments requires a combination of theoretical understanding and practical implementation. By choosing the right technique, such as Histogram Equalization, you can ...

6. ECE 468: Digital Image Processing

Digital image processing is ubiquitous, with applications including television, tomography, photography, printing, robot perception, and remote sensing. ECE468 is an introductory course to the fundamentals of digital image processing. ... We will have weekly homework assignments. They will involve either problem solving or mini-project ...

7. Image Processing for Engineering and Science Specialization

Image Segmentation, Filtering, and Region Analysis. Course 2 • 10 hours • 4.8 (30 ratings) Use segmentation to detect and analyze regions of interest in images & video. Apply spatial filters and morphological operators to improve segmentation & remove noise. Segment & analyze 3D images, such as MRI images of a brain.

8. PDF Digital Image Processing

Al Bovik (ed.), „The Essential Guide to Image Processing," Academic Press, 2009. Journals/Conference Proceedings IEEE Transactions on Image Processing IEEE International Conference on Image Processing (ICIP) IEEE Computer Vision and Pattern Recognition (CVPR) ....

9. PDF Introducing Assignment 1: Image Processing

Introducing Assignment 1: Image Processing. Setup Same as in A0: • Run "python3 -m http.server" (or similar) inside the assignment directory ... -create an intermediate image with interpolated correspondences (alpha) -warp the background image to the intermediate image

10. Digital Image Processing

This repository contains my assignment solutions for the Digital Image Processing course (M2608.001000_001) offered by Seoul National University (Fall 2020). The algorithms for the assignments are implemented using MATLAB.

11. A Comprehensive Guide to Image Processing: Fundamentals

Image Processing Part 1. A scene, a view we see with our eyes, is actually a continuous signal obtained with electromagnetic energy spectra. The value of this signal perceived by the receptors in our eye is basically determined by two main factors: the amount of light that falls into the environment and the amount of light reflected back from the object into our eyes.

12. CS474/674: Image Processing and Interpretation

Digital image processing is among the fastest growing computer technologies. This course will provide an introduction to the theory and applications of digital image processing. ... Pattern.pgm and Image.pgm; Pogramming Assignment 3 (Due date (extended): 11/20/2023) powerpoint file fft.c, documentation, Rect_128.txt; Programming Assignment 4 ...

13. GitHub

This repository contains all the Assignments done as a part of 'Digital Image Processing (DIP)' Course instructed by Dr. Ravi Kiran during the Spring 2020 Semester at IIIT-H. The goal of this course was to make oneself comfortable with how digital images are stored and processed on the computer and also get a feel of how an image would look ...

14. EE168 Introduction to Digital Image Processing

The lab exercises will introduce various image processing topics, which will be examined in more detail in the homework assignments. Topics will include representation of two-dimensional data, time and frequency domain representations, filtering and enhancement, the Fourier transform, convolution, interpolation, color images, and techniques for ...

15. Image Processing: Techniques, Types, & Applications [2023]

Task 1: Image Enhancement. One of the most common image processing tasks is an image enhancement, or improving the quality of an image. It has crucial applications in Computer Vision tasks, Remote Sensing, and surveillance. One common approach is adjusting the image's contrast and brightness.

16. Digital Image Processing

Digital Image Processing - Assignment 1 Version 2.0.1.0 (1.32 MB) by Daniel Zimmerman This repository shows a simple MATLAB Live Script to load and display an image, apply a filter, then display and save the new image.

17. Learn Essential Image Processing Skills

Image Processing is the manipulation or modification of a digitized image, especially in order to enhance its quality. It involves techniques and algorithms designed to analyze, enhance, and optimize an image's characteristics. ... Course content is delivered via video lectures, readings, quizzes, and other types of assignments. ...

18. Digital Image Processing Basics

The basic steps involved in digital image processing are: Image acquisition: This involves capturing an image using a digital camera or scanner, or importing an existing image into a computer. Image enhancement: This involves improving the visual quality of an image, such as increasing contrast, reducing noise, and removing artifacts.

19. Assignment 1

In this assignment you will create a simple photo editor that supports a rich and diverse set of image processing operations. The purpose of this assignment is to give you hands-on experience with implementing many of the filters discussed in lectures and precepts.

20. Assignment 5: Image Processing

Go to the Assignment Page. Joint Group or make another group. Wait till your repository created. Open the link and follow instructions to setup your repository. Requirements. Load color RGB image. Display the image on the screen. Convert the image to a grayscale image and display it. Get the histogram of the grayscale image and display it.

21. How to Tackle Digital Image Processing Assignments

1. Loading and Displaying the Image: To begin, load the image 'tire.tif' into MATLAB and display it. B = imread ('tire.tif'); imshow (B); When you load and display the image, observe the overall appearance. You might notice that the image has a narrow range of intensity values, making it look dull or washed out. 2.

22. Assignment 1: Image Processing

The user interface for this assignment was kept to the simplest possible, so you can concentrate on the image processing issues. The program runs on the command line. It reads an image from the standard input, processes the image using the filters specified by the command line arguments, and writes the resulting image to the standard output.

23. PDF Assignment 2 Solutions ECE513

Assignment 2 Solutions ECE513 - Digital Image Processing Spring 2011 Problem 1 . Problem 2 . Problem 3 Problem 4 . Problem 5 . COS — S) Find MSE —x —x dc + dc + —x dc + 32 49 32 —x dc 12 768 768 32 25 32 768 768 25 49 12 1 —x —x —x dc dc 12 Find PDF • P(t2 < x < t3) ...

24. Introducing Apple's On-Device and Server Foundation Models

Figure 1: Modeling overview for the Apple foundation models. Pre-Training. Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023.It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.

25. Full article: Destination image analysis and marketing strategies in

2.1. Destination image research in a cross-cultural context. Destination image, recognized as a multifaceted construct, encompasses cognitive, affective, and conative dimensions (Gartner, Citation 1994).The cognitive image pertains to tourists' perceptions of destination attributes, the affective image encapsulates personal attitudes, and the conative image reflects travel intentions (Pike ...