AI prototyping
Developing applications leveraging deep learning may require a significant investment of time and resources.
One effective approach for reducing costs is rapid prototyping combined with concept visualization during the product design phase.
This can help product teams to discover conceptual issues early in development and maximize value for customers.
Why rapidly prototype AI applications?
My recommendation is developing simple Proof of Concept (PoC) applications that serve as interactive presentations during product design discussions.
These PoCs can enhance communication in multidisciplinary teams by providing a tangible representation of complex product concepts (ensuring that everyone is aligned and properly informed).
Benefits of this approach include:
Enhanced user experience:
Identifying potential usability challenges and addressing them proactively.
Accelerated time-to-market:
By validating ideas and features early, we can expedite the overall product release timeline.
Cost savings by:
Early detection of design flaws prevents costly modifications later in development.
Mitigation of design risks.
Faster development cycles leading to quicker deployment.
Let’s explore this approach with a practical example: Consider concept visualization of a computer vision application that classifies traffic objects as an essential component of autonomous driving system.
Exercise 1/3 (Wireframe)
Examine the following wireframe and try to envision:
How should this application operate?
How is it addressing customer needs?
What are the critical risks and challenges?
Which object categories should we incorporate?
Traffic object classifier (prototype)
Traffic object classifier (prototype)
Exercise 2/3 (Prototype)
Play with the prototype and consider:
How should this application operate?
How is it addressing customer needs?
What are the critical risks and challenges?
Which object categories should we incorporate?
Exercise 3/3 (Evaluation)
Compare the results from both exercises - which case yielded:
More insightful answers?
Easier flow of ideas?
Better clarity on how the app should (or should not) behave?
Chances are that some initial ideas have already been implemented in the prototype and experimenting with the interactive PoC has likely lead to discovering new insights on what needs to be addressed in the final application.
Framework for the application design with rapid prototyping:
Framework for the application design with rapid prototyping:
Product design (iteration cycle):
Concept (value proposition)
Product design discussion
UX/UI (sketches + wireframes)
Product development discussion
Prototype (internal PoC)
Testing & feedback sessions
Solution adjustment decisions
Approach tailored
to project complexity:
One large product
Product design:
Concept
UX/UI
Prototype
Application development:
SW development
Static + dynamic tests
Adjustments & fine-tuning
Production:
Deploying application
Monitoring & maintaining
Updating (new features)
with "many small products" iteration cycle.
Many small products
Concept
UX/UI
Prototype
Development
Testing
Fine-tuning
Production
Monitoring & maintaining
How to develop AI prototype?
How to develop AI prototype?
Timespan:
The entire prototype application (including deployment to production) took approximately 3 days to develop. Once the pipeline is established for a specific task (such as image recognition), similar prototypes can be created in 4-8 hours.
Technology:
I have selected the following technology stack for rapid experimentation and PoC accessibility:
Environment: Jupyter notebook, Google Colab.
Python libraries: PyTorch, Fastai.
Model architecture: Pre-trained convolutional neural network (ResNet18 with the output layer re-trained with the custom dataset).
Training:
This is a quick & dirty approach suitable for developing initial prototype to early discover risks and challenges inherent to the particular data/task:
Collecting images via automated web search and labeling based on the search query.
Sorting images based on the prediction loss and probability.
Re-labeling misclassified items.
Re-training the CNN and exporting the pickled model.
Production:
The prototype is publicly accessible via user-friendly web interface:
Application for inference with an image example for each category.
Web interface was built with Gradio for ease of use.
Prototype is hosted on the Hugging Face Spaces.
How to discover and address technology limitations:
How to discover and address technology limitations:
Input categories
Empty city (clear road without obstacles).
Pedestrian walking (semi-predictable movement).
Animal walking (unpredictable movement).
Car (rapidly moving object).
Speed limit (by specific type of roadsign).
Roadsign (with command or information).
Traffic lights (specific static objects).
Data cleaning & model fine-tuning
Collecting images via automated search:
Labels are based on the search query ("Photo road speed limit" puts images into the "Speed limit" category).
That results in mostly garbage/noisy data, which we can use to train our model for determining problematic areas.
Fig.1: Sample batch of input images
Sorting images by prediction loss and probability helps us to:
Quickly discover the most challenging data.
Determine specific requirements for an effective dataset.
Consider alternative approaches (such as image segmentation and categorizing objects-of-interest).
Re-labeling misclassified items:
Helps us to train our model on features relevant for each category.
This should be reflected in improved metrics (accuracy or error rate).
Fig.2: Re-labeling problematic images dramatically improves model's ability to correctly categorize validation images.
Re-training the model:
Returns surprisingly decent predictions on the validation dataset (considering the limited size and relatively poor quality of the input data).
Fig.3: Comparing prediction metrics for the pre-trained ResNet18, the model re-trained on collected images, and the final model re-trained on re-labeled images.
Prototype limitations (overview)
To effectively use our prototype, we must distinguish between two types of limitations:
a) Simplification limitations:
Arise due to limited time for rapid prototyping (aspects of the solution that we deliberately neglected).
They should NOT obscure the conceptual design of the application.
Examples include low accuracy, a crude interface, and internal use only.
We can accept these limitations if the prototype still provides a tangible representation of product concepts and functional design.
b) Technology-intrinsic limitations:
Stem from the task nature and input data properties (rapid analysis of multiple images in a short time, crucial objects may occupy only a small portion of an image).
Examples include object misclassification and privacy violation.
The prototype should help us discover these limitations early (possibly as part of a feasibility study).
We have to address these limits for ensuring that our application will be functioning properly with risks mitigated to an acceptable level.
In some cases, discovered limitations may lead to a well-founded decision to halt the project before investing significantly into the software development (if estimated costs exceed desirable RoI or solution poses an unacceptable compliance risk).
Addressing limitations (walkthrough)
Let's focus on insights from exploring technology-intrinsic limitations of the Traffic object classifier prototype. We will address them appropriately for the practical use of a final application.
PRIORITIZATION:
Fig.4: Misclassification of a traffic lights image.
Our model barely recognizes the presence of traffic lights within the image (7% probability), which is wrongly classified as an empty city. We can address this error by:
Sub-categorization: Roughly categorizing an entire image would result in an inconveniently large number of necessary sub-categories (Right of way/Traffic lights/Signal arrows).
Segmentation: Instead of sub-categorizing, we can pre-process images using segmentation to identify relevant objects-of-interest ('Traffic lights', 'Signal arrow', 'Give right of way', 'Forward of right', 'Empty road').
Prioritization: Establishing an order of categories where prioritized objects are processed first (Pedestrian > Animal > Car > Traffic lights > Roadsign (command) > Speed limit > Roadsign (information) > Empty city).
Metrics: Given the high cost of any false-negative detection, it is better to have more false-positive detections of prioritized objects (street lamps misclassified as traffic lights) than unrecognizing true-positive object even once, so we will prefer performance metrics reflecting this tradeoff (higher recall at the cost of lower precision).
SYSTEM ROBUSTNESS:
Fig.5: Misclassification of a walking pedestrian image.
Pedestrian cross-walking the road was not prioritized in this example (3% probability), which would have fatal consequences! For mitigating risk of such severe error, we can consider implementing multiple measures:
Set strict criteria for high-risk aspects that the application has to meet before being approved for production deployment (pedestrian recognized AND prioritized with 99.999% accuracy).
Define requirements for a comprehensive dataset with appropriately labeled data, including many variations with even distribution (people from multiple countries, wearing various clothes, under different lighting conditions).
This pre-requisite enables meeting the strict criteria.
Aggregating multi-modal inputs entering a decision algorithm can make system more robust and resilient from a single point of failure like sensor malfunction or object misclassification (multiple images from different cameras, LIDAR, infrared camera or thermal scanner, GPS/Galileo, mobile/wireless device detector).
Conclusion:
Conclusion:
The proposed rapid prototyping framework helped us to identify and properly address critical challenges for developing the traffic object classifier algorithm as an essential component of a robust autonomous driving system. Key considerations for designing such production-ready application would include:
Balancing object categories and their prioritization.
Defining requirements on a comprehensive dataset, acquiring the data and validating its suitability for model training.
Selecting appropriate metrics for assessing model performance and setting strict criteria for high-risk aspects to achieve business goals without introducing an unacceptable risk.
Experimenting with suitable model designs to find optimal technical solution:
Image segmentation + multi-target models.
Selecting model architecture & fine-tuning.
Aggregating multi-modal inputs for developing robust ensemble learning model.
Summary:
Prototyping deep learning applications (AI) can help product teams to effectively find technical and conceptual challenges early in the product design process for achieving business goals while mitigating risks.
Addressing these challenges enables teams to make informed decisions regarding the optimal solution design before significantly investing into software development of the final product.
The outlined framework for application design with rapid prototyping can be readily adopted by agile development teams to significantly improve efficacy of product discussions by facilitating understanding of complex technical challenges to non-technically oriented team members.