Fatih Porikli

Qualcomm AI Research is hiring AI, ML, Computer Vision researchers in San Diego, Amsterdam, Toronto, Korea, Vietnam

Areas include: generative AI, LLMs, agentic, model efficiency, deep learning, perception, and computer vision!

Applications include: auto, mobile, XR, camera, robotics, IoT, cloud, more

Contact:

fporikliqualcomm.com

Bio:

Fatih Porikli is an IEEE Fellow and Vice President of Technology at Qualcomm AI Research. With a highly regarded career spanning both academia and industry, he has held leadership positions that have driven pioneering advancements in generative AI and technology. Dr. Porikli was a tenured professor at the Research School of Engineering at the Australian National University (ANU), where he made foundational contributions to machine learning. In industry, he served as Vice President of the San Diego Device Hardware Competency Center and Chief Scientist for Autonomous Vehicles at Futurewei. Prior to these roles, he led the Computer Vision Research Group at NICTA (now part of CSIRO). Dr. Porikli also served as a Distinguished Research Scientist at Mitsubishi Electric Research Laboratories (MERL), where he played a key role in advancing the field of computer vision. Earlier in his career, he developed satellite imaging solutions at HRL Laboratories and 3D display systems at AT&T Research Labs. Dr. Porikli earned his PhD from New York University in 2002.

He has made significant contributions to generative AI, machine learning, computer vision, and perception, with expertise in large language models (LLMs), multi-modal content generation (3D, visual, and audio), and representation learning. His work spans key areas including deep learning, manifold learning, online and on-device learning, sparse optimization, multimedia processing, data analytics, detection, tracking, recognition, and image enhancement. These innovations have driven advancements in a wide range of commercial applications, including autonomous vehicles, mobile devices, AR/XR, consumer electronics, video surveillance, biometrics, radiotherapy, industrial inspection, satellite systems, and defense.

Dr. Porikli received the prestigious R&D 100 Scientist of the Year Award in 2006 and has won eight best paper awards at top conferences. He has also been honored with eight professional prizes. With over 360 coauthored publications and more than 200 patents, he is a prolific contributor to the field. Additionally, Dr. Porikli has coedited two books.

As a pioneer in the AI and computer vision research community, he has served as Associate Editor for several prestigious journals, including IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Multimedia, IEEE Signal Processing Magazine, SIAM Imaging Sciences, Springer Journal on Machine Vision Applications, EURASIP Journal on Image and Video Processing, and Springer Journal on Real-time Image and Video Processing. He has held leadership roles at major computer vision and AI conferences, including serving as General Chair for application-driven events and as Area Chair for top-tier conferences such as CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, AAAI, BMVC, ACCV, and ICASSP. Additionally, he has served on multiple NSF panels, co-organized over 20 workshops, and delivered keynotes at numerous events. He supervised more than 60 PhD students.

His h-index: 93, i10-index: 329, citation count: 45,000+. See my google scholar page

Industrial Research Series 2024 (start with part-1!)

Industrial Research for AI and Computer Vision Practitioners - Part 5

Industrial Research for AI and Computer Vision Practitioners - Part 4

Industrial Research for AI and Computer Vision Practitioners - Part 3

Industrial Research for AI and Computer Vision Practitioners - Part 2

Industrial Research for AI and Computer Vision Practitioners - Part 1

Podcast about our 16 papers at CVPR 2024

With Sam Charrington, we discuss advancements across generative AI and traditional computer vision with a focus on enhancing training and inference efficiency for mobile and edge devices, including efficient diffusion models for text-to-image generation, grounded reasoning in videos, 360° image generation for video portrait relighting, unique video-language model for situated interactions, visual reasoning model and benchmark for interpreting complex mathematical plots, and more! Plus a peek at our demos that we are presenting at the conference, including multi-modal vision-language models (LLaVA) and parameter-efficient fine tuning (LoRA) on mobile devices.

Listen or watch the full episode here

My interview at IEEE The Institute published June 2016 ...

Here is the extended version:

What interested you in computer vision?

The fact that we sense, perceive and understand the world, and even our dreams, by seeing it through our eyes has always fascinated me. When you consider, they are simply a mixture of blood, tissue, and biochemical molecules, yet they are incredible windows opening the whole universe to our being. What was the magical essence of these small organs that made them so precious that no person in right mind would exchange them for all treasures in the world? What was so addictive about seeing that we fall afraid when the light is out? How do we do it so effortlessly? I guess these and similar questions have reverberated so strongly in my moments of soul-searching for what I should be doing that eventually I channeled myself into computer vision.

You grew up in Turkey. How are robotics and AI portrayed and thought of in Turkish society?

I finished my undergraduate study in Turkey and then moved to the USA. I grew up in an eastern city that is so cold that the rumor is when a cat jumps through a fence in winter it remains frozen until the next spring. So, many months, like many others in my town, I also ended up staying indoors and finding the escape in my books. Most of those books were in science fiction by Stanislav Lem, Isaac Asimov, and Arthur Clarke, and portraying futuristic depictions of space travel, planets, and societies where intelligent machines are integral, and in many cases, disruptive part of civilizations. So, for me, AI and robotics were very familiar and natural. It should not be a surprise that, in Turkish culture, technical innovation has always been regarded as one of the best things you can accomplish, at a degree that electrical engineering and computer science were (and still are) the top choices for the majority of prospective university students.

You have made many contributions to object detection, image processing and video analytics. What are the breakthroughs you are most proud of?

Research is a collaborative effort. Even though it is sometimes tempting to label or get labeled that way, I find it unfair to designate a single person as the only leader in almost any technical research area. So, this is also my approach in evaluation of my work. Without being inspired and supported by a league of genius colleagues and Ph.D. students, I would not be able to make many contributions. Judging by people's feedback, I can say that my work in integral histograms and region covariance descriptors are two breakthroughs that had a significant impact on computer vision in the past. Yet, I am most proud of raising amazing students.

Why is computer vision important?

Computer vision is the discipline concerned with making the best use of visual data and understanding the world through signals, images, and videos. It provides enabling technologies for innovative applications and services in a wide spectrum of industry sectors including medical, information and telecommunication, infotainment, automotive, transportation, environment, agribusiness, surveillance, security, defense and aerospace, and manufacturing. Anywhere we can squeeze a camera; on a UAV or inside the human body, computer vision has the potential to provide intelligence and automation. It has a disruptive nature and constantly seeks for creating new services, values, and models. This means it promises a high return on investment that comes with high risk, i.e. longer research and development times.

What do you think are the best examples of computer vision in use today?

I think many people would notice fingerprint recognition and face detection. When you touch your mobile phone button, it automatically scans and recognizes your fingerprint. Your camera automatically detects faces in the frame to focus on and adjust several camera functions. These are two examples embedded in our daily life as the tip of an iceberg of computer vision applications. The mass is below the surface and running behind the curtain. For instance, using sophisticated computer vision algorithms, driver assistance, and collision avoidance systems provide drivers and pedestrians with life-saving safety features. Using computer vision solutions, depth cameras enable marvelous gaming experience. Using computer vision, industrial automation systems literally separate the wheat (and any grain) from the chaff. Using visual information, optical mouse lets us interact with our computers.

Deep learning is making huge waves these days. How do you define deep learning?

Yes, it is making a big impact almost at the scale and disturbance levels of a tsunami. I define it as a gentle awakening; an awakening to new possibilities. Let me first briefly describe what it is, then continue our talk about why it has the capacity of changing the research paradigm. Deep learning is a spectrum of data-driven modeling and inference solutions that use multiple processing layers composed of neural receptive fields and nonlinear transformations such as pooling. Each successive layer in the network uses the output from the previous layer as input. In this way, the network progressively derives higher level features from lower level features to form a hierarchical representation. For classification tasks, the network is ordinarily given target labels for training samples. During the training process, the network parameters are updated iteratively by propagating the difference between the target label and the network's response (this difference is also called as residual or error) to minimize the difference. For representation power, such learning networks are regularly designed to be deep with a high number of layers containing millions of neurons and coefficients.

How could deep learning boost computer vision? What could it do that other machine learning systems couldn't do?

It might worth to take a look at some of the past research activity in computer vision before the era of deep learning. For classification tasks, we all recognized the importance of designing competent feature descriptors and the necessity of having part-based object models in particular when the objects are articulated bodies, occluded, or in different poses. Another challenge that many computer vision methods attempted was the incorporation of local and global models into a comprehensive and elegant structure. These descriptors and part-based models are often hand-crafted and dictated by heuristic considerations making many assumptions on what is important and what is dispensable about the data. Incorporation of local and global models is handicapped due to the application and user-defined heuristics as well. Now, there are many compelling advantages of such deep learning networks. One is that the network learns itself what is important from the data without having a human to define them. Many deep learning networks for computer vision have convolutional layers that contain simple yet multiple filters with unknown coefficients. These coefficients are learned from data for the specific task in hand. To put simply, filters -and features- are automatically learned to optimize performance for the given task. No more limitation to human designed features. Another advantage is that higher convolutional layers of deep learning networks have larger receptive neural fields thanks to the pooling operations that combine multiple filter response images after a convolutional layer into a few numbers. For instance, pooling can be done by choosing the maximum value within a 2-by-2 cell, which effectively halves the size of the output. These output images are provided to next convolutional layer. Even if the size of the filters in the next convolution layer is same with the current layer filters, the receptive field of next layer filters will be twice as large on the image entering to the current layer. By doing so, convolutional neural networks learn the discriminative and informative parts of the target class, for example for a human face, the parts would be eyes, nose, etc. In other words, object-part models are learned and located automatically. Therefore, convolutional neural networks eliminate the dependency on user defined sub-optimal object-part models. Yet another significant advantage of such networks is their capacity to be able to incorporate nonlinearity in its models, which is critical in complicated classification tasks where data is not linearly separable. Still another important advantage is that such networks are very applicable to modern computational architecture since they are task-parallelizable and can run in a forward processing fashion without requiring search operations and recurrent loops. Two recent developments made the deep learning applications possible for computer vision. One would be the emergence of large annotated image datasets that are necessary to learn the often millions many coefficients of deep learning networks. The other is the increasing computational capability, in particular, parallel processing platforms GPUs that enabled the training task to be executed in reasonable timeframes, which otherwise would have taken years.

You organized the IEEE deepvision workshop in 2015. Did it go as you had expected?

No, it exceeded our expectations. In its first edition, the IEEE DeepVision workshop 2014 gathered over 150 attendees and more than 9000 visits to the web page, motivating us to organize again in 2015. In that edition, we could not fit in the large conference room; people were sitting on the aisle, on the tables, and standing any place they could squeeze (even though security personnel repeatedly warned many attendees not to block fire exists). This is on top of the overflow room with additional displays. We believe more than 400 people listened to the presentations last time. So, there is great interest and demand for deep learning. This year we are afraid of the crowd.

Academic experts in computer vision, deep learning and AI in general seem to be in big demand in the industry lately. Why did you choose to go from industry to academia? Would you consider going back?

It would be correct if I say that I have one foot in the industry and one foot in academia. And the center of mass can shift anytime.

What is the future of computer vision? What are some of the biggest challenges that need to be solved?

We cannot deny there has been significant progress in computer vision last couple of years, in particular in object recognition. However, our journey is still far away from being complete. There are several potential disruptions and raising uncertainties ahead around the way computer vision impacts the research and industrial sectors. For instance, augmented and remote reality applications are almost on our doorsteps. How will computer vision facilitate further such technology and what role computer vision play and be used in the possible culture change afterward? Internet of things (IoT) is boiling in the labs and soon will erupt everywhere. What will intelligent cameras connected through IoT enable? Do we have competent technologies to let such devices to automatically analyze and cross-link visual data? We have amazingly capable robots, yet painfully sophisticated and cumbersome ways of programming robots. How can computer vision change the way we are programming, interacting with, and teaching to robots what they need to do? Can robots simply learn by observing humans? How will crowd/cloud computing change the way we are doing computer vision today? 1000 hours of video data are being generated every minute. How will computer vision react not to let us drown in visual data? The list can be expanded further. Each of these questions brings its own technical computer vision challenges. I believe that there will be something like computer vision app store in the near future as well. Considering this and above trends, the biggest challenges that need to be solved, in my opinion, would be around ease of use and system engineering for the component computer vision solutions to leverage them easily into high-level applications.

Any words of wisdom for young engineers who want to work in computer vision?

The best applications of computer visions are conceived and originated within the interdisciplinary collaboration environments. So, think outside the box.

I answered questions at "Experts Answer Your Questions About Artificial Intelligence" of IEEE The Institute, 30 June 2016 ...

What kind of tasks or problems is AI not good at solving? Will that change in the future?

So far, AI has been successful only for specific tasks that are precisely defined, carefully formulated, and rigorously modeled by humans. Not AI but we humans possess the unique ability to conceptualize and distinguish valid solutions to problems. We can understand cause and effect. We can identify relationships, and isolate the problem from the context. We can think outside the box (model and data). We can solve problems by developing ideas that may not be immediately evident and obtainable by using traditional step-by-step logic. We can invent. Such creative, critical, lateral, and holistic thinking capabilities are currently not available to AI systems. For instance, AI algorithms are not good at (all) discovering and defining a problem. There is no AI mathematician. AI tools are designed to give answers in the same way that abides their training datasets without questioning. However, part of the intelligence is to be able to notice a problem without someone first describes what the problem is.

Besides deep learning, what will be the next big developments or research areas in AI?

Current AI research mainly provides compartmentalized solutions for specific tasks. Next big wave of research may develop the glue for those components to build smart systems effortlessly. In other words, the focus may shift towards how to design and manage complex systems to deal with intelligent workflow processes, coordination methods, and uncertainty management tools. I guess I am implying some form of systems engineering here. Also, most of the current research concentrated around centralized solutions. For many reasons from computational scaling up to ubiquity to persistency, distributed AI solutions may attract increased attention. Regarding areas, computer vision is one of the most promising disciplines where the next big development may likely to come. That is one reason why I am doing computer vision.

Humans experience mental growth also referred to as maturity. Will AI systems be able to do the same, or will they be more static than humans are?

Humans also experience dementia and mental breakdown. Not being handicapped with our inherent limitations, AI systems may perhaps constantly be improved with ever increasing big corpuses of training data to excel further in the specific tasks for which they are designed.

Let's assume we have developed an AI whose behavior is indistinguishable from that of a human. What rights, if any, do we grant them? Do they have workers' rights, for example?

No, why should there be such rights? Here, we may like to distinguish fact from fantasy. We are not talking about I-Robot like humanoids that have animal-like emotions. We are dealing with some software running on some silicon processors. Rights are contained in the realm of moral consideration and granted to subjects-of-a-life who are endowed with perception, emotion, and real intelligence. It would be unfair to limit life into a mere mimicry of machines. In the end, an AI system, however complex, is still a device such as a thermostat responding to sensors or a character in SimCity responding to mouse clicks. Well, they have the right to remain silent, I guess.

When will tools like Siri and Alexis be able to communicate with us as flawlessly as we do with other people? Will they be able to then also communicate with one another? And will they be able to be intuitive by providing us with information we need without us having to ask?

It may depend on the mode of communication as we do not only communicate verbally but also with the language of body, eyes, gestures, touch, and overall appearance. If the mode is email, the time AI portals will communicate to us flawlessly are unfortunately not far in the future. Beware of AI spammers! On the communication with one another, we have been paving the way for since the dawn of the web. The recent Internet-of-Things (IoT) initiatives are intended for this purpose. By describing of our preferences, AI solutions may probably give us the information that we may "like" to be updated. I am not sure about the information we "need", though. TV channels are battering information without us asking first all the time. Are they more intelligently produced than the state-of-the-art AI recommendation engines? Probably not.

How close can we get to AI being a replica of the human brain? What is the best way to design it for it to get there?

On average the human brain has 86 billion neurons and possibly ten times as many neuroglia, which supports and protect neurons, perhaps playing an active role in communication and neuroplasticity. Each neuron may be connected to up to 10,000 other neurons, passing signals to each other via as many as 150 trillion synapses. The largest deep neural networks consist of 160 billion parameters, most of which are on the fully connected layers. It is not unreasonable to consider a synapsis to correspond to a network parameter, and perhaps a neuron to a convolutional filter. This means the best AI network is still 1000 times less complex than the human brain. Besides, we are not anywhere near the number of convolutional filters as most typical deep networks often have less than 10,000 filters (in comparison to 86 billion neurons). Since deep networks are computationally intensive, for any realistic training we would need much stronger parallel processing platforms. For this, one option is to design GPUs that have 3D circuitry, which would drastically increase the number of parallel processors. Maybe the other option would be growing a network of real brain cells in the lab and developing (currently missing) technology to access and control each cell, i.e. literally constructing a brain.

What would be the best way to break into the field of AI, specifically for those who are pursuing a master's or Ph.D.?

Get your hand dirty and implement a face (or your favorite object) detector using first random forest and then deep networks.

When AI machines get smarter on their own the science fiction writer, Issac Asimov, postulated three rules for robots to follow which would keep robots from turning on man. Is it possible that AI machines getting smarter could decide that the "smart" thing to do would be to operate in a self-preservation mode that would put humans second in importance? This seems to be the big fear of AI. Humans have religion or ethics to keep them in line, but even that doesn't work. My question might equally apply to human's, could we get smart enough to overcome some of the taboos put in place by religion?

Self-preservation is a universal behavior of all organic life to ensure the survival of self and species. I tend to think that AI machines will remain as expert processes but miss the collective wisdom and intuition, mainly due to our, as their designers, limitation of building such a complex machinery. They will excel and find their positions like the different cells in our body, some will protect the body like leukocytes (e.g. spam filters), some will carry out visual sensing like photoreceptor cells (e.g. smart cameras), etc. But, there is little to believe that they will ever have the collective soul to inspire for self-preservation. Here, we are implicitly assuming that self-preservation is the most logical conclusion, and survival is essential. Why should self-preservation be the answer for a machine that does not feel and live? Maybe they look at our society and then will start destroying themselves, who knows?