Automatic image captioning aims to produce a descriptive sentence about a picture. For this task, we are creating a model that will spit out an English sen- tence when an image is given as input describing the image’s subject. Scien- tists in the field of cognitive computing have paid much attention to it in recent years. The endeavor is challenging because it requires merging ideas from two distinct but related disciplines: natural language processing and computer vision. Using the integration of CNN with LSTM, we developed a model for generating image captions. The ideas behind a Convolutional Neural Network and a Long Short-Term Memory model were combined to create this model. The convolutional neural network serves as the encoder, extracting informa- tion from images. At the same time, the long short-term memory is responsible for the decoder role, coming up with words to describe the image. The problem arises when the dataset is significant, and it takes weeks for systems to have only CPU support to train the network to decrease the time it is required to train big data can be taken into accounts. After the caption generation phase, we use BLEU Scores to assess our model’s performance. Using this informa- tion, our technology can help users find a fitting description for the uploaded photo with the desired language.