The paper proposes a new approach to lung disease classification using MedViT and Swin Transformer deep learning models on a set of 10,425 lung X-ray images.[4] The images are divided into three classes: normal with 3,750 images, vioacity with 3,750 images, and pneumonia with 3,300 images.[4] Data upscaling, including geometric and photometric upscaling, is used to improve performance.[4] MedViT achieves the highest accuracy of 98.6% with a loss of 0.09 due to a hybrid convolution and transformer design.[4] Kullback-Leibler divergence as a loss function gives the best results and effectively handles class imbalance.[4] Both models show promising lung disease classification accuracy.[4] The findings highlight the potential of transformer models, especially MedViT, to support clinical decision-making.[4]