With the arrival of sixth-generation (6G) wireless systems comes radical potential for the deployment of autonomous aerial vehicle (AAV) swarms in mission-critical applications, ranging from disaster rescue to intelligent transportation. However, 6G-supporting AAV environments present challenges such as dynamic three-dimensional topologies, highly restrictive energy constraints, and extremely low latency demands, which substantially degrade the efficiency of conventional routing protocols. To this end, this work presents a Q-learning-enhanced ad hoc on-demand distance vector (QL-AODV). This intelligent routing protocol uses reinforcement learning within the AODV protocol to support adaptive, data-driven route selection in highly dynamic aerial networks. QL-AODV offers four novelties, including a multipath route set collection methodology that retains up to ten candidate routes for each destination using an extended route reply (RREP) waiting mechanism, a more detailed RREP message format with cumulative node buffer usage, enabling informed decision-making, a normalized 3D state space model recording hop count, average buffer occupancy, and peak buffer saturation, optimized to adhere to aerial network dynamics, and a light-weighted distributed Q-learning approach at the source node that uses an ε-greedy policy to balance exploration and exploitation. Large-scale simulations conducted with NS-3.34 for various node densities and mobility conditions confirm the better performance of QL-AODV compared to conventional AODV. In high-mobility environments, QL-AODV offers up to 9.8% improvement in packet delivery ratio and up to 12.1% increase in throughput, while remaining persistently scalable for various network sizes. The results prove that QL-AODV is a reliable, scalable, and intelligent routing method for next-generation AAV networks that will operate in intensive environments that are expected for 6G.