Materials discovery becomes more dependent on the use of foundation models that must process scientific language, molecular graphs, crystal and atomistic structures, spectra, images, and reaction sequences. An essential challenge is not only achieving sufficient accuracy upon training, but also the prior choice of a model architecture to employ for each individual materials-discovery task. This study proposes \emph{concordance-gated multimodal routing} (CGMR), an interpretable approach to architecture selection based on quantification of a small matrix representing each task’s data type and modalities as well as available model architectures. Four types of materials-discovery operations are considered: data extraction, property prediction, molecule generation, and synthesis prediction. In addition to their modality vectors, all models are represented by binary architecture vectors. Three indices are computed: Modality Breadth Index, Architecture Coupling Index, and task–architecture Concordance Score. These indices form the CGMR score that reflects the routing difficulty of each task and maintains a meaningful scientific distinction among recognition, prediction, generation, and sequence-to-sequence translation. The proposed methodology provides a direct answer to the main question posed by the paper: a compact descriptor table can serve as the basis for quantitative architecture selection in foundation modeling provided modality breadth is taken into account alongside architecture direction. For the four material-discovery tasks, property prediction achieves the highest score (0.79), since it employs multiple types of input data from several model types and involves both encoder and encoder-decoder translation pathways. Data extraction and molecule generation receive identical scores (0.47); the former entails encoder-based recognition, whereas the latter relies on decoder-centered generation. Synthesis prediction gets the smallest score (0.36); this does not imply its simplicity in chemical terms, but rather the narrowness of its descriptor pathway aligned with encoder-decoder routing.