“Painting” is not a gerundive. Gerundives are verb forms that function as adjectives, which in English most often comes up in phrases like “that is a sight to be seen” (with “to be seen” being the gerundive). But “painting” (as in “a painting”) is a verb form that functions like a noun, so it is a gerund. And when talking about a painting as an object, it’s also not a verbal noun. Yet when talking about the activity of painting (as in “painting is an art”) it is a verbal noun and not a gerund. So “painting” (the gerund), “painting” (the verbal noun), and “painting” (the verb form used in the continuous tenses) are technically all homonyms rather than the same word being used in different ways.
How did we end up in this situation? Proto-Indo-European, the ancestral language of modern English, relied heavily on inflections—changes to the beginnings, endings, or even middles of words to indicate things like tense, case, and number. So a hypothetical word stem “har” might have a bunch of endings to indicate tense (e.g., hara, haru, hari, haran, harun, harin, etc.), a bunch of prefixes to indicate number (e.g., tihar, zihar, lihar, etc.), and even changes to the main vowel to indicate the case (e.g., har, hur, hyr).
It’s a lot to remember, which is fine when a language is small. But it’s not the kind of thing that can withstand the expansion of a language’s vocabulary beyond a core set of words that are used everyday (thus why the overwhelming majority of irregular words in modern languages are core vocabulary). So things get easier. Proto-German has fewer inflections than Proto-Indo-European. Old English has even fewer. And due to Norse influence, Middle English and Modern English have fewer still. But we still have to be able to indicate the things that all those inflections were used for, so alternatives are needed.
One alternative is to create words that apply universally. Don’t want an inflection to indicate the future tense? Use the word “will” to indicate something that will happen in the future. Another alternative is to collapse redundant inflections. Don’t need three different ways to indicate the past tense depending on how many people did a thing? Just use ”-ed” regardless of whether one person, two people, or more than two people climbed the mountain. And then there’s the alternative that was used in this case, which was to let one inflection indicate more than one thing depending on context. Can’t remember when to use ”-ing,” when to use ”-eng,” and when to use ”-ung”? They’re all ”-ing” now regardless of which verb form you’re using, and it turns out that their contexts are all distinct enough that no one actually gets confused.
In the old days, we wouldn’t have used “painting” for the gerund, verbal noun, and continuous tenses. We would have used “painteng,” “painting,” and “paintung.” But the English were also warring, trading, and mating with the Norse, who only used “painting” and “paintung” (and didn’t use them the same way that they did anyway). So the inflections all just merged into one, and ”-ing” is the one that happened to survive. Thus we were left with paintings, buildings, clippings, toppings, and so forth.