I think the idea is that you can eventually get an approximation of whatever wave-like curve you want by merging several pure sin/cos functions with varying amplitudes, wavelengths, and offsets
Fun fact: this is mostly how JPEG works - it uses fourier transforms to approximate an image in a way that takes up much less storage than storing information about each pixel.
My point is that it would be silly to judge someone for this, just like it's silly to judge someone for putting creamer in coffee.
Edit: also, what about drinks like mochas, cappuccinos, macchiatos, etc. which also have other ingredients mixed in? Generally it's still fine to call those forms of coffee, no?
Random side note: I've had chocolate-dipped espresso beans before and they're actually a pretty good snack. You just can't eat too many of them because of the caffeine content.