What you describe is partly why I and a couple of others on different posts have said that it’s a bad idea to only read first and then try to listen later, you’re just not gonna understand spoken speech when you do hear it. It’s because you haven’t developed the perception of the sounds or other aspects of phonology yet. You literally can’t hear a lot of the language. It’s basically just undifferentiated noise, not individual linguistic sounds. It’s the same with tones in Chinese, at first you even have difficulty telling whether the pitch is rising or falling, which seems like it should be easy, but it’s not. Or in French, dessus and dessous if said right next to each other do sound a bit different, but put one into a sentence and until you’ve listened a lot, you’re unlikely to get it.
A sort of analogy, I think we’ve all had the experience of looking at a picture, you don’t know what it is, it’s just undifferentiated lines and blocks of colors, you can’t tell what is the foreground and what is the background, but supposedly there’s something there to see… but a moment later, you see what the picture is, it sort of pops out at you and it’s no longer just lines and blocks of color, you recognize it as an object. Your brain has figured out what’s the foreground, what’s the background etc. Despite initially seeing the lines and color, you literally couldn’t perceive the object that was there… So you hear the new word, but yet you’re not actually perceiving what’s really there. Of course, language sounds aren’t going to just “pop out” at you the way a picture does, it’s more gradual.
Something similar happens with whole sentences. We perceive there being spaces between words, but the audio waveform is just one continuous waveform, a continuous string of syllables, there’s no spaces between words. Your brain takes that in, segments the syllables into words and you perceive it as a string of individual words, not a continuous sound wave. So if you’ve only read and not listened, your brain won’t even be able to segment a sentence into individual words. It hasn’t developed that capability yet.
So yes, you’re correct, your brain needs time to acquire the sounds. I don’t know how else to do it other than listen a lot, starting with very simple audio. Sometimes specially contrived audio that focuses on certain sounds, presenting them to you in different contexts might be helpful.