I have several tapes (yes actual cassette tapes) of my grandfather reading a novel.

Unfortunately a few of the tapes have degraded to the point that I cannot play them back.

I would love to recreate his voice, to “rerecord” the missing bits.

The recordings are in Danish.

Is this possible?

If it is, how can I go about it?

  • Grimy@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    4 months ago

    Elvenlabs is currently the best but you can get some very good results with first xtts then rvc as a second pass. It involves fine tuning models and running things with python and notebooks, so requires some know how.

    You can explore more models on the huggingface page https://huggingface.co/models?pipeline_tag=text-to-speech&sort=trending

    Most have a huggingface space dedicated to them where you can try them, here is the xtts space for example https://huggingface.co/spaces/coqui/xtts

    The language adds an other layer of difficulty, I would try their demo first to see if it gives anything workable but it isn’t a language current tts software cater too, it doesn’t seem to be an available option on xtts sadly.

    • boojumliussnark@lemmy.worldOP
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      4 months ago

      Thank you for the tips. As I see it currently, I expect the language to be the biggest hurdle. It doesn’t appear like something I can add myself, even if I had the data for a model. So as far as I can tell it involves two currently more or less impossible steps: Get model data and teach language to model.