infocon.org Header Art

Downloading everything without RSS and how big is it anyway?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dgl
    DisMember
    • Mar 2023
    • 1

    Downloading everything without RSS and how big is it anyway?

    So I'm using transmission, which doesn't directly support RSS. I also wanted to know the size of some of the torrents before download, to plan where to store stuff.

    I present two hacky scripts:

    mirror.sh:
    Code:
    #!/bin/bash
    # Infocon mirror, torrent files only. Excludes some content, adjust below.
    # By dgl, https://dgl.cx/0bsd
    # SPDX-License-Identifier: 0BSD
    set -exu
    
    BASE='https://w27irt6ldaydjoacyovepuzlethuoypazhhbot6tljuywy52emetn7qd.onion.jump.black/'
    
    # Using --timestamping doesn't work, as the index pages don't have Last-Modified.
    wget \
      --unlink \
      --force-directories \
      --no-host-directories \
      ${BASE}{cons,documentaries,podcasts,skills}/ # Adjust here.
    
    # Extract just the torrent links. (Yuck, but we can't make wget do it.)
    perl -nle'
      my $dir = $ARGV =~ s{(?:^|/)([^/]+)/index.html}{$1}r;
      print "'$BASE'$dir/$1" if m{href="([^/].*?\.torrent)"}i
    ' */index.html > dl
    
    # The files have Last-Modified so we can use --timestamping now.
    wget -idl --force-directories --no-host-directories --timestamping
    
    rm dl */index.html

    torrentsize:
    Code:
    #!/usr/bin/env perl
    # By dgl, https://dgl.cx/0bsd
    # SPDX-License-Identifier: 0BSD
    
    if (!@ARGV) {
      die "Usage: $0 v1.torrents...\n";
    }
    
    my %size_map = (
      K => 1<<10,
      M => 1<<20,
      G => 1<<30,
      T => 1<<40,
    );
    
    my $total = 0;
    
    for my $file (@ARGV) {
      open my $fh, "-|", "transmission-show", $file or die $!;
      my $out = join "", <$fh>;
      if ($out =~ /Total Size:\s+([\d.]+) (\w)/) {
        my $size = $1;
        my $mod = $2;
        $total += $size * $size_map{$mod};
      } else {
        warn "No size found for $file\n";
      }
    }
    
    if (-t STDOUT) {
      for (sort { $size_map{$b} <=> $size_map{$a} } keys %size_map) {
        if ($total > $size_map{$_}) {
          printf "%.02f %sB\n", $total/$size_map{$_}, $_;
          last;
        }
      }
    } else {
      print "$total\n";
    }
    ​
    This lets you download all the torrent files and then answer questions like, how big would it be to download all of cons/?

    Code:
    $ ./torrentsize cons/*v1\ *
    4.99 TB
    ​
    (Transmission doesn't support v2 torrents so hence just looking at the v1 ones.)

    Actually downloading is an exercise for the reader, but with transmission-daemon I do something like:

    Code:
    cd cons
    transmission-remote -a --no-trash-torrent some-v1.torrent -w "$(pwd)"
    Yeah, it would probably be easier to just use transmission-rss but maybe this is useful if someone wants the directories that don't have RSS feeds or something.​
Working...