Merge remote-tracking branch 'refs/remotes/soimort/develop' into develop

This commit is contained in:
Rokic 2016-11-08 01:14:18 +08:00
commit 09fa036c27
59 changed files with 2064 additions and 357 deletions

View File

@ -1,7 +1,7 @@
# You-Get
[![PyPI version](https://badge.fury.io/py/you-get.png)](http://badge.fury.io/py/you-get)
[![Build Status](https://api.travis-ci.org/soimort/you-get.png)](https://travis-ci.org/soimort/you-get)
[![PyPI version](https://img.shields.io/pypi/v/you-get.svg)](https://pypi.python.org/pypi/you-get/)
[![Build Status](https://travis-ci.org/soimort/you-get.svg)](https://travis-ci.org/soimort/you-get)
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
@ -37,13 +37,13 @@ Interested? [Install it](#installation) now and [get started by examples](#getti
Are you a Python programmer? Then check out [the source](https://github.com/soimort/you-get) and fork it!
![](http://i.imgur.com/GfthFAz.png)
![](https://i.imgur.com/GfthFAz.png)
## Installation
### Prerequisites
The following dependencies are required and must be installed separately, unless you are using a pre-built package on Windows:
The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows:
* **[Python 3](https://www.python.org/downloads/)**
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
@ -93,6 +93,24 @@ $ git clone git://github.com/soimort/you-get.git
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only)
```
> choco install you-get
```
### Option 7: Homebrew (Mac only)
You can install `you-get` easily via:
```
$ brew install you-get
```
### Shell completion
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](contrib/completion). Please consult your shell's manual for how to take advantage of them.
## Upgrading
Based on which option you chose to install `you-get`, you may upgrade it via:
@ -107,6 +125,18 @@ or download the latest release via:
$ you-get https://github.com/soimort/you-get/archive/master.zip
```
or use [chocolatey package manager](https://chocolatey.org):
```
> choco upgrade you-get
```
In order to get the latest ```develop``` branch without messing up the PIP, you can try:
```
$ pip3 install --upgrade git+https://github.com/soimort/you-get@develop
```
## Getting Started
### Download a video
@ -300,7 +330,7 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| :--: | :-- | :-----: | :-----: | :-----: |
| **YouTube** | <https://www.youtube.com/> |✓| | |
| **Twitter** | <https://twitter.com/> |✓|✓| |
| VK | <http://vk.com/> |✓| | |
| VK | <http://vk.com/> |✓|| |
| Vine | <https://vine.co/> |✓| | |
| Vimeo | <https://vimeo.com/> |✓| | |
| Vidto | <http://vidto.me/> |✓| | |
@ -309,6 +339,7 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| **Tumblr** | <https://www.tumblr.com/> |✓|✓|✓|
| TED | <http://www.ted.com/> |✓| | |
| SoundCloud | <https://soundcloud.com/> | | |✓|
| SHOWROOM | <https://www.showroom-live.com/> |✓| | |
| Pinterest | <https://www.pinterest.com/> | |✓| |
| MusicPlayOn | <http://en.musicplayon.com/> |✓| | |
| MTV81 | <http://www.mtv81.com/> |✓| | |
@ -342,8 +373,9 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
| Dilidili | <http://www.dilidili.com/> |✓| | |
| 豆瓣 | <http://www.douban.com/> | | |✓|
| 豆瓣 | <http://www.douban.com/> || |✓|
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
| Panda<br/>熊猫 | <http://www.panda.tv/> |✓| | |
| 凤凰视频 | <http://v.ifeng.com/> |✓| | |
| 风行网 | <http://www.fun.tv/> |✓| | |
| iQIYI<br/>爱奇艺 | <http://www.iqiyi.com/> |✓| | |
@ -359,6 +391,7 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| PPTV聚力 | <http://www.pptv.com/> |✓| | |
| 齐鲁网 | <http://v.iqilu.com/> |✓| | |
| QQ<br/>腾讯视频 | <http://v.qq.com/> |✓| | |
| 企鹅直播 | <http://live.qq.com/> |✓| | |
| 阡陌视频 | <http://qianmo.com/> |✓| | |
| THVideo | <http://thvideo.tv/> |✓| | |
| Sina<br/>新浪视频<br/>微博秒拍视频 | <http://video.sina.com.cn/><br/><http://video.weibo.com/> |✓| | |
@ -372,6 +405,9 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
| 央视网 | <http://www.cntv.cn/> |✓| | |
| 花瓣 | <http://huaban.com/> | |✓| |
| Naver<br/>네이버 | <http://tvcast.naver.com/> |✓| | |
| 芒果TV | <http://www.mgtv.com/> |✓| | |
| 火猫TV | <http://www.huomao.com/> |✓| | |
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.

View File

@ -0,0 +1,29 @@
#compdef you-get
# Zsh completion definition for soimort/you-get.
setopt localoptions noshwordsplit noksharrays
local -a args
args=(
'(- : *)'{-V,--version}'[print version and exit]'
'(- : *)'{-h,--help}'[print help and exit]'
'(-i --info)'{-i,--info}'[print extracted information]'
'(-u --url)'{-u,--url}'[print extracted information with URLs]'
'(--json)--json[print extracted URLs in JSON format]'
'(-n --no-merge)'{-n,--no-merge}'[do not merge video parts]'
'(--no-caption)--no-caption[do not download captions]'
'(-f --force)'{-f,--force}'[force overwrite existing files]'
'(-F --format)'{-F,--format}'[set video format to the specified stream id]:stream id'
'(-O --output-filename)'{-O,--output-filename}'[set output filename]:filename:_files'
'(-o --output-dir)'{-o,--output-dir}'[set output directory]:directory:_files -/'
'(-p --player)'{-p,--player}'[stream extracted URL to the specified player]:player and options'
'(-c --cookies)'{-c,--cookies}'[load cookies.txt or cookies.sqlite]:cookies file:_files'
'(-x --http-proxy)'{-x,--http-proxy}'[use the specified HTTP proxy for downloading]:host\:port:'
'(-y --extractor-proxy)'{-y,--extractor-proxy}'[use the specified HTTP proxy for extraction only]:host\:port'
'(--no-proxy)--no-proxy[do not use a proxy]'
'(-t --timeout)'{-t,--timeout}'[set socket timeout]:seconds'
'(-d --debug)'{-d,--debug}'[show traceback and other debug info]'
'*: :_guard "^-*" url'
)
_arguments -S -s $args

View File

@ -0,0 +1,31 @@
# Bash completion definition for you-get.
_you-get () {
COMPREPLY=()
local IFS=$' \n'
local cur=$2 prev=$3
local -a opts_without_arg opts_with_arg
opts_without_arg=(
-V --version -h --help -i --info -u --url --json -n --no-merge
--no-caption -f --force --no-proxy -d --debug
)
opts_with_arg=(
-F --format -O --output-filename -o --output-dir -p --player
-c --cookies -x --http-proxy -y --extractor-proxy -t --timeout
)
# Do not complete non option names
[[ $cur == -* ]] || return 1
# Do not complete when the previous arg is an option expecting an argument
for opt in "${opts_with_arg[@]}"; do
[[ $opt == $prev ]] && return 1
done
# Complete option names
COMPREPLY=( $(compgen -W "${opts_without_arg[*]} ${opts_with_arg[*]}" \
-- "$cur") )
return 0
}
complete -F _you-get you-get

View File

@ -0,0 +1,23 @@
# Fish completion definition for you-get.
complete -c you-get -s V -l version -d 'print version and exit'
complete -c you-get -s h -l help -d 'print help and exit'
complete -c you-get -s i -l info -d 'print extracted information'
complete -c you-get -s u -l url -d 'print extracted information'
complete -c you-get -l json -d 'print extracted URLs in JSON format'
complete -c you-get -s n -l no-merge -d 'do not merge video parts'
complete -c you-get -l no-caption -d 'do not download captions'
complete -c you-get -s f -l force -d 'force overwrite existing files'
complete -c you-get -s F -l format -x -d 'set video format to the specified stream id'
complete -c you-get -s O -l output-filename -d 'set output filename' \
-x -a '(__fish_complete_path (commandline -ct) "output filename")'
complete -c you-get -s o -l output-dir -d 'set output directory' \
-x -a '(__fish_complete_directories (commandline -ct) "output directory")'
complete -c you-get -s p -l player -x -d 'stream extracted URL to the specified player'
complete -c you-get -s c -l cookies -d 'load cookies.txt or cookies.sqlite' \
-x -a '(__fish_complete_path (commandline -ct) "cookies.txt or cookies.sqlite")'
complete -c you-get -s x -l http-proxy -x -d 'use the specified HTTP proxy for downloading'
complete -c you-get -s y -l extractor-proxy -x -d 'use the specified HTTP proxy for extraction only'
complete -c you-get -l no-proxy -d 'do not use a proxy'
complete -c you-get -s t -l timeout -x -d 'set socket timeout'
complete -c you-get -s d -l debug -d 'show traceback and other debug info'

View File

@ -8,7 +8,9 @@ SITES = {
'baidu' : 'baidu',
'bandcamp' : 'bandcamp',
'baomihua' : 'baomihua',
'bigthink' : 'bigthink',
'bilibili' : 'bilibili',
'cctv' : 'cntv',
'cntv' : 'cntv',
'cbs' : 'cbs',
'dailymotion' : 'dailymotion',
@ -25,7 +27,9 @@ SITES = {
'google' : 'google',
'heavy-music' : 'heavymusic',
'huaban' : 'huaban',
'huomao' : 'huomaotv',
'iask' : 'sina',
'icourses' : 'icourses',
'ifeng' : 'ifeng',
'imgur' : 'imgur',
'in' : 'alive',
@ -47,17 +51,21 @@ SITES = {
'lizhi' : 'lizhi',
'magisto' : 'magisto',
'metacafe' : 'metacafe',
'mgtv' : 'mgtv',
'miomio' : 'miomio',
'mixcloud' : 'mixcloud',
'mtv81' : 'mtv81',
'musicplayon' : 'musicplayon',
'naver' : 'naver',
'7gogo' : 'nanagogo',
'nicovideo' : 'nicovideo',
'panda' : 'panda',
'pinterest' : 'pinterest',
'pixnet' : 'pixnet',
'pptv' : 'pptv',
'qianmo' : 'qianmo',
'qq' : 'qq',
'showroom-live' : 'showroom',
'sina' : 'sina',
'smgbb' : 'bilibili',
'sohu' : 'sohu',
@ -73,6 +81,7 @@ SITES = {
'videomega' : 'videomega',
'vidto' : 'vidto',
'vimeo' : 'vimeo',
'wanmen' : 'wanmen',
'weibo' : 'miaopai',
'veoh' : 'veoh',
'vine' : 'vine',
@ -95,6 +104,7 @@ import logging
import os
import platform
import re
import socket
import sys
import time
from urllib import request, parse, error
@ -305,7 +315,53 @@ def get_content(url, headers={}, decoded=True):
if cookies:
cookies.add_cookie_header(req)
req.headers.update(req.unredirected_hdrs)
for i in range(10):
try:
response = request.urlopen(req)
break
except socket.timeout:
logging.debug('request attempt %s timeout' % str(i + 1))
data = response.read()
# Handle HTTP compression for gzip and deflate (zlib)
content_encoding = response.getheader('Content-Encoding')
if content_encoding == 'gzip':
data = ungzip(data)
elif content_encoding == 'deflate':
data = undeflate(data)
# Decode the response body
if decoded:
charset = match1(response.getheader('Content-Type'), r'charset=([\w-]+)')
if charset is not None:
data = data.decode(charset)
else:
data = data.decode('utf-8')
return data
def post_content(url, headers={}, post_data={}, decoded=True):
"""Post the content of a URL via sending a HTTP POST request.
Args:
url: A URL.
headers: Request headers used by the client.
decoded: Whether decode the response body using UTF-8 or the charset specified in Content-Type.
Returns:
The content as a string.
"""
logging.debug('post_content: %s \n post_data: %s' % (url, post_data))
req = request.Request(url, headers=headers)
if cookies:
cookies.add_cookie_header(req)
req.headers.update(req.unredirected_hdrs)
post_data_enc = bytes(parse.urlencode(post_data), 'utf-8')
response = request.urlopen(req, data = post_data_enc)
data = response.read()
# Handle HTTP compression for gzip and deflate (zlib)
@ -492,7 +548,11 @@ def url_save(url, filepath, bar, refer = None, is_part = False, faker = False, h
os.remove(filepath) # on Windows rename could fail if destination filepath exists
os.rename(temp_filepath, filepath)
def url_save_chunked(url, filepath, bar, refer = None, is_part = False, faker = False, headers = {}):
def url_save_chunked(url, filepath, bar, dyn_callback=None, chunk_size=0, ignore_range=False, refer=None, is_part=False, faker=False, headers={}):
def dyn_update_url(received):
if callable(dyn_callback):
logging.debug('Calling callback %s for new URL from %s' % (dyn_callback.__name__, received))
return dyn_callback(received)
if os.path.exists(filepath):
if not force:
if not is_part:
@ -530,19 +590,26 @@ def url_save_chunked(url, filepath, bar, refer = None, is_part = False, faker =
else:
headers = {}
if received:
url = dyn_update_url(received)
if not ignore_range:
headers['Range'] = 'bytes=' + str(received) + '-'
if refer:
headers['Referer'] = refer
response = request.urlopen(request.Request(url, headers = headers), None)
response = request.urlopen(request.Request(url, headers=headers), None)
with open(temp_filepath, open_mode) as output:
this_chunk = received
while True:
buffer = response.read(1024 * 256)
if not buffer:
break
output.write(buffer)
received += len(buffer)
if chunk_size and (received - this_chunk) >= chunk_size:
url = dyn_callback(received)
this_chunk = received
response = request.urlopen(request.Request(url, headers=headers), None)
if bar:
bar.update_received(len(buffer))
@ -734,7 +801,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
if has_ffmpeg_installed():
from .processor.ffmpeg import ffmpeg_concat_av
ret = ffmpeg_concat_av(parts, output_filepath, ext)
print('Done.')
print('Merged into %s' % output_filename)
if ret == 0:
for part in parts: os.remove(part)
@ -747,7 +814,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
else:
from .processor.join_flv import concat_flv
concat_flv(parts, output_filepath)
print('Done.')
print('Merged into %s' % output_filename)
except:
raise
else:
@ -763,7 +830,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
else:
from .processor.join_mp4 import concat_mp4
concat_mp4(parts, output_filepath)
print('Done.')
print('Merged into %s' % output_filename)
except:
raise
else:
@ -779,7 +846,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
else:
from .processor.join_ts import concat_ts
concat_ts(parts, output_filepath)
print('Done.')
print('Merged into %s' % output_filename)
except:
raise
else:
@ -791,7 +858,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
print()
def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=None, merge=True, faker=False, headers = {}):
def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=None, merge=True, faker=False, headers = {}, **kwargs):
assert urls
if dry_run:
print('Real URLs:\n%s\n' % urls)
@ -805,7 +872,7 @@ def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=No
filename = '%s.%s' % (title, ext)
filepath = os.path.join(output_dir, filename)
if total_size and ext in ('ts'):
if total_size:
if not force and os.path.exists(filepath[:-3] + '.mkv'):
print('Skipping %s: file already exists' % filepath[:-3] + '.mkv')
print()
@ -820,7 +887,7 @@ def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=No
print('Downloading %s ...' % tr(filename))
filepath = os.path.join(output_dir, filename)
parts.append(filepath)
url_save_chunked(url, filepath, bar, refer = refer, faker = faker, headers = headers)
url_save_chunked(url, filepath, bar, refer = refer, faker = faker, headers = headers, **kwargs)
bar.done()
if not merge:
@ -887,6 +954,22 @@ def download_rtmp_url(url,title, ext,params={}, total_size=0, output_dir='.', re
assert has_rtmpdump_installed(), "RTMPDump not installed."
download_rtmpdump_stream(url, title, ext,params, output_dir)
def download_url_ffmpeg(url,title, ext,params={}, total_size=0, output_dir='.', refer=None, merge=True, faker=False):
assert url
if dry_run:
print('Real URL:\n%s\n' % [url])
if params.get("-y",False): #None or unset ->False
print('Real Playpath:\n%s\n' % [params.get("-y")])
return
if player:
launch_player(player, [url])
return
from .processor.ffmpeg import has_ffmpeg_installed, ffmpeg_download_stream
assert has_ffmpeg_installed(), "FFmpeg not installed."
ffmpeg_download_stream(url, title, ext, params, output_dir)
def playlist_not_supported(name):
def f(*args, **kwargs):
raise NotImplementedError('Playlist is not supported for ' + name)
@ -1015,6 +1098,22 @@ def set_http_proxy(proxy):
opener = request.build_opener(proxy_support)
request.install_opener(opener)
def print_more_compatible(*args, **kwargs):
import builtins as __builtin__
"""Overload default print function as py (<3.3) does not support 'flush' keyword.
Although the function name can be same as print to get itself overloaded automatically,
I'd rather leave it with a different name and only overload it when importing to make less confusion. """
# nothing happens on py3.3 and later
if sys.version_info[:2] >= (3, 3):
return __builtin__.print(*args, **kwargs)
# in lower pyver (e.g. 3.2.x), remove 'flush' keyword and flush it as requested
doFlush = kwargs.pop('flush', False)
ret = __builtin__.print(*args, **kwargs)
if doFlush:
kwargs.get('file', sys.stdout).flush()
return ret
def download_main(download, download_playlist, urls, playlist, **kwargs):
@ -1060,11 +1159,13 @@ def script_main(script_name, download, download_playlist, **kwargs):
-x | --http-proxy <HOST:PORT> Use an HTTP proxy for downloading.
-y | --extractor-proxy <HOST:PORT> Use an HTTP proxy for extracting only.
--no-proxy Never use a proxy.
-s | --socks-proxy <HOST:PORT> Use an SOCKS5 proxy for downloading.
-t | --timeout <SECONDS> Set socket timeout.
-d | --debug Show traceback and other debug info.
'''
short_opts = 'Vhfiuc:ndF:O:o:p:x:y:'
opts = ['version', 'help', 'force', 'info', 'url', 'cookies', 'no-caption', 'no-merge', 'no-proxy', 'debug', 'json', 'format=', 'stream=', 'itag=', 'output-filename=', 'output-dir=', 'player=', 'http-proxy=', 'extractor-proxy=', 'lang=']
short_opts = 'Vhfiuc:ndF:O:o:p:x:y:s:t:'
opts = ['version', 'help', 'force', 'info', 'url', 'cookies', 'no-caption', 'no-merge', 'no-proxy', 'debug', 'json', 'format=', 'stream=', 'itag=', 'output-filename=', 'output-dir=', 'player=', 'http-proxy=', 'socks-proxy=', 'extractor-proxy=', 'lang=', 'timeout=']
if download_playlist:
short_opts = 'l' + short_opts
opts = ['playlist'] + opts
@ -1092,8 +1193,10 @@ def script_main(script_name, download, download_playlist, **kwargs):
lang = None
output_dir = '.'
proxy = None
socks_proxy = None
extractor_proxy = None
traceback = False
timeout = 600
for o, a in opts:
if o in ('-V', '--version'):
version()
@ -1163,10 +1266,14 @@ def script_main(script_name, download, download_playlist, **kwargs):
caption = False
elif o in ('-x', '--http-proxy'):
proxy = a
elif o in ('-s', '--socks-proxy'):
socks_proxy = a
elif o in ('-y', '--extractor-proxy'):
extractor_proxy = a
elif o in ('--lang',):
lang = a
elif o in ('-t', '--timeout'):
timeout = int(a)
else:
log.e("try 'you-get --help' for more options")
sys.exit(2)
@ -1174,8 +1281,27 @@ def script_main(script_name, download, download_playlist, **kwargs):
print(help)
sys.exit()
if (socks_proxy):
try:
import socket
import socks
socks_proxy_addrs = socks_proxy.split(':')
socks.set_default_proxy(socks.SOCKS5,
socks_proxy_addrs[0],
int(socks_proxy_addrs[1]))
socket.socket = socks.socksocket
def getaddrinfo(*args):
return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]
socket.getaddrinfo = getaddrinfo
except ImportError:
log.w('Error importing PySocks library, socks proxy ignored.'
'In order to use use socks proxy, please install PySocks.')
else:
import socket
set_http_proxy(proxy)
socket.setdefaulttimeout(timeout)
try:
if stream_id:
if not extractor_proxy:

View File

@ -1,6 +1,7 @@
#!/usr/bin/env python
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy
from .common import print_more_compatible as print
from .util import log
from . import json_output
import os

View File

@ -5,7 +5,9 @@ from .alive import *
from .archive import *
from .baidu import *
from .bandcamp import *
from .bigthink import *
from .bilibili import *
from .bokecc import *
from .cbs import *
from .ckplayer import *
from .cntv import *
@ -22,6 +24,7 @@ from .funshion import *
from .google import *
from .heavymusic import *
from .huaban import *
from .icourses import *
from .ifeng import *
from .imgur import *
from .infoq import *
@ -38,19 +41,24 @@ from .le import *
from .lizhi import *
from .magisto import *
from .metacafe import *
from .mgtv import *
from .miaopai import *
from .miomio import *
from .mixcloud import *
from .mtv81 import *
from .musicplayon import *
from .nanagogo import *
from .naver import *
from .netease import *
from .nicovideo import *
from .panda import *
from .pinterest import *
from .pixnet import *
from .pptv import *
from .qianmo import *
from .qie import *
from .qq import *
from .showroom import *
from .sina import *
from .sohu import *
from .soundcloud import *
@ -67,6 +75,7 @@ from .vimeo import *
from .vine import *
from .vk import *
from .w56 import *
from .wanmen import *
from .xiami import *
from .yinyuetai import *
from .yixia import *
@ -74,3 +83,4 @@ from .youku import *
from .youtube import *
from .ted import *
from .khan import *
from .zhanqi import *

View File

@ -8,7 +8,7 @@ from .le import letvcloud_download_by_vu
from .qq import qq_download_by_vid
from .sina import sina_download_by_vid
from .tudou import tudou_download_by_iid
from .youku import youku_download_by_vid
from .youku import youku_download_by_vid, youku_open_download_by_vid
import json, re
@ -17,10 +17,24 @@ def get_srt_json(id):
return get_html(url)
def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
"""str, str, str, bool, bool ->None
Download Acfun video by vid.
Call Acfun API, decide which site to use, and pass the job to its
extractor.
"""
#first call the main parasing API
info = json.loads(get_html('http://www.acfun.tv/video/getVideo.aspx?id=' + vid))
sourceType = info['sourceType']
#decide sourceId to know which extractor to use
if 'sourceId' in info: sourceId = info['sourceId']
# danmakuId = info['danmakuId']
#call extractor decided by sourceId
if sourceType == 'sina':
sina_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'youku':
@ -32,14 +46,13 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
elif sourceType == 'letv':
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'zhuzhan':
a = 'http://api.aixifan.com/plays/%s/realSource' % vid
s = json.loads(get_content(a, headers={'deviceType': '1'}))
urls = s['data']['files'][-1]['url']
size = urls_size(urls)
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls(urls, title, 'mp4', size,
output_dir=output_dir, merge=merge)
#As in Jul.28.2016, Acfun is using embsig to anti hotlink so we need to pass this
embsig = info['encode']
a = 'http://api.aixifan.com/plays/%s' % vid
s = json.loads(get_content(a, headers={'deviceType': '2'}))
if s['data']['source'] == "zhuzhan-youku":
sourceId = s['data']['sourceId']
youku_open_download_by_vid(client_id='908a519d032263f8', vid=sourceId, title=title, output_dir=output_dir,merge=merge, info_only=info_only, embsig = embsig, **kwargs)
else:
raise NotImplementedError(sourceType)
@ -60,16 +73,15 @@ def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
assert re.match(r'http://[^\.]+.acfun.[^\.]+/\D/\D\D(\d+)', url)
html = get_html(url)
title = r1(r'<h1 id="txt-title-view">([^<>]+)<', html)
title = r1(r'data-title="([^"]+)"', html)
title = unescape_html(title)
title = escape_file_path(title)
assert title
videos = re.findall("data-vid=\"(\d+)\".*href=\"[^\"]+\".*title=\"([^\"]+)\"", html)
for video in videos:
p_vid = video[0]
p_title = title + " - " + video[1] if video[1] != '删除标签' else title
acfun_download_by_vid(p_vid, p_title,
vid = r1('data-vid="(\d+)"', html)
up = r1('data-name="([^"]+)"', html)
title = title + ' - ' + up
acfun_download_by_vid(vid, title,
output_dir=output_dir,
merge=merge,
info_only=info_only,

225
src/you_get/extractors/baidu.py Executable file → Normal file
View File

@ -7,8 +7,10 @@ from ..common import *
from .embed import *
from .universal import *
def baidu_get_song_data(sid):
data = json.loads(get_html('http://music.baidu.com/data/music/fmlink?songIds=%s' % sid, faker = True))['data']
data = json.loads(get_html(
'http://music.baidu.com/data/music/fmlink?songIds=%s' % sid, faker=True))['data']
if data['xcode'] != '':
# inside china mainland
@ -17,22 +19,28 @@ def baidu_get_song_data(sid):
# outside china mainland
return None
def baidu_get_song_url(data):
return data['songLink']
def baidu_get_song_artist(data):
return data['artistName']
def baidu_get_song_album(data):
return data['albumName']
def baidu_get_song_title(data):
return data['songName']
def baidu_get_song_lyric(data):
lrc = data['lrcLink']
return None if lrc is '' else "http://music.baidu.com%s" % lrc
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
data = baidu_get_song_data(sid)
if data is not None:
@ -51,7 +59,8 @@ def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
type, ext, size = url_info(url, faker=True)
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], file_name, ext, size, output_dir, merge=merge, faker=True)
download_urls([url], file_name, ext, size,
output_dir, merge=merge, faker=True)
try:
type, ext, size = url_info(lrc, faker=True)
@ -61,12 +70,14 @@ def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
except:
pass
def baidu_download_album(aid, output_dir = '.', merge = True, info_only = False):
html = get_html('http://music.baidu.com/album/%s' % aid, faker = True)
def baidu_download_album(aid, output_dir='.', merge=True, info_only=False):
html = get_html('http://music.baidu.com/album/%s' % aid, faker=True)
album_name = r1(r'<h2 class="album-name">(.+?)<\/h2>', html)
artist = r1(r'<span class="author_list" title="(.+?)">', html)
output_dir = '%s/%s - %s' % (output_dir, artist, album_name)
ids = json.loads(r1(r'<span class="album-add" data-adddata=\'(.+?)\'>', html).replace('&quot', '').replace(';', '"'))['ids']
ids = json.loads(r1(r'<span class="album-add" data-adddata=\'(.+?)\'>',
html).replace('&quot', '').replace(';', '"'))['ids']
track_nr = 1
for id in ids:
song_data = baidu_get_song_data(id)
@ -75,38 +86,29 @@ def baidu_download_album(aid, output_dir = '.', merge = True, info_only = False)
song_lrc = baidu_get_song_lyric(song_data)
file_name = '%02d.%s' % (track_nr, song_title)
type, ext, size = url_info(song_url, faker = True)
type, ext, size = url_info(song_url, faker=True)
print_info(site_info, song_title, type, size)
if not info_only:
download_urls([song_url], file_name, ext, size, output_dir, merge = merge, faker = True)
download_urls([song_url], file_name, ext, size,
output_dir, merge=merge, faker=True)
if song_lrc:
type, ext, size = url_info(song_lrc, faker = True)
type, ext, size = url_info(song_lrc, faker=True)
print_info(site_info, song_title, type, size)
if not info_only:
download_urls([song_lrc], file_name, ext, size, output_dir, faker = True)
download_urls([song_lrc], file_name, ext,
size, output_dir, faker=True)
track_nr += 1
def baidu_download(url, output_dir = '.', stream_type = None, merge = True, info_only = False, **kwargs):
if re.match(r'http://imgsrc.baidu.com', url):
universal_download(url, output_dir, merge=merge, info_only=info_only)
return
elif re.match(r'http://pan.baidu.com', url):
html = get_html(url)
def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
title = r1(r'server_filename="([^"]+)"', html)
if len(title.split('.')) > 1:
title = ".".join(title.split('.')[:-1])
real_url = r1(r'\\"dlink\\":\\"([^"]*)\\"', html).replace('\\\\/', '/')
type, ext, size = url_info(real_url, faker = True)
print_info(site_info, title, ext, size)
if re.match(r'http://pan.baidu.com', url):
real_url, title, ext, size = baidu_pan_download(url)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
download_urls([real_url], title, ext, size,
output_dir, url, merge=merge, faker=True)
elif re.match(r'http://music.baidu.com/album/\d+', url):
id = r1(r'http://music.baidu.com/album/(\d+)', url)
baidu_download_album(id, output_dir, merge, info_only)
@ -124,17 +126,20 @@ def baidu_download(url, output_dir = '.', stream_type = None, merge = True, info
html = get_html(url)
title = r1(r'title:"([^"]+)"', html)
items = re.findall(r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
items = re.findall(
r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i
for i in set(items)]
# handle albums
kw = r1(r'kw=([^&]+)', html) or r1(r"kw:'([^']+)'", html)
tid = r1(r'tid=(\d+)', html) or r1(r"tid:'([^']+)'", html)
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s' % (kw, tid)
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s' % (
kw, tid)
album_info = json.loads(get_content(album_url))
for i in album_info['data']['pic_list']:
urls.append('http://imgsrc.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
urls.append(
'http://imgsrc.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
ext = 'jpg'
size = float('Inf')
@ -144,6 +149,170 @@ def baidu_download(url, output_dir = '.', stream_type = None, merge = True, info
download_urls(urls, title, ext, size,
output_dir=output_dir, merge=False)
def baidu_pan_download(url):
errno_patt = r'errno":([^"]+),'
refer_url = ""
fake_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'UTF-8,*;q=0.5',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Host': 'pan.baidu.com',
'Origin': 'http://pan.baidu.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:13.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2500.0 Safari/537.36',
'Referer': refer_url
}
if cookies:
print('Use user specified cookies')
else:
print('Generating cookies...')
fake_headers['Cookie'] = baidu_pan_gen_cookies(url)
refer_url = "http://pan.baidu.com"
html = get_content(url, fake_headers, decoded=True)
isprotected = False
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk = baidu_pan_parse(
html)
if sign == None:
if re.findall(r'\baccess-code\b', html):
isprotected = True
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk, fake_headers, psk = baidu_pan_protected_share(
url)
# raise NotImplementedError("Password required!")
if isprotected != True:
raise AssertionError("Share not found or canceled: %s" % url)
if bdstoken == None:
bdstoken = ""
if isprotected != True:
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk = baidu_pan_parse(
html)
request_url = "http://pan.baidu.com/api/sharedownload?sign=%s&timestamp=%s&bdstoken=%s&channel=chunlei&clienttype=0&web=1&app_id=%s" % (
sign, timestamp, bdstoken, appid)
refer_url = url
post_data = {
'encrypt': 0,
'product': 'share',
'uk': uk,
'primaryid': primary_id,
'fid_list': '[' + fs_id + ']'
}
if isprotected == True:
post_data['sekey'] = psk
response_content = post_content(request_url, fake_headers, post_data, True)
errno = match1(response_content, errno_patt)
if errno != "0":
raise AssertionError(
"Server refused to provide download link! (Errno:%s)" % errno)
real_url = r1(r'dlink":"([^"]+)"', response_content).replace('\\/', '/')
title = r1(r'server_filename":"([^"]+)"', response_content)
assert real_url
type, ext, size = url_info(real_url, faker=True)
title_wrapped = json.loads('{"wrapper":"%s"}' % title)
title = title_wrapped['wrapper']
logging.debug(real_url)
print_info(site_info, title, ext, size)
print('Hold on...')
time.sleep(5)
return real_url, title, ext, size
def baidu_pan_parse(html):
sign_patt = r'sign":"([^"]+)"'
timestamp_patt = r'timestamp":([^"]+),'
appid_patt = r'app_id":"([^"]+)"'
bdstoken_patt = r'bdstoken":"([^"]+)"'
fs_id_patt = r'fs_id":([^"]+),'
uk_patt = r'uk":([^"]+),'
errno_patt = r'errno":([^"]+),'
primary_id_patt = r'shareid":([^"]+),'
sign = match1(html, sign_patt)
timestamp = match1(html, timestamp_patt)
appid = match1(html, appid_patt)
bdstoken = match1(html, bdstoken_patt)
fs_id = match1(html, fs_id_patt)
uk = match1(html, uk_patt)
primary_id = match1(html, primary_id_patt)
return sign, timestamp, bdstoken, appid, primary_id, fs_id, uk
def baidu_pan_gen_cookies(url, post_data=None):
from http import cookiejar
cookiejar = cookiejar.CookieJar()
opener = request.build_opener(request.HTTPCookieProcessor(cookiejar))
resp = opener.open('http://pan.baidu.com')
if post_data != None:
resp = opener.open(url, bytes(parse.urlencode(post_data), 'utf-8'))
return cookjar2hdr(cookiejar)
def baidu_pan_protected_share(url):
print('This share is protected by password!')
inpwd = input('Please provide unlock password: ')
inpwd = inpwd.replace(' ', '').replace('\t', '')
print('Please wait...')
post_pwd = {
'pwd': inpwd,
'vcode': None,
'vstr': None
}
from http import cookiejar
import time
cookiejar = cookiejar.CookieJar()
opener = request.build_opener(request.HTTPCookieProcessor(cookiejar))
resp = opener.open('http://pan.baidu.com')
resp = opener.open(url)
init_url = resp.geturl()
verify_url = 'http://pan.baidu.com/share/verify?%s&t=%s&channel=chunlei&clienttype=0&web=1' % (
init_url.split('?', 1)[1], int(time.time()))
refer_url = init_url
fake_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'UTF-8,*;q=0.5',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Host': 'pan.baidu.com',
'Origin': 'http://pan.baidu.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:13.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2500.0 Safari/537.36',
'Referer': refer_url
}
opener.addheaders = dict2triplet(fake_headers)
pwd_resp = opener.open(verify_url, bytes(
parse.urlencode(post_pwd), 'utf-8'))
pwd_resp_str = ungzip(pwd_resp.read()).decode('utf-8')
pwd_res = json.loads(pwd_resp_str)
if pwd_res['errno'] != 0:
raise AssertionError(
'Server returned an error: %s (Incorrect password?)' % pwd_res['errno'])
pg_resp = opener.open('http://pan.baidu.com/share/link?%s' %
init_url.split('?', 1)[1])
content = ungzip(pg_resp.read()).decode('utf-8')
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk = baidu_pan_parse(
content)
psk = query_cookiejar(cookiejar, 'BDCLND')
psk = parse.unquote(psk)
fake_headers['Cookie'] = cookjar2hdr(cookiejar)
return sign, timestamp, bdstoken, appid, primary_id, fs_id, uk, fake_headers, psk
def cookjar2hdr(cookiejar):
cookie_str = ''
for i in cookiejar:
cookie_str = cookie_str + i.name + '=' + i.value + ';'
return cookie_str[:-1]
def query_cookiejar(cookiejar, name):
for i in cookiejar:
if i.name == name:
return i.value
def dict2triplet(dictin):
out_triplet = []
for i in dictin:
out_triplet.append((i, dictin[i]))
return out_triplet
site_info = "Baidu.com"
download = baidu_download
download_playlist = playlist_not_supported("baidu")

View File

@ -6,7 +6,7 @@ from ..common import *
def bandcamp_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
trackinfo = json.loads(r1(r'(\[{"video_poster_url".*}\]),', html))
trackinfo = json.loads(r1(r'(\[{"(video_poster_url|video_caption)".*}\]),', html))
for track in trackinfo:
track_num = track['track_num']
title = '%s. %s' % (track_num, track['title'])

2
src/you_get/extractors/baomihua.py Executable file → Normal file
View File

@ -7,7 +7,7 @@ from ..common import *
import urllib
def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html('http://play.baomihua.com/getvideourl.aspx?flvid=%s' % id)
html = get_html('http://play.baomihua.com/getvideourl.aspx?flvid=%s&devicetype=phone_app' % id)
host = r1(r'host=([^&]*)', html)
assert host
type = r1(r'videofiletype=([^&]*)', html)

View File

@ -0,0 +1,76 @@
#!/usr/bin/env python
from ..common import *
from ..extractor import VideoExtractor
import json
class Bigthink(VideoExtractor):
name = "Bigthink"
stream_types = [ #this is just a sample. Will make it in prepare()
# {'id': '1080'},
# {'id': '720'},
# {'id': '360'},
# {'id': '288'},
# {'id': '190'},
# {'id': '180'},
]
@staticmethod
def get_streams_by_id(account_number, video_id):
"""
int, int->list
Get the height of the videos.
Since brightcove is using 3 kinds of links: rtmp, http and https,
we will be using the HTTPS one to make it secure.
If somehow akamaihd.net is blocked by the Great Fucking Wall,
change the "startswith https" to http.
"""
endpoint = 'https://edge.api.brightcove.com/playback/v1/accounts/{account_number}/videos/{video_id}'.format(account_number = account_number, video_id = video_id)
fake_header_id = fake_headers
#is this somehow related to the time? Magic....
fake_header_id['Accept'] ='application/json;pk=BCpkADawqM1cc6wmJQC2tvoXZt4mrB7bFfi6zGt9QnOzprPZcGLE9OMGJwspQwKfuFYuCjAAJ53JdjI8zGFx1ll4rxhYJ255AXH1BQ10rnm34weknpfG-sippyQ'
html = get_content(endpoint, headers= fake_header_id)
html_json = json.loads(html)
link_list = []
for i in html_json['sources']:
if 'src' in i: #to avoid KeyError
if i['src'].startswith('https'):
link_list.append((str(i['height']), i['src']))
return link_list
def prepare(self, **kwargs):
html = get_content(self.url)
self.title = match1(html, r'<meta property="og:title" content="([^"]*)"')
account_number = match1(html, r'data-account="(\d+)"')
video_id = match1(html, r'data-brightcove-id="(\d+)"')
assert account_number, video_id
link_list = self.get_streams_by_id(account_number, video_id)
for i in link_list:
self.stream_types.append({'id': str(i[0])})
self.streams[i[0]] = {'url': i[1]}
def extract(self, **kwargs):
for i in self.streams:
s = self.streams[i]
_, s['container'], s['size'] = url_info(s['url'])
s['src'] = [s['url']]
site = Bigthink()
download = site.download_by_url

View File

@ -11,12 +11,14 @@ from .youku import youku_download_by_vid
import hashlib
import re
appkey='8e9fc618fbd41e28'
appkey = 'f3bb208b3d081dc8'
SECRETKEY_MINILOADER = '1c15888dc316e05a15fdd0a02ed6584f'
def get_srt_xml(id):
url = 'http://comment.bilibili.com/%s.xml' % id
return get_html(url)
def parse_srt_p(p):
fields = p.split(',')
assert len(fields) == 8, fields
@ -44,12 +46,14 @@ def parse_srt_p(p):
return pool, mode, font_size, font_color
def parse_srt_xml(xml):
d = re.findall(r'<d p="([^"]+)">(.*)</d>', xml)
for x, y in d:
p = parse_srt_p(x)
raise NotImplementedError()
def parse_cid_playurl(xml):
from xml.dom.minidom import parseString
try:
@ -59,10 +63,12 @@ def parse_cid_playurl(xml):
except:
return []
def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only=False):
urls = []
for cid in cids:
url = 'http://interface.bilibili.com/playurl?appkey=' + appkey + '&cid=' + cid
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
urls += [i
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
@ -78,8 +84,10 @@ def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only
if not info_only:
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
url = 'http://interface.bilibili.com/playurl?appkey=' + appkey + '&cid=' + cid
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
urls = [i
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
@ -87,17 +95,15 @@ def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=F
type_ = ''
size = 0
try:
for url in urls:
_, type_, temp = url_info(url)
size += temp or 0
except error.URLError:
log.wtf('[Failed] DNS not resolved. Please change your DNS server settings.')
print_info(site_info, title, type_, size)
if not info_only:
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
api_url = 'http://live.bilibili.com/api/playurl?cid=' + cid
urls = parse_cid_playurl(get_content(api_url))
@ -109,31 +115,42 @@ def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_o
if not info_only:
download_urls([url], title, type_, total_size=None, output_dir=output_dir, merge=merge)
def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_content(url)
title = r1_of([r'<meta name="title" content="([^<>]{1,999})" />',
r'<h1[^>]*>([^<>]+)</h1>'], html)
title = r1_of([r'<meta name="title" content="\s*([^<>]{1,999})\s*" />',
r'<h1[^>]*>\s*([^<>]+)\s*</h1>'], html)
if title:
title = unescape_html(title)
title = escape_file_path(title)
flashvars = r1_of([r'(cid=\d+)', r'(cid: \d+)', r'flashvars="([^"]+)"', r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
if re.match(r'https?://bangumi\.bilibili\.com/', url):
# quick hack for bangumi URLs
episode_id = r1(r'data-current-episode-id="(\d+)"', html)
cont = post_content('http://bangumi.bilibili.com/web_api/get_source',
post_data={'episode_id': episode_id})
cid = json.loads(cont)['result']['cid']
bilibili_download_by_cid(str(cid), title, output_dir=output_dir, merge=merge, info_only=info_only)
else:
flashvars = r1_of([r'(cid=\d+)', r'(cid: \d+)', r'flashvars="([^"]+)"',
r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
assert flashvars
flashvars = flashvars.replace(': ','=')
flashvars = flashvars.replace(': ', '=')
t, cid = flashvars.split('=', 1)
cid = cid.split('&')[0]
if t == 'cid':
if re.match(r'https?://live\.bilibili\.com/', url):
title = r1(r'<title>([^<>]+)</title>', html)
title = r1(r'<title>\s*([^<>]+)\s*</title>', html)
bilibili_live_download_by_cid(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
else:
# multi-P
cids = []
pages = re.findall('<option value=\'([^\']*)\'', html)
titles = re.findall('<option value=.*>(.+)</option>', html)
for page in pages:
titles = re.findall('<option value=.*>\s*([^<>]+)\s*</option>', html)
for i, page in enumerate(pages):
html = get_html("http://www.bilibili.com%s" % page)
flashvars = r1_of([r'(cid=\d+)',
r'flashvars="([^"]+)"',
@ -141,11 +158,15 @@ def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs
if flashvars:
t, cid = flashvars.split('=', 1)
cids.append(cid.split('&')[0])
if url.endswith(page):
cids = [cid.split('&')[0]]
titles = [titles[i]]
break
# no multi-P
if not pages:
cids = [cid]
titles = [r1(r'<option value=.* selected>(.+)</option>', html) or title]
titles = [r1(r'<option value=.* selected>\s*([^<>]+)\s*</option>', html) or title]
for i in range(len(cids)):
bilibili_download_by_cid(cids[i],
@ -173,6 +194,7 @@ def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs
with open(os.path.join(output_dir, title + '.cmt.xml'), 'w', encoding='utf-8') as x:
x.write(xml)
site_info = "bilibili.com"
download = bilibili_download
download_playlist = bilibili_download

View File

@ -0,0 +1,95 @@
#!/usr/bin/env python
from ..common import *
from ..extractor import VideoExtractor
import xml.etree.ElementTree as ET
class BokeCC(VideoExtractor):
name = "BokeCC"
stream_types = [ # we do now know for now, as we have to check the
# output from the API
]
API_ENDPOINT = 'http://p.bokecc.com/'
def download_by_id(self, vid = '', title = None, output_dir='.', merge=True, info_only=False,**kwargs):
"""self, str->None
Keyword arguments:
self: self
vid: The video ID for BokeCC cloud, something like
FE3BB999594978049C33DC5901307461
Calls the prepare() to download the video.
If no title is provided, this method shall try to find a proper title
with the information providin within the
returned content of the API."""
assert vid
self.prepare(vid = vid, title = title, **kwargs)
self.extract(**kwargs)
self.download(output_dir = output_dir,
merge = merge,
info_only = info_only, **kwargs)
def prepare(self, vid = '', title = None, **kwargs):
assert vid
api_url = self.API_ENDPOINT + \
'servlet/playinfo?vid={vid}&m=0'.format(vid = vid) #return XML
html = get_content(api_url)
self.tree = ET.ElementTree(ET.fromstring(html))
if self.tree.find('result').text != '1':
log.wtf('API result says failed!')
raise
if title is None:
self.title = '_'.join([i.text for i in tree.iterfind('video/videomarks/videomark/markdesc')])
else:
self.title = title
for i in self.tree.iterfind('video/quality'):
quality = i.attrib ['value']
url = i[0].attrib['playurl']
self.stream_types.append({'id': quality,
'video_profile': i.attrib ['desp']})
self.streams[quality] = {'url': url,
'video_profile': i.attrib ['desp']}
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
def extract(self, **kwargs):
for i in self.streams:
s = self.streams[i]
_, s['container'], s['size'] = url_info(s['url'])
s['src'] = [s['url']]
if 'stream_id' in kwargs and kwargs['stream_id']:
# Extract the stream
stream_id = kwargs['stream_id']
if stream_id not in self.streams:
log.e('[Error] Invalid video format.')
log.e('Run \'-i\' command with no specific video format to view all available formats.')
exit(2)
else:
# Extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
_, s['container'], s['size'] = url_info(s['url'])
s['src'] = [s['url']]
site = BokeCC()
# I don't know how to call the player directly so I just put it here
# just in case anyone touchs it -- Beining@Aug.24.2016
#download = site.download_by_url
#download_playlist = site.download_by_url
bokecc_download_by_id = site.download_by_id

View File

@ -7,6 +7,7 @@ from ..common import *
import json
import re
def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
assert id
info = json.loads(get_html('http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + id))
@ -31,7 +32,11 @@ def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_o
def cntv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
if re.match(r'http://tv\.cntv\.cn/video/(\w+)/(\w+)', url):
id = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
elif re.match(r'http://\w+\.cntv\.cn/(\w+/\w+/(classpage/video/)?)?\d+/\d+\.shtml', url) or re.match(r'http://\w+.cntv.cn/(\w+/)*VIDE\d+.shtml', url):
elif re.match(r'http://\w+\.cntv\.cn/(\w+/\w+/(classpage/video/)?)?\d+/\d+\.shtml', url) or \
re.match(r'http://\w+.cntv.cn/(\w+/)*VIDE\d+.shtml', url) or \
re.match(r'http://(\w+).cntv.cn/(\w+)/classpage/video/(\d+)/(\d+).shtml', url) or \
re.match(r'http://\w+.cctv.com/\d+/\d+/\d+/\w+.shtml', url) or \
re.match(r'http://\w+.cntv.cn/\d+/\d+/\d+/\w+.shtml', url):
id = r1(r'videoCenterId","(\w+)"', get_html(url))
elif re.match(r'http://xiyou.cntv.cn/v-[\w-]+\.html', url):
id = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)

View File

@ -4,6 +4,11 @@ __all__ = ['dailymotion_download']
from ..common import *
def extract_m3u(url):
content = get_content(url)
m3u_url = re.findall(r'http://.*', content)[0]
return match1(m3u_url, r'([^#]+)')
def dailymotion_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
"""Downloads Dailymotion videos by URL.
"""
@ -13,7 +18,7 @@ def dailymotion_download(url, output_dir = '.', merge = True, info_only = False,
title = match1(html, r'"video_title"\s*:\s*"([^"]+)"') or \
match1(html, r'"title"\s*:\s*"([^"]+)"')
for quality in ['720','480','380','240','auto']:
for quality in ['1080','720','480','380','240','auto']:
try:
real_url = info[quality][0]["url"]
if real_url:
@ -21,11 +26,12 @@ def dailymotion_download(url, output_dir = '.', merge = True, info_only = False,
except KeyError:
pass
type, ext, size = url_info(real_url)
m3u_url = extract_m3u(real_url)
mime, ext, size = 'video/mp4', 'mp4', 0
print_info(site_info, title, type, size)
print_info(site_info, title, mime, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
download_url_ffmpeg(m3u_url, title, ext, output_dir=output_dir, merge=merge)
site_info = "Dailymotion.com"
download = dailymotion_download

6
src/you_get/extractors/dilidili.py Executable file → Normal file
View File

@ -35,16 +35,16 @@ def dilidili_parser_data_to_stream_types(typ ,vid ,hd2 ,sign, tmsign, ulk):
#----------------------------------------------------------------------
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
if re.match(r'http://www.dilidili.com/watch/\w+', url):
if re.match(r'http://www.dilidili.com/watch\S+', url):
html = get_content(url)
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
# player loaded via internal iframe
frame_url = re.search(r'<iframe (.+)src="(.+)\" f(.+)</iframe>', html).group(2)
frame_url = re.search(r'<iframe src=\"(.+?)\"', html).group(1)
#print(frame_url)
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
html = get_content(frame_url, headers=headers)
html = get_content(frame_url, headers=headers, decoded=False).decode('utf-8')
match = re.search(r'(.+?)var video =(.+?);', html)
vid = match1(html, r'var vid="(.+)"')

View File

@ -7,7 +7,18 @@ from ..common import *
def douban_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
html = get_html(url)
if 'subject' in url:
if re.match(r'https?://movie', url):
title = match1(html, 'name="description" content="([^"]+)')
tid = match1(url, 'trailer/(\d+)')
real_url = 'https://movie.douban.com/trailer/video_url?tid=%s' % tid
type, ext, size = url_info(real_url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
elif 'subject' in url:
titles = re.findall(r'data-title="([^"]*)">', html)
song_id = re.findall(r'<li class="song-item" id="([^"]*)"', html)
song_ssid = re.findall(r'data-ssid="([^"]*)"', html)

View File

@ -6,27 +6,50 @@ from ..common import *
import json
import hashlib
import time
import uuid
import urllib.parse, urllib.request
def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
html = get_content(url)
room_id_patt = r'"room_id"\s*:\s*(\d+),'
room_id = match1(html, room_id_patt)
if room_id == "0":
room_id = url[url.rfind('/')+1:]
#Thanks to @yan12125 for providing decoding method!!
suffix = 'room/%s?aid=android&client_sys=android&time=%d' % (room_id, int(time.time()))
sign = hashlib.md5((suffix + '1231').encode('ascii')).hexdigest()
json_request_url = "http://www.douyu.com/api/v1/%s&auth=%s" % (suffix, sign)
content = get_html(json_request_url)
json_request_url = "http://m.douyu.com/html5/live?roomId=%s" % room_id
content = get_content(json_request_url)
data = json.loads(content)['data']
server_status = data.get('error',0)
if server_status is not 0:
raise ValueError("Server returned error:%s" % server_status)
title = data.get('room_name')
show_status = data.get('show_status')
if show_status is not "1":
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
tt = int(time.time() / 60)
did = uuid.uuid4().hex.upper()
sign_content = '{room_id}{did}A12Svb&%1UUmf@hC{tt}'.format(room_id = room_id, did = did, tt = tt)
sign = hashlib.md5(sign_content.encode('utf-8')).hexdigest()
json_request_url = "http://www.douyu.com/lapi/live/getPlay/%s" % room_id
payload = {'cdn': 'ws', 'rate': '0', 'tt': tt, 'did': did, 'sign': sign}
postdata = urllib.parse.urlencode(payload)
req = urllib.request.Request(json_request_url, postdata.encode('utf-8'))
with urllib.request.urlopen(req) as response:
content = response.read()
data = json.loads(content.decode('utf-8'))['data']
server_status = data.get('error',0)
if server_status is not 0:
raise ValueError("Server returned error:%s" % server_status)
real_url = data.get('rtmp_url')+'/'+data.get('rtmp_live')
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
download_url_ffmpeg(real_url, title, 'flv', None, output_dir = output_dir, merge = merge)
site_info = "douyu.com"
download = douyutv_download

View File

@ -8,6 +8,7 @@ from .netease import netease_download
from .qq import qq_download_by_vid
from .sina import sina_download_by_vid
from .tudou import tudou_download_by_id
from .vimeo import vimeo_download_by_id
from .yinyuetai import yinyuetai_download_by_id
from .youku import youku_download_by_vid
@ -24,7 +25,7 @@ youku_embed_patterns = [ 'youku\.com/v_show/id_([a-zA-Z0-9=]+)',
"""
http://www.tudou.com/programs/view/html5embed.action?type=0&amp;code=3LS_URGvl54&amp;lcode=&amp;resourceId=0_06_05_99
"""
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_]+)\&',
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_-]+)\&',
'www\.tudou\.com/v/([a-zA-Z0-9_-]+)/[^"]*v\.swf'
]
@ -39,6 +40,9 @@ iqiyi_embed_patterns = [ 'player\.video\.qiyi\.com/([^/]+)/[^/]+/[^/]+/[^/]+\.sw
netease_embed_patterns = [ '(http://\w+\.163\.com/movie/[^\'"]+)' ]
vimeo_embed_patters = [ 'player\.vimeo\.com/video/(\d+)' ]
def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwargs):
content = get_content(url, headers=fake_headers)
found = False
@ -69,6 +73,11 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
found = True
netease_download(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
urls = matchall(content, vimeo_embed_patters)
for url in urls:
found = True
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
if not found:
raise NotImplementedError(url)

View File

@ -5,24 +5,26 @@ __all__ = ['facebook_download']
from ..common import *
import json
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
title = r1(r'<title id="pageTitle">(.+) \| Facebook</title>', html)
s2 = parse.unquote(unicodize(r1(r'\["params","([^"]*)"\]', html)))
data = json.loads(s2)
video_data = data["video_data"]["progressive"]
for fmt in ["hd_src", "sd_src"]:
src = video_data[0][fmt]
if src:
break
title = r1(r'<title id="pageTitle">(.+)</title>', html)
sd_urls = list(set([
unicodize(str.replace(i, '\\/', '/'))
for i in re.findall(r'"sd_src_no_ratelimit":"([^"]*)"', html)
]))
hd_urls = list(set([
unicodize(str.replace(i, '\\/', '/'))
for i in re.findall(r'"hd_src_no_ratelimit":"([^"]*)"', html)
]))
urls = hd_urls if hd_urls else sd_urls
type, ext, size = url_info(src, True)
type, ext, size = url_info(urls[0], True)
size = urls_size(urls)
print_info(site_info, title, type, size)
if not info_only:
download_urls([src], title, ext, size, output_dir, merge=merge)
download_urls(urls, title, ext, size, output_dir, merge=False)
site_info = "Facebook.com"
download = facebook_download

15
src/you_get/extractors/funshion.py Executable file → Normal file
View File

@ -10,9 +10,9 @@ import json
def funshion_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
""""""
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url): #single video
funshion_download_by_url(url, output_dir = '.', merge = False, info_only = False)
elif re.match(r'http://www.fun.tv/vplay/g-(\w+)', url): #whole drama
funshion_download_by_drama_url(url, output_dir = '.', merge = False, info_only = False)
funshion_download_by_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
elif re.match(r'http://www.fun.tv/vplay/.*g-(\w+)', url): #whole drama
funshion_download_by_drama_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
else:
return
@ -25,7 +25,7 @@ def funshion_download_by_url(url, output_dir = '.', merge = False, info_only = F
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url):
match = re.search(r'http://www.fun.tv/vplay/v-(\d+)(.?)', url)
vid = match.group(1)
funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False)
funshion_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
#----------------------------------------------------------------------
def funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False):
@ -63,14 +63,11 @@ def funshion_download_by_drama_url(url, output_dir = '.', merge = False, info_on
"""str->None
url = 'http://www.fun.tv/vplay/g-95785/'
"""
if re.match(r'http://www.fun.tv/vplay/g-(\w+)', url):
match = re.search(r'http://www.fun.tv/vplay/g-(\d+)(.?)', url)
id = match.group(1)
id = r1(r'http://www.fun.tv/vplay/.*g-(\d+)', url)
video_list = funshion_drama_id_to_vid(id)
for video in video_list:
funshion_download_by_id((video[0], id), output_dir = '.', merge = False, info_only = False)
funshion_download_by_id((video[0], id), output_dir=output_dir, merge=merge, info_only=info_only)
# id is for drama, vid not the same as the ones used in single video
#----------------------------------------------------------------------

View File

@ -0,0 +1,36 @@
#!/usr/bin/env python
__all__ = ['huomaotv_download']
from ..common import *
def get_mobile_room_url(room_id):
return 'http://www.huomao.com/mobile/mob_live/%s' % room_id
def get_m3u8_url(stream_id):
return 'http://live-ws.huomaotv.cn/live/%s/playlist.m3u8' % stream_id
def huomaotv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
room_id_pattern = r'huomao.com/(\d+)'
room_id = match1(url, room_id_pattern)
html = get_content(get_mobile_room_url(room_id))
stream_id_pattern = r'id="html_stream" value="(\w+)"'
stream_id = match1(html, stream_id_pattern)
m3u8_url = get_m3u8_url(stream_id)
title = match1(html, r'<title>([^<]{1,9999})</title>')
print_info(site_info, title, 'm3u8', float('inf'))
if not info_only:
download_url_ffmpeg(m3u8_url, title, 'm3u8', None, output_dir=output_dir, merge=merge)
site_info = 'huomao.com'
download = huomaotv_download
download_playlist = playlist_not_supported('huomao')

View File

@ -0,0 +1,148 @@
#!/usr/bin/env python
from ..common import *
from urllib import parse
import random
from time import sleep
import xml.etree.ElementTree as ET
import datetime
import hashlib
import base64
import logging
from urllib import error
import re
__all__ = ['icourses_download']
def icourses_download(url, merge=False, output_dir='.', **kwargs):
icourses_parser = ICousesExactor(url=url)
real_url = icourses_parser.icourses_cn_url_parser(**kwargs)
title = icourses_parser.title
if real_url is not None:
for tries in range(0, 5):
try:
_, type_, size = url_info(real_url, faker=True)
break
except error.HTTPError:
logging.warning('Failed to fetch the video file! Retrying...')
sleep(random.Random().randint(0, 5)) # Prevent from blockage
real_url = icourses_parser.icourses_cn_url_parser()
title = icourses_parser.title
print_info(site_info, title, type_, size)
if not kwargs['info_only']:
download_urls_chunked([real_url], title, 'flv',
total_size=size, output_dir=output_dir, refer=url, merge=merge, faker=True, ignore_range=True, chunk_size=15000000, dyn_callback=icourses_parser.icourses_cn_url_parser)
# Why not using VideoExtractor: This site needs specical download method
class ICousesExactor(object):
def __init__(self, url):
self.url = url
self.title = ''
return
def icourses_playlist_download(self, **kwargs):
html = get_content(self.url)
page_type_patt = r'showSectionNode\(this,(\d+),(\d+)\)'
video_js_number = r'changeforvideo\((.*?)\)'
fs_flag = r'<input type="hidden" value=(\w+) id="firstShowFlag">'
page_navi_vars = re.search(pattern=page_type_patt, string=html)
dummy_page = 'http://www.icourses.cn/jpk/viewCharacterDetail.action?sectionId={}&courseId={}'.format(
page_navi_vars.group(2), page_navi_vars.group(1))
html = get_content(dummy_page)
fs_status = match1(html, fs_flag)
video_list = re.findall(pattern=video_js_number, string=html)
for video in video_list:
video_args = video.replace('\'', '').split(',')
video_url = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}&firstShowFlag={}'.format(
video_args[0], video_args[1], fs_status or '1')
sleep(random.Random().randint(0, 5)) # Prevent from blockage
icourses_download(video_url, **kwargs)
def icourses_cn_url_parser(self, received=0, **kwargs):
PLAYER_BASE_VER = '150606-1'
ENCRYPT_MOD_VER = '151020'
ENCRYPT_SALT = '3DAPmXsZ4o' # It took really long time to find this...
html = get_content(self.url)
if re.search(pattern=r'showSectionNode\(.*\)', string=html):
logging.warning('Switching to playlist mode!')
return self.icourses_playlist_download(**kwargs)
flashvars_patt = r'var\ flashvars\=((.|\n)*)};'
server_time_patt = r'MPlayer.swf\?v\=(\d+)'
uuid_patt = r'uuid:(\d+)'
other_args_patt = r'other:"(.*)"'
res_url_patt = r'IService:\'([^\']+)'
title_a_patt = r'<div class="con"> <a.*?>(.*?)</a>'
title_b_patt = r'<div class="con"> <a.*?/a>((.|\n)*?)</div>'
title_a = match1(html, title_a_patt).strip()
title_b = match1(html, title_b_patt).strip()
title = title_a + title_b # WIP, FIXME
title = re.sub('( +|\n|\t|\r|\&nbsp\;)', '',
unescape_html(title).replace(' ', ''))
server_time = match1(html, server_time_patt)
flashvars = match1(html, flashvars_patt)
uuid = match1(flashvars, uuid_patt)
other_args = match1(flashvars, other_args_patt)
res_url = match1(flashvars, res_url_patt)
url_parts = {'v': server_time, 'other': other_args,
'uuid': uuid, 'IService': res_url}
req_url = '%s?%s' % (res_url, parse.urlencode(url_parts))
logging.debug('Requesting video resource location...')
xml_resp = get_html(req_url)
xml_obj = ET.fromstring(xml_resp)
logging.debug('The result was {}'.format(xml_obj.get('status')))
if xml_obj.get('status') != 'success':
raise ValueError('Server returned error!')
if received:
play_type = 'seek'
else:
play_type = 'play'
received -= 1
common_args = {'lv': PLAYER_BASE_VER, 'ls': play_type,
'lt': datetime.datetime.now().strftime('%m-%d/%H:%M:%S'),
'start': received + 1}
media_host = xml_obj.find(".//*[@name='host']").text
media_url = media_host + xml_obj.find(".//*[@name='url']").text
# This is what they called `SSLModule`... But obviously, just a kind of
# encryption, takes absolutely no effect in protecting data intergrity
if xml_obj.find(".//*[@name='ssl']").text != 'true':
logging.debug('The encryption mode is disabled')
# when the so-called `SSLMode` is not activated, the parameters, `h`
# and `p` can be found in response
arg_h = xml_obj.find(".//*[@name='h']").text
assert arg_h
arg_r = xml_obj.find(".//*[@name='p']").text or ENCRYPT_MOD_VER
url_args = common_args.copy()
url_args.update({'h': arg_h, 'r': arg_r})
final_url = '{}?{}'.format(
media_url, parse.urlencode(url_args))
self.title = title
return final_url
# when the `SSLMode` is activated, we need to receive the timestamp and the
# time offset (?) value from the server
logging.debug('The encryption mode is in effect')
ssl_callback = get_html(
'{}/ssl/ssl.shtml'.format(media_host)).split(',')
ssl_timestamp = int(datetime.datetime.strptime(
ssl_callback[1], "%b %d %H:%M:%S %Y").timestamp() + int(ssl_callback[0]))
sign_this = ENCRYPT_SALT + \
parse.urlparse(media_url).path + str(ssl_timestamp)
arg_h = base64.b64encode(hashlib.md5(
bytes(sign_this, 'utf-8')).digest())
# Post-processing, may subject to change, so leaving this alone...
arg_h = arg_h.decode('utf-8').strip('=').replace('+',
'-').replace('/', '_')
arg_r = ssl_timestamp
url_args = common_args.copy()
url_args.update({'h': arg_h, 'r': arg_r, 'p': ENCRYPT_MOD_VER})
final_url = '{}?{}'.format(
media_url, parse.urlencode(url_args))
logging.debug('Crafted URL: {}'.format(final_url))
self.title = title
return final_url
site_info = 'icourses.cn'
download = icourses_download
# download_playlist = icourses_playlist_download

View File

@ -6,14 +6,14 @@ from ..common import *
def ifeng_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
assert r1(r'([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', id), id
url = 'http://v.ifeng.com/video_info_new/%s/%s/%s.xml' % (id[-2], id[-2:], id)
url = 'http://vxml.ifengimg.com/video_info_new/%s/%s/%s.xml' % (id[-2], id[-2:], id)
xml = get_html(url, 'utf-8')
title = r1(r'Name="([^"]+)"', xml)
title = unescape_html(title)
url = r1(r'VideoPlayUrl="([^"]+)"', xml)
from random import randint
r = randint(10, 19)
url = url.replace('http://video.ifeng.com/', 'http://video%s.ifeng.com/' % r)
url = url.replace('http://wideo.ifeng.com/', 'http://ips.ifeng.com/wideo.ifeng.com/')
type, ext, size = url_info(url)
print_info(site_info, title, ext, size)

View File

@ -1,13 +1,18 @@
#!/usr/bin/env python
from ..common import *
from ..common import print_more_compatible as print
from ..extractor import VideoExtractor
from ..util import log
from .. import json_output
from uuid import uuid4
from random import random,randint
import json
from math import floor
from zlib import decompress
import hashlib
import time
'''
Changelog:
@ -43,6 +48,7 @@ bid meaning for quality
10 4k
96 topspeed
'''
'''
def mix(tvid):
salt = '4a1caba4b4465345366f28da7c117d20'
@ -75,42 +81,37 @@ def getDispathKey(rid):
time=json.loads(get_content("http://data.video.qiyi.com/t?tn="+str(random())))["t"]
t=str(int(floor(int(time)/(10*60.0))))
return hashlib.new("md5",bytes(t+tp+rid,"utf-8")).hexdigest()
'''
def getVMS(tvid, vid):
t = int(time.time() * 1000)
src = '76f90cbd92f94a2e925d83e8ccd22cb7'
key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
sc = hashlib.new('md5', bytes(str(t) + key + vid, 'utf-8')).hexdigest()
vmsreq= url = 'http://cache.m.iqiyi.com/tmts/{0}/{1}/?t={2}&sc={3}&src={4}'.format(tvid,vid,t,sc,src)
return json.loads(get_content(vmsreq))
class Iqiyi(VideoExtractor):
name = "爱奇艺 (Iqiyi)"
stream_types = [
{'id': '4k', 'container': 'f4v', 'video_profile': '4K'},
{'id': 'fullhd', 'container': 'f4v', 'video_profile': '全高清'},
{'id': 'suprt-high', 'container': 'f4v', 'video_profile': '超高清'},
{'id': 'super', 'container': 'f4v', 'video_profile': '超清'},
{'id': 'high', 'container': 'f4v', 'video_profile': '高清'},
{'id': 'standard', 'container': 'f4v', 'video_profile': '标清'},
{'id': 'topspeed', 'container': 'f4v', 'video_profile': '最差'},
{'id': '4k', 'container': 'm3u8', 'video_profile': '4k'},
{'id': 'BD', 'container': 'm3u8', 'video_profile': '1080p'},
{'id': 'TD', 'container': 'm3u8', 'video_profile': '720p'},
{'id': 'HD', 'container': 'm3u8', 'video_profile': '540p'},
{'id': 'SD', 'container': 'm3u8', 'video_profile': '360p'},
{'id': 'LD', 'container': 'm3u8', 'video_profile': '210p'},
]
'''
supported_stream_types = [ 'high', 'standard']
stream_to_bid = { '4k': 10, 'fullhd' : 5, 'suprt-high' : 4, 'super' : 3, 'high' : 2, 'standard' :1, 'topspeed' :96}
'''
ids = ['4k','BD', 'TD', 'HD', 'SD', 'LD']
vd_2_id = {10: '4k', 19: '4k', 5:'BD', 18: 'BD', 21: 'HD', 2: 'HD', 4: 'TD', 17: 'TD', 96: 'LD', 1: 'SD'}
id_2_profile = {'4k':'4k', 'BD': '1080p','TD': '720p', 'HD': '540p', 'SD': '360p', 'LD': '210p'}
stream_urls = { '4k': [] , 'fullhd' : [], 'suprt-high' : [], 'super' : [], 'high' : [], 'standard' :[], 'topspeed' :[]}
baseurl = ''
gen_uid = ''
def getVMS(self):
#tm ->the flash run time for md5 usage
#um -> vip 1 normal 0
#authkey -> for password protected video ,replace '' with your password
#puid user.passportid may empty?
#TODO: support password protected video
tvid, vid = self.vid
tm, sc, src = mix(tvid)
uid = self.gen_uid
vmsreq='http://cache.video.qiyi.com/vms?key=fvip&src=1702633101b340d8917a69cf8a4b8c7' +\
"&tvId="+tvid+"&vid="+vid+"&vinfo=1&tm="+tm+\
"&enc="+sc+\
"&qyid="+uid+"&tn="+str(random()) +"&um=1" +\
"&authkey="+hashlib.new('md5',bytes(hashlib.new('md5', b'').hexdigest()+str(tm)+tvid,'utf-8')).hexdigest()
return json.loads(get_content(vmsreq))
def download_playlist_by_url(self, url, **kwargs):
self.url = url
@ -133,14 +134,88 @@ class Iqiyi(VideoExtractor):
r1(r'vid=([^&]+)', self.url) or \
r1(r'data-player-videoid="([^"]+)"', html)
self.vid = (tvid, videoid)
self.title = match1(html, '<title>([^<]+)').split('-')[0]
tvid, videoid = self.vid
info = getVMS(tvid, videoid)
assert info['code'] == 'A00000', 'can\'t play this video'
self.gen_uid = uuid4().hex
for stream in info['data']['vidl']:
try:
info = self.getVMS()
stream_id = self.vd_2_id[stream['vd']]
if stream_id in self.stream_types:
continue
stream_profile = self.id_2_profile[stream_id]
self.streams[stream_id] = {'video_profile': stream_profile, 'container': 'm3u8', 'src': [stream['m3u']], 'size' : 0}
except:
self.download_playlist_by_url(self.url, **kwargs)
exit(0)
log.i("vd: {} is not handled".format(stream['vd']))
log.i("info is {}".format(stream))
def download(self, **kwargs):
"""Override the original one
Ugly ugly dirty hack"""
if 'json_output' in kwargs and kwargs['json_output']:
json_output.output(self)
elif 'info_only' in kwargs and kwargs['info_only']:
if 'stream_id' in kwargs and kwargs['stream_id']:
# Display the stream
stream_id = kwargs['stream_id']
if 'index' not in kwargs:
self.p(stream_id)
else:
self.p_i(stream_id)
else:
# Display all available streams
if 'index' not in kwargs:
self.p([])
else:
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
self.p_i(stream_id)
else:
if 'stream_id' in kwargs and kwargs['stream_id']:
# Download the stream
stream_id = kwargs['stream_id']
else:
# Download stream with the best quality
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
if 'index' not in kwargs:
self.p(stream_id)
else:
self.p_i(stream_id)
if stream_id in self.streams:
urls = self.streams[stream_id]['src']
ext = self.streams[stream_id]['container']
total_size = self.streams[stream_id]['size']
else:
urls = self.dash_streams[stream_id]['src']
ext = self.dash_streams[stream_id]['container']
total_size = self.dash_streams[stream_id]['size']
if not urls:
log.wtf('[Failed] Cannot extract video source.')
# For legacy main()
#Here's the change!!
download_url_ffmpeg(urls[0], self.title, 'mp4',
output_dir=kwargs['output_dir'],
merge=kwargs['merge'],)
if not kwargs['caption']:
print('Skipping captions.')
return
for lang in self.caption_tracks:
filename = '%s.%s.srt' % (get_filename(self.title), lang)
print('Saving %s ... ' % filename, end="", flush=True)
srt = self.caption_tracks[lang]
with open(os.path.join(kwargs['output_dir'], filename),
'w', encoding='utf-8') as x:
x.write(srt)
print('Done.')
'''
if info["code"] != "A000000":
log.e("[error] outdated iQIYI key")
log.wtf("is your you-get up-to-date?")
@ -208,6 +283,7 @@ class Iqiyi(VideoExtractor):
#because the url is generated before start downloading
#and the key may be expired after 10 minutes
self.streams[stream_id]['src'] = urls
'''
site = Iqiyi()
download = site.download_by_url

0
src/you_get/extractors/khan.py Executable file → Normal file
View File

View File

@ -27,6 +27,11 @@ def ku6_download_by_id(id, title = None, output_dir = '.', merge = True, info_on
download_urls(urls, title, ext, size, output_dir, merge = merge)
def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
id = None
if match1(url, r'http://baidu.ku6.com/watch/(.*)\.html') is not None:
id = baidu_ku6(url)
else:
patterns = [r'http://v.ku6.com/special/show_\d+/(.*)\.\.\.html',
r'http://v.ku6.com/show/(.*)\.\.\.html',
r'http://my.ku6.com/watch\?.*v=(.*)\.\..*']
@ -34,6 +39,18 @@ def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwarg
ku6_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
def baidu_ku6(url):
id = None
h1 = get_html(url)
isrc = match1(h1, r'<iframe id="innerFrame" src="([^"]*)"')
if isrc is not None:
h2 = get_html(isrc)
id = match1(h2, r'http://v.ku6.com/show/(.*)\.\.\.html')
return id
site_info = "Ku6.com"
download = ku6_download
download_playlist = playlist_not_supported('ku6')

View File

@ -0,0 +1,112 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from ..common import *
from ..extractor import VideoExtractor
from json import loads
from urllib.parse import urlsplit
from os.path import dirname
import re
class MGTV(VideoExtractor):
name = "芒果 (MGTV)"
# Last updated: 2015-11-24
stream_types = [
{'id': 'hd', 'container': 'flv', 'video_profile': '超清'},
{'id': 'sd', 'container': 'flv', 'video_profile': '高清'},
{'id': 'ld', 'container': 'flv', 'video_profile': '标清'},
]
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
api_endpoint = 'http://v.api.mgtv.com/player/video?video_id={video_id}'
@staticmethod
def get_vid_from_url(url):
"""Extracts video ID from URL.
"""
return match1(url, 'http://www.mgtv.com/v/\d/\d+/\w+/(\d+).html')
#----------------------------------------------------------------------
@staticmethod
def get_mgtv_real_url(url):
"""str->list of str
Give you the real URLs."""
content = loads(get_content(url))
m3u_url = content['info']
split = urlsplit(m3u_url)
base_url = "{scheme}://{netloc}{path}/".format(scheme = split[0],
netloc = split[1],
path = dirname(split[2]))
content = get_content(content['info']) #get the REAL M3U url, maybe to be changed later?
segment_list = []
for i in content.split():
if not i.startswith('#'): #not the best way, better we use the m3u8 package
segment_list.append(base_url + i)
return segment_list
def download_playlist_by_url(self, url, **kwargs):
pass
def prepare(self, **kwargs):
if self.url:
self.vid = self.get_vid_from_url(self.url)
content = get_content(self.api_endpoint.format(video_id = self.vid))
content = loads(content)
self.title = content['data']['info']['title']
#stream_avalable = [i['name'] for i in content['data']['stream']]
stream_available = {}
for i in content['data']['stream']:
stream_available[i['name']] = i['url']
for s in self.stream_types:
if s['video_profile'] in stream_available.keys():
quality_id = self.id_dic[s['video_profile']]
url = stream_available[s['video_profile']]
url = re.sub( r'(\&arange\=\d+)', '', url) #Un-Hum
segment_list_this = self.get_mgtv_real_url(url)
container_this_stream = ''
size_this_stream = 0
stream_fileid_list = []
for i in segment_list_this:
_, container_this_stream, size_this_seg = url_info(i)
size_this_stream += size_this_seg
stream_fileid_list.append(os.path.basename(i).split('.')[0])
#make pieces
pieces = []
for i in zip(stream_fileid_list, segment_list_this):
pieces.append({'fileid': i[0], 'segs': i[1],})
self.streams[quality_id] = {
'container': 'flv',
'video_profile': s['video_profile'],
'size': size_this_stream,
'pieces': pieces
}
if not kwargs['info_only']:
self.streams[quality_id]['src'] = segment_list_this
def extract(self, **kwargs):
if 'stream_id' in kwargs and kwargs['stream_id']:
# Extract the stream
stream_id = kwargs['stream_id']
if stream_id not in self.streams:
log.e('[Error] Invalid video format.')
log.e('Run \'-i\' command with no specific video format to view all available formats.')
exit(2)
else:
# Extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
site = MGTV()
download = site.download_by_url
download_playlist = site.download_playlist_by_url

View File

@ -37,7 +37,7 @@ def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **
miaopai_download_by_url(url, output_dir, merge, info_only)
elif re.match(r'http://weibo.com/p/230444\w+', url):
_fid = match1(url, r'http://weibo.com/p/230444(\w+)')
miaopai_download_by_url('http://video.weibo.com/show?fid=1034:{_fid}'.format(_fid = _fid))
miaopai_download_by_url('http://video.weibo.com/show?fid=1034:{_fid}'.format(_fid = _fid), output_dir, merge, info_only)
site_info = "miaopai"
download = miaopai_download

0
src/you_get/extractors/miomio.py Executable file → Normal file
View File

View File

@ -0,0 +1,48 @@
#!/usr/bin/env python
__all__ = ['naver_download']
import urllib.request, urllib.parse
from ..common import *
def naver_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
assert re.search(r'http://tvcast.naver.com/v/', url), "URL is not supported"
html = get_html(url)
contentid = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',html)
videoid = contentid.group(1)
inkey = contentid.group(2)
assert videoid
assert inkey
info_key = urllib.parse.urlencode({'vid': videoid, 'inKey': inkey, })
down_key = urllib.parse.urlencode({'masterVid': videoid,'protocol': 'p2p','inKey': inkey, })
inf_xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?%s' % info_key )
from xml.dom.minidom import parseString
doc_info = parseString(inf_xml)
Subject = doc_info.getElementsByTagName('Subject')[0].firstChild
title = Subject.data
assert title
xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?%s' % down_key )
doc = parseString(xml)
encodingoptions = doc.getElementsByTagName('EncodingOption')
old_height = doc.getElementsByTagName('height')[0]
real_url= ''
#to download the highest resolution one,
for node in encodingoptions:
new_height = node.getElementsByTagName('height')[0]
domain_node = node.getElementsByTagName('Domain')[0]
uri_node = node.getElementsByTagName('uri')[0]
if int(new_height.firstChild.data) > int (old_height.firstChild.data):
real_url= domain_node.firstChild.data+ '/' +uri_node.firstChild.data
type, ext, size = url_info(real_url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
site_info = "tvcast.naver.com"
download = naver_download
download_playlist = playlist_not_supported('naver')

View File

@ -4,6 +4,8 @@
__all__ = ['netease_download']
from ..common import *
from ..common import print_more_compatible as print
from ..util import fs
from json import loads
import hashlib
import base64
@ -28,10 +30,10 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
artist_name = j['album']['artists'][0]['name']
album_name = j['album']['name']
new_dir = output_dir + '/' + "%s - %s" % (artist_name, album_name)
new_dir = output_dir + '/' + fs.legitimize("%s - %s" % (artist_name, album_name))
if not info_only:
if not os.path.exists(new_dir):
os.mkdir(new_dir)
if not info_only:
cover_url = j['album']['picUrl']
download_urls([cover_url], "cover", "jpg", 0, new_dir)
@ -46,10 +48,10 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
elif "playlist" in url:
j = loads(get_content("http://music.163.com/api/playlist/detail?id=%s&csrf_token=" % rid, headers={"Referer": "http://music.163.com/"}))
new_dir = output_dir + '/' + j['result']['name']
new_dir = output_dir + '/' + fs.legitimize(j['result']['name'])
if not info_only:
if not os.path.exists(new_dir):
os.mkdir(new_dir)
if not info_only:
cover_url = j['result']['coverImgUrl']
download_urls([cover_url], "cover", "jpg", 0, new_dir)
@ -70,6 +72,15 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
netease_lyric_download(j["songs"][0], l["lrc"]["lyric"], output_dir=output_dir, info_only=info_only)
except: pass
elif "program" in url:
j = loads(get_content("http://music.163.com/api/dj/program/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
netease_song_download(j["program"]["mainSong"], output_dir=output_dir, info_only=info_only)
elif "radio" in url:
j = loads(get_content("http://music.163.com/api/dj/program/byradio/?radioId=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
for i in j['programs']:
netease_song_download(i["mainSong"],output_dir=output_dir, info_only=info_only)
elif "mv" in url:
j = loads(get_content("http://music.163.com/api/mv/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
netease_video_download(j['data'], output_dir=output_dir, info_only=info_only)

View File

@ -0,0 +1,33 @@
#!/usr/bin/env python
__all__ = ['panda_download']
from ..common import *
import json
import time
def panda_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
roomid = url[url.rfind('/')+1:]
json_request_url = 'http://www.panda.tv/api_room?roomid={}&pub_key=&_={}'.format(roomid, int(time.time()))
content = get_html(json_request_url)
errno = json.loads(content)['errno']
errmsg = json.loads(content)['errmsg']
if errno:
raise ValueError("Errno : {}, Errmsg : {}".format(errno, errmsg))
data = json.loads(content)['data']
title = data.get('roominfo')['name']
room_key = data.get('videoinfo')['room_key']
plflag = data.get('videoinfo')['plflag'].split('_')
status = data.get('videoinfo')['status']
if status is not "2":
raise ValueError("The live stream is not online! (status:%s)" % status)
real_url = 'http://pl{}.live.panda.tv/live_panda/{}.flv'.format(plflag[1],room_key)
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
site_info = "panda.tv"
download = panda_download
download_playlist = playlist_not_supported('panda')

View File

@ -129,7 +129,7 @@ def pptv_download_by_id(id, title = None, output_dir = '.', merge = True, info_o
pieces = re.findall('<sgm no="(\d+)"[^<>]+fs="(\d+)"', xml)
numbers, fs = zip(*pieces)
urls=[ "http://ccf.pptv.com/{}/{}?key={}&fpp.ver=1.3.0.4&k={}&type=web.fpp".format(i,rid,key,k) for i in range(max(map(int,numbers))+1)]
urls=["http://{}/{}/{}?key={}&fpp.ver=1.3.0.4&k={}&type=web.fpp".format(host,i,rid,key,k) for i in range(max(map(int,numbers))+1)]
total_size = sum(map(int, fs))
assert rid.endswith('.mp4')

View File

@ -0,0 +1,78 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from ..common import *
from ..extractor import VideoExtractor
from json import loads
class QiE(VideoExtractor):
name = "QiE (企鹅直播)"
# Last updated: 2015-11-24
stream_types = [
{'id': 'normal', 'container': 'flv', 'video_profile': '标清'},
{'id': 'middle', 'container': 'flv', 'video_profile': '550'},
{'id': 'middle2', 'container': 'flv', 'video_profile': '900'},
]
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
api_endpoint = 'http://www.qie.tv/api/v1/room/{room_id}'
@staticmethod
def get_vid_from_url(url):
"""Extracts video ID from live.qq.com.
"""
html = get_content(url)
return match1(html, r'room_id\":(\d+)')
def download_playlist_by_url(self, url, **kwargs):
pass
def prepare(self, **kwargs):
if self.url:
self.vid = self.get_vid_from_url(self.url)
content = get_content(self.api_endpoint.format(room_id = self.vid))
content = loads(content)
self.title = content['data']['room_name']
rtmp_url = content['data']['rtmp_url']
#stream_avalable = [i['name'] for i in content['data']['stream']]
stream_available = {}
stream_available['normal'] = rtmp_url + '/' + content['data']['rtmp_live']
if len(content['data']['rtmp_multi_bitrate']) > 0:
for k , v in content['data']['rtmp_multi_bitrate'].items():
stream_available[k] = rtmp_url + '/' + v
for s in self.stream_types:
if s['id'] in stream_available.keys():
quality_id = s['id']
url = stream_available[quality_id]
self.streams[quality_id] = {
'container': 'flv',
'video_profile': s['video_profile'],
'size': 0,
'url': url
}
def extract(self, **kwargs):
for i in self.streams:
s = self.streams[i]
s['src'] = [s['url']]
if 'stream_id' in kwargs and kwargs['stream_id']:
# Extract the stream
stream_id = kwargs['stream_id']
if stream_id not in self.streams:
log.e('[Error] Invalid video format.')
log.e('Run \'-i\' command with no specific video format to view all available formats.')
exit(2)
else:
# Extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
s['src'] = [s['url']]
site = QiE()
download = site.download_by_url
download_playlist = playlist_not_supported('QiE')

View File

@ -3,32 +3,105 @@
__all__ = ['qq_download']
from ..common import *
from .qie import download as qieDownload
from urllib.parse import urlparse,parse_qs
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
api = "http://h5vv.video.qq.com/getinfo?otype=json&vid=%s" % vid
content = get_html(api)
output_json = json.loads(match1(content, r'QZOutputJson=(.*)')[:-1])
url = output_json['vl']['vi'][0]['ul']['ui'][0]['url']
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3%2E2%2E19%2E333&platform=11&defnpayver=1&vid=' + vid
info = get_html(info_api)
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
parts_vid = video_json['vl']['vi'][0]['vid']
parts_ti = video_json['vl']['vi'][0]['ti']
parts_prefix = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
parts_formats = video_json['fl']['fi']
# find best quality
# only looking for fhd(1080p) and shd(720p) here.
# 480p usually come with a single file, will be downloaded as fallback.
best_quality = ''
for part_format in parts_formats:
if part_format['name'] == 'fhd':
best_quality = 'fhd'
break
if part_format['name'] == 'shd':
best_quality = 'shd'
for part_format in parts_formats:
if (not best_quality == '') and (not part_format['name'] == best_quality):
continue
part_format_id = part_format['id']
part_format_sl = part_format['sl']
if part_format_sl == 0:
part_urls= []
total_size = 0
try:
# For fhd(1080p), every part is about 100M and 6 minutes
# try 100 parts here limited download longest single video of 10 hours.
for part in range(1,100):
filename = vid + '.p' + str(part_format_id % 1000) + '.' + str(part) + '.mp4'
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format=%s&vid=%s&filename=%s" % (part_format_id, parts_vid, filename)
#print(filename)
#print(key_api)
part_info = get_html(key_api)
key_json = json.loads(match1(part_info, r'QZOutputJson=(.*)')[:-1])
#print(key_json)
vkey = key_json['key']
url = '%s/%s?vkey=%s' % (parts_prefix, filename, vkey)
part_urls.append(url)
_, ext, size = url_info(url, faker=True)
total_size += size
except:
pass
print_info(site_info, parts_ti, ext, total_size)
if not info_only:
download_urls(part_urls, parts_ti, ext, total_size, output_dir=output_dir, merge=merge)
else:
fvkey = output_json['vl']['vi'][0]['fvkey']
url = '%s/%s.mp4?vkey=%s' % ( url, vid, fvkey )
mp4 = output_json['vl']['vi'][0]['cl'].get('ci', None)
if mp4:
mp4 = mp4[0]['keyid'].replace('.10', '.p') + '.mp4'
else:
mp4 = output_json['vl']['vi'][0]['fn']
url = '%s/%s?vkey=%s' % ( parts_prefix, mp4, fvkey )
_, ext, size = url_info(url, faker=True)
print_info(site_info, title, ext, size)
if not info_only:
download_urls([url], title, ext, size, output_dir=output_dir, merge=merge)
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if 'iframe/player.html' in url:
""""""
if 'live.qq.com' in url:
qieDownload(url,output_dir=output_dir, merge=merge, info_only=info_only)
return
#do redirect
if 'v.qq.com/page' in url:
# for URLs like this:
# http://v.qq.com/page/k/9/7/k0194pwgw97.html
content = get_html(url)
url = match1(content,r'window\.location\.href="(.*?)"')
if 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
content = get_html(url)
vid = match1(content, r'vid\s*=\s*"\s*([^"]+)"')
title = match1(content, r'title">([^"]+)</p>')
title = title.strip() if title else vid
elif 'iframe/player.html' in url:
vid = match1(url, r'\bvid=(\w+)')
# for embedded URLs; don't know what the title is
title = vid
else:
content = get_html(url)
vid = match1(content, r'vid\s*:\s*"\s*([^"]+)"')
title = match1(content, r'title\s*:\s*"\s*([^"]+)"')
# try to get the right title for URLs like this:
# http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
title = matchall(content, [r'title\s*:\s*"\s*([^"]+)"'])[-1]
vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
vid = vid[0] if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
title = match1(content,r'<a.*?id\s*=\s*"%s".*?title\s*=\s*"(.+?)".*?>'%vid)
title = match1(content, r'title">([^"]+)</p>') if not title else title
title = match1(content, r'"title":"([^"]+)"') if not title else title
title = vid if not title else title #general fallback
qq_download_by_vid(vid, title, output_dir, merge, info_only)

View File

@ -0,0 +1,70 @@
#!/usr/bin/env python
__all__ = ['showroom_download']
from ..common import *
import urllib.error
from json import loads
from time import time, sleep
#----------------------------------------------------------------------
def showroom_get_roomid_by_room_url_key(room_url_key):
"""str->str"""
fake_headers_mobile = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'UTF-8,*;q=0.5',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'User-Agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'
}
webpage_url = 'https://www.showroom-live.com/' + room_url_key
html = get_content(webpage_url, headers = fake_headers_mobile)
roomid = match1(html, r'room\?room_id\=(\d+)')
assert roomid
return roomid
def showroom_download_by_room_id(room_id, output_dir = '.', merge = False, info_only = False, **kwargs):
'''Source: Android mobile'''
while True:
timestamp = str(int(time() * 1000))
api_endpoint = 'https://www.showroom-live.com/api/live/streaming_url?room_id={room_id}&_={timestamp}'.format(room_id = room_id, timestamp = timestamp)
html = get_content(api_endpoint)
html = json.loads(html)
#{'streaming_url_list': [{'url': 'rtmp://52.197.69.198:1935/liveedge', 'id': 1, 'label': 'original spec(low latency)', 'is_default': True, 'type': 'rtmp', 'stream_name': '7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed'}, {'url': 'http://52.197.69.198:1935/liveedge/7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed/playlist.m3u8', 'is_default': True, 'id': 2, 'type': 'hls', 'label': 'original spec'}, {'url': 'rtmp://52.197.69.198:1935/liveedge', 'id': 3, 'label': 'low spec(low latency)', 'is_default': False, 'type': 'rtmp', 'stream_name': '7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed_low'}, {'url': 'http://52.197.69.198:1935/liveedge/7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed_low/playlist.m3u8', 'is_default': False, 'id': 4, 'type': 'hls', 'label': 'low spec'}]}
if len(html) >= 1:
break
log.w('The live show is currently offline.')
sleep(1)
#This is mainly for testing the M3U FFmpeg parser so I would ignore any non-m3u ones
stream_url = [i['url'] for i in html['streaming_url_list'] if i['is_default'] and i['type'] == 'hls'][0]
assert stream_url
#title
title = ''
profile_api = 'https://www.showroom-live.com/api/room/profile?room_id={room_id}'.format(room_id = room_id)
html = loads(get_content(profile_api))
try:
title = html['main_name']
except KeyError:
title = 'Showroom_{room_id}'.format(room_id = room_id)
type_, ext, size = url_info(stream_url)
print_info(site_info, title, type_, size)
if not info_only:
download_url_ffmpeg(url=stream_url, title=title, ext= 'mp4', output_dir=output_dir)
#----------------------------------------------------------------------
def showroom_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
""""""
if re.match( r'(\w+)://www.showroom-live.com/([-\w]+)', url):
room_url_key = match1(url, r'\w+://www.showroom-live.com/([-\w]+)')
room_id = showroom_get_roomid_by_room_url_key(room_url_key)
showroom_download_by_room_id(room_id, output_dir, merge,
info_only)
site_info = "Showroom"
download = showroom_download
download_playlist = playlist_not_supported('showroom')

View File

@ -14,7 +14,7 @@ def get_k(vid, rand):
def video_info_xml(vid):
rand = "0.{0}{1}".format(randint(10000, 10000000), randint(10000, 10000000))
url = 'http://v.iask.com/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, get_k(vid, rand))
url = 'http://ask.ivideo.sina.com.cn/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, get_k(vid, rand))
xml = get_content(url, headers=fake_headers, decoded=True)
return xml
@ -71,7 +71,7 @@ def sina_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
vid = vids[-1]
if vid is None:
vid = match1(video_page, r'vid:(\d+)')
vid = match1(video_page, r'vid:"?(\d+)"?')
if vid:
title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
sina_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)

View File

@ -32,9 +32,14 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
set_proxy(tuple(extractor_proxy.split(":")))
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
for qtyp in ["oriVid","superVid","highVid" ,"norVid","relativeId"]:
if 'data' in info:
hqvid = info['data'][qtyp]
else:
hqvid = info[qtyp]
if hqvid != 0 and hqvid != vid :
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % hqvid))
if not 'allot' in info:
continue
break
if extractor_proxy:
unset_proxy()

View File

@ -6,13 +6,14 @@ from ..common import *
import random
import time
from xml.dom import minidom
#possible raw list types
#1. <li>type=tudou&vid=199687639</li>
#2. <li>type=tudou&vid=199506910|</li>
#3. <li>type=video&file=http://xiaoshen140731.qiniudn.com/lovestage04.flv|</li>
#4 may ? <li>type=video&file=http://xiaoshen140731.qiniudn.com/lovestage04.flv|xx**type=&vid=?</li>
#5. <li>type=tudou&vid=200003098|07**type=tudou&vid=200000350|08</li>
#6. <li>vid=49454694&type=sina|</li>
#7. <li>type=189&vid=513031813243909|</li>
# re_pattern=re.compile(r"(type=(.+?)&(vid|file)=(.*?))[\|<]")
def tucao_single_download(type_link, title, output_dir=".", merge=True, info_only=False):
@ -22,8 +23,17 @@ def tucao_single_download(type_link, title, output_dir=".", merge=True, info_onl
print_info(site_info, title, vtype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir)
#fix for 189 video source, see raw list types 7
elif "189" in type_link:
vid = match1(type_link, r"vid=(\d+)")
assert vid, "vid not exsits"
url = "http://api.tucao.tv/api/down/{}".format(vid)
vtype, ext, size=url_info(url)
print_info(site_info, title, vtype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir)
else:
u="http://www.tucao.cc/api/playurl.php?{}&key=tucao{:07x}.cc&r={}".format(type_link,random.getrandbits(28),int(time.time()*1000))
u="http://www.tucao.tv/api/playurl.php?{}&key=tucao{:07x}.cc&r={}".format(type_link,random.getrandbits(28),int(time.time()*1000))
xml=minidom.parseString(get_content(u))
urls=[]
size=0
@ -38,7 +48,8 @@ def tucao_single_download(type_link, title, output_dir=".", merge=True, info_onl
def tucao_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
html=get_content(url)
title=match1(html,r'<h1 class="show_title">(.*?)<\w')
raw_list=match1(html,r"<li>(type=.+?)</li>")
#fix for raw list that vid goes before type, see raw list types 6
raw_list=match1(html,r"<li>\s*(type=.+?|vid=.+?)</li>")
raw_l=raw_list.split("**")
if len(raw_l)==1:
format_link=raw_l[0][:-1] if raw_l[0].endswith("|") else raw_l[0]
@ -49,6 +60,6 @@ def tucao_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
tucao_single_download(format_link,title+"-"+sub_title,output_dir,merge,info_only)
site_info = "tucao.cc"
site_info = "tucao.tv"
download = tucao_download
download_playlist = playlist_not_supported("tucao")

View File

@ -4,6 +4,7 @@ __all__ = ['tudou_download', 'tudou_download_playlist', 'tudou_download_by_id',
from ..common import *
from xml.dom.minidom import parseString
import you_get.extractors.acfun
def tudou_download_by_iid(iid, title, output_dir = '.', merge = True, info_only = False):
data = json.loads(get_decoded_html('http://www.tudou.com/outplay/goto/getItemSegs.action?iid=%s' % iid))
@ -29,6 +30,13 @@ def tudou_download_by_id(id, title, output_dir = '.', merge = True, info_only =
tudou_download_by_iid(iid, title, output_dir = output_dir, merge = merge, info_only = info_only)
def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
if 'acfun.tudou.com' in url: #wrong way!
url = url.replace('acfun.tudou.com', 'www.acfun.tv')
you_get.extractors.acfun.acfun_download(url, output_dir,
merge,
info_only)
return #throw you back
# Embedded player
id = r1(r'http://www.tudou.com/v/([^/]+)/', url)
if id:

View File

@ -68,7 +68,7 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
real_url = r1(r'<source src="([^"]*)"', html)
if not real_url:
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
if len(iframe_url) > 0:
if iframe_url:
iframe_html = get_content(iframe_url, headers=fake_headers)
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
else:

View File

@ -5,6 +5,13 @@ __all__ = ['twitter_download']
from ..common import *
from .vine import vine_download
def extract_m3u(source):
r1 = get_content(source)
s1 = re.findall(r'(/ext_tw_video/.*)', r1)
r2 = get_content('https://video.twimg.com%s' % s1[-1])
s2 = re.findall(r'(/ext_tw_video/.*)', r2)
return ['https://video.twimg.com%s' % i for i in s2]
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
@ -62,12 +69,20 @@ def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs)
vmap = get_content(vmap_url)
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
if not item_id: page_title = i['tweet_id']
elif 'scribe_playlist_url' in i:
scribe_playlist_url = i['scribe_playlist_url']
return vine_download(scribe_playlist_url, output_dir, merge=merge, info_only=info_only)
mime, ext, size = url_info(source)
try:
urls = extract_m3u(source)
except:
urls = [source]
size = urls_size(urls)
mime, ext = 'video/mp4', 'mp4'
print_info(site_info, page_title, mime, size)
if not info_only:
download_urls([source], page_title, ext, size, output_dir, merge=merge)
download_urls(urls, page_title, ext, size, output_dir, merge=merge)
site_info = "Twitter.com"
download = twitter_download

View File

@ -6,6 +6,8 @@ from ..common import *
from .embed import *
def universal_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
content_type = get_head(url, headers=fake_headers)['Content-Type']
if content_type.startswith('text/html'):
try:
embed_download(url, output_dir, merge=merge, info_only=info_only)
except: pass
@ -15,11 +17,9 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
if len(domains) > 2: domains = domains[1:]
site_info = '.'.join(domains)
response = get_response(url, faker=True)
content_type = response.headers['Content-Type']
if content_type.startswith('text/html'):
# extract an HTML page
response = get_response(url, faker=True)
page = str(response.data)
page_title = r1(r'<title>([^<]*)', page)

View File

@ -1,47 +1,44 @@
#!/usr/bin/env python
from ..common import *
from ..extractor import VideoExtractor
__all__ = ['videomega_download']
from ..common import *
import ssl
class Videomega(VideoExtractor):
name = "Videomega"
stream_types = [
{'id': 'original'}
]
def prepare(self, **kwargs):
def videomega_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
# Hot-plug cookie handler
ssl_context = request.HTTPSHandler(
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
cookie_handler = request.HTTPCookieProcessor()
opener = request.build_opener(ssl_context, cookie_handler)
opener.addheaders = [('Referer', self.url),
opener.addheaders = [('Referer', url),
('Cookie', 'noadvtday=0')]
request.install_opener(opener)
ref = match1(self.url, r'ref=(\w+)')
php_url = 'http://videomega.tv/view.php?ref=' + ref
if re.search(r'view\.php', url):
php_url = url
else:
content = get_content(url)
m = re.search(r'ref="([^"]*)";\s*width="([^"]*)";\s*height="([^"]*)"', content)
ref = m.group(1)
width, height = m.group(2), m.group(3)
php_url = 'http://videomega.tv/view.php?ref=%s&width=%s&height=%s' % (ref, width, height)
content = get_content(php_url)
self.title = match1(content, r'<title>(.*)</title>')
title = match1(content, r'<title>(.*)</title>')
js = match1(content, r'(eval.*)')
t = match1(js, r'\$\("\d+"\)\.\d+\("\d+","([^"]+)"\)')
t = match1(js, r'\$\("\w+"\)\.\w+\("\w+","([^"]+)"\)')
t = re.sub(r'(\w)', r'{\1}', t)
t = t.translate({87 + i: str(i) for i in range(10, 36)})
s = match1(js, r"'([^']+)'\.split").split('|')
self.streams['original'] = {
'url': t.format(*s)
}
src = t.format(*s)
def extract(self, **kwargs):
for i in self.streams:
s = self.streams[i]
_, s['container'], s['size'] = url_info(s['url'])
s['src'] = [s['url']]
type, ext, size = url_info(src, faker=True)
site = Videomega()
download = site.download_by_url
download_playlist = site.download_by_url
print_info(site_info, title, type, size)
if not info_only:
download_urls([src], title, ext, size, output_dir, merge=merge, faker=True)
site_info = "Videomega.tv"
download = videomega_download
download_playlist = playlist_not_supported('videomega')

View File

@ -4,21 +4,51 @@ __all__ = ['vk_download']
from ..common import *
def vk_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
def get_video_info(url):
video_page = get_content(url)
title = unescape_html(r1(r'"title":"([^"]+)"', video_page))
info = dict(re.findall(r'\\"url(\d+)\\":\\"([^"]+)\\"', video_page))
for quality in ['1080', '720', '480', '360', '240']:
if quality in info:
url = re.sub(r'\\\\\\/', r'/', info[quality])
title = r1(r'<div class="vv_summary">(.[^>]+?)</div', video_page)
sources = re.findall(r'<source src=\"(.[^>]+?)"', video_page)
for quality in ['.1080.', '.720.', '.480.', '.360.', '.240.']:
for source in sources:
if source.find(quality) != -1:
url = source
break
assert url
type, ext, size = url_info(url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
return url, title, ext, size
def get_image_info(url):
image_page = get_content(url)
# used for title - vk page owner
page_of = re.findall(r'Sender:</dt><dd><a href=.*>(.[^>]+?)</a', image_page)
# used for title - date when photo was uploaded
photo_date = re.findall(r'<span class="item_date">(.[^>]+?)</span', image_page)
title = (' ').join(page_of + photo_date)
image_link = r1(r'href="([^"]+)" class=\"mva_item\" target="_blank">Download full size', image_page)
type, ext, size = url_info(image_link)
print_info(site_info, title, type, size)
return image_link, title, ext, size
def vk_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
link = None
if re.match(r'(.+)z\=video(.+)', url):
link, title, ext, size = get_video_info(url)
elif re.match(r'(.+)vk\.com\/photo(.+)', url):
link, title, ext, size = get_image_info(url)
else:
raise NotImplementedError('Nothing to download here')
if not info_only and link is not None:
download_urls([link], title, ext, size, output_dir, merge=merge)
site_info = "VK.com"
download = vk_download

123
src/you_get/extractors/wanmen.py Executable file
View File

@ -0,0 +1,123 @@
#!/usr/bin/env python
__all__ = ['wanmen_download', 'wanmen_download_by_course', 'wanmen_download_by_course_topic', 'wanmen_download_by_course_topic_part']
from ..common import *
from .bokecc import bokecc_download_by_id
from json import loads
##Helper functions
def _wanmen_get_json_api_content_by_courseID(courseID):
"""int->JSON
Return a parsed JSON tree of WanMen's API."""
return loads(get_content('http://api.wanmen.org/course/getCourseNested/{courseID}'.format(courseID = courseID)))
def _wanmen_get_title_by_json_topic_part(json_content, tIndex, pIndex):
"""JSON, int, int, int->str
Get a proper title with courseid+topicID+partID."""
return '_'.join([json_content[0]['name'],
json_content[0]['Topics'][tIndex]['name'],
json_content[0]['Topics'][tIndex]['Parts'][pIndex]['name']])
def _wanmen_get_boke_id_by_json_topic_part(json_content, tIndex, pIndex):
"""JSON, int, int, int->str
Get one BokeCC video ID with courseid+topicID+partID."""
return json_content[0]['Topics'][tIndex]['Parts'][pIndex]['ccVideoLink']
##Parsers
def wanmen_download_by_course(json_api_content, output_dir='.', merge=True, info_only=False, **kwargs):
"""int->None
Download a WHOLE course.
Reuse the API call to save time."""
for tIndex in range(len(json_api_content[0]['Topics'])):
for pIndex in range(len(json_api_content[0]['Topics'][tIndex]['Parts'])):
wanmen_download_by_course_topic_part(json_api_content,
tIndex,
pIndex,
output_dir=output_dir,
merge=merge,
info_only=info_only,
**kwargs)
def wanmen_download_by_course_topic(json_api_content, tIndex, output_dir='.', merge=True, info_only=False, **kwargs):
"""int, int->None
Download a TOPIC of a course.
Reuse the API call to save time."""
for pIndex in range(len(json_api_content[0]['Topics'][tIndex]['Parts'])):
wanmen_download_by_course_topic_part(json_api_content,
tIndex,
pIndex,
output_dir=output_dir,
merge=merge,
info_only=info_only,
**kwargs)
def wanmen_download_by_course_topic_part(json_api_content, tIndex, pIndex, output_dir='.', merge=True, info_only=False, **kwargs):
"""int, int, int->None
Download ONE PART of the course."""
html = json_api_content
title = _wanmen_get_title_by_json_topic_part(html,
tIndex,
pIndex)
bokeccID = _wanmen_get_boke_id_by_json_topic_part(html,
tIndex,
pIndex)
bokecc_download_by_id(vid = bokeccID, title = title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
##Main entrance
def wanmen_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if not 'wanmen.org' in url:
log.wtf('You are at the wrong place dude. This is for WanMen University!')
raise
courseID = int(match1(url, r'course\/(\d+)'))
assert courseID > 0 #without courseID we cannot do anything
tIndex = int(match1(url, r'tIndex=(\d+)'))
pIndex = int(match1(url, r'pIndex=(\d+)'))
json_api_content = _wanmen_get_json_api_content_by_courseID(courseID)
if pIndex: #only download ONE single part
assert tIndex >= 0
wanmen_download_by_course_topic_part(json_api_content, tIndex, pIndex,
output_dir = output_dir,
merge = merge,
info_only = info_only)
elif tIndex: #download a topic
wanmen_download_by_course_topic(json_api_content, tIndex,
output_dir = output_dir,
merge = merge,
info_only = info_only)
else: #download the whole course
wanmen_download_by_course(json_api_content,
output_dir = output_dir,
merge = merge,
info_only = info_only)
site_info = "WanMen University"
download = wanmen_download
download_playlist = wanmen_download_by_course

View File

@ -17,7 +17,8 @@ def yinyuetai_download_by_id(vid, title=None, output_dir='.', merge=True, info_o
download_urls([url], title, ext, size, output_dir, merge = merge)
def yinyuetai_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
id = r1(r'http://\w+.yinyuetai.com/video/(\d+)', url)
id = r1(r'http://\w+.yinyuetai.com/video/(\d+)', url) or \
r1(r'http://\w+.yinyuetai.com/video/h5/(\d+)', url)
if not id:
yinyuetai_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
return

0
src/you_get/extractors/yixia.py Executable file → Normal file
View File

View File

@ -28,7 +28,11 @@ class Youku(VideoExtractor):
f_code_1 = 'becaf9be'
f_code_2 = 'bf7e5f01'
ctype = 12 #differ from 86
def trans_e(a, c):
"""str, str->str
This is an RC4 encryption."""
f = h = 0
b = list(range(256))
result = ''
@ -49,14 +53,14 @@ class Youku(VideoExtractor):
return result
def generate_ep(no, streamfileids, sid, token):
def generate_ep(self, no, streamfileids, sid, token):
number = hex(int(str(no), 10))[2:].upper()
if len(number) == 1:
number = '0' + number
fileid = streamfileids[0:8] + number + streamfileids[10:]
ep = parse.quote(base64.b64encode(
''.join(Youku.trans_e(
Youku.f_code_2,
''.join(self.__class__.trans_e(
self.f_code_2, #use the 86 fcode if using 86
sid + '_' + fileid + '_' + token)).encode('latin1')),
safe='~()*!.\''
)
@ -72,7 +76,7 @@ class Youku(VideoExtractor):
for x in xs:
if x not in mem:
mem.add(x)
yield(x)
return mem
def get_vid_from_url(url):
"""Extracts video ID from URL.
@ -85,7 +89,7 @@ class Youku(VideoExtractor):
def get_playlist_id_from_url(url):
"""Extracts playlist ID from URL.
"""
return match1(url, r'youku\.com/playlist_show/id_([a-zA-Z0-9=]+)')
return match1(url, r'youku\.com/albumlist/show\?id=([a-zA-Z0-9=]+)')
def download_playlist_by_url(self, url, **kwargs):
self.url = url
@ -93,15 +97,17 @@ class Youku(VideoExtractor):
try:
playlist_id = self.__class__.get_playlist_id_from_url(self.url)
assert playlist_id
video_page = get_content('http://www.youku.com/playlist_show/id_%s' % playlist_id)
video_page = get_content('http://list.youku.com/albumlist/show?id=%s' % playlist_id)
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
# Parse multi-page playlists
for extra_page_url in Youku.oset(re.findall('href="(http://www\.youku\.com/playlist_show/id_%s_[^?"]+)' % playlist_id, video_page)):
extra_page = get_content(extra_page_url)
last_page_url = re.findall(r'href="(/albumlist/show\?id=%s[^"]+)" title="末页"' % playlist_id, video_page)[0]
num_pages = int(re.findall(r'page=([0-9]+)\.htm', last_page_url)[0])
if (num_pages > 0):
# download one by one
for pn in range(2, num_pages + 1):
extra_page_url = re.sub(r'page=([0-9]+)\.htm', r'page=%s.htm' % pn, last_page_url)
extra_page = get_content('http://list.youku.com' + extra_page_url)
videos |= Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', extra_page))
except:
# Show full list of episodes
if match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)'):
@ -150,8 +156,17 @@ class Youku(VideoExtractor):
self.download_playlist_by_url(self.url, **kwargs)
exit(0)
#HACK!
if 'api_url' in kwargs:
api_url = kwargs['api_url'] #85
api12_url = kwargs['api12_url'] #86
self.ctype = kwargs['ctype']
self.title = kwargs['title']
else:
api_url = 'http://play.youku.com/play/get.json?vid=%s&ct=10' % self.vid
api12_url = 'http://play.youku.com/play/get.json?vid=%s&ct=12' % self.vid
try:
meta = json.loads(get_content(
api_url,
@ -171,13 +186,13 @@ class Youku(VideoExtractor):
self.password_protected = True
self.password = input(log.sprint('Password: ', log.YELLOW))
api_url += '&pwd={}'.format(self.password)
api_url12 += '&pwd={}'.format(self.password)
api12_url += '&pwd={}'.format(self.password)
meta = json.loads(get_content(
api_url,
headers={'Referer': 'http://static.youku.com/'}
))
meta12 = json.loads(get_content(
api_url12,
api12_url,
headers={'Referer': 'http://static.youku.com/'}
))
data = meta['data']
@ -187,6 +202,7 @@ class Youku(VideoExtractor):
else:
log.wtf('[Failed] Video not found.')
if not self.title: #86
self.title = data['video']['title']
self.ep = data12['security']['encrypt_string']
self.ip = data12['security']['ip']
@ -264,7 +280,7 @@ class Youku(VideoExtractor):
stream_id = self.streams_sorted[0]['id']
e_code = self.__class__.trans_e(
self.__class__.f_code_1,
self.f_code_1,
base64.b64decode(bytes(self.ep, 'ascii'))
)
sid, token = e_code.split('_')
@ -279,10 +295,10 @@ class Youku(VideoExtractor):
for no in range(0, len(segs)):
k = segs[no]['key']
if k == -1: break # we hit the paywall; stop here
fileid, ep = self.__class__.generate_ep(no, streamfileid,
fileid, ep = self.__class__.generate_ep(self, no, streamfileid,
sid, token)
q = parse.urlencode(dict(
ctype = 12,
ctype = self.ctype,
ev = 1,
K = k,
ep = parse.unquote(ep),
@ -312,9 +328,69 @@ class Youku(VideoExtractor):
if not kwargs['info_only']:
self.streams[stream_id]['src'] = ksegs
def open_download_by_vid(self, client_id, vid, **kwargs):
"""self, str, str, **kwargs->None
Arguments:
client_id: An ID per client. For now we only know Acfun's
such ID.
vid: An video ID for each video, starts with "C".
kwargs['embsig']: Youku COOP's anti hotlinking.
For Acfun, an API call must be done to Acfun's
server, or the "playsign" of the content of sign_url
shall be empty.
Misc:
Override the original one with VideoExtractor.
Author:
Most of the credit are to @ERioK, who gave his POC.
History:
Jul.28.2016 Youku COOP now have anti hotlinking via embsig. """
self.f_code_1 = '10ehfkbv' #can be retrived by running r.translate with the keys and the list e
self.f_code_2 = 'msjv7h2b'
# as in VideoExtractor
self.url = None
self.vid = vid
self.name = "优酷开放平台 (Youku COOP)"
#A little bit of work before self.prepare
#Change as Jul.28.2016 Youku COOP updates its platform to add ant hotlinking
if kwargs['embsig']:
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}&embsig={embsig}".format(client_id = client_id, video_id = vid, embsig = kwargs['embsig'])
else:
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}".format(client_id = client_id, video_id = vid)
playsign = json.loads(get_content(sign_url))['playsign']
#to be injected and replace ct10 and 12
api85_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=85&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
api86_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=86&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
self.prepare(api_url = api85_url, api12_url = api86_url, ctype = 86, **kwargs)
#exact copy from original VideoExtractor
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
unset_proxy()
try:
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
except:
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
self.extract(**kwargs)
self.download(**kwargs)
site = Youku()
download = site.download_by_url
download_playlist = site.download_playlist_by_url
youku_download_by_vid = site.download_by_vid
youku_open_download_by_vid = site.open_download_by_vid
# Used by: acfun.py bilibili.py miomio.py tudou.py

View File

@ -56,7 +56,7 @@ class YouTube(VideoExtractor):
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
f1def = 'function %s%s' % (re.escape(f1), f1def)
f1def = 'function %s%s' % (f1, f1def)
code = tr_js(f1def)
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
for f2 in f2s:
@ -236,7 +236,7 @@ class YouTube(VideoExtractor):
start = '{:0>2}:{:0>2}:{:06.3f}'.format(int(h), int(m), s).replace('.', ',')
m, s = divmod(finish, 60); h, m = divmod(m, 60)
finish = '{:0>2}:{:0>2}:{:06.3f}'.format(int(h), int(m), s).replace('.', ',')
content = text.firstChild.nodeValue
content = unescape_html(text.firstChild.nodeValue)
srt += '%s\n' % str(seq)
srt += '%s --> %s\n' % (start, finish)

View File

@ -3,6 +3,7 @@
import os.path
import subprocess
from ..util.strings import parameterize
from ..common import print_more_compatible as print
def get_usable_ffmpeg(cmd):
try:
@ -169,7 +170,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
params.append(output + '.txt')
params += ['-c', 'copy', output]
params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc', output]
subprocess.check_call(params)
os.remove(output + '.txt')
@ -199,3 +200,44 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
for file in files:
os.remove(file + '.ts')
return True
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
"""str, str->True
WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!!
You can basicly download anything with this function
but better leave it alone with
"""
output = title + '.' + ext
if not (output_dir == '.'):
output = output_dir + '/' + output
print('Downloading streaming content with FFmpeg, press q to stop recording...')
ffmpeg_params = [FFMPEG] + ['-y', '-re', '-i']
ffmpeg_params.append(files) #not the same here!!!!
if FFMPEG == 'avconv': #who cares?
ffmpeg_params += ['-c', 'copy', output]
else:
ffmpeg_params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc']
if params is not None:
if len(params) > 0:
for k, v in params:
ffmpeg_params.append(k)
ffmpeg_params.append(v)
ffmpeg_params.append(output)
print(' '.join(ffmpeg_params))
try:
a = subprocess.Popen(ffmpeg_params, stdin= subprocess.PIPE)
a.communicate()
except KeyboardInterrupt:
try:
a.stdin.write('q'.encode('utf-8'))
except:
pass
return True

View File

@ -41,15 +41,32 @@ def download_rtmpdump_stream(url, title, ext,params={},output_dir='.'):
subprocess.call(cmdline)
return
#
#To be refactor
#
def play_rtmpdump_stream(player, url, params={}):
cmdline="rtmpdump -r '%s' "%url
#construct left side of pipe
cmdline = [RTMPDUMP, '-r']
cmdline.append(url)
#append other params if exist
for key in params.keys():
cmdline+=key+" "+params[key] if params[key]!=None else ""+" "
cmdline+=" -o - | %s -"%player
print(cmdline)
os.system(cmdline)
cmdline.append(key)
if params[key]!=None:
cmdline.append(params[key])
cmdline.append('-o')
cmdline.append('-')
#pipe start
cmdline.append('|')
cmdline.append(player)
cmdline.append('-')
#logging
print("Call rtmpdump:\n"+" ".join(cmdline)+"\n")
#call RTMPDump!
subprocess.call(cmdline)
# os.system("rtmpdump -r '%s' -y '%s' -o - | %s -" % (url, playpath, player))
return

View File

@ -10,6 +10,7 @@ def legitimize(text, os=platform.system()):
text = text.translate({
0: None,
ord('/'): '-',
ord('|'): '-',
})
if os == 'Windows':
@ -20,7 +21,6 @@ def legitimize(text, os=platform.system()):
ord('*'): '-',
ord('?'): '-',
ord('\\'): '-',
ord('|'): '-',
ord('\"'): '\'',
# Reserved in Windows VFAT
ord('+'): '-',

View File

@ -1,4 +1,4 @@
#!/usr/bin/env python
script_name = 'you-get'
__version__ = '0.4.365'
__version__ = '0.4.575'

View File

@ -21,9 +21,6 @@ class YouGetTests(unittest.TestCase):
def test_mixcloud(self):
mixcloud.download("http://www.mixcloud.com/DJVadim/north-america-are-you-ready/", info_only=True)
def test_vimeo(self):
vimeo.download("http://vimeo.com/56810854", info_only=True)
def test_youtube(self):
youtube.download("http://www.youtube.com/watch?v=pzKerr0JIPA", info_only=True)
youtube.download("http://youtu.be/pzKerr0JIPA", info_only=True)

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python
#!/usr/bin/env python3
import os, sys
_srcdir = 'src/'
_srcdir = '%s/src/' % os.path.dirname(os.path.realpath(__file__))
_filepath = os.path.dirname(sys.argv[0])
sys.path.insert(1, os.path.join(_filepath, _srcdir))

View File

@ -1,3 +1,3 @@
#!/usr/bin/env zsh
alias you-get="noglob $(dirname $0)/you-get"
alias you-vlc="noglob $(dirname $0)/you-get --player vlc"
alias you-get="noglob python3 $(dirname $0)/you-get"
alias you-vlc="noglob python3 $(dirname $0)/you-get --player vlc"