mirror of
https://github.com/soimort/you-get.git
synced 2025-02-11 20:52:31 +03:00
Merge remote-tracking branch 'refs/remotes/soimort/develop' into develop
This commit is contained in:
commit
09fa036c27
48
README.md
48
README.md
@ -1,7 +1,7 @@
|
|||||||
# You-Get
|
# You-Get
|
||||||
|
|
||||||
[![PyPI version](https://badge.fury.io/py/you-get.png)](http://badge.fury.io/py/you-get)
|
[![PyPI version](https://img.shields.io/pypi/v/you-get.svg)](https://pypi.python.org/pypi/you-get/)
|
||||||
[![Build Status](https://api.travis-ci.org/soimort/you-get.png)](https://travis-ci.org/soimort/you-get)
|
[![Build Status](https://travis-ci.org/soimort/you-get.svg)](https://travis-ci.org/soimort/you-get)
|
||||||
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
||||||
|
|
||||||
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
|
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
|
||||||
@ -37,13 +37,13 @@ Interested? [Install it](#installation) now and [get started by examples](#getti
|
|||||||
|
|
||||||
Are you a Python programmer? Then check out [the source](https://github.com/soimort/you-get) and fork it!
|
Are you a Python programmer? Then check out [the source](https://github.com/soimort/you-get) and fork it!
|
||||||
|
|
||||||
![](http://i.imgur.com/GfthFAz.png)
|
![](https://i.imgur.com/GfthFAz.png)
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
The following dependencies are required and must be installed separately, unless you are using a pre-built package on Windows:
|
The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows:
|
||||||
|
|
||||||
* **[Python 3](https://www.python.org/downloads/)**
|
* **[Python 3](https://www.python.org/downloads/)**
|
||||||
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
|
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
|
||||||
@ -93,6 +93,24 @@ $ git clone git://github.com/soimort/you-get.git
|
|||||||
|
|
||||||
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
|
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
|
||||||
|
|
||||||
|
### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only)
|
||||||
|
|
||||||
|
```
|
||||||
|
> choco install you-get
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 7: Homebrew (Mac only)
|
||||||
|
|
||||||
|
You can install `you-get` easily via:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ brew install you-get
|
||||||
|
```
|
||||||
|
|
||||||
|
### Shell completion
|
||||||
|
|
||||||
|
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](contrib/completion). Please consult your shell's manual for how to take advantage of them.
|
||||||
|
|
||||||
## Upgrading
|
## Upgrading
|
||||||
|
|
||||||
Based on which option you chose to install `you-get`, you may upgrade it via:
|
Based on which option you chose to install `you-get`, you may upgrade it via:
|
||||||
@ -107,6 +125,18 @@ or download the latest release via:
|
|||||||
$ you-get https://github.com/soimort/you-get/archive/master.zip
|
$ you-get https://github.com/soimort/you-get/archive/master.zip
|
||||||
```
|
```
|
||||||
|
|
||||||
|
or use [chocolatey package manager](https://chocolatey.org):
|
||||||
|
|
||||||
|
```
|
||||||
|
> choco upgrade you-get
|
||||||
|
```
|
||||||
|
|
||||||
|
In order to get the latest ```develop``` branch without messing up the PIP, you can try:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ pip3 install --upgrade git+https://github.com/soimort/you-get@develop
|
||||||
|
```
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
### Download a video
|
### Download a video
|
||||||
@ -300,7 +330,7 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
|||||||
| :--: | :-- | :-----: | :-----: | :-----: |
|
| :--: | :-- | :-----: | :-----: | :-----: |
|
||||||
| **YouTube** | <https://www.youtube.com/> |✓| | |
|
| **YouTube** | <https://www.youtube.com/> |✓| | |
|
||||||
| **Twitter** | <https://twitter.com/> |✓|✓| |
|
| **Twitter** | <https://twitter.com/> |✓|✓| |
|
||||||
| VK | <http://vk.com/> |✓| | |
|
| VK | <http://vk.com/> |✓|✓| |
|
||||||
| Vine | <https://vine.co/> |✓| | |
|
| Vine | <https://vine.co/> |✓| | |
|
||||||
| Vimeo | <https://vimeo.com/> |✓| | |
|
| Vimeo | <https://vimeo.com/> |✓| | |
|
||||||
| Vidto | <http://vidto.me/> |✓| | |
|
| Vidto | <http://vidto.me/> |✓| | |
|
||||||
@ -309,6 +339,7 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
|||||||
| **Tumblr** | <https://www.tumblr.com/> |✓|✓|✓|
|
| **Tumblr** | <https://www.tumblr.com/> |✓|✓|✓|
|
||||||
| TED | <http://www.ted.com/> |✓| | |
|
| TED | <http://www.ted.com/> |✓| | |
|
||||||
| SoundCloud | <https://soundcloud.com/> | | |✓|
|
| SoundCloud | <https://soundcloud.com/> | | |✓|
|
||||||
|
| SHOWROOM | <https://www.showroom-live.com/> |✓| | |
|
||||||
| Pinterest | <https://www.pinterest.com/> | |✓| |
|
| Pinterest | <https://www.pinterest.com/> | |✓| |
|
||||||
| MusicPlayOn | <http://en.musicplayon.com/> |✓| | |
|
| MusicPlayOn | <http://en.musicplayon.com/> |✓| | |
|
||||||
| MTV81 | <http://www.mtv81.com/> |✓| | |
|
| MTV81 | <http://www.mtv81.com/> |✓| | |
|
||||||
@ -342,8 +373,9 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
|||||||
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
|
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
|
||||||
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
|
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
|
||||||
| Dilidili | <http://www.dilidili.com/> |✓| | |
|
| Dilidili | <http://www.dilidili.com/> |✓| | |
|
||||||
| 豆瓣 | <http://www.douban.com/> | | |✓|
|
| 豆瓣 | <http://www.douban.com/> |✓| |✓|
|
||||||
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
|
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
|
||||||
|
| Panda<br/>熊猫 | <http://www.panda.tv/> |✓| | |
|
||||||
| 凤凰视频 | <http://v.ifeng.com/> |✓| | |
|
| 凤凰视频 | <http://v.ifeng.com/> |✓| | |
|
||||||
| 风行网 | <http://www.fun.tv/> |✓| | |
|
| 风行网 | <http://www.fun.tv/> |✓| | |
|
||||||
| iQIYI<br/>爱奇艺 | <http://www.iqiyi.com/> |✓| | |
|
| iQIYI<br/>爱奇艺 | <http://www.iqiyi.com/> |✓| | |
|
||||||
@ -359,6 +391,7 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
|||||||
| PPTV聚力 | <http://www.pptv.com/> |✓| | |
|
| PPTV聚力 | <http://www.pptv.com/> |✓| | |
|
||||||
| 齐鲁网 | <http://v.iqilu.com/> |✓| | |
|
| 齐鲁网 | <http://v.iqilu.com/> |✓| | |
|
||||||
| QQ<br/>腾讯视频 | <http://v.qq.com/> |✓| | |
|
| QQ<br/>腾讯视频 | <http://v.qq.com/> |✓| | |
|
||||||
|
| 企鹅直播 | <http://live.qq.com/> |✓| | |
|
||||||
| 阡陌视频 | <http://qianmo.com/> |✓| | |
|
| 阡陌视频 | <http://qianmo.com/> |✓| | |
|
||||||
| THVideo | <http://thvideo.tv/> |✓| | |
|
| THVideo | <http://thvideo.tv/> |✓| | |
|
||||||
| Sina<br/>新浪视频<br/>微博秒拍视频 | <http://video.sina.com.cn/><br/><http://video.weibo.com/> |✓| | |
|
| Sina<br/>新浪视频<br/>微博秒拍视频 | <http://video.sina.com.cn/><br/><http://video.weibo.com/> |✓| | |
|
||||||
@ -372,6 +405,9 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
|||||||
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
|
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
|
||||||
| 央视网 | <http://www.cntv.cn/> |✓| | |
|
| 央视网 | <http://www.cntv.cn/> |✓| | |
|
||||||
| 花瓣 | <http://huaban.com/> | |✓| |
|
| 花瓣 | <http://huaban.com/> | |✓| |
|
||||||
|
| Naver<br/>네이버 | <http://tvcast.naver.com/> |✓| | |
|
||||||
|
| 芒果TV | <http://www.mgtv.com/> |✓| | |
|
||||||
|
| 火猫TV | <http://www.huomao.com/> |✓| | |
|
||||||
|
|
||||||
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
|
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
|
||||||
|
|
||||||
|
29
contrib/completion/_you-get
Normal file
29
contrib/completion/_you-get
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
#compdef you-get
|
||||||
|
|
||||||
|
# Zsh completion definition for soimort/you-get.
|
||||||
|
|
||||||
|
setopt localoptions noshwordsplit noksharrays
|
||||||
|
local -a args
|
||||||
|
|
||||||
|
args=(
|
||||||
|
'(- : *)'{-V,--version}'[print version and exit]'
|
||||||
|
'(- : *)'{-h,--help}'[print help and exit]'
|
||||||
|
'(-i --info)'{-i,--info}'[print extracted information]'
|
||||||
|
'(-u --url)'{-u,--url}'[print extracted information with URLs]'
|
||||||
|
'(--json)--json[print extracted URLs in JSON format]'
|
||||||
|
'(-n --no-merge)'{-n,--no-merge}'[do not merge video parts]'
|
||||||
|
'(--no-caption)--no-caption[do not download captions]'
|
||||||
|
'(-f --force)'{-f,--force}'[force overwrite existing files]'
|
||||||
|
'(-F --format)'{-F,--format}'[set video format to the specified stream id]:stream id'
|
||||||
|
'(-O --output-filename)'{-O,--output-filename}'[set output filename]:filename:_files'
|
||||||
|
'(-o --output-dir)'{-o,--output-dir}'[set output directory]:directory:_files -/'
|
||||||
|
'(-p --player)'{-p,--player}'[stream extracted URL to the specified player]:player and options'
|
||||||
|
'(-c --cookies)'{-c,--cookies}'[load cookies.txt or cookies.sqlite]:cookies file:_files'
|
||||||
|
'(-x --http-proxy)'{-x,--http-proxy}'[use the specified HTTP proxy for downloading]:host\:port:'
|
||||||
|
'(-y --extractor-proxy)'{-y,--extractor-proxy}'[use the specified HTTP proxy for extraction only]:host\:port'
|
||||||
|
'(--no-proxy)--no-proxy[do not use a proxy]'
|
||||||
|
'(-t --timeout)'{-t,--timeout}'[set socket timeout]:seconds'
|
||||||
|
'(-d --debug)'{-d,--debug}'[show traceback and other debug info]'
|
||||||
|
'*: :_guard "^-*" url'
|
||||||
|
)
|
||||||
|
_arguments -S -s $args
|
31
contrib/completion/you-get-completion.bash
Executable file
31
contrib/completion/you-get-completion.bash
Executable file
@ -0,0 +1,31 @@
|
|||||||
|
# Bash completion definition for you-get.
|
||||||
|
|
||||||
|
_you-get () {
|
||||||
|
COMPREPLY=()
|
||||||
|
local IFS=$' \n'
|
||||||
|
local cur=$2 prev=$3
|
||||||
|
local -a opts_without_arg opts_with_arg
|
||||||
|
opts_without_arg=(
|
||||||
|
-V --version -h --help -i --info -u --url --json -n --no-merge
|
||||||
|
--no-caption -f --force --no-proxy -d --debug
|
||||||
|
)
|
||||||
|
opts_with_arg=(
|
||||||
|
-F --format -O --output-filename -o --output-dir -p --player
|
||||||
|
-c --cookies -x --http-proxy -y --extractor-proxy -t --timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
# Do not complete non option names
|
||||||
|
[[ $cur == -* ]] || return 1
|
||||||
|
|
||||||
|
# Do not complete when the previous arg is an option expecting an argument
|
||||||
|
for opt in "${opts_with_arg[@]}"; do
|
||||||
|
[[ $opt == $prev ]] && return 1
|
||||||
|
done
|
||||||
|
|
||||||
|
# Complete option names
|
||||||
|
COMPREPLY=( $(compgen -W "${opts_without_arg[*]} ${opts_with_arg[*]}" \
|
||||||
|
-- "$cur") )
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
complete -F _you-get you-get
|
23
contrib/completion/you-get.fish
Normal file
23
contrib/completion/you-get.fish
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
# Fish completion definition for you-get.
|
||||||
|
|
||||||
|
complete -c you-get -s V -l version -d 'print version and exit'
|
||||||
|
complete -c you-get -s h -l help -d 'print help and exit'
|
||||||
|
complete -c you-get -s i -l info -d 'print extracted information'
|
||||||
|
complete -c you-get -s u -l url -d 'print extracted information'
|
||||||
|
complete -c you-get -l json -d 'print extracted URLs in JSON format'
|
||||||
|
complete -c you-get -s n -l no-merge -d 'do not merge video parts'
|
||||||
|
complete -c you-get -l no-caption -d 'do not download captions'
|
||||||
|
complete -c you-get -s f -l force -d 'force overwrite existing files'
|
||||||
|
complete -c you-get -s F -l format -x -d 'set video format to the specified stream id'
|
||||||
|
complete -c you-get -s O -l output-filename -d 'set output filename' \
|
||||||
|
-x -a '(__fish_complete_path (commandline -ct) "output filename")'
|
||||||
|
complete -c you-get -s o -l output-dir -d 'set output directory' \
|
||||||
|
-x -a '(__fish_complete_directories (commandline -ct) "output directory")'
|
||||||
|
complete -c you-get -s p -l player -x -d 'stream extracted URL to the specified player'
|
||||||
|
complete -c you-get -s c -l cookies -d 'load cookies.txt or cookies.sqlite' \
|
||||||
|
-x -a '(__fish_complete_path (commandline -ct) "cookies.txt or cookies.sqlite")'
|
||||||
|
complete -c you-get -s x -l http-proxy -x -d 'use the specified HTTP proxy for downloading'
|
||||||
|
complete -c you-get -s y -l extractor-proxy -x -d 'use the specified HTTP proxy for extraction only'
|
||||||
|
complete -c you-get -l no-proxy -d 'do not use a proxy'
|
||||||
|
complete -c you-get -s t -l timeout -x -d 'set socket timeout'
|
||||||
|
complete -c you-get -s d -l debug -d 'show traceback and other debug info'
|
@ -8,7 +8,9 @@ SITES = {
|
|||||||
'baidu' : 'baidu',
|
'baidu' : 'baidu',
|
||||||
'bandcamp' : 'bandcamp',
|
'bandcamp' : 'bandcamp',
|
||||||
'baomihua' : 'baomihua',
|
'baomihua' : 'baomihua',
|
||||||
|
'bigthink' : 'bigthink',
|
||||||
'bilibili' : 'bilibili',
|
'bilibili' : 'bilibili',
|
||||||
|
'cctv' : 'cntv',
|
||||||
'cntv' : 'cntv',
|
'cntv' : 'cntv',
|
||||||
'cbs' : 'cbs',
|
'cbs' : 'cbs',
|
||||||
'dailymotion' : 'dailymotion',
|
'dailymotion' : 'dailymotion',
|
||||||
@ -25,7 +27,9 @@ SITES = {
|
|||||||
'google' : 'google',
|
'google' : 'google',
|
||||||
'heavy-music' : 'heavymusic',
|
'heavy-music' : 'heavymusic',
|
||||||
'huaban' : 'huaban',
|
'huaban' : 'huaban',
|
||||||
|
'huomao' : 'huomaotv',
|
||||||
'iask' : 'sina',
|
'iask' : 'sina',
|
||||||
|
'icourses' : 'icourses',
|
||||||
'ifeng' : 'ifeng',
|
'ifeng' : 'ifeng',
|
||||||
'imgur' : 'imgur',
|
'imgur' : 'imgur',
|
||||||
'in' : 'alive',
|
'in' : 'alive',
|
||||||
@ -47,17 +51,21 @@ SITES = {
|
|||||||
'lizhi' : 'lizhi',
|
'lizhi' : 'lizhi',
|
||||||
'magisto' : 'magisto',
|
'magisto' : 'magisto',
|
||||||
'metacafe' : 'metacafe',
|
'metacafe' : 'metacafe',
|
||||||
|
'mgtv' : 'mgtv',
|
||||||
'miomio' : 'miomio',
|
'miomio' : 'miomio',
|
||||||
'mixcloud' : 'mixcloud',
|
'mixcloud' : 'mixcloud',
|
||||||
'mtv81' : 'mtv81',
|
'mtv81' : 'mtv81',
|
||||||
'musicplayon' : 'musicplayon',
|
'musicplayon' : 'musicplayon',
|
||||||
|
'naver' : 'naver',
|
||||||
'7gogo' : 'nanagogo',
|
'7gogo' : 'nanagogo',
|
||||||
'nicovideo' : 'nicovideo',
|
'nicovideo' : 'nicovideo',
|
||||||
|
'panda' : 'panda',
|
||||||
'pinterest' : 'pinterest',
|
'pinterest' : 'pinterest',
|
||||||
'pixnet' : 'pixnet',
|
'pixnet' : 'pixnet',
|
||||||
'pptv' : 'pptv',
|
'pptv' : 'pptv',
|
||||||
'qianmo' : 'qianmo',
|
'qianmo' : 'qianmo',
|
||||||
'qq' : 'qq',
|
'qq' : 'qq',
|
||||||
|
'showroom-live' : 'showroom',
|
||||||
'sina' : 'sina',
|
'sina' : 'sina',
|
||||||
'smgbb' : 'bilibili',
|
'smgbb' : 'bilibili',
|
||||||
'sohu' : 'sohu',
|
'sohu' : 'sohu',
|
||||||
@ -73,6 +81,7 @@ SITES = {
|
|||||||
'videomega' : 'videomega',
|
'videomega' : 'videomega',
|
||||||
'vidto' : 'vidto',
|
'vidto' : 'vidto',
|
||||||
'vimeo' : 'vimeo',
|
'vimeo' : 'vimeo',
|
||||||
|
'wanmen' : 'wanmen',
|
||||||
'weibo' : 'miaopai',
|
'weibo' : 'miaopai',
|
||||||
'veoh' : 'veoh',
|
'veoh' : 'veoh',
|
||||||
'vine' : 'vine',
|
'vine' : 'vine',
|
||||||
@ -95,6 +104,7 @@ import logging
|
|||||||
import os
|
import os
|
||||||
import platform
|
import platform
|
||||||
import re
|
import re
|
||||||
|
import socket
|
||||||
import sys
|
import sys
|
||||||
import time
|
import time
|
||||||
from urllib import request, parse, error
|
from urllib import request, parse, error
|
||||||
@ -305,7 +315,53 @@ def get_content(url, headers={}, decoded=True):
|
|||||||
if cookies:
|
if cookies:
|
||||||
cookies.add_cookie_header(req)
|
cookies.add_cookie_header(req)
|
||||||
req.headers.update(req.unredirected_hdrs)
|
req.headers.update(req.unredirected_hdrs)
|
||||||
|
|
||||||
|
for i in range(10):
|
||||||
|
try:
|
||||||
response = request.urlopen(req)
|
response = request.urlopen(req)
|
||||||
|
break
|
||||||
|
except socket.timeout:
|
||||||
|
logging.debug('request attempt %s timeout' % str(i + 1))
|
||||||
|
|
||||||
|
data = response.read()
|
||||||
|
|
||||||
|
# Handle HTTP compression for gzip and deflate (zlib)
|
||||||
|
content_encoding = response.getheader('Content-Encoding')
|
||||||
|
if content_encoding == 'gzip':
|
||||||
|
data = ungzip(data)
|
||||||
|
elif content_encoding == 'deflate':
|
||||||
|
data = undeflate(data)
|
||||||
|
|
||||||
|
# Decode the response body
|
||||||
|
if decoded:
|
||||||
|
charset = match1(response.getheader('Content-Type'), r'charset=([\w-]+)')
|
||||||
|
if charset is not None:
|
||||||
|
data = data.decode(charset)
|
||||||
|
else:
|
||||||
|
data = data.decode('utf-8')
|
||||||
|
|
||||||
|
return data
|
||||||
|
|
||||||
|
def post_content(url, headers={}, post_data={}, decoded=True):
|
||||||
|
"""Post the content of a URL via sending a HTTP POST request.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
url: A URL.
|
||||||
|
headers: Request headers used by the client.
|
||||||
|
decoded: Whether decode the response body using UTF-8 or the charset specified in Content-Type.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The content as a string.
|
||||||
|
"""
|
||||||
|
|
||||||
|
logging.debug('post_content: %s \n post_data: %s' % (url, post_data))
|
||||||
|
|
||||||
|
req = request.Request(url, headers=headers)
|
||||||
|
if cookies:
|
||||||
|
cookies.add_cookie_header(req)
|
||||||
|
req.headers.update(req.unredirected_hdrs)
|
||||||
|
post_data_enc = bytes(parse.urlencode(post_data), 'utf-8')
|
||||||
|
response = request.urlopen(req, data = post_data_enc)
|
||||||
data = response.read()
|
data = response.read()
|
||||||
|
|
||||||
# Handle HTTP compression for gzip and deflate (zlib)
|
# Handle HTTP compression for gzip and deflate (zlib)
|
||||||
@ -492,7 +548,11 @@ def url_save(url, filepath, bar, refer = None, is_part = False, faker = False, h
|
|||||||
os.remove(filepath) # on Windows rename could fail if destination filepath exists
|
os.remove(filepath) # on Windows rename could fail if destination filepath exists
|
||||||
os.rename(temp_filepath, filepath)
|
os.rename(temp_filepath, filepath)
|
||||||
|
|
||||||
def url_save_chunked(url, filepath, bar, refer = None, is_part = False, faker = False, headers = {}):
|
def url_save_chunked(url, filepath, bar, dyn_callback=None, chunk_size=0, ignore_range=False, refer=None, is_part=False, faker=False, headers={}):
|
||||||
|
def dyn_update_url(received):
|
||||||
|
if callable(dyn_callback):
|
||||||
|
logging.debug('Calling callback %s for new URL from %s' % (dyn_callback.__name__, received))
|
||||||
|
return dyn_callback(received)
|
||||||
if os.path.exists(filepath):
|
if os.path.exists(filepath):
|
||||||
if not force:
|
if not force:
|
||||||
if not is_part:
|
if not is_part:
|
||||||
@ -530,6 +590,8 @@ def url_save_chunked(url, filepath, bar, refer = None, is_part = False, faker =
|
|||||||
else:
|
else:
|
||||||
headers = {}
|
headers = {}
|
||||||
if received:
|
if received:
|
||||||
|
url = dyn_update_url(received)
|
||||||
|
if not ignore_range:
|
||||||
headers['Range'] = 'bytes=' + str(received) + '-'
|
headers['Range'] = 'bytes=' + str(received) + '-'
|
||||||
if refer:
|
if refer:
|
||||||
headers['Referer'] = refer
|
headers['Referer'] = refer
|
||||||
@ -537,12 +599,17 @@ def url_save_chunked(url, filepath, bar, refer = None, is_part = False, faker =
|
|||||||
response = request.urlopen(request.Request(url, headers=headers), None)
|
response = request.urlopen(request.Request(url, headers=headers), None)
|
||||||
|
|
||||||
with open(temp_filepath, open_mode) as output:
|
with open(temp_filepath, open_mode) as output:
|
||||||
|
this_chunk = received
|
||||||
while True:
|
while True:
|
||||||
buffer = response.read(1024 * 256)
|
buffer = response.read(1024 * 256)
|
||||||
if not buffer:
|
if not buffer:
|
||||||
break
|
break
|
||||||
output.write(buffer)
|
output.write(buffer)
|
||||||
received += len(buffer)
|
received += len(buffer)
|
||||||
|
if chunk_size and (received - this_chunk) >= chunk_size:
|
||||||
|
url = dyn_callback(received)
|
||||||
|
this_chunk = received
|
||||||
|
response = request.urlopen(request.Request(url, headers=headers), None)
|
||||||
if bar:
|
if bar:
|
||||||
bar.update_received(len(buffer))
|
bar.update_received(len(buffer))
|
||||||
|
|
||||||
@ -734,7 +801,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
|
|||||||
if has_ffmpeg_installed():
|
if has_ffmpeg_installed():
|
||||||
from .processor.ffmpeg import ffmpeg_concat_av
|
from .processor.ffmpeg import ffmpeg_concat_av
|
||||||
ret = ffmpeg_concat_av(parts, output_filepath, ext)
|
ret = ffmpeg_concat_av(parts, output_filepath, ext)
|
||||||
print('Done.')
|
print('Merged into %s' % output_filename)
|
||||||
if ret == 0:
|
if ret == 0:
|
||||||
for part in parts: os.remove(part)
|
for part in parts: os.remove(part)
|
||||||
|
|
||||||
@ -747,7 +814,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
|
|||||||
else:
|
else:
|
||||||
from .processor.join_flv import concat_flv
|
from .processor.join_flv import concat_flv
|
||||||
concat_flv(parts, output_filepath)
|
concat_flv(parts, output_filepath)
|
||||||
print('Done.')
|
print('Merged into %s' % output_filename)
|
||||||
except:
|
except:
|
||||||
raise
|
raise
|
||||||
else:
|
else:
|
||||||
@ -763,7 +830,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
|
|||||||
else:
|
else:
|
||||||
from .processor.join_mp4 import concat_mp4
|
from .processor.join_mp4 import concat_mp4
|
||||||
concat_mp4(parts, output_filepath)
|
concat_mp4(parts, output_filepath)
|
||||||
print('Done.')
|
print('Merged into %s' % output_filename)
|
||||||
except:
|
except:
|
||||||
raise
|
raise
|
||||||
else:
|
else:
|
||||||
@ -779,7 +846,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
|
|||||||
else:
|
else:
|
||||||
from .processor.join_ts import concat_ts
|
from .processor.join_ts import concat_ts
|
||||||
concat_ts(parts, output_filepath)
|
concat_ts(parts, output_filepath)
|
||||||
print('Done.')
|
print('Merged into %s' % output_filename)
|
||||||
except:
|
except:
|
||||||
raise
|
raise
|
||||||
else:
|
else:
|
||||||
@ -791,7 +858,7 @@ def download_urls(urls, title, ext, total_size, output_dir='.', refer=None, merg
|
|||||||
|
|
||||||
print()
|
print()
|
||||||
|
|
||||||
def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=None, merge=True, faker=False, headers = {}):
|
def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=None, merge=True, faker=False, headers = {}, **kwargs):
|
||||||
assert urls
|
assert urls
|
||||||
if dry_run:
|
if dry_run:
|
||||||
print('Real URLs:\n%s\n' % urls)
|
print('Real URLs:\n%s\n' % urls)
|
||||||
@ -805,7 +872,7 @@ def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=No
|
|||||||
|
|
||||||
filename = '%s.%s' % (title, ext)
|
filename = '%s.%s' % (title, ext)
|
||||||
filepath = os.path.join(output_dir, filename)
|
filepath = os.path.join(output_dir, filename)
|
||||||
if total_size and ext in ('ts'):
|
if total_size:
|
||||||
if not force and os.path.exists(filepath[:-3] + '.mkv'):
|
if not force and os.path.exists(filepath[:-3] + '.mkv'):
|
||||||
print('Skipping %s: file already exists' % filepath[:-3] + '.mkv')
|
print('Skipping %s: file already exists' % filepath[:-3] + '.mkv')
|
||||||
print()
|
print()
|
||||||
@ -820,7 +887,7 @@ def download_urls_chunked(urls, title, ext, total_size, output_dir='.', refer=No
|
|||||||
print('Downloading %s ...' % tr(filename))
|
print('Downloading %s ...' % tr(filename))
|
||||||
filepath = os.path.join(output_dir, filename)
|
filepath = os.path.join(output_dir, filename)
|
||||||
parts.append(filepath)
|
parts.append(filepath)
|
||||||
url_save_chunked(url, filepath, bar, refer = refer, faker = faker, headers = headers)
|
url_save_chunked(url, filepath, bar, refer = refer, faker = faker, headers = headers, **kwargs)
|
||||||
bar.done()
|
bar.done()
|
||||||
|
|
||||||
if not merge:
|
if not merge:
|
||||||
@ -887,6 +954,22 @@ def download_rtmp_url(url,title, ext,params={}, total_size=0, output_dir='.', re
|
|||||||
assert has_rtmpdump_installed(), "RTMPDump not installed."
|
assert has_rtmpdump_installed(), "RTMPDump not installed."
|
||||||
download_rtmpdump_stream(url, title, ext,params, output_dir)
|
download_rtmpdump_stream(url, title, ext,params, output_dir)
|
||||||
|
|
||||||
|
def download_url_ffmpeg(url,title, ext,params={}, total_size=0, output_dir='.', refer=None, merge=True, faker=False):
|
||||||
|
assert url
|
||||||
|
if dry_run:
|
||||||
|
print('Real URL:\n%s\n' % [url])
|
||||||
|
if params.get("-y",False): #None or unset ->False
|
||||||
|
print('Real Playpath:\n%s\n' % [params.get("-y")])
|
||||||
|
return
|
||||||
|
|
||||||
|
if player:
|
||||||
|
launch_player(player, [url])
|
||||||
|
return
|
||||||
|
|
||||||
|
from .processor.ffmpeg import has_ffmpeg_installed, ffmpeg_download_stream
|
||||||
|
assert has_ffmpeg_installed(), "FFmpeg not installed."
|
||||||
|
ffmpeg_download_stream(url, title, ext, params, output_dir)
|
||||||
|
|
||||||
def playlist_not_supported(name):
|
def playlist_not_supported(name):
|
||||||
def f(*args, **kwargs):
|
def f(*args, **kwargs):
|
||||||
raise NotImplementedError('Playlist is not supported for ' + name)
|
raise NotImplementedError('Playlist is not supported for ' + name)
|
||||||
@ -1015,6 +1098,22 @@ def set_http_proxy(proxy):
|
|||||||
opener = request.build_opener(proxy_support)
|
opener = request.build_opener(proxy_support)
|
||||||
request.install_opener(opener)
|
request.install_opener(opener)
|
||||||
|
|
||||||
|
def print_more_compatible(*args, **kwargs):
|
||||||
|
import builtins as __builtin__
|
||||||
|
"""Overload default print function as py (<3.3) does not support 'flush' keyword.
|
||||||
|
Although the function name can be same as print to get itself overloaded automatically,
|
||||||
|
I'd rather leave it with a different name and only overload it when importing to make less confusion. """
|
||||||
|
# nothing happens on py3.3 and later
|
||||||
|
if sys.version_info[:2] >= (3, 3):
|
||||||
|
return __builtin__.print(*args, **kwargs)
|
||||||
|
|
||||||
|
# in lower pyver (e.g. 3.2.x), remove 'flush' keyword and flush it as requested
|
||||||
|
doFlush = kwargs.pop('flush', False)
|
||||||
|
ret = __builtin__.print(*args, **kwargs)
|
||||||
|
if doFlush:
|
||||||
|
kwargs.get('file', sys.stdout).flush()
|
||||||
|
return ret
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def download_main(download, download_playlist, urls, playlist, **kwargs):
|
def download_main(download, download_playlist, urls, playlist, **kwargs):
|
||||||
@ -1060,11 +1159,13 @@ def script_main(script_name, download, download_playlist, **kwargs):
|
|||||||
-x | --http-proxy <HOST:PORT> Use an HTTP proxy for downloading.
|
-x | --http-proxy <HOST:PORT> Use an HTTP proxy for downloading.
|
||||||
-y | --extractor-proxy <HOST:PORT> Use an HTTP proxy for extracting only.
|
-y | --extractor-proxy <HOST:PORT> Use an HTTP proxy for extracting only.
|
||||||
--no-proxy Never use a proxy.
|
--no-proxy Never use a proxy.
|
||||||
|
-s | --socks-proxy <HOST:PORT> Use an SOCKS5 proxy for downloading.
|
||||||
|
-t | --timeout <SECONDS> Set socket timeout.
|
||||||
-d | --debug Show traceback and other debug info.
|
-d | --debug Show traceback and other debug info.
|
||||||
'''
|
'''
|
||||||
|
|
||||||
short_opts = 'Vhfiuc:ndF:O:o:p:x:y:'
|
short_opts = 'Vhfiuc:ndF:O:o:p:x:y:s:t:'
|
||||||
opts = ['version', 'help', 'force', 'info', 'url', 'cookies', 'no-caption', 'no-merge', 'no-proxy', 'debug', 'json', 'format=', 'stream=', 'itag=', 'output-filename=', 'output-dir=', 'player=', 'http-proxy=', 'extractor-proxy=', 'lang=']
|
opts = ['version', 'help', 'force', 'info', 'url', 'cookies', 'no-caption', 'no-merge', 'no-proxy', 'debug', 'json', 'format=', 'stream=', 'itag=', 'output-filename=', 'output-dir=', 'player=', 'http-proxy=', 'socks-proxy=', 'extractor-proxy=', 'lang=', 'timeout=']
|
||||||
if download_playlist:
|
if download_playlist:
|
||||||
short_opts = 'l' + short_opts
|
short_opts = 'l' + short_opts
|
||||||
opts = ['playlist'] + opts
|
opts = ['playlist'] + opts
|
||||||
@ -1092,8 +1193,10 @@ def script_main(script_name, download, download_playlist, **kwargs):
|
|||||||
lang = None
|
lang = None
|
||||||
output_dir = '.'
|
output_dir = '.'
|
||||||
proxy = None
|
proxy = None
|
||||||
|
socks_proxy = None
|
||||||
extractor_proxy = None
|
extractor_proxy = None
|
||||||
traceback = False
|
traceback = False
|
||||||
|
timeout = 600
|
||||||
for o, a in opts:
|
for o, a in opts:
|
||||||
if o in ('-V', '--version'):
|
if o in ('-V', '--version'):
|
||||||
version()
|
version()
|
||||||
@ -1163,10 +1266,14 @@ def script_main(script_name, download, download_playlist, **kwargs):
|
|||||||
caption = False
|
caption = False
|
||||||
elif o in ('-x', '--http-proxy'):
|
elif o in ('-x', '--http-proxy'):
|
||||||
proxy = a
|
proxy = a
|
||||||
|
elif o in ('-s', '--socks-proxy'):
|
||||||
|
socks_proxy = a
|
||||||
elif o in ('-y', '--extractor-proxy'):
|
elif o in ('-y', '--extractor-proxy'):
|
||||||
extractor_proxy = a
|
extractor_proxy = a
|
||||||
elif o in ('--lang',):
|
elif o in ('--lang',):
|
||||||
lang = a
|
lang = a
|
||||||
|
elif o in ('-t', '--timeout'):
|
||||||
|
timeout = int(a)
|
||||||
else:
|
else:
|
||||||
log.e("try 'you-get --help' for more options")
|
log.e("try 'you-get --help' for more options")
|
||||||
sys.exit(2)
|
sys.exit(2)
|
||||||
@ -1174,8 +1281,27 @@ def script_main(script_name, download, download_playlist, **kwargs):
|
|||||||
print(help)
|
print(help)
|
||||||
sys.exit()
|
sys.exit()
|
||||||
|
|
||||||
|
if (socks_proxy):
|
||||||
|
try:
|
||||||
|
import socket
|
||||||
|
import socks
|
||||||
|
socks_proxy_addrs = socks_proxy.split(':')
|
||||||
|
socks.set_default_proxy(socks.SOCKS5,
|
||||||
|
socks_proxy_addrs[0],
|
||||||
|
int(socks_proxy_addrs[1]))
|
||||||
|
socket.socket = socks.socksocket
|
||||||
|
def getaddrinfo(*args):
|
||||||
|
return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]
|
||||||
|
socket.getaddrinfo = getaddrinfo
|
||||||
|
except ImportError:
|
||||||
|
log.w('Error importing PySocks library, socks proxy ignored.'
|
||||||
|
'In order to use use socks proxy, please install PySocks.')
|
||||||
|
else:
|
||||||
|
import socket
|
||||||
set_http_proxy(proxy)
|
set_http_proxy(proxy)
|
||||||
|
|
||||||
|
socket.setdefaulttimeout(timeout)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
if stream_id:
|
if stream_id:
|
||||||
if not extractor_proxy:
|
if not extractor_proxy:
|
||||||
|
@ -1,6 +1,7 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
|
||||||
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy
|
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy
|
||||||
|
from .common import print_more_compatible as print
|
||||||
from .util import log
|
from .util import log
|
||||||
from . import json_output
|
from . import json_output
|
||||||
import os
|
import os
|
||||||
|
@ -5,7 +5,9 @@ from .alive import *
|
|||||||
from .archive import *
|
from .archive import *
|
||||||
from .baidu import *
|
from .baidu import *
|
||||||
from .bandcamp import *
|
from .bandcamp import *
|
||||||
|
from .bigthink import *
|
||||||
from .bilibili import *
|
from .bilibili import *
|
||||||
|
from .bokecc import *
|
||||||
from .cbs import *
|
from .cbs import *
|
||||||
from .ckplayer import *
|
from .ckplayer import *
|
||||||
from .cntv import *
|
from .cntv import *
|
||||||
@ -22,6 +24,7 @@ from .funshion import *
|
|||||||
from .google import *
|
from .google import *
|
||||||
from .heavymusic import *
|
from .heavymusic import *
|
||||||
from .huaban import *
|
from .huaban import *
|
||||||
|
from .icourses import *
|
||||||
from .ifeng import *
|
from .ifeng import *
|
||||||
from .imgur import *
|
from .imgur import *
|
||||||
from .infoq import *
|
from .infoq import *
|
||||||
@ -38,19 +41,24 @@ from .le import *
|
|||||||
from .lizhi import *
|
from .lizhi import *
|
||||||
from .magisto import *
|
from .magisto import *
|
||||||
from .metacafe import *
|
from .metacafe import *
|
||||||
|
from .mgtv import *
|
||||||
from .miaopai import *
|
from .miaopai import *
|
||||||
from .miomio import *
|
from .miomio import *
|
||||||
from .mixcloud import *
|
from .mixcloud import *
|
||||||
from .mtv81 import *
|
from .mtv81 import *
|
||||||
from .musicplayon import *
|
from .musicplayon import *
|
||||||
from .nanagogo import *
|
from .nanagogo import *
|
||||||
|
from .naver import *
|
||||||
from .netease import *
|
from .netease import *
|
||||||
from .nicovideo import *
|
from .nicovideo import *
|
||||||
|
from .panda import *
|
||||||
from .pinterest import *
|
from .pinterest import *
|
||||||
from .pixnet import *
|
from .pixnet import *
|
||||||
from .pptv import *
|
from .pptv import *
|
||||||
from .qianmo import *
|
from .qianmo import *
|
||||||
|
from .qie import *
|
||||||
from .qq import *
|
from .qq import *
|
||||||
|
from .showroom import *
|
||||||
from .sina import *
|
from .sina import *
|
||||||
from .sohu import *
|
from .sohu import *
|
||||||
from .soundcloud import *
|
from .soundcloud import *
|
||||||
@ -67,6 +75,7 @@ from .vimeo import *
|
|||||||
from .vine import *
|
from .vine import *
|
||||||
from .vk import *
|
from .vk import *
|
||||||
from .w56 import *
|
from .w56 import *
|
||||||
|
from .wanmen import *
|
||||||
from .xiami import *
|
from .xiami import *
|
||||||
from .yinyuetai import *
|
from .yinyuetai import *
|
||||||
from .yixia import *
|
from .yixia import *
|
||||||
@ -74,3 +83,4 @@ from .youku import *
|
|||||||
from .youtube import *
|
from .youtube import *
|
||||||
from .ted import *
|
from .ted import *
|
||||||
from .khan import *
|
from .khan import *
|
||||||
|
from .zhanqi import *
|
||||||
|
@ -8,7 +8,7 @@ from .le import letvcloud_download_by_vu
|
|||||||
from .qq import qq_download_by_vid
|
from .qq import qq_download_by_vid
|
||||||
from .sina import sina_download_by_vid
|
from .sina import sina_download_by_vid
|
||||||
from .tudou import tudou_download_by_iid
|
from .tudou import tudou_download_by_iid
|
||||||
from .youku import youku_download_by_vid
|
from .youku import youku_download_by_vid, youku_open_download_by_vid
|
||||||
|
|
||||||
import json, re
|
import json, re
|
||||||
|
|
||||||
@ -17,10 +17,24 @@ def get_srt_json(id):
|
|||||||
return get_html(url)
|
return get_html(url)
|
||||||
|
|
||||||
def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
|
def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
"""str, str, str, bool, bool ->None
|
||||||
|
|
||||||
|
Download Acfun video by vid.
|
||||||
|
|
||||||
|
Call Acfun API, decide which site to use, and pass the job to its
|
||||||
|
extractor.
|
||||||
|
"""
|
||||||
|
|
||||||
|
#first call the main parasing API
|
||||||
info = json.loads(get_html('http://www.acfun.tv/video/getVideo.aspx?id=' + vid))
|
info = json.loads(get_html('http://www.acfun.tv/video/getVideo.aspx?id=' + vid))
|
||||||
|
|
||||||
sourceType = info['sourceType']
|
sourceType = info['sourceType']
|
||||||
|
|
||||||
|
#decide sourceId to know which extractor to use
|
||||||
if 'sourceId' in info: sourceId = info['sourceId']
|
if 'sourceId' in info: sourceId = info['sourceId']
|
||||||
# danmakuId = info['danmakuId']
|
# danmakuId = info['danmakuId']
|
||||||
|
|
||||||
|
#call extractor decided by sourceId
|
||||||
if sourceType == 'sina':
|
if sourceType == 'sina':
|
||||||
sina_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
sina_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
elif sourceType == 'youku':
|
elif sourceType == 'youku':
|
||||||
@ -32,14 +46,13 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
|
|||||||
elif sourceType == 'letv':
|
elif sourceType == 'letv':
|
||||||
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
|
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
elif sourceType == 'zhuzhan':
|
elif sourceType == 'zhuzhan':
|
||||||
a = 'http://api.aixifan.com/plays/%s/realSource' % vid
|
#As in Jul.28.2016, Acfun is using embsig to anti hotlink so we need to pass this
|
||||||
s = json.loads(get_content(a, headers={'deviceType': '1'}))
|
embsig = info['encode']
|
||||||
urls = s['data']['files'][-1]['url']
|
a = 'http://api.aixifan.com/plays/%s' % vid
|
||||||
size = urls_size(urls)
|
s = json.loads(get_content(a, headers={'deviceType': '2'}))
|
||||||
print_info(site_info, title, 'mp4', size)
|
if s['data']['source'] == "zhuzhan-youku":
|
||||||
if not info_only:
|
sourceId = s['data']['sourceId']
|
||||||
download_urls(urls, title, 'mp4', size,
|
youku_open_download_by_vid(client_id='908a519d032263f8', vid=sourceId, title=title, output_dir=output_dir,merge=merge, info_only=info_only, embsig = embsig, **kwargs)
|
||||||
output_dir=output_dir, merge=merge)
|
|
||||||
else:
|
else:
|
||||||
raise NotImplementedError(sourceType)
|
raise NotImplementedError(sourceType)
|
||||||
|
|
||||||
@ -60,16 +73,15 @@ def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
|||||||
assert re.match(r'http://[^\.]+.acfun.[^\.]+/\D/\D\D(\d+)', url)
|
assert re.match(r'http://[^\.]+.acfun.[^\.]+/\D/\D\D(\d+)', url)
|
||||||
html = get_html(url)
|
html = get_html(url)
|
||||||
|
|
||||||
title = r1(r'<h1 id="txt-title-view">([^<>]+)<', html)
|
title = r1(r'data-title="([^"]+)"', html)
|
||||||
title = unescape_html(title)
|
title = unescape_html(title)
|
||||||
title = escape_file_path(title)
|
title = escape_file_path(title)
|
||||||
assert title
|
assert title
|
||||||
|
|
||||||
videos = re.findall("data-vid=\"(\d+)\".*href=\"[^\"]+\".*title=\"([^\"]+)\"", html)
|
vid = r1('data-vid="(\d+)"', html)
|
||||||
for video in videos:
|
up = r1('data-name="([^"]+)"', html)
|
||||||
p_vid = video[0]
|
title = title + ' - ' + up
|
||||||
p_title = title + " - " + video[1] if video[1] != '删除标签' else title
|
acfun_download_by_vid(vid, title,
|
||||||
acfun_download_by_vid(p_vid, p_title,
|
|
||||||
output_dir=output_dir,
|
output_dir=output_dir,
|
||||||
merge=merge,
|
merge=merge,
|
||||||
info_only=info_only,
|
info_only=info_only,
|
||||||
|
217
src/you_get/extractors/baidu.py
Executable file → Normal file
217
src/you_get/extractors/baidu.py
Executable file → Normal file
@ -7,8 +7,10 @@ from ..common import *
|
|||||||
from .embed import *
|
from .embed import *
|
||||||
from .universal import *
|
from .universal import *
|
||||||
|
|
||||||
|
|
||||||
def baidu_get_song_data(sid):
|
def baidu_get_song_data(sid):
|
||||||
data = json.loads(get_html('http://music.baidu.com/data/music/fmlink?songIds=%s' % sid, faker = True))['data']
|
data = json.loads(get_html(
|
||||||
|
'http://music.baidu.com/data/music/fmlink?songIds=%s' % sid, faker=True))['data']
|
||||||
|
|
||||||
if data['xcode'] != '':
|
if data['xcode'] != '':
|
||||||
# inside china mainland
|
# inside china mainland
|
||||||
@ -17,22 +19,28 @@ def baidu_get_song_data(sid):
|
|||||||
# outside china mainland
|
# outside china mainland
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def baidu_get_song_url(data):
|
def baidu_get_song_url(data):
|
||||||
return data['songLink']
|
return data['songLink']
|
||||||
|
|
||||||
|
|
||||||
def baidu_get_song_artist(data):
|
def baidu_get_song_artist(data):
|
||||||
return data['artistName']
|
return data['artistName']
|
||||||
|
|
||||||
|
|
||||||
def baidu_get_song_album(data):
|
def baidu_get_song_album(data):
|
||||||
return data['albumName']
|
return data['albumName']
|
||||||
|
|
||||||
|
|
||||||
def baidu_get_song_title(data):
|
def baidu_get_song_title(data):
|
||||||
return data['songName']
|
return data['songName']
|
||||||
|
|
||||||
|
|
||||||
def baidu_get_song_lyric(data):
|
def baidu_get_song_lyric(data):
|
||||||
lrc = data['lrcLink']
|
lrc = data['lrcLink']
|
||||||
return None if lrc is '' else "http://music.baidu.com%s" % lrc
|
return None if lrc is '' else "http://music.baidu.com%s" % lrc
|
||||||
|
|
||||||
|
|
||||||
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
|
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
|
||||||
data = baidu_get_song_data(sid)
|
data = baidu_get_song_data(sid)
|
||||||
if data is not None:
|
if data is not None:
|
||||||
@ -51,7 +59,8 @@ def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
|
|||||||
type, ext, size = url_info(url, faker=True)
|
type, ext, size = url_info(url, faker=True)
|
||||||
print_info(site_info, title, type, size)
|
print_info(site_info, title, type, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([url], file_name, ext, size, output_dir, merge=merge, faker=True)
|
download_urls([url], file_name, ext, size,
|
||||||
|
output_dir, merge=merge, faker=True)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
type, ext, size = url_info(lrc, faker=True)
|
type, ext, size = url_info(lrc, faker=True)
|
||||||
@ -61,12 +70,14 @@ def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
|
|||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
def baidu_download_album(aid, output_dir='.', merge=True, info_only=False):
|
def baidu_download_album(aid, output_dir='.', merge=True, info_only=False):
|
||||||
html = get_html('http://music.baidu.com/album/%s' % aid, faker=True)
|
html = get_html('http://music.baidu.com/album/%s' % aid, faker=True)
|
||||||
album_name = r1(r'<h2 class="album-name">(.+?)<\/h2>', html)
|
album_name = r1(r'<h2 class="album-name">(.+?)<\/h2>', html)
|
||||||
artist = r1(r'<span class="author_list" title="(.+?)">', html)
|
artist = r1(r'<span class="author_list" title="(.+?)">', html)
|
||||||
output_dir = '%s/%s - %s' % (output_dir, artist, album_name)
|
output_dir = '%s/%s - %s' % (output_dir, artist, album_name)
|
||||||
ids = json.loads(r1(r'<span class="album-add" data-adddata=\'(.+?)\'>', html).replace('"', '').replace(';', '"'))['ids']
|
ids = json.loads(r1(r'<span class="album-add" data-adddata=\'(.+?)\'>',
|
||||||
|
html).replace('"', '').replace(';', '"'))['ids']
|
||||||
track_nr = 1
|
track_nr = 1
|
||||||
for id in ids:
|
for id in ids:
|
||||||
song_data = baidu_get_song_data(id)
|
song_data = baidu_get_song_data(id)
|
||||||
@ -78,35 +89,26 @@ def baidu_download_album(aid, output_dir = '.', merge = True, info_only = False)
|
|||||||
type, ext, size = url_info(song_url, faker=True)
|
type, ext, size = url_info(song_url, faker=True)
|
||||||
print_info(site_info, song_title, type, size)
|
print_info(site_info, song_title, type, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([song_url], file_name, ext, size, output_dir, merge = merge, faker = True)
|
download_urls([song_url], file_name, ext, size,
|
||||||
|
output_dir, merge=merge, faker=True)
|
||||||
|
|
||||||
if song_lrc:
|
if song_lrc:
|
||||||
type, ext, size = url_info(song_lrc, faker=True)
|
type, ext, size = url_info(song_lrc, faker=True)
|
||||||
print_info(site_info, song_title, type, size)
|
print_info(site_info, song_title, type, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([song_lrc], file_name, ext, size, output_dir, faker = True)
|
download_urls([song_lrc], file_name, ext,
|
||||||
|
size, output_dir, faker=True)
|
||||||
|
|
||||||
track_nr += 1
|
track_nr += 1
|
||||||
|
|
||||||
|
|
||||||
def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
|
def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
|
||||||
if re.match(r'http://imgsrc.baidu.com', url):
|
|
||||||
universal_download(url, output_dir, merge=merge, info_only=info_only)
|
|
||||||
return
|
|
||||||
|
|
||||||
elif re.match(r'http://pan.baidu.com', url):
|
if re.match(r'http://pan.baidu.com', url):
|
||||||
html = get_html(url)
|
real_url, title, ext, size = baidu_pan_download(url)
|
||||||
|
|
||||||
title = r1(r'server_filename="([^"]+)"', html)
|
|
||||||
if len(title.split('.')) > 1:
|
|
||||||
title = ".".join(title.split('.')[:-1])
|
|
||||||
|
|
||||||
real_url = r1(r'\\"dlink\\":\\"([^"]*)\\"', html).replace('\\\\/', '/')
|
|
||||||
type, ext, size = url_info(real_url, faker = True)
|
|
||||||
|
|
||||||
print_info(site_info, title, ext, size)
|
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
download_urls([real_url], title, ext, size,
|
||||||
|
output_dir, url, merge=merge, faker=True)
|
||||||
elif re.match(r'http://music.baidu.com/album/\d+', url):
|
elif re.match(r'http://music.baidu.com/album/\d+', url):
|
||||||
id = r1(r'http://music.baidu.com/album/(\d+)', url)
|
id = r1(r'http://music.baidu.com/album/(\d+)', url)
|
||||||
baidu_download_album(id, output_dir, merge, info_only)
|
baidu_download_album(id, output_dir, merge, info_only)
|
||||||
@ -124,17 +126,20 @@ def baidu_download(url, output_dir = '.', stream_type = None, merge = True, info
|
|||||||
html = get_html(url)
|
html = get_html(url)
|
||||||
title = r1(r'title:"([^"]+)"', html)
|
title = r1(r'title:"([^"]+)"', html)
|
||||||
|
|
||||||
items = re.findall(r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
|
items = re.findall(
|
||||||
|
r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
|
||||||
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i
|
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i
|
||||||
for i in set(items)]
|
for i in set(items)]
|
||||||
|
|
||||||
# handle albums
|
# handle albums
|
||||||
kw = r1(r'kw=([^&]+)', html) or r1(r"kw:'([^']+)'", html)
|
kw = r1(r'kw=([^&]+)', html) or r1(r"kw:'([^']+)'", html)
|
||||||
tid = r1(r'tid=(\d+)', html) or r1(r"tid:'([^']+)'", html)
|
tid = r1(r'tid=(\d+)', html) or r1(r"tid:'([^']+)'", html)
|
||||||
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s' % (kw, tid)
|
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s' % (
|
||||||
|
kw, tid)
|
||||||
album_info = json.loads(get_content(album_url))
|
album_info = json.loads(get_content(album_url))
|
||||||
for i in album_info['data']['pic_list']:
|
for i in album_info['data']['pic_list']:
|
||||||
urls.append('http://imgsrc.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
|
urls.append(
|
||||||
|
'http://imgsrc.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
|
||||||
|
|
||||||
ext = 'jpg'
|
ext = 'jpg'
|
||||||
size = float('Inf')
|
size = float('Inf')
|
||||||
@ -144,6 +149,170 @@ def baidu_download(url, output_dir = '.', stream_type = None, merge = True, info
|
|||||||
download_urls(urls, title, ext, size,
|
download_urls(urls, title, ext, size,
|
||||||
output_dir=output_dir, merge=False)
|
output_dir=output_dir, merge=False)
|
||||||
|
|
||||||
|
|
||||||
|
def baidu_pan_download(url):
|
||||||
|
errno_patt = r'errno":([^"]+),'
|
||||||
|
refer_url = ""
|
||||||
|
fake_headers = {
|
||||||
|
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
||||||
|
'Accept-Charset': 'UTF-8,*;q=0.5',
|
||||||
|
'Accept-Encoding': 'gzip,deflate,sdch',
|
||||||
|
'Accept-Language': 'en-US,en;q=0.8',
|
||||||
|
'Host': 'pan.baidu.com',
|
||||||
|
'Origin': 'http://pan.baidu.com',
|
||||||
|
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:13.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2500.0 Safari/537.36',
|
||||||
|
'Referer': refer_url
|
||||||
|
}
|
||||||
|
if cookies:
|
||||||
|
print('Use user specified cookies')
|
||||||
|
else:
|
||||||
|
print('Generating cookies...')
|
||||||
|
fake_headers['Cookie'] = baidu_pan_gen_cookies(url)
|
||||||
|
refer_url = "http://pan.baidu.com"
|
||||||
|
html = get_content(url, fake_headers, decoded=True)
|
||||||
|
isprotected = False
|
||||||
|
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk = baidu_pan_parse(
|
||||||
|
html)
|
||||||
|
if sign == None:
|
||||||
|
if re.findall(r'\baccess-code\b', html):
|
||||||
|
isprotected = True
|
||||||
|
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk, fake_headers, psk = baidu_pan_protected_share(
|
||||||
|
url)
|
||||||
|
# raise NotImplementedError("Password required!")
|
||||||
|
if isprotected != True:
|
||||||
|
raise AssertionError("Share not found or canceled: %s" % url)
|
||||||
|
if bdstoken == None:
|
||||||
|
bdstoken = ""
|
||||||
|
if isprotected != True:
|
||||||
|
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk = baidu_pan_parse(
|
||||||
|
html)
|
||||||
|
request_url = "http://pan.baidu.com/api/sharedownload?sign=%s×tamp=%s&bdstoken=%s&channel=chunlei&clienttype=0&web=1&app_id=%s" % (
|
||||||
|
sign, timestamp, bdstoken, appid)
|
||||||
|
refer_url = url
|
||||||
|
post_data = {
|
||||||
|
'encrypt': 0,
|
||||||
|
'product': 'share',
|
||||||
|
'uk': uk,
|
||||||
|
'primaryid': primary_id,
|
||||||
|
'fid_list': '[' + fs_id + ']'
|
||||||
|
}
|
||||||
|
if isprotected == True:
|
||||||
|
post_data['sekey'] = psk
|
||||||
|
response_content = post_content(request_url, fake_headers, post_data, True)
|
||||||
|
errno = match1(response_content, errno_patt)
|
||||||
|
if errno != "0":
|
||||||
|
raise AssertionError(
|
||||||
|
"Server refused to provide download link! (Errno:%s)" % errno)
|
||||||
|
real_url = r1(r'dlink":"([^"]+)"', response_content).replace('\\/', '/')
|
||||||
|
title = r1(r'server_filename":"([^"]+)"', response_content)
|
||||||
|
assert real_url
|
||||||
|
type, ext, size = url_info(real_url, faker=True)
|
||||||
|
title_wrapped = json.loads('{"wrapper":"%s"}' % title)
|
||||||
|
title = title_wrapped['wrapper']
|
||||||
|
logging.debug(real_url)
|
||||||
|
print_info(site_info, title, ext, size)
|
||||||
|
print('Hold on...')
|
||||||
|
time.sleep(5)
|
||||||
|
return real_url, title, ext, size
|
||||||
|
|
||||||
|
|
||||||
|
def baidu_pan_parse(html):
|
||||||
|
sign_patt = r'sign":"([^"]+)"'
|
||||||
|
timestamp_patt = r'timestamp":([^"]+),'
|
||||||
|
appid_patt = r'app_id":"([^"]+)"'
|
||||||
|
bdstoken_patt = r'bdstoken":"([^"]+)"'
|
||||||
|
fs_id_patt = r'fs_id":([^"]+),'
|
||||||
|
uk_patt = r'uk":([^"]+),'
|
||||||
|
errno_patt = r'errno":([^"]+),'
|
||||||
|
primary_id_patt = r'shareid":([^"]+),'
|
||||||
|
sign = match1(html, sign_patt)
|
||||||
|
timestamp = match1(html, timestamp_patt)
|
||||||
|
appid = match1(html, appid_patt)
|
||||||
|
bdstoken = match1(html, bdstoken_patt)
|
||||||
|
fs_id = match1(html, fs_id_patt)
|
||||||
|
uk = match1(html, uk_patt)
|
||||||
|
primary_id = match1(html, primary_id_patt)
|
||||||
|
return sign, timestamp, bdstoken, appid, primary_id, fs_id, uk
|
||||||
|
|
||||||
|
|
||||||
|
def baidu_pan_gen_cookies(url, post_data=None):
|
||||||
|
from http import cookiejar
|
||||||
|
cookiejar = cookiejar.CookieJar()
|
||||||
|
opener = request.build_opener(request.HTTPCookieProcessor(cookiejar))
|
||||||
|
resp = opener.open('http://pan.baidu.com')
|
||||||
|
if post_data != None:
|
||||||
|
resp = opener.open(url, bytes(parse.urlencode(post_data), 'utf-8'))
|
||||||
|
return cookjar2hdr(cookiejar)
|
||||||
|
|
||||||
|
|
||||||
|
def baidu_pan_protected_share(url):
|
||||||
|
print('This share is protected by password!')
|
||||||
|
inpwd = input('Please provide unlock password: ')
|
||||||
|
inpwd = inpwd.replace(' ', '').replace('\t', '')
|
||||||
|
print('Please wait...')
|
||||||
|
post_pwd = {
|
||||||
|
'pwd': inpwd,
|
||||||
|
'vcode': None,
|
||||||
|
'vstr': None
|
||||||
|
}
|
||||||
|
from http import cookiejar
|
||||||
|
import time
|
||||||
|
cookiejar = cookiejar.CookieJar()
|
||||||
|
opener = request.build_opener(request.HTTPCookieProcessor(cookiejar))
|
||||||
|
resp = opener.open('http://pan.baidu.com')
|
||||||
|
resp = opener.open(url)
|
||||||
|
init_url = resp.geturl()
|
||||||
|
verify_url = 'http://pan.baidu.com/share/verify?%s&t=%s&channel=chunlei&clienttype=0&web=1' % (
|
||||||
|
init_url.split('?', 1)[1], int(time.time()))
|
||||||
|
refer_url = init_url
|
||||||
|
fake_headers = {
|
||||||
|
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
||||||
|
'Accept-Charset': 'UTF-8,*;q=0.5',
|
||||||
|
'Accept-Encoding': 'gzip,deflate,sdch',
|
||||||
|
'Accept-Language': 'en-US,en;q=0.8',
|
||||||
|
'Host': 'pan.baidu.com',
|
||||||
|
'Origin': 'http://pan.baidu.com',
|
||||||
|
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:13.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2500.0 Safari/537.36',
|
||||||
|
'Referer': refer_url
|
||||||
|
}
|
||||||
|
opener.addheaders = dict2triplet(fake_headers)
|
||||||
|
pwd_resp = opener.open(verify_url, bytes(
|
||||||
|
parse.urlencode(post_pwd), 'utf-8'))
|
||||||
|
pwd_resp_str = ungzip(pwd_resp.read()).decode('utf-8')
|
||||||
|
pwd_res = json.loads(pwd_resp_str)
|
||||||
|
if pwd_res['errno'] != 0:
|
||||||
|
raise AssertionError(
|
||||||
|
'Server returned an error: %s (Incorrect password?)' % pwd_res['errno'])
|
||||||
|
pg_resp = opener.open('http://pan.baidu.com/share/link?%s' %
|
||||||
|
init_url.split('?', 1)[1])
|
||||||
|
content = ungzip(pg_resp.read()).decode('utf-8')
|
||||||
|
sign, timestamp, bdstoken, appid, primary_id, fs_id, uk = baidu_pan_parse(
|
||||||
|
content)
|
||||||
|
psk = query_cookiejar(cookiejar, 'BDCLND')
|
||||||
|
psk = parse.unquote(psk)
|
||||||
|
fake_headers['Cookie'] = cookjar2hdr(cookiejar)
|
||||||
|
return sign, timestamp, bdstoken, appid, primary_id, fs_id, uk, fake_headers, psk
|
||||||
|
|
||||||
|
|
||||||
|
def cookjar2hdr(cookiejar):
|
||||||
|
cookie_str = ''
|
||||||
|
for i in cookiejar:
|
||||||
|
cookie_str = cookie_str + i.name + '=' + i.value + ';'
|
||||||
|
return cookie_str[:-1]
|
||||||
|
|
||||||
|
|
||||||
|
def query_cookiejar(cookiejar, name):
|
||||||
|
for i in cookiejar:
|
||||||
|
if i.name == name:
|
||||||
|
return i.value
|
||||||
|
|
||||||
|
|
||||||
|
def dict2triplet(dictin):
|
||||||
|
out_triplet = []
|
||||||
|
for i in dictin:
|
||||||
|
out_triplet.append((i, dictin[i]))
|
||||||
|
return out_triplet
|
||||||
|
|
||||||
site_info = "Baidu.com"
|
site_info = "Baidu.com"
|
||||||
download = baidu_download
|
download = baidu_download
|
||||||
download_playlist = playlist_not_supported("baidu")
|
download_playlist = playlist_not_supported("baidu")
|
||||||
|
@ -6,7 +6,7 @@ from ..common import *
|
|||||||
|
|
||||||
def bandcamp_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def bandcamp_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
html = get_html(url)
|
html = get_html(url)
|
||||||
trackinfo = json.loads(r1(r'(\[{"video_poster_url".*}\]),', html))
|
trackinfo = json.loads(r1(r'(\[{"(video_poster_url|video_caption)".*}\]),', html))
|
||||||
for track in trackinfo:
|
for track in trackinfo:
|
||||||
track_num = track['track_num']
|
track_num = track['track_num']
|
||||||
title = '%s. %s' % (track_num, track['title'])
|
title = '%s. %s' % (track_num, track['title'])
|
||||||
|
2
src/you_get/extractors/baomihua.py
Executable file → Normal file
2
src/you_get/extractors/baomihua.py
Executable file → Normal file
@ -7,7 +7,7 @@ from ..common import *
|
|||||||
import urllib
|
import urllib
|
||||||
|
|
||||||
def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
|
def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
html = get_html('http://play.baomihua.com/getvideourl.aspx?flvid=%s' % id)
|
html = get_html('http://play.baomihua.com/getvideourl.aspx?flvid=%s&devicetype=phone_app' % id)
|
||||||
host = r1(r'host=([^&]*)', html)
|
host = r1(r'host=([^&]*)', html)
|
||||||
assert host
|
assert host
|
||||||
type = r1(r'videofiletype=([^&]*)', html)
|
type = r1(r'videofiletype=([^&]*)', html)
|
||||||
|
76
src/you_get/extractors/bigthink.py
Normal file
76
src/you_get/extractors/bigthink.py
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
from ..extractor import VideoExtractor
|
||||||
|
|
||||||
|
import json
|
||||||
|
|
||||||
|
class Bigthink(VideoExtractor):
|
||||||
|
name = "Bigthink"
|
||||||
|
|
||||||
|
stream_types = [ #this is just a sample. Will make it in prepare()
|
||||||
|
# {'id': '1080'},
|
||||||
|
# {'id': '720'},
|
||||||
|
# {'id': '360'},
|
||||||
|
# {'id': '288'},
|
||||||
|
# {'id': '190'},
|
||||||
|
# {'id': '180'},
|
||||||
|
|
||||||
|
]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_streams_by_id(account_number, video_id):
|
||||||
|
"""
|
||||||
|
int, int->list
|
||||||
|
|
||||||
|
Get the height of the videos.
|
||||||
|
|
||||||
|
Since brightcove is using 3 kinds of links: rtmp, http and https,
|
||||||
|
we will be using the HTTPS one to make it secure.
|
||||||
|
|
||||||
|
If somehow akamaihd.net is blocked by the Great Fucking Wall,
|
||||||
|
change the "startswith https" to http.
|
||||||
|
"""
|
||||||
|
endpoint = 'https://edge.api.brightcove.com/playback/v1/accounts/{account_number}/videos/{video_id}'.format(account_number = account_number, video_id = video_id)
|
||||||
|
fake_header_id = fake_headers
|
||||||
|
#is this somehow related to the time? Magic....
|
||||||
|
fake_header_id['Accept'] ='application/json;pk=BCpkADawqM1cc6wmJQC2tvoXZt4mrB7bFfi6zGt9QnOzprPZcGLE9OMGJwspQwKfuFYuCjAAJ53JdjI8zGFx1ll4rxhYJ255AXH1BQ10rnm34weknpfG-sippyQ'
|
||||||
|
|
||||||
|
html = get_content(endpoint, headers= fake_header_id)
|
||||||
|
html_json = json.loads(html)
|
||||||
|
|
||||||
|
link_list = []
|
||||||
|
|
||||||
|
for i in html_json['sources']:
|
||||||
|
if 'src' in i: #to avoid KeyError
|
||||||
|
if i['src'].startswith('https'):
|
||||||
|
link_list.append((str(i['height']), i['src']))
|
||||||
|
|
||||||
|
return link_list
|
||||||
|
|
||||||
|
def prepare(self, **kwargs):
|
||||||
|
|
||||||
|
html = get_content(self.url)
|
||||||
|
|
||||||
|
self.title = match1(html, r'<meta property="og:title" content="([^"]*)"')
|
||||||
|
|
||||||
|
account_number = match1(html, r'data-account="(\d+)"')
|
||||||
|
|
||||||
|
video_id = match1(html, r'data-brightcove-id="(\d+)"')
|
||||||
|
|
||||||
|
assert account_number, video_id
|
||||||
|
|
||||||
|
link_list = self.get_streams_by_id(account_number, video_id)
|
||||||
|
|
||||||
|
for i in link_list:
|
||||||
|
self.stream_types.append({'id': str(i[0])})
|
||||||
|
self.streams[i[0]] = {'url': i[1]}
|
||||||
|
|
||||||
|
def extract(self, **kwargs):
|
||||||
|
for i in self.streams:
|
||||||
|
s = self.streams[i]
|
||||||
|
_, s['container'], s['size'] = url_info(s['url'])
|
||||||
|
s['src'] = [s['url']]
|
||||||
|
|
||||||
|
site = Bigthink()
|
||||||
|
download = site.download_by_url
|
@ -11,12 +11,14 @@ from .youku import youku_download_by_vid
|
|||||||
import hashlib
|
import hashlib
|
||||||
import re
|
import re
|
||||||
|
|
||||||
appkey='8e9fc618fbd41e28'
|
appkey = 'f3bb208b3d081dc8'
|
||||||
|
SECRETKEY_MINILOADER = '1c15888dc316e05a15fdd0a02ed6584f'
|
||||||
|
|
||||||
def get_srt_xml(id):
|
def get_srt_xml(id):
|
||||||
url = 'http://comment.bilibili.com/%s.xml' % id
|
url = 'http://comment.bilibili.com/%s.xml' % id
|
||||||
return get_html(url)
|
return get_html(url)
|
||||||
|
|
||||||
|
|
||||||
def parse_srt_p(p):
|
def parse_srt_p(p):
|
||||||
fields = p.split(',')
|
fields = p.split(',')
|
||||||
assert len(fields) == 8, fields
|
assert len(fields) == 8, fields
|
||||||
@ -44,12 +46,14 @@ def parse_srt_p(p):
|
|||||||
|
|
||||||
return pool, mode, font_size, font_color
|
return pool, mode, font_size, font_color
|
||||||
|
|
||||||
|
|
||||||
def parse_srt_xml(xml):
|
def parse_srt_xml(xml):
|
||||||
d = re.findall(r'<d p="([^"]+)">(.*)</d>', xml)
|
d = re.findall(r'<d p="([^"]+)">(.*)</d>', xml)
|
||||||
for x, y in d:
|
for x, y in d:
|
||||||
p = parse_srt_p(x)
|
p = parse_srt_p(x)
|
||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
|
||||||
def parse_cid_playurl(xml):
|
def parse_cid_playurl(xml):
|
||||||
from xml.dom.minidom import parseString
|
from xml.dom.minidom import parseString
|
||||||
try:
|
try:
|
||||||
@ -59,10 +63,12 @@ def parse_cid_playurl(xml):
|
|||||||
except:
|
except:
|
||||||
return []
|
return []
|
||||||
|
|
||||||
|
|
||||||
def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only=False):
|
def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only=False):
|
||||||
urls = []
|
urls = []
|
||||||
for cid in cids:
|
for cid in cids:
|
||||||
url = 'http://interface.bilibili.com/playurl?appkey=' + appkey + '&cid=' + cid
|
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
|
||||||
|
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
|
||||||
urls += [i
|
urls += [i
|
||||||
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
|
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
|
||||||
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
|
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
|
||||||
@ -78,8 +84,10 @@ def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only
|
|||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||||
|
|
||||||
|
|
||||||
def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
|
def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
|
||||||
url = 'http://interface.bilibili.com/playurl?appkey=' + appkey + '&cid=' + cid
|
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
|
||||||
|
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
|
||||||
urls = [i
|
urls = [i
|
||||||
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
|
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
|
||||||
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
|
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
|
||||||
@ -87,17 +95,15 @@ def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=F
|
|||||||
|
|
||||||
type_ = ''
|
type_ = ''
|
||||||
size = 0
|
size = 0
|
||||||
try:
|
|
||||||
for url in urls:
|
for url in urls:
|
||||||
_, type_, temp = url_info(url)
|
_, type_, temp = url_info(url)
|
||||||
size += temp or 0
|
size += temp or 0
|
||||||
except error.URLError:
|
|
||||||
log.wtf('[Failed] DNS not resolved. Please change your DNS server settings.')
|
|
||||||
|
|
||||||
print_info(site_info, title, type_, size)
|
print_info(site_info, title, type_, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||||
|
|
||||||
|
|
||||||
def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
|
def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
|
||||||
api_url = 'http://live.bilibili.com/api/playurl?cid=' + cid
|
api_url = 'http://live.bilibili.com/api/playurl?cid=' + cid
|
||||||
urls = parse_cid_playurl(get_content(api_url))
|
urls = parse_cid_playurl(get_content(api_url))
|
||||||
@ -109,31 +115,42 @@ def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_o
|
|||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([url], title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
download_urls([url], title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||||
|
|
||||||
|
|
||||||
def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
html = get_content(url)
|
html = get_content(url)
|
||||||
|
|
||||||
title = r1_of([r'<meta name="title" content="([^<>]{1,999})" />',
|
title = r1_of([r'<meta name="title" content="\s*([^<>]{1,999})\s*" />',
|
||||||
r'<h1[^>]*>([^<>]+)</h1>'], html)
|
r'<h1[^>]*>\s*([^<>]+)\s*</h1>'], html)
|
||||||
if title:
|
if title:
|
||||||
title = unescape_html(title)
|
title = unescape_html(title)
|
||||||
title = escape_file_path(title)
|
title = escape_file_path(title)
|
||||||
|
|
||||||
flashvars = r1_of([r'(cid=\d+)', r'(cid: \d+)', r'flashvars="([^"]+)"', r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
|
if re.match(r'https?://bangumi\.bilibili\.com/', url):
|
||||||
|
# quick hack for bangumi URLs
|
||||||
|
episode_id = r1(r'data-current-episode-id="(\d+)"', html)
|
||||||
|
cont = post_content('http://bangumi.bilibili.com/web_api/get_source',
|
||||||
|
post_data={'episode_id': episode_id})
|
||||||
|
cid = json.loads(cont)['result']['cid']
|
||||||
|
bilibili_download_by_cid(str(cid), title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
|
||||||
|
else:
|
||||||
|
flashvars = r1_of([r'(cid=\d+)', r'(cid: \d+)', r'flashvars="([^"]+)"',
|
||||||
|
r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
|
||||||
assert flashvars
|
assert flashvars
|
||||||
flashvars = flashvars.replace(': ', '=')
|
flashvars = flashvars.replace(': ', '=')
|
||||||
t, cid = flashvars.split('=', 1)
|
t, cid = flashvars.split('=', 1)
|
||||||
cid = cid.split('&')[0]
|
cid = cid.split('&')[0]
|
||||||
if t == 'cid':
|
if t == 'cid':
|
||||||
if re.match(r'https?://live\.bilibili\.com/', url):
|
if re.match(r'https?://live\.bilibili\.com/', url):
|
||||||
title = r1(r'<title>([^<>]+)</title>', html)
|
title = r1(r'<title>\s*([^<>]+)\s*</title>', html)
|
||||||
bilibili_live_download_by_cid(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
bilibili_live_download_by_cid(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
|
||||||
else:
|
else:
|
||||||
# multi-P
|
# multi-P
|
||||||
cids = []
|
cids = []
|
||||||
pages = re.findall('<option value=\'([^\']*)\'', html)
|
pages = re.findall('<option value=\'([^\']*)\'', html)
|
||||||
titles = re.findall('<option value=.*>(.+)</option>', html)
|
titles = re.findall('<option value=.*>\s*([^<>]+)\s*</option>', html)
|
||||||
for page in pages:
|
for i, page in enumerate(pages):
|
||||||
html = get_html("http://www.bilibili.com%s" % page)
|
html = get_html("http://www.bilibili.com%s" % page)
|
||||||
flashvars = r1_of([r'(cid=\d+)',
|
flashvars = r1_of([r'(cid=\d+)',
|
||||||
r'flashvars="([^"]+)"',
|
r'flashvars="([^"]+)"',
|
||||||
@ -141,11 +158,15 @@ def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs
|
|||||||
if flashvars:
|
if flashvars:
|
||||||
t, cid = flashvars.split('=', 1)
|
t, cid = flashvars.split('=', 1)
|
||||||
cids.append(cid.split('&')[0])
|
cids.append(cid.split('&')[0])
|
||||||
|
if url.endswith(page):
|
||||||
|
cids = [cid.split('&')[0]]
|
||||||
|
titles = [titles[i]]
|
||||||
|
break
|
||||||
|
|
||||||
# no multi-P
|
# no multi-P
|
||||||
if not pages:
|
if not pages:
|
||||||
cids = [cid]
|
cids = [cid]
|
||||||
titles = [r1(r'<option value=.* selected>(.+)</option>', html) or title]
|
titles = [r1(r'<option value=.* selected>\s*([^<>]+)\s*</option>', html) or title]
|
||||||
|
|
||||||
for i in range(len(cids)):
|
for i in range(len(cids)):
|
||||||
bilibili_download_by_cid(cids[i],
|
bilibili_download_by_cid(cids[i],
|
||||||
@ -173,6 +194,7 @@ def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs
|
|||||||
with open(os.path.join(output_dir, title + '.cmt.xml'), 'w', encoding='utf-8') as x:
|
with open(os.path.join(output_dir, title + '.cmt.xml'), 'w', encoding='utf-8') as x:
|
||||||
x.write(xml)
|
x.write(xml)
|
||||||
|
|
||||||
|
|
||||||
site_info = "bilibili.com"
|
site_info = "bilibili.com"
|
||||||
download = bilibili_download
|
download = bilibili_download
|
||||||
download_playlist = bilibili_download
|
download_playlist = bilibili_download
|
||||||
|
95
src/you_get/extractors/bokecc.py
Normal file
95
src/you_get/extractors/bokecc.py
Normal file
@ -0,0 +1,95 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
from ..extractor import VideoExtractor
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
|
||||||
|
class BokeCC(VideoExtractor):
|
||||||
|
name = "BokeCC"
|
||||||
|
|
||||||
|
stream_types = [ # we do now know for now, as we have to check the
|
||||||
|
# output from the API
|
||||||
|
]
|
||||||
|
|
||||||
|
API_ENDPOINT = 'http://p.bokecc.com/'
|
||||||
|
|
||||||
|
|
||||||
|
def download_by_id(self, vid = '', title = None, output_dir='.', merge=True, info_only=False,**kwargs):
|
||||||
|
"""self, str->None
|
||||||
|
|
||||||
|
Keyword arguments:
|
||||||
|
self: self
|
||||||
|
vid: The video ID for BokeCC cloud, something like
|
||||||
|
FE3BB999594978049C33DC5901307461
|
||||||
|
|
||||||
|
Calls the prepare() to download the video.
|
||||||
|
|
||||||
|
If no title is provided, this method shall try to find a proper title
|
||||||
|
with the information providin within the
|
||||||
|
returned content of the API."""
|
||||||
|
|
||||||
|
assert vid
|
||||||
|
|
||||||
|
self.prepare(vid = vid, title = title, **kwargs)
|
||||||
|
|
||||||
|
self.extract(**kwargs)
|
||||||
|
|
||||||
|
self.download(output_dir = output_dir,
|
||||||
|
merge = merge,
|
||||||
|
info_only = info_only, **kwargs)
|
||||||
|
|
||||||
|
def prepare(self, vid = '', title = None, **kwargs):
|
||||||
|
assert vid
|
||||||
|
|
||||||
|
api_url = self.API_ENDPOINT + \
|
||||||
|
'servlet/playinfo?vid={vid}&m=0'.format(vid = vid) #return XML
|
||||||
|
|
||||||
|
html = get_content(api_url)
|
||||||
|
self.tree = ET.ElementTree(ET.fromstring(html))
|
||||||
|
|
||||||
|
if self.tree.find('result').text != '1':
|
||||||
|
log.wtf('API result says failed!')
|
||||||
|
raise
|
||||||
|
|
||||||
|
if title is None:
|
||||||
|
self.title = '_'.join([i.text for i in tree.iterfind('video/videomarks/videomark/markdesc')])
|
||||||
|
else:
|
||||||
|
self.title = title
|
||||||
|
|
||||||
|
for i in self.tree.iterfind('video/quality'):
|
||||||
|
quality = i.attrib ['value']
|
||||||
|
url = i[0].attrib['playurl']
|
||||||
|
self.stream_types.append({'id': quality,
|
||||||
|
'video_profile': i.attrib ['desp']})
|
||||||
|
self.streams[quality] = {'url': url,
|
||||||
|
'video_profile': i.attrib ['desp']}
|
||||||
|
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
|
||||||
|
|
||||||
|
|
||||||
|
def extract(self, **kwargs):
|
||||||
|
for i in self.streams:
|
||||||
|
s = self.streams[i]
|
||||||
|
_, s['container'], s['size'] = url_info(s['url'])
|
||||||
|
s['src'] = [s['url']]
|
||||||
|
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||||
|
# Extract the stream
|
||||||
|
stream_id = kwargs['stream_id']
|
||||||
|
|
||||||
|
if stream_id not in self.streams:
|
||||||
|
log.e('[Error] Invalid video format.')
|
||||||
|
log.e('Run \'-i\' command with no specific video format to view all available formats.')
|
||||||
|
exit(2)
|
||||||
|
else:
|
||||||
|
# Extract stream with the best quality
|
||||||
|
stream_id = self.streams_sorted[0]['id']
|
||||||
|
_, s['container'], s['size'] = url_info(s['url'])
|
||||||
|
s['src'] = [s['url']]
|
||||||
|
|
||||||
|
site = BokeCC()
|
||||||
|
|
||||||
|
# I don't know how to call the player directly so I just put it here
|
||||||
|
# just in case anyone touchs it -- Beining@Aug.24.2016
|
||||||
|
#download = site.download_by_url
|
||||||
|
#download_playlist = site.download_by_url
|
||||||
|
|
||||||
|
bokecc_download_by_id = site.download_by_id
|
@ -7,6 +7,7 @@ from ..common import *
|
|||||||
import json
|
import json
|
||||||
import re
|
import re
|
||||||
|
|
||||||
|
|
||||||
def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
||||||
assert id
|
assert id
|
||||||
info = json.loads(get_html('http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + id))
|
info = json.loads(get_html('http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + id))
|
||||||
@ -31,7 +32,11 @@ def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_o
|
|||||||
def cntv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
def cntv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
if re.match(r'http://tv\.cntv\.cn/video/(\w+)/(\w+)', url):
|
if re.match(r'http://tv\.cntv\.cn/video/(\w+)/(\w+)', url):
|
||||||
id = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
|
id = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
|
||||||
elif re.match(r'http://\w+\.cntv\.cn/(\w+/\w+/(classpage/video/)?)?\d+/\d+\.shtml', url) or re.match(r'http://\w+.cntv.cn/(\w+/)*VIDE\d+.shtml', url):
|
elif re.match(r'http://\w+\.cntv\.cn/(\w+/\w+/(classpage/video/)?)?\d+/\d+\.shtml', url) or \
|
||||||
|
re.match(r'http://\w+.cntv.cn/(\w+/)*VIDE\d+.shtml', url) or \
|
||||||
|
re.match(r'http://(\w+).cntv.cn/(\w+)/classpage/video/(\d+)/(\d+).shtml', url) or \
|
||||||
|
re.match(r'http://\w+.cctv.com/\d+/\d+/\d+/\w+.shtml', url) or \
|
||||||
|
re.match(r'http://\w+.cntv.cn/\d+/\d+/\d+/\w+.shtml', url):
|
||||||
id = r1(r'videoCenterId","(\w+)"', get_html(url))
|
id = r1(r'videoCenterId","(\w+)"', get_html(url))
|
||||||
elif re.match(r'http://xiyou.cntv.cn/v-[\w-]+\.html', url):
|
elif re.match(r'http://xiyou.cntv.cn/v-[\w-]+\.html', url):
|
||||||
id = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)
|
id = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)
|
||||||
|
@ -4,6 +4,11 @@ __all__ = ['dailymotion_download']
|
|||||||
|
|
||||||
from ..common import *
|
from ..common import *
|
||||||
|
|
||||||
|
def extract_m3u(url):
|
||||||
|
content = get_content(url)
|
||||||
|
m3u_url = re.findall(r'http://.*', content)[0]
|
||||||
|
return match1(m3u_url, r'([^#]+)')
|
||||||
|
|
||||||
def dailymotion_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
def dailymotion_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
"""Downloads Dailymotion videos by URL.
|
"""Downloads Dailymotion videos by URL.
|
||||||
"""
|
"""
|
||||||
@ -13,7 +18,7 @@ def dailymotion_download(url, output_dir = '.', merge = True, info_only = False,
|
|||||||
title = match1(html, r'"video_title"\s*:\s*"([^"]+)"') or \
|
title = match1(html, r'"video_title"\s*:\s*"([^"]+)"') or \
|
||||||
match1(html, r'"title"\s*:\s*"([^"]+)"')
|
match1(html, r'"title"\s*:\s*"([^"]+)"')
|
||||||
|
|
||||||
for quality in ['720','480','380','240','auto']:
|
for quality in ['1080','720','480','380','240','auto']:
|
||||||
try:
|
try:
|
||||||
real_url = info[quality][0]["url"]
|
real_url = info[quality][0]["url"]
|
||||||
if real_url:
|
if real_url:
|
||||||
@ -21,11 +26,12 @@ def dailymotion_download(url, output_dir = '.', merge = True, info_only = False,
|
|||||||
except KeyError:
|
except KeyError:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
type, ext, size = url_info(real_url)
|
m3u_url = extract_m3u(real_url)
|
||||||
|
mime, ext, size = 'video/mp4', 'mp4', 0
|
||||||
|
|
||||||
print_info(site_info, title, type, size)
|
print_info(site_info, title, mime, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
download_url_ffmpeg(m3u_url, title, ext, output_dir=output_dir, merge=merge)
|
||||||
|
|
||||||
site_info = "Dailymotion.com"
|
site_info = "Dailymotion.com"
|
||||||
download = dailymotion_download
|
download = dailymotion_download
|
||||||
|
6
src/you_get/extractors/dilidili.py
Executable file → Normal file
6
src/you_get/extractors/dilidili.py
Executable file → Normal file
@ -35,16 +35,16 @@ def dilidili_parser_data_to_stream_types(typ ,vid ,hd2 ,sign, tmsign, ulk):
|
|||||||
|
|
||||||
#----------------------------------------------------------------------
|
#----------------------------------------------------------------------
|
||||||
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||||
if re.match(r'http://www.dilidili.com/watch/\w+', url):
|
if re.match(r'http://www.dilidili.com/watch\S+', url):
|
||||||
html = get_content(url)
|
html = get_content(url)
|
||||||
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
|
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
|
||||||
|
|
||||||
# player loaded via internal iframe
|
# player loaded via internal iframe
|
||||||
frame_url = re.search(r'<iframe (.+)src="(.+)\" f(.+)</iframe>', html).group(2)
|
frame_url = re.search(r'<iframe src=\"(.+?)\"', html).group(1)
|
||||||
#print(frame_url)
|
#print(frame_url)
|
||||||
|
|
||||||
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
|
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
|
||||||
html = get_content(frame_url, headers=headers)
|
html = get_content(frame_url, headers=headers, decoded=False).decode('utf-8')
|
||||||
|
|
||||||
match = re.search(r'(.+?)var video =(.+?);', html)
|
match = re.search(r'(.+?)var video =(.+?);', html)
|
||||||
vid = match1(html, r'var vid="(.+)"')
|
vid = match1(html, r'var vid="(.+)"')
|
||||||
|
@ -7,7 +7,18 @@ from ..common import *
|
|||||||
|
|
||||||
def douban_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
def douban_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
html = get_html(url)
|
html = get_html(url)
|
||||||
if 'subject' in url:
|
|
||||||
|
if re.match(r'https?://movie', url):
|
||||||
|
title = match1(html, 'name="description" content="([^"]+)')
|
||||||
|
tid = match1(url, 'trailer/(\d+)')
|
||||||
|
real_url = 'https://movie.douban.com/trailer/video_url?tid=%s' % tid
|
||||||
|
type, ext, size = url_info(real_url)
|
||||||
|
|
||||||
|
print_info(site_info, title, type, size)
|
||||||
|
if not info_only:
|
||||||
|
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||||
|
|
||||||
|
elif 'subject' in url:
|
||||||
titles = re.findall(r'data-title="([^"]*)">', html)
|
titles = re.findall(r'data-title="([^"]*)">', html)
|
||||||
song_id = re.findall(r'<li class="song-item" id="([^"]*)"', html)
|
song_id = re.findall(r'<li class="song-item" id="([^"]*)"', html)
|
||||||
song_ssid = re.findall(r'data-ssid="([^"]*)"', html)
|
song_ssid = re.findall(r'data-ssid="([^"]*)"', html)
|
||||||
|
@ -6,27 +6,50 @@ from ..common import *
|
|||||||
import json
|
import json
|
||||||
import hashlib
|
import hashlib
|
||||||
import time
|
import time
|
||||||
|
import uuid
|
||||||
|
import urllib.parse, urllib.request
|
||||||
|
|
||||||
def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
|
html = get_content(url)
|
||||||
|
room_id_patt = r'"room_id"\s*:\s*(\d+),'
|
||||||
|
room_id = match1(html, room_id_patt)
|
||||||
|
if room_id == "0":
|
||||||
room_id = url[url.rfind('/')+1:]
|
room_id = url[url.rfind('/')+1:]
|
||||||
#Thanks to @yan12125 for providing decoding method!!
|
|
||||||
suffix = 'room/%s?aid=android&client_sys=android&time=%d' % (room_id, int(time.time()))
|
json_request_url = "http://m.douyu.com/html5/live?roomId=%s" % room_id
|
||||||
sign = hashlib.md5((suffix + '1231').encode('ascii')).hexdigest()
|
content = get_content(json_request_url)
|
||||||
json_request_url = "http://www.douyu.com/api/v1/%s&auth=%s" % (suffix, sign)
|
|
||||||
content = get_html(json_request_url)
|
|
||||||
data = json.loads(content)['data']
|
data = json.loads(content)['data']
|
||||||
server_status = data.get('error',0)
|
server_status = data.get('error',0)
|
||||||
if server_status is not 0:
|
if server_status is not 0:
|
||||||
raise ValueError("Server returned error:%s" % server_status)
|
raise ValueError("Server returned error:%s" % server_status)
|
||||||
|
|
||||||
title = data.get('room_name')
|
title = data.get('room_name')
|
||||||
show_status = data.get('show_status')
|
show_status = data.get('show_status')
|
||||||
if show_status is not "1":
|
if show_status is not "1":
|
||||||
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
|
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
|
||||||
|
|
||||||
|
tt = int(time.time() / 60)
|
||||||
|
did = uuid.uuid4().hex.upper()
|
||||||
|
sign_content = '{room_id}{did}A12Svb&%1UUmf@hC{tt}'.format(room_id = room_id, did = did, tt = tt)
|
||||||
|
sign = hashlib.md5(sign_content.encode('utf-8')).hexdigest()
|
||||||
|
|
||||||
|
json_request_url = "http://www.douyu.com/lapi/live/getPlay/%s" % room_id
|
||||||
|
payload = {'cdn': 'ws', 'rate': '0', 'tt': tt, 'did': did, 'sign': sign}
|
||||||
|
postdata = urllib.parse.urlencode(payload)
|
||||||
|
req = urllib.request.Request(json_request_url, postdata.encode('utf-8'))
|
||||||
|
with urllib.request.urlopen(req) as response:
|
||||||
|
content = response.read()
|
||||||
|
|
||||||
|
data = json.loads(content.decode('utf-8'))['data']
|
||||||
|
server_status = data.get('error',0)
|
||||||
|
if server_status is not 0:
|
||||||
|
raise ValueError("Server returned error:%s" % server_status)
|
||||||
|
|
||||||
real_url = data.get('rtmp_url')+'/'+data.get('rtmp_live')
|
real_url = data.get('rtmp_url')+'/'+data.get('rtmp_live')
|
||||||
|
|
||||||
print_info(site_info, title, 'flv', float('inf'))
|
print_info(site_info, title, 'flv', float('inf'))
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
|
download_url_ffmpeg(real_url, title, 'flv', None, output_dir = output_dir, merge = merge)
|
||||||
|
|
||||||
site_info = "douyu.com"
|
site_info = "douyu.com"
|
||||||
download = douyutv_download
|
download = douyutv_download
|
||||||
|
@ -8,6 +8,7 @@ from .netease import netease_download
|
|||||||
from .qq import qq_download_by_vid
|
from .qq import qq_download_by_vid
|
||||||
from .sina import sina_download_by_vid
|
from .sina import sina_download_by_vid
|
||||||
from .tudou import tudou_download_by_id
|
from .tudou import tudou_download_by_id
|
||||||
|
from .vimeo import vimeo_download_by_id
|
||||||
from .yinyuetai import yinyuetai_download_by_id
|
from .yinyuetai import yinyuetai_download_by_id
|
||||||
from .youku import youku_download_by_vid
|
from .youku import youku_download_by_vid
|
||||||
|
|
||||||
@ -24,7 +25,7 @@ youku_embed_patterns = [ 'youku\.com/v_show/id_([a-zA-Z0-9=]+)',
|
|||||||
"""
|
"""
|
||||||
http://www.tudou.com/programs/view/html5embed.action?type=0&code=3LS_URGvl54&lcode=&resourceId=0_06_05_99
|
http://www.tudou.com/programs/view/html5embed.action?type=0&code=3LS_URGvl54&lcode=&resourceId=0_06_05_99
|
||||||
"""
|
"""
|
||||||
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_]+)\&',
|
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_-]+)\&',
|
||||||
'www\.tudou\.com/v/([a-zA-Z0-9_-]+)/[^"]*v\.swf'
|
'www\.tudou\.com/v/([a-zA-Z0-9_-]+)/[^"]*v\.swf'
|
||||||
]
|
]
|
||||||
|
|
||||||
@ -39,6 +40,9 @@ iqiyi_embed_patterns = [ 'player\.video\.qiyi\.com/([^/]+)/[^/]+/[^/]+/[^/]+\.sw
|
|||||||
|
|
||||||
netease_embed_patterns = [ '(http://\w+\.163\.com/movie/[^\'"]+)' ]
|
netease_embed_patterns = [ '(http://\w+\.163\.com/movie/[^\'"]+)' ]
|
||||||
|
|
||||||
|
vimeo_embed_patters = [ 'player\.vimeo\.com/video/(\d+)' ]
|
||||||
|
|
||||||
|
|
||||||
def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwargs):
|
def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwargs):
|
||||||
content = get_content(url, headers=fake_headers)
|
content = get_content(url, headers=fake_headers)
|
||||||
found = False
|
found = False
|
||||||
@ -69,6 +73,11 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
|
|||||||
found = True
|
found = True
|
||||||
netease_download(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
netease_download(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
|
||||||
|
urls = matchall(content, vimeo_embed_patters)
|
||||||
|
for url in urls:
|
||||||
|
found = True
|
||||||
|
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
|
||||||
if not found:
|
if not found:
|
||||||
raise NotImplementedError(url)
|
raise NotImplementedError(url)
|
||||||
|
|
||||||
|
@ -5,24 +5,26 @@ __all__ = ['facebook_download']
|
|||||||
from ..common import *
|
from ..common import *
|
||||||
import json
|
import json
|
||||||
|
|
||||||
|
|
||||||
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
html = get_html(url)
|
html = get_html(url)
|
||||||
|
|
||||||
title = r1(r'<title id="pageTitle">(.+) \| Facebook</title>', html)
|
title = r1(r'<title id="pageTitle">(.+)</title>', html)
|
||||||
s2 = parse.unquote(unicodize(r1(r'\["params","([^"]*)"\]', html)))
|
sd_urls = list(set([
|
||||||
data = json.loads(s2)
|
unicodize(str.replace(i, '\\/', '/'))
|
||||||
video_data = data["video_data"]["progressive"]
|
for i in re.findall(r'"sd_src_no_ratelimit":"([^"]*)"', html)
|
||||||
for fmt in ["hd_src", "sd_src"]:
|
]))
|
||||||
src = video_data[0][fmt]
|
hd_urls = list(set([
|
||||||
if src:
|
unicodize(str.replace(i, '\\/', '/'))
|
||||||
break
|
for i in re.findall(r'"hd_src_no_ratelimit":"([^"]*)"', html)
|
||||||
|
]))
|
||||||
|
urls = hd_urls if hd_urls else sd_urls
|
||||||
|
|
||||||
type, ext, size = url_info(src, True)
|
type, ext, size = url_info(urls[0], True)
|
||||||
|
size = urls_size(urls)
|
||||||
|
|
||||||
print_info(site_info, title, type, size)
|
print_info(site_info, title, type, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([src], title, ext, size, output_dir, merge=merge)
|
download_urls(urls, title, ext, size, output_dir, merge=False)
|
||||||
|
|
||||||
site_info = "Facebook.com"
|
site_info = "Facebook.com"
|
||||||
download = facebook_download
|
download = facebook_download
|
||||||
|
15
src/you_get/extractors/funshion.py
Executable file → Normal file
15
src/you_get/extractors/funshion.py
Executable file → Normal file
@ -10,9 +10,9 @@ import json
|
|||||||
def funshion_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
def funshion_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||||
""""""
|
""""""
|
||||||
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url): #single video
|
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url): #single video
|
||||||
funshion_download_by_url(url, output_dir = '.', merge = False, info_only = False)
|
funshion_download_by_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
elif re.match(r'http://www.fun.tv/vplay/g-(\w+)', url): #whole drama
|
elif re.match(r'http://www.fun.tv/vplay/.*g-(\w+)', url): #whole drama
|
||||||
funshion_download_by_drama_url(url, output_dir = '.', merge = False, info_only = False)
|
funshion_download_by_drama_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
else:
|
else:
|
||||||
return
|
return
|
||||||
|
|
||||||
@ -25,7 +25,7 @@ def funshion_download_by_url(url, output_dir = '.', merge = False, info_only = F
|
|||||||
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url):
|
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url):
|
||||||
match = re.search(r'http://www.fun.tv/vplay/v-(\d+)(.?)', url)
|
match = re.search(r'http://www.fun.tv/vplay/v-(\d+)(.?)', url)
|
||||||
vid = match.group(1)
|
vid = match.group(1)
|
||||||
funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False)
|
funshion_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
|
||||||
#----------------------------------------------------------------------
|
#----------------------------------------------------------------------
|
||||||
def funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False):
|
def funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False):
|
||||||
@ -63,14 +63,11 @@ def funshion_download_by_drama_url(url, output_dir = '.', merge = False, info_on
|
|||||||
"""str->None
|
"""str->None
|
||||||
url = 'http://www.fun.tv/vplay/g-95785/'
|
url = 'http://www.fun.tv/vplay/g-95785/'
|
||||||
"""
|
"""
|
||||||
if re.match(r'http://www.fun.tv/vplay/g-(\w+)', url):
|
id = r1(r'http://www.fun.tv/vplay/.*g-(\d+)', url)
|
||||||
match = re.search(r'http://www.fun.tv/vplay/g-(\d+)(.?)', url)
|
|
||||||
id = match.group(1)
|
|
||||||
|
|
||||||
video_list = funshion_drama_id_to_vid(id)
|
video_list = funshion_drama_id_to_vid(id)
|
||||||
|
|
||||||
for video in video_list:
|
for video in video_list:
|
||||||
funshion_download_by_id((video[0], id), output_dir = '.', merge = False, info_only = False)
|
funshion_download_by_id((video[0], id), output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
# id is for drama, vid not the same as the ones used in single video
|
# id is for drama, vid not the same as the ones used in single video
|
||||||
|
|
||||||
#----------------------------------------------------------------------
|
#----------------------------------------------------------------------
|
||||||
|
36
src/you_get/extractors/huomaotv.py
Normal file
36
src/you_get/extractors/huomaotv.py
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
__all__ = ['huomaotv_download']
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
|
||||||
|
|
||||||
|
def get_mobile_room_url(room_id):
|
||||||
|
return 'http://www.huomao.com/mobile/mob_live/%s' % room_id
|
||||||
|
|
||||||
|
|
||||||
|
def get_m3u8_url(stream_id):
|
||||||
|
return 'http://live-ws.huomaotv.cn/live/%s/playlist.m3u8' % stream_id
|
||||||
|
|
||||||
|
|
||||||
|
def huomaotv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
room_id_pattern = r'huomao.com/(\d+)'
|
||||||
|
room_id = match1(url, room_id_pattern)
|
||||||
|
html = get_content(get_mobile_room_url(room_id))
|
||||||
|
|
||||||
|
stream_id_pattern = r'id="html_stream" value="(\w+)"'
|
||||||
|
stream_id = match1(html, stream_id_pattern)
|
||||||
|
|
||||||
|
m3u8_url = get_m3u8_url(stream_id)
|
||||||
|
|
||||||
|
title = match1(html, r'<title>([^<]{1,9999})</title>')
|
||||||
|
|
||||||
|
print_info(site_info, title, 'm3u8', float('inf'))
|
||||||
|
|
||||||
|
if not info_only:
|
||||||
|
download_url_ffmpeg(m3u8_url, title, 'm3u8', None, output_dir=output_dir, merge=merge)
|
||||||
|
|
||||||
|
|
||||||
|
site_info = 'huomao.com'
|
||||||
|
download = huomaotv_download
|
||||||
|
download_playlist = playlist_not_supported('huomao')
|
148
src/you_get/extractors/icourses.py
Normal file
148
src/you_get/extractors/icourses.py
Normal file
@ -0,0 +1,148 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
from ..common import *
|
||||||
|
from urllib import parse
|
||||||
|
import random
|
||||||
|
from time import sleep
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
import datetime
|
||||||
|
import hashlib
|
||||||
|
import base64
|
||||||
|
import logging
|
||||||
|
from urllib import error
|
||||||
|
import re
|
||||||
|
|
||||||
|
__all__ = ['icourses_download']
|
||||||
|
|
||||||
|
|
||||||
|
def icourses_download(url, merge=False, output_dir='.', **kwargs):
|
||||||
|
icourses_parser = ICousesExactor(url=url)
|
||||||
|
real_url = icourses_parser.icourses_cn_url_parser(**kwargs)
|
||||||
|
title = icourses_parser.title
|
||||||
|
if real_url is not None:
|
||||||
|
for tries in range(0, 5):
|
||||||
|
try:
|
||||||
|
_, type_, size = url_info(real_url, faker=True)
|
||||||
|
break
|
||||||
|
except error.HTTPError:
|
||||||
|
logging.warning('Failed to fetch the video file! Retrying...')
|
||||||
|
sleep(random.Random().randint(0, 5)) # Prevent from blockage
|
||||||
|
real_url = icourses_parser.icourses_cn_url_parser()
|
||||||
|
title = icourses_parser.title
|
||||||
|
print_info(site_info, title, type_, size)
|
||||||
|
if not kwargs['info_only']:
|
||||||
|
download_urls_chunked([real_url], title, 'flv',
|
||||||
|
total_size=size, output_dir=output_dir, refer=url, merge=merge, faker=True, ignore_range=True, chunk_size=15000000, dyn_callback=icourses_parser.icourses_cn_url_parser)
|
||||||
|
|
||||||
|
|
||||||
|
# Why not using VideoExtractor: This site needs specical download method
|
||||||
|
class ICousesExactor(object):
|
||||||
|
|
||||||
|
def __init__(self, url):
|
||||||
|
self.url = url
|
||||||
|
self.title = ''
|
||||||
|
return
|
||||||
|
|
||||||
|
def icourses_playlist_download(self, **kwargs):
|
||||||
|
html = get_content(self.url)
|
||||||
|
page_type_patt = r'showSectionNode\(this,(\d+),(\d+)\)'
|
||||||
|
video_js_number = r'changeforvideo\((.*?)\)'
|
||||||
|
fs_flag = r'<input type="hidden" value=(\w+) id="firstShowFlag">'
|
||||||
|
page_navi_vars = re.search(pattern=page_type_patt, string=html)
|
||||||
|
dummy_page = 'http://www.icourses.cn/jpk/viewCharacterDetail.action?sectionId={}&courseId={}'.format(
|
||||||
|
page_navi_vars.group(2), page_navi_vars.group(1))
|
||||||
|
html = get_content(dummy_page)
|
||||||
|
fs_status = match1(html, fs_flag)
|
||||||
|
video_list = re.findall(pattern=video_js_number, string=html)
|
||||||
|
for video in video_list:
|
||||||
|
video_args = video.replace('\'', '').split(',')
|
||||||
|
video_url = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}&firstShowFlag={}'.format(
|
||||||
|
video_args[0], video_args[1], fs_status or '1')
|
||||||
|
sleep(random.Random().randint(0, 5)) # Prevent from blockage
|
||||||
|
icourses_download(video_url, **kwargs)
|
||||||
|
|
||||||
|
def icourses_cn_url_parser(self, received=0, **kwargs):
|
||||||
|
PLAYER_BASE_VER = '150606-1'
|
||||||
|
ENCRYPT_MOD_VER = '151020'
|
||||||
|
ENCRYPT_SALT = '3DAPmXsZ4o' # It took really long time to find this...
|
||||||
|
html = get_content(self.url)
|
||||||
|
if re.search(pattern=r'showSectionNode\(.*\)', string=html):
|
||||||
|
logging.warning('Switching to playlist mode!')
|
||||||
|
return self.icourses_playlist_download(**kwargs)
|
||||||
|
flashvars_patt = r'var\ flashvars\=((.|\n)*)};'
|
||||||
|
server_time_patt = r'MPlayer.swf\?v\=(\d+)'
|
||||||
|
uuid_patt = r'uuid:(\d+)'
|
||||||
|
other_args_patt = r'other:"(.*)"'
|
||||||
|
res_url_patt = r'IService:\'([^\']+)'
|
||||||
|
title_a_patt = r'<div class="con"> <a.*?>(.*?)</a>'
|
||||||
|
title_b_patt = r'<div class="con"> <a.*?/a>((.|\n)*?)</div>'
|
||||||
|
title_a = match1(html, title_a_patt).strip()
|
||||||
|
title_b = match1(html, title_b_patt).strip()
|
||||||
|
title = title_a + title_b # WIP, FIXME
|
||||||
|
title = re.sub('( +|\n|\t|\r|\ \;)', '',
|
||||||
|
unescape_html(title).replace(' ', ''))
|
||||||
|
server_time = match1(html, server_time_patt)
|
||||||
|
flashvars = match1(html, flashvars_patt)
|
||||||
|
uuid = match1(flashvars, uuid_patt)
|
||||||
|
other_args = match1(flashvars, other_args_patt)
|
||||||
|
res_url = match1(flashvars, res_url_patt)
|
||||||
|
url_parts = {'v': server_time, 'other': other_args,
|
||||||
|
'uuid': uuid, 'IService': res_url}
|
||||||
|
req_url = '%s?%s' % (res_url, parse.urlencode(url_parts))
|
||||||
|
logging.debug('Requesting video resource location...')
|
||||||
|
xml_resp = get_html(req_url)
|
||||||
|
xml_obj = ET.fromstring(xml_resp)
|
||||||
|
logging.debug('The result was {}'.format(xml_obj.get('status')))
|
||||||
|
if xml_obj.get('status') != 'success':
|
||||||
|
raise ValueError('Server returned error!')
|
||||||
|
if received:
|
||||||
|
play_type = 'seek'
|
||||||
|
else:
|
||||||
|
play_type = 'play'
|
||||||
|
received -= 1
|
||||||
|
common_args = {'lv': PLAYER_BASE_VER, 'ls': play_type,
|
||||||
|
'lt': datetime.datetime.now().strftime('%m-%d/%H:%M:%S'),
|
||||||
|
'start': received + 1}
|
||||||
|
media_host = xml_obj.find(".//*[@name='host']").text
|
||||||
|
media_url = media_host + xml_obj.find(".//*[@name='url']").text
|
||||||
|
# This is what they called `SSLModule`... But obviously, just a kind of
|
||||||
|
# encryption, takes absolutely no effect in protecting data intergrity
|
||||||
|
if xml_obj.find(".//*[@name='ssl']").text != 'true':
|
||||||
|
logging.debug('The encryption mode is disabled')
|
||||||
|
# when the so-called `SSLMode` is not activated, the parameters, `h`
|
||||||
|
# and `p` can be found in response
|
||||||
|
arg_h = xml_obj.find(".//*[@name='h']").text
|
||||||
|
assert arg_h
|
||||||
|
arg_r = xml_obj.find(".//*[@name='p']").text or ENCRYPT_MOD_VER
|
||||||
|
url_args = common_args.copy()
|
||||||
|
url_args.update({'h': arg_h, 'r': arg_r})
|
||||||
|
final_url = '{}?{}'.format(
|
||||||
|
media_url, parse.urlencode(url_args))
|
||||||
|
self.title = title
|
||||||
|
return final_url
|
||||||
|
# when the `SSLMode` is activated, we need to receive the timestamp and the
|
||||||
|
# time offset (?) value from the server
|
||||||
|
logging.debug('The encryption mode is in effect')
|
||||||
|
ssl_callback = get_html(
|
||||||
|
'{}/ssl/ssl.shtml'.format(media_host)).split(',')
|
||||||
|
ssl_timestamp = int(datetime.datetime.strptime(
|
||||||
|
ssl_callback[1], "%b %d %H:%M:%S %Y").timestamp() + int(ssl_callback[0]))
|
||||||
|
sign_this = ENCRYPT_SALT + \
|
||||||
|
parse.urlparse(media_url).path + str(ssl_timestamp)
|
||||||
|
arg_h = base64.b64encode(hashlib.md5(
|
||||||
|
bytes(sign_this, 'utf-8')).digest())
|
||||||
|
# Post-processing, may subject to change, so leaving this alone...
|
||||||
|
arg_h = arg_h.decode('utf-8').strip('=').replace('+',
|
||||||
|
'-').replace('/', '_')
|
||||||
|
arg_r = ssl_timestamp
|
||||||
|
url_args = common_args.copy()
|
||||||
|
url_args.update({'h': arg_h, 'r': arg_r, 'p': ENCRYPT_MOD_VER})
|
||||||
|
final_url = '{}?{}'.format(
|
||||||
|
media_url, parse.urlencode(url_args))
|
||||||
|
logging.debug('Crafted URL: {}'.format(final_url))
|
||||||
|
self.title = title
|
||||||
|
return final_url
|
||||||
|
|
||||||
|
|
||||||
|
site_info = 'icourses.cn'
|
||||||
|
download = icourses_download
|
||||||
|
# download_playlist = icourses_playlist_download
|
@ -6,14 +6,14 @@ from ..common import *
|
|||||||
|
|
||||||
def ifeng_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
def ifeng_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
||||||
assert r1(r'([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', id), id
|
assert r1(r'([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', id), id
|
||||||
url = 'http://v.ifeng.com/video_info_new/%s/%s/%s.xml' % (id[-2], id[-2:], id)
|
url = 'http://vxml.ifengimg.com/video_info_new/%s/%s/%s.xml' % (id[-2], id[-2:], id)
|
||||||
xml = get_html(url, 'utf-8')
|
xml = get_html(url, 'utf-8')
|
||||||
title = r1(r'Name="([^"]+)"', xml)
|
title = r1(r'Name="([^"]+)"', xml)
|
||||||
title = unescape_html(title)
|
title = unescape_html(title)
|
||||||
url = r1(r'VideoPlayUrl="([^"]+)"', xml)
|
url = r1(r'VideoPlayUrl="([^"]+)"', xml)
|
||||||
from random import randint
|
from random import randint
|
||||||
r = randint(10, 19)
|
r = randint(10, 19)
|
||||||
url = url.replace('http://video.ifeng.com/', 'http://video%s.ifeng.com/' % r)
|
url = url.replace('http://wideo.ifeng.com/', 'http://ips.ifeng.com/wideo.ifeng.com/')
|
||||||
type, ext, size = url_info(url)
|
type, ext, size = url_info(url)
|
||||||
|
|
||||||
print_info(site_info, title, ext, size)
|
print_info(site_info, title, ext, size)
|
||||||
|
@ -1,13 +1,18 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
|
||||||
from ..common import *
|
from ..common import *
|
||||||
|
from ..common import print_more_compatible as print
|
||||||
from ..extractor import VideoExtractor
|
from ..extractor import VideoExtractor
|
||||||
|
from ..util import log
|
||||||
|
from .. import json_output
|
||||||
|
|
||||||
from uuid import uuid4
|
from uuid import uuid4
|
||||||
from random import random,randint
|
from random import random,randint
|
||||||
import json
|
import json
|
||||||
from math import floor
|
from math import floor
|
||||||
from zlib import decompress
|
from zlib import decompress
|
||||||
import hashlib
|
import hashlib
|
||||||
|
import time
|
||||||
|
|
||||||
'''
|
'''
|
||||||
Changelog:
|
Changelog:
|
||||||
@ -43,6 +48,7 @@ bid meaning for quality
|
|||||||
10 4k
|
10 4k
|
||||||
96 topspeed
|
96 topspeed
|
||||||
|
|
||||||
|
'''
|
||||||
'''
|
'''
|
||||||
def mix(tvid):
|
def mix(tvid):
|
||||||
salt = '4a1caba4b4465345366f28da7c117d20'
|
salt = '4a1caba4b4465345366f28da7c117d20'
|
||||||
@ -75,42 +81,37 @@ def getDispathKey(rid):
|
|||||||
time=json.loads(get_content("http://data.video.qiyi.com/t?tn="+str(random())))["t"]
|
time=json.loads(get_content("http://data.video.qiyi.com/t?tn="+str(random())))["t"]
|
||||||
t=str(int(floor(int(time)/(10*60.0))))
|
t=str(int(floor(int(time)/(10*60.0))))
|
||||||
return hashlib.new("md5",bytes(t+tp+rid,"utf-8")).hexdigest()
|
return hashlib.new("md5",bytes(t+tp+rid,"utf-8")).hexdigest()
|
||||||
|
'''
|
||||||
|
def getVMS(tvid, vid):
|
||||||
|
t = int(time.time() * 1000)
|
||||||
|
src = '76f90cbd92f94a2e925d83e8ccd22cb7'
|
||||||
|
key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
|
||||||
|
sc = hashlib.new('md5', bytes(str(t) + key + vid, 'utf-8')).hexdigest()
|
||||||
|
vmsreq= url = 'http://cache.m.iqiyi.com/tmts/{0}/{1}/?t={2}&sc={3}&src={4}'.format(tvid,vid,t,sc,src)
|
||||||
|
return json.loads(get_content(vmsreq))
|
||||||
|
|
||||||
class Iqiyi(VideoExtractor):
|
class Iqiyi(VideoExtractor):
|
||||||
name = "爱奇艺 (Iqiyi)"
|
name = "爱奇艺 (Iqiyi)"
|
||||||
|
|
||||||
stream_types = [
|
stream_types = [
|
||||||
{'id': '4k', 'container': 'f4v', 'video_profile': '4K'},
|
{'id': '4k', 'container': 'm3u8', 'video_profile': '4k'},
|
||||||
{'id': 'fullhd', 'container': 'f4v', 'video_profile': '全高清'},
|
{'id': 'BD', 'container': 'm3u8', 'video_profile': '1080p'},
|
||||||
{'id': 'suprt-high', 'container': 'f4v', 'video_profile': '超高清'},
|
{'id': 'TD', 'container': 'm3u8', 'video_profile': '720p'},
|
||||||
{'id': 'super', 'container': 'f4v', 'video_profile': '超清'},
|
{'id': 'HD', 'container': 'm3u8', 'video_profile': '540p'},
|
||||||
{'id': 'high', 'container': 'f4v', 'video_profile': '高清'},
|
{'id': 'SD', 'container': 'm3u8', 'video_profile': '360p'},
|
||||||
{'id': 'standard', 'container': 'f4v', 'video_profile': '标清'},
|
{'id': 'LD', 'container': 'm3u8', 'video_profile': '210p'},
|
||||||
{'id': 'topspeed', 'container': 'f4v', 'video_profile': '最差'},
|
|
||||||
]
|
]
|
||||||
|
'''
|
||||||
|
supported_stream_types = [ 'high', 'standard']
|
||||||
|
|
||||||
|
|
||||||
stream_to_bid = { '4k': 10, 'fullhd' : 5, 'suprt-high' : 4, 'super' : 3, 'high' : 2, 'standard' :1, 'topspeed' :96}
|
stream_to_bid = { '4k': 10, 'fullhd' : 5, 'suprt-high' : 4, 'super' : 3, 'high' : 2, 'standard' :1, 'topspeed' :96}
|
||||||
|
'''
|
||||||
|
ids = ['4k','BD', 'TD', 'HD', 'SD', 'LD']
|
||||||
|
vd_2_id = {10: '4k', 19: '4k', 5:'BD', 18: 'BD', 21: 'HD', 2: 'HD', 4: 'TD', 17: 'TD', 96: 'LD', 1: 'SD'}
|
||||||
|
id_2_profile = {'4k':'4k', 'BD': '1080p','TD': '720p', 'HD': '540p', 'SD': '360p', 'LD': '210p'}
|
||||||
|
|
||||||
stream_urls = { '4k': [] , 'fullhd' : [], 'suprt-high' : [], 'super' : [], 'high' : [], 'standard' :[], 'topspeed' :[]}
|
|
||||||
|
|
||||||
baseurl = ''
|
|
||||||
|
|
||||||
gen_uid = ''
|
|
||||||
def getVMS(self):
|
|
||||||
#tm ->the flash run time for md5 usage
|
|
||||||
#um -> vip 1 normal 0
|
|
||||||
#authkey -> for password protected video ,replace '' with your password
|
|
||||||
#puid user.passportid may empty?
|
|
||||||
#TODO: support password protected video
|
|
||||||
tvid, vid = self.vid
|
|
||||||
tm, sc, src = mix(tvid)
|
|
||||||
uid = self.gen_uid
|
|
||||||
vmsreq='http://cache.video.qiyi.com/vms?key=fvip&src=1702633101b340d8917a69cf8a4b8c7' +\
|
|
||||||
"&tvId="+tvid+"&vid="+vid+"&vinfo=1&tm="+tm+\
|
|
||||||
"&enc="+sc+\
|
|
||||||
"&qyid="+uid+"&tn="+str(random()) +"&um=1" +\
|
|
||||||
"&authkey="+hashlib.new('md5',bytes(hashlib.new('md5', b'').hexdigest()+str(tm)+tvid,'utf-8')).hexdigest()
|
|
||||||
return json.loads(get_content(vmsreq))
|
|
||||||
|
|
||||||
def download_playlist_by_url(self, url, **kwargs):
|
def download_playlist_by_url(self, url, **kwargs):
|
||||||
self.url = url
|
self.url = url
|
||||||
@ -133,14 +134,88 @@ class Iqiyi(VideoExtractor):
|
|||||||
r1(r'vid=([^&]+)', self.url) or \
|
r1(r'vid=([^&]+)', self.url) or \
|
||||||
r1(r'data-player-videoid="([^"]+)"', html)
|
r1(r'data-player-videoid="([^"]+)"', html)
|
||||||
self.vid = (tvid, videoid)
|
self.vid = (tvid, videoid)
|
||||||
|
self.title = match1(html, '<title>([^<]+)').split('-')[0]
|
||||||
|
tvid, videoid = self.vid
|
||||||
|
info = getVMS(tvid, videoid)
|
||||||
|
assert info['code'] == 'A00000', 'can\'t play this video'
|
||||||
|
|
||||||
self.gen_uid = uuid4().hex
|
for stream in info['data']['vidl']:
|
||||||
try:
|
try:
|
||||||
info = self.getVMS()
|
stream_id = self.vd_2_id[stream['vd']]
|
||||||
|
if stream_id in self.stream_types:
|
||||||
|
continue
|
||||||
|
stream_profile = self.id_2_profile[stream_id]
|
||||||
|
self.streams[stream_id] = {'video_profile': stream_profile, 'container': 'm3u8', 'src': [stream['m3u']], 'size' : 0}
|
||||||
except:
|
except:
|
||||||
self.download_playlist_by_url(self.url, **kwargs)
|
log.i("vd: {} is not handled".format(stream['vd']))
|
||||||
exit(0)
|
log.i("info is {}".format(stream))
|
||||||
|
|
||||||
|
|
||||||
|
def download(self, **kwargs):
|
||||||
|
"""Override the original one
|
||||||
|
Ugly ugly dirty hack"""
|
||||||
|
if 'json_output' in kwargs and kwargs['json_output']:
|
||||||
|
json_output.output(self)
|
||||||
|
elif 'info_only' in kwargs and kwargs['info_only']:
|
||||||
|
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||||
|
# Display the stream
|
||||||
|
stream_id = kwargs['stream_id']
|
||||||
|
if 'index' not in kwargs:
|
||||||
|
self.p(stream_id)
|
||||||
|
else:
|
||||||
|
self.p_i(stream_id)
|
||||||
|
else:
|
||||||
|
# Display all available streams
|
||||||
|
if 'index' not in kwargs:
|
||||||
|
self.p([])
|
||||||
|
else:
|
||||||
|
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||||
|
self.p_i(stream_id)
|
||||||
|
|
||||||
|
else:
|
||||||
|
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||||
|
# Download the stream
|
||||||
|
stream_id = kwargs['stream_id']
|
||||||
|
else:
|
||||||
|
# Download stream with the best quality
|
||||||
|
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||||
|
|
||||||
|
if 'index' not in kwargs:
|
||||||
|
self.p(stream_id)
|
||||||
|
else:
|
||||||
|
self.p_i(stream_id)
|
||||||
|
|
||||||
|
if stream_id in self.streams:
|
||||||
|
urls = self.streams[stream_id]['src']
|
||||||
|
ext = self.streams[stream_id]['container']
|
||||||
|
total_size = self.streams[stream_id]['size']
|
||||||
|
else:
|
||||||
|
urls = self.dash_streams[stream_id]['src']
|
||||||
|
ext = self.dash_streams[stream_id]['container']
|
||||||
|
total_size = self.dash_streams[stream_id]['size']
|
||||||
|
|
||||||
|
if not urls:
|
||||||
|
log.wtf('[Failed] Cannot extract video source.')
|
||||||
|
# For legacy main()
|
||||||
|
|
||||||
|
#Here's the change!!
|
||||||
|
download_url_ffmpeg(urls[0], self.title, 'mp4',
|
||||||
|
output_dir=kwargs['output_dir'],
|
||||||
|
merge=kwargs['merge'],)
|
||||||
|
|
||||||
|
if not kwargs['caption']:
|
||||||
|
print('Skipping captions.')
|
||||||
|
return
|
||||||
|
for lang in self.caption_tracks:
|
||||||
|
filename = '%s.%s.srt' % (get_filename(self.title), lang)
|
||||||
|
print('Saving %s ... ' % filename, end="", flush=True)
|
||||||
|
srt = self.caption_tracks[lang]
|
||||||
|
with open(os.path.join(kwargs['output_dir'], filename),
|
||||||
|
'w', encoding='utf-8') as x:
|
||||||
|
x.write(srt)
|
||||||
|
print('Done.')
|
||||||
|
|
||||||
|
'''
|
||||||
if info["code"] != "A000000":
|
if info["code"] != "A000000":
|
||||||
log.e("[error] outdated iQIYI key")
|
log.e("[error] outdated iQIYI key")
|
||||||
log.wtf("is your you-get up-to-date?")
|
log.wtf("is your you-get up-to-date?")
|
||||||
@ -208,6 +283,7 @@ class Iqiyi(VideoExtractor):
|
|||||||
#because the url is generated before start downloading
|
#because the url is generated before start downloading
|
||||||
#and the key may be expired after 10 minutes
|
#and the key may be expired after 10 minutes
|
||||||
self.streams[stream_id]['src'] = urls
|
self.streams[stream_id]['src'] = urls
|
||||||
|
'''
|
||||||
|
|
||||||
site = Iqiyi()
|
site = Iqiyi()
|
||||||
download = site.download_by_url
|
download = site.download_by_url
|
||||||
|
0
src/you_get/extractors/khan.py
Executable file → Normal file
0
src/you_get/extractors/khan.py
Executable file → Normal file
@ -27,6 +27,11 @@ def ku6_download_by_id(id, title = None, output_dir = '.', merge = True, info_on
|
|||||||
download_urls(urls, title, ext, size, output_dir, merge = merge)
|
download_urls(urls, title, ext, size, output_dir, merge = merge)
|
||||||
|
|
||||||
def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
|
id = None
|
||||||
|
|
||||||
|
if match1(url, r'http://baidu.ku6.com/watch/(.*)\.html') is not None:
|
||||||
|
id = baidu_ku6(url)
|
||||||
|
else:
|
||||||
patterns = [r'http://v.ku6.com/special/show_\d+/(.*)\.\.\.html',
|
patterns = [r'http://v.ku6.com/special/show_\d+/(.*)\.\.\.html',
|
||||||
r'http://v.ku6.com/show/(.*)\.\.\.html',
|
r'http://v.ku6.com/show/(.*)\.\.\.html',
|
||||||
r'http://my.ku6.com/watch\?.*v=(.*)\.\..*']
|
r'http://my.ku6.com/watch\?.*v=(.*)\.\..*']
|
||||||
@ -34,6 +39,18 @@ def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwarg
|
|||||||
|
|
||||||
ku6_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
|
ku6_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||||
|
|
||||||
|
def baidu_ku6(url):
|
||||||
|
id = None
|
||||||
|
|
||||||
|
h1 = get_html(url)
|
||||||
|
isrc = match1(h1, r'<iframe id="innerFrame" src="([^"]*)"')
|
||||||
|
|
||||||
|
if isrc is not None:
|
||||||
|
h2 = get_html(isrc)
|
||||||
|
id = match1(h2, r'http://v.ku6.com/show/(.*)\.\.\.html')
|
||||||
|
|
||||||
|
return id
|
||||||
|
|
||||||
site_info = "Ku6.com"
|
site_info = "Ku6.com"
|
||||||
download = ku6_download
|
download = ku6_download
|
||||||
download_playlist = playlist_not_supported('ku6')
|
download_playlist = playlist_not_supported('ku6')
|
||||||
|
112
src/you_get/extractors/mgtv.py
Normal file
112
src/you_get/extractors/mgtv.py
Normal file
@ -0,0 +1,112 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
from ..extractor import VideoExtractor
|
||||||
|
|
||||||
|
from json import loads
|
||||||
|
from urllib.parse import urlsplit
|
||||||
|
from os.path import dirname
|
||||||
|
import re
|
||||||
|
|
||||||
|
class MGTV(VideoExtractor):
|
||||||
|
name = "芒果 (MGTV)"
|
||||||
|
|
||||||
|
# Last updated: 2015-11-24
|
||||||
|
stream_types = [
|
||||||
|
{'id': 'hd', 'container': 'flv', 'video_profile': '超清'},
|
||||||
|
{'id': 'sd', 'container': 'flv', 'video_profile': '高清'},
|
||||||
|
{'id': 'ld', 'container': 'flv', 'video_profile': '标清'},
|
||||||
|
]
|
||||||
|
|
||||||
|
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
|
||||||
|
|
||||||
|
api_endpoint = 'http://v.api.mgtv.com/player/video?video_id={video_id}'
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_vid_from_url(url):
|
||||||
|
"""Extracts video ID from URL.
|
||||||
|
"""
|
||||||
|
return match1(url, 'http://www.mgtv.com/v/\d/\d+/\w+/(\d+).html')
|
||||||
|
|
||||||
|
#----------------------------------------------------------------------
|
||||||
|
@staticmethod
|
||||||
|
def get_mgtv_real_url(url):
|
||||||
|
"""str->list of str
|
||||||
|
Give you the real URLs."""
|
||||||
|
content = loads(get_content(url))
|
||||||
|
m3u_url = content['info']
|
||||||
|
split = urlsplit(m3u_url)
|
||||||
|
|
||||||
|
base_url = "{scheme}://{netloc}{path}/".format(scheme = split[0],
|
||||||
|
netloc = split[1],
|
||||||
|
path = dirname(split[2]))
|
||||||
|
|
||||||
|
content = get_content(content['info']) #get the REAL M3U url, maybe to be changed later?
|
||||||
|
segment_list = []
|
||||||
|
for i in content.split():
|
||||||
|
if not i.startswith('#'): #not the best way, better we use the m3u8 package
|
||||||
|
segment_list.append(base_url + i)
|
||||||
|
return segment_list
|
||||||
|
|
||||||
|
def download_playlist_by_url(self, url, **kwargs):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def prepare(self, **kwargs):
|
||||||
|
if self.url:
|
||||||
|
self.vid = self.get_vid_from_url(self.url)
|
||||||
|
content = get_content(self.api_endpoint.format(video_id = self.vid))
|
||||||
|
content = loads(content)
|
||||||
|
self.title = content['data']['info']['title']
|
||||||
|
|
||||||
|
#stream_avalable = [i['name'] for i in content['data']['stream']]
|
||||||
|
stream_available = {}
|
||||||
|
for i in content['data']['stream']:
|
||||||
|
stream_available[i['name']] = i['url']
|
||||||
|
|
||||||
|
for s in self.stream_types:
|
||||||
|
if s['video_profile'] in stream_available.keys():
|
||||||
|
quality_id = self.id_dic[s['video_profile']]
|
||||||
|
url = stream_available[s['video_profile']]
|
||||||
|
url = re.sub( r'(\&arange\=\d+)', '', url) #Un-Hum
|
||||||
|
segment_list_this = self.get_mgtv_real_url(url)
|
||||||
|
|
||||||
|
container_this_stream = ''
|
||||||
|
size_this_stream = 0
|
||||||
|
stream_fileid_list = []
|
||||||
|
for i in segment_list_this:
|
||||||
|
_, container_this_stream, size_this_seg = url_info(i)
|
||||||
|
size_this_stream += size_this_seg
|
||||||
|
stream_fileid_list.append(os.path.basename(i).split('.')[0])
|
||||||
|
|
||||||
|
#make pieces
|
||||||
|
pieces = []
|
||||||
|
for i in zip(stream_fileid_list, segment_list_this):
|
||||||
|
pieces.append({'fileid': i[0], 'segs': i[1],})
|
||||||
|
|
||||||
|
self.streams[quality_id] = {
|
||||||
|
'container': 'flv',
|
||||||
|
'video_profile': s['video_profile'],
|
||||||
|
'size': size_this_stream,
|
||||||
|
'pieces': pieces
|
||||||
|
}
|
||||||
|
|
||||||
|
if not kwargs['info_only']:
|
||||||
|
self.streams[quality_id]['src'] = segment_list_this
|
||||||
|
|
||||||
|
def extract(self, **kwargs):
|
||||||
|
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||||
|
# Extract the stream
|
||||||
|
stream_id = kwargs['stream_id']
|
||||||
|
|
||||||
|
if stream_id not in self.streams:
|
||||||
|
log.e('[Error] Invalid video format.')
|
||||||
|
log.e('Run \'-i\' command with no specific video format to view all available formats.')
|
||||||
|
exit(2)
|
||||||
|
else:
|
||||||
|
# Extract stream with the best quality
|
||||||
|
stream_id = self.streams_sorted[0]['id']
|
||||||
|
|
||||||
|
site = MGTV()
|
||||||
|
download = site.download_by_url
|
||||||
|
download_playlist = site.download_playlist_by_url
|
@ -37,7 +37,7 @@ def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **
|
|||||||
miaopai_download_by_url(url, output_dir, merge, info_only)
|
miaopai_download_by_url(url, output_dir, merge, info_only)
|
||||||
elif re.match(r'http://weibo.com/p/230444\w+', url):
|
elif re.match(r'http://weibo.com/p/230444\w+', url):
|
||||||
_fid = match1(url, r'http://weibo.com/p/230444(\w+)')
|
_fid = match1(url, r'http://weibo.com/p/230444(\w+)')
|
||||||
miaopai_download_by_url('http://video.weibo.com/show?fid=1034:{_fid}'.format(_fid = _fid))
|
miaopai_download_by_url('http://video.weibo.com/show?fid=1034:{_fid}'.format(_fid = _fid), output_dir, merge, info_only)
|
||||||
|
|
||||||
site_info = "miaopai"
|
site_info = "miaopai"
|
||||||
download = miaopai_download
|
download = miaopai_download
|
||||||
|
0
src/you_get/extractors/miomio.py
Executable file → Normal file
0
src/you_get/extractors/miomio.py
Executable file → Normal file
48
src/you_get/extractors/naver.py
Normal file
48
src/you_get/extractors/naver.py
Normal file
@ -0,0 +1,48 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
__all__ = ['naver_download']
|
||||||
|
import urllib.request, urllib.parse
|
||||||
|
from ..common import *
|
||||||
|
|
||||||
|
def naver_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
|
|
||||||
|
assert re.search(r'http://tvcast.naver.com/v/', url), "URL is not supported"
|
||||||
|
|
||||||
|
html = get_html(url)
|
||||||
|
contentid = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',html)
|
||||||
|
videoid = contentid.group(1)
|
||||||
|
inkey = contentid.group(2)
|
||||||
|
assert videoid
|
||||||
|
assert inkey
|
||||||
|
info_key = urllib.parse.urlencode({'vid': videoid, 'inKey': inkey, })
|
||||||
|
down_key = urllib.parse.urlencode({'masterVid': videoid,'protocol': 'p2p','inKey': inkey, })
|
||||||
|
inf_xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?%s' % info_key )
|
||||||
|
|
||||||
|
from xml.dom.minidom import parseString
|
||||||
|
doc_info = parseString(inf_xml)
|
||||||
|
Subject = doc_info.getElementsByTagName('Subject')[0].firstChild
|
||||||
|
title = Subject.data
|
||||||
|
assert title
|
||||||
|
|
||||||
|
xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?%s' % down_key )
|
||||||
|
doc = parseString(xml)
|
||||||
|
|
||||||
|
encodingoptions = doc.getElementsByTagName('EncodingOption')
|
||||||
|
old_height = doc.getElementsByTagName('height')[0]
|
||||||
|
real_url= ''
|
||||||
|
#to download the highest resolution one,
|
||||||
|
for node in encodingoptions:
|
||||||
|
new_height = node.getElementsByTagName('height')[0]
|
||||||
|
domain_node = node.getElementsByTagName('Domain')[0]
|
||||||
|
uri_node = node.getElementsByTagName('uri')[0]
|
||||||
|
if int(new_height.firstChild.data) > int (old_height.firstChild.data):
|
||||||
|
real_url= domain_node.firstChild.data+ '/' +uri_node.firstChild.data
|
||||||
|
|
||||||
|
type, ext, size = url_info(real_url)
|
||||||
|
print_info(site_info, title, type, size)
|
||||||
|
if not info_only:
|
||||||
|
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||||
|
|
||||||
|
site_info = "tvcast.naver.com"
|
||||||
|
download = naver_download
|
||||||
|
download_playlist = playlist_not_supported('naver')
|
@ -4,6 +4,8 @@
|
|||||||
__all__ = ['netease_download']
|
__all__ = ['netease_download']
|
||||||
|
|
||||||
from ..common import *
|
from ..common import *
|
||||||
|
from ..common import print_more_compatible as print
|
||||||
|
from ..util import fs
|
||||||
from json import loads
|
from json import loads
|
||||||
import hashlib
|
import hashlib
|
||||||
import base64
|
import base64
|
||||||
@ -28,10 +30,10 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
|
|||||||
|
|
||||||
artist_name = j['album']['artists'][0]['name']
|
artist_name = j['album']['artists'][0]['name']
|
||||||
album_name = j['album']['name']
|
album_name = j['album']['name']
|
||||||
new_dir = output_dir + '/' + "%s - %s" % (artist_name, album_name)
|
new_dir = output_dir + '/' + fs.legitimize("%s - %s" % (artist_name, album_name))
|
||||||
|
if not info_only:
|
||||||
if not os.path.exists(new_dir):
|
if not os.path.exists(new_dir):
|
||||||
os.mkdir(new_dir)
|
os.mkdir(new_dir)
|
||||||
if not info_only:
|
|
||||||
cover_url = j['album']['picUrl']
|
cover_url = j['album']['picUrl']
|
||||||
download_urls([cover_url], "cover", "jpg", 0, new_dir)
|
download_urls([cover_url], "cover", "jpg", 0, new_dir)
|
||||||
|
|
||||||
@ -46,10 +48,10 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
|
|||||||
elif "playlist" in url:
|
elif "playlist" in url:
|
||||||
j = loads(get_content("http://music.163.com/api/playlist/detail?id=%s&csrf_token=" % rid, headers={"Referer": "http://music.163.com/"}))
|
j = loads(get_content("http://music.163.com/api/playlist/detail?id=%s&csrf_token=" % rid, headers={"Referer": "http://music.163.com/"}))
|
||||||
|
|
||||||
new_dir = output_dir + '/' + j['result']['name']
|
new_dir = output_dir + '/' + fs.legitimize(j['result']['name'])
|
||||||
|
if not info_only:
|
||||||
if not os.path.exists(new_dir):
|
if not os.path.exists(new_dir):
|
||||||
os.mkdir(new_dir)
|
os.mkdir(new_dir)
|
||||||
if not info_only:
|
|
||||||
cover_url = j['result']['coverImgUrl']
|
cover_url = j['result']['coverImgUrl']
|
||||||
download_urls([cover_url], "cover", "jpg", 0, new_dir)
|
download_urls([cover_url], "cover", "jpg", 0, new_dir)
|
||||||
|
|
||||||
@ -70,6 +72,15 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
|
|||||||
netease_lyric_download(j["songs"][0], l["lrc"]["lyric"], output_dir=output_dir, info_only=info_only)
|
netease_lyric_download(j["songs"][0], l["lrc"]["lyric"], output_dir=output_dir, info_only=info_only)
|
||||||
except: pass
|
except: pass
|
||||||
|
|
||||||
|
elif "program" in url:
|
||||||
|
j = loads(get_content("http://music.163.com/api/dj/program/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
|
||||||
|
netease_song_download(j["program"]["mainSong"], output_dir=output_dir, info_only=info_only)
|
||||||
|
|
||||||
|
elif "radio" in url:
|
||||||
|
j = loads(get_content("http://music.163.com/api/dj/program/byradio/?radioId=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
|
||||||
|
for i in j['programs']:
|
||||||
|
netease_song_download(i["mainSong"],output_dir=output_dir, info_only=info_only)
|
||||||
|
|
||||||
elif "mv" in url:
|
elif "mv" in url:
|
||||||
j = loads(get_content("http://music.163.com/api/mv/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
|
j = loads(get_content("http://music.163.com/api/mv/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
|
||||||
netease_video_download(j['data'], output_dir=output_dir, info_only=info_only)
|
netease_video_download(j['data'], output_dir=output_dir, info_only=info_only)
|
||||||
|
33
src/you_get/extractors/panda.py
Normal file
33
src/you_get/extractors/panda.py
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
__all__ = ['panda_download']
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
import json
|
||||||
|
import time
|
||||||
|
|
||||||
|
def panda_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
|
roomid = url[url.rfind('/')+1:]
|
||||||
|
json_request_url = 'http://www.panda.tv/api_room?roomid={}&pub_key=&_={}'.format(roomid, int(time.time()))
|
||||||
|
content = get_html(json_request_url)
|
||||||
|
errno = json.loads(content)['errno']
|
||||||
|
errmsg = json.loads(content)['errmsg']
|
||||||
|
if errno:
|
||||||
|
raise ValueError("Errno : {}, Errmsg : {}".format(errno, errmsg))
|
||||||
|
|
||||||
|
data = json.loads(content)['data']
|
||||||
|
title = data.get('roominfo')['name']
|
||||||
|
room_key = data.get('videoinfo')['room_key']
|
||||||
|
plflag = data.get('videoinfo')['plflag'].split('_')
|
||||||
|
status = data.get('videoinfo')['status']
|
||||||
|
if status is not "2":
|
||||||
|
raise ValueError("The live stream is not online! (status:%s)" % status)
|
||||||
|
real_url = 'http://pl{}.live.panda.tv/live_panda/{}.flv'.format(plflag[1],room_key)
|
||||||
|
|
||||||
|
print_info(site_info, title, 'flv', float('inf'))
|
||||||
|
if not info_only:
|
||||||
|
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
|
||||||
|
|
||||||
|
site_info = "panda.tv"
|
||||||
|
download = panda_download
|
||||||
|
download_playlist = playlist_not_supported('panda')
|
@ -129,7 +129,7 @@ def pptv_download_by_id(id, title = None, output_dir = '.', merge = True, info_o
|
|||||||
|
|
||||||
pieces = re.findall('<sgm no="(\d+)"[^<>]+fs="(\d+)"', xml)
|
pieces = re.findall('<sgm no="(\d+)"[^<>]+fs="(\d+)"', xml)
|
||||||
numbers, fs = zip(*pieces)
|
numbers, fs = zip(*pieces)
|
||||||
urls=[ "http://ccf.pptv.com/{}/{}?key={}&fpp.ver=1.3.0.4&k={}&type=web.fpp".format(i,rid,key,k) for i in range(max(map(int,numbers))+1)]
|
urls=["http://{}/{}/{}?key={}&fpp.ver=1.3.0.4&k={}&type=web.fpp".format(host,i,rid,key,k) for i in range(max(map(int,numbers))+1)]
|
||||||
|
|
||||||
total_size = sum(map(int, fs))
|
total_size = sum(map(int, fs))
|
||||||
assert rid.endswith('.mp4')
|
assert rid.endswith('.mp4')
|
||||||
|
78
src/you_get/extractors/qie.py
Normal file
78
src/you_get/extractors/qie.py
Normal file
@ -0,0 +1,78 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
from ..extractor import VideoExtractor
|
||||||
|
|
||||||
|
from json import loads
|
||||||
|
|
||||||
|
class QiE(VideoExtractor):
|
||||||
|
name = "QiE (企鹅直播)"
|
||||||
|
|
||||||
|
# Last updated: 2015-11-24
|
||||||
|
stream_types = [
|
||||||
|
{'id': 'normal', 'container': 'flv', 'video_profile': '标清'},
|
||||||
|
{'id': 'middle', 'container': 'flv', 'video_profile': '550'},
|
||||||
|
{'id': 'middle2', 'container': 'flv', 'video_profile': '900'},
|
||||||
|
]
|
||||||
|
|
||||||
|
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
|
||||||
|
|
||||||
|
api_endpoint = 'http://www.qie.tv/api/v1/room/{room_id}'
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_vid_from_url(url):
|
||||||
|
"""Extracts video ID from live.qq.com.
|
||||||
|
"""
|
||||||
|
html = get_content(url)
|
||||||
|
return match1(html, r'room_id\":(\d+)')
|
||||||
|
|
||||||
|
def download_playlist_by_url(self, url, **kwargs):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def prepare(self, **kwargs):
|
||||||
|
if self.url:
|
||||||
|
self.vid = self.get_vid_from_url(self.url)
|
||||||
|
|
||||||
|
content = get_content(self.api_endpoint.format(room_id = self.vid))
|
||||||
|
content = loads(content)
|
||||||
|
self.title = content['data']['room_name']
|
||||||
|
rtmp_url = content['data']['rtmp_url']
|
||||||
|
#stream_avalable = [i['name'] for i in content['data']['stream']]
|
||||||
|
stream_available = {}
|
||||||
|
stream_available['normal'] = rtmp_url + '/' + content['data']['rtmp_live']
|
||||||
|
if len(content['data']['rtmp_multi_bitrate']) > 0:
|
||||||
|
for k , v in content['data']['rtmp_multi_bitrate'].items():
|
||||||
|
stream_available[k] = rtmp_url + '/' + v
|
||||||
|
|
||||||
|
for s in self.stream_types:
|
||||||
|
if s['id'] in stream_available.keys():
|
||||||
|
quality_id = s['id']
|
||||||
|
url = stream_available[quality_id]
|
||||||
|
self.streams[quality_id] = {
|
||||||
|
'container': 'flv',
|
||||||
|
'video_profile': s['video_profile'],
|
||||||
|
'size': 0,
|
||||||
|
'url': url
|
||||||
|
}
|
||||||
|
|
||||||
|
def extract(self, **kwargs):
|
||||||
|
for i in self.streams:
|
||||||
|
s = self.streams[i]
|
||||||
|
s['src'] = [s['url']]
|
||||||
|
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||||
|
# Extract the stream
|
||||||
|
stream_id = kwargs['stream_id']
|
||||||
|
|
||||||
|
if stream_id not in self.streams:
|
||||||
|
log.e('[Error] Invalid video format.')
|
||||||
|
log.e('Run \'-i\' command with no specific video format to view all available formats.')
|
||||||
|
exit(2)
|
||||||
|
else:
|
||||||
|
# Extract stream with the best quality
|
||||||
|
stream_id = self.streams_sorted[0]['id']
|
||||||
|
s['src'] = [s['url']]
|
||||||
|
|
||||||
|
site = QiE()
|
||||||
|
download = site.download_by_url
|
||||||
|
download_playlist = playlist_not_supported('QiE')
|
@ -3,32 +3,105 @@
|
|||||||
__all__ = ['qq_download']
|
__all__ = ['qq_download']
|
||||||
|
|
||||||
from ..common import *
|
from ..common import *
|
||||||
|
from .qie import download as qieDownload
|
||||||
|
from urllib.parse import urlparse,parse_qs
|
||||||
|
|
||||||
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
|
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
|
||||||
api = "http://h5vv.video.qq.com/getinfo?otype=json&vid=%s" % vid
|
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3%2E2%2E19%2E333&platform=11&defnpayver=1&vid=' + vid
|
||||||
content = get_html(api)
|
info = get_html(info_api)
|
||||||
output_json = json.loads(match1(content, r'QZOutputJson=(.*)')[:-1])
|
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
|
||||||
url = output_json['vl']['vi'][0]['ul']['ui'][0]['url']
|
parts_vid = video_json['vl']['vi'][0]['vid']
|
||||||
|
parts_ti = video_json['vl']['vi'][0]['ti']
|
||||||
|
parts_prefix = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
|
||||||
|
parts_formats = video_json['fl']['fi']
|
||||||
|
# find best quality
|
||||||
|
# only looking for fhd(1080p) and shd(720p) here.
|
||||||
|
# 480p usually come with a single file, will be downloaded as fallback.
|
||||||
|
best_quality = ''
|
||||||
|
for part_format in parts_formats:
|
||||||
|
if part_format['name'] == 'fhd':
|
||||||
|
best_quality = 'fhd'
|
||||||
|
break
|
||||||
|
|
||||||
|
if part_format['name'] == 'shd':
|
||||||
|
best_quality = 'shd'
|
||||||
|
|
||||||
|
for part_format in parts_formats:
|
||||||
|
if (not best_quality == '') and (not part_format['name'] == best_quality):
|
||||||
|
continue
|
||||||
|
part_format_id = part_format['id']
|
||||||
|
part_format_sl = part_format['sl']
|
||||||
|
if part_format_sl == 0:
|
||||||
|
part_urls= []
|
||||||
|
total_size = 0
|
||||||
|
try:
|
||||||
|
# For fhd(1080p), every part is about 100M and 6 minutes
|
||||||
|
# try 100 parts here limited download longest single video of 10 hours.
|
||||||
|
for part in range(1,100):
|
||||||
|
filename = vid + '.p' + str(part_format_id % 1000) + '.' + str(part) + '.mp4'
|
||||||
|
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format=%s&vid=%s&filename=%s" % (part_format_id, parts_vid, filename)
|
||||||
|
#print(filename)
|
||||||
|
#print(key_api)
|
||||||
|
part_info = get_html(key_api)
|
||||||
|
key_json = json.loads(match1(part_info, r'QZOutputJson=(.*)')[:-1])
|
||||||
|
#print(key_json)
|
||||||
|
vkey = key_json['key']
|
||||||
|
url = '%s/%s?vkey=%s' % (parts_prefix, filename, vkey)
|
||||||
|
part_urls.append(url)
|
||||||
|
_, ext, size = url_info(url, faker=True)
|
||||||
|
total_size += size
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
print_info(site_info, parts_ti, ext, total_size)
|
||||||
|
if not info_only:
|
||||||
|
download_urls(part_urls, parts_ti, ext, total_size, output_dir=output_dir, merge=merge)
|
||||||
|
else:
|
||||||
fvkey = output_json['vl']['vi'][0]['fvkey']
|
fvkey = output_json['vl']['vi'][0]['fvkey']
|
||||||
url = '%s/%s.mp4?vkey=%s' % ( url, vid, fvkey )
|
mp4 = output_json['vl']['vi'][0]['cl'].get('ci', None)
|
||||||
|
if mp4:
|
||||||
|
mp4 = mp4[0]['keyid'].replace('.10', '.p') + '.mp4'
|
||||||
|
else:
|
||||||
|
mp4 = output_json['vl']['vi'][0]['fn']
|
||||||
|
url = '%s/%s?vkey=%s' % ( parts_prefix, mp4, fvkey )
|
||||||
_, ext, size = url_info(url, faker=True)
|
_, ext, size = url_info(url, faker=True)
|
||||||
|
|
||||||
print_info(site_info, title, ext, size)
|
print_info(site_info, title, ext, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([url], title, ext, size, output_dir=output_dir, merge=merge)
|
download_urls([url], title, ext, size, output_dir=output_dir, merge=merge)
|
||||||
|
|
||||||
|
|
||||||
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
if 'iframe/player.html' in url:
|
""""""
|
||||||
|
if 'live.qq.com' in url:
|
||||||
|
qieDownload(url,output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
return
|
||||||
|
|
||||||
|
#do redirect
|
||||||
|
if 'v.qq.com/page' in url:
|
||||||
|
# for URLs like this:
|
||||||
|
# http://v.qq.com/page/k/9/7/k0194pwgw97.html
|
||||||
|
content = get_html(url)
|
||||||
|
url = match1(content,r'window\.location\.href="(.*?)"')
|
||||||
|
|
||||||
|
if 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
|
||||||
|
content = get_html(url)
|
||||||
|
vid = match1(content, r'vid\s*=\s*"\s*([^"]+)"')
|
||||||
|
title = match1(content, r'title">([^"]+)</p>')
|
||||||
|
title = title.strip() if title else vid
|
||||||
|
elif 'iframe/player.html' in url:
|
||||||
vid = match1(url, r'\bvid=(\w+)')
|
vid = match1(url, r'\bvid=(\w+)')
|
||||||
# for embedded URLs; don't know what the title is
|
# for embedded URLs; don't know what the title is
|
||||||
title = vid
|
title = vid
|
||||||
else:
|
else:
|
||||||
content = get_html(url)
|
content = get_html(url)
|
||||||
vid = match1(content, r'vid\s*:\s*"\s*([^"]+)"')
|
vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
|
||||||
title = match1(content, r'title\s*:\s*"\s*([^"]+)"')
|
vid = vid[0] if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
|
||||||
# try to get the right title for URLs like this:
|
title = match1(content,r'<a.*?id\s*=\s*"%s".*?title\s*=\s*"(.+?)".*?>'%vid)
|
||||||
# http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
|
title = match1(content, r'title">([^"]+)</p>') if not title else title
|
||||||
title = matchall(content, [r'title\s*:\s*"\s*([^"]+)"'])[-1]
|
title = match1(content, r'"title":"([^"]+)"') if not title else title
|
||||||
|
title = vid if not title else title #general fallback
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
qq_download_by_vid(vid, title, output_dir, merge, info_only)
|
qq_download_by_vid(vid, title, output_dir, merge, info_only)
|
||||||
|
|
||||||
|
70
src/you_get/extractors/showroom.py
Normal file
70
src/you_get/extractors/showroom.py
Normal file
@ -0,0 +1,70 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
__all__ = ['showroom_download']
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
import urllib.error
|
||||||
|
from json import loads
|
||||||
|
from time import time, sleep
|
||||||
|
|
||||||
|
#----------------------------------------------------------------------
|
||||||
|
def showroom_get_roomid_by_room_url_key(room_url_key):
|
||||||
|
"""str->str"""
|
||||||
|
fake_headers_mobile = {
|
||||||
|
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
||||||
|
'Accept-Charset': 'UTF-8,*;q=0.5',
|
||||||
|
'Accept-Encoding': 'gzip,deflate,sdch',
|
||||||
|
'Accept-Language': 'en-US,en;q=0.8',
|
||||||
|
'User-Agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'
|
||||||
|
}
|
||||||
|
webpage_url = 'https://www.showroom-live.com/' + room_url_key
|
||||||
|
html = get_content(webpage_url, headers = fake_headers_mobile)
|
||||||
|
roomid = match1(html, r'room\?room_id\=(\d+)')
|
||||||
|
assert roomid
|
||||||
|
return roomid
|
||||||
|
|
||||||
|
def showroom_download_by_room_id(room_id, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||||
|
'''Source: Android mobile'''
|
||||||
|
while True:
|
||||||
|
timestamp = str(int(time() * 1000))
|
||||||
|
api_endpoint = 'https://www.showroom-live.com/api/live/streaming_url?room_id={room_id}&_={timestamp}'.format(room_id = room_id, timestamp = timestamp)
|
||||||
|
html = get_content(api_endpoint)
|
||||||
|
html = json.loads(html)
|
||||||
|
#{'streaming_url_list': [{'url': 'rtmp://52.197.69.198:1935/liveedge', 'id': 1, 'label': 'original spec(low latency)', 'is_default': True, 'type': 'rtmp', 'stream_name': '7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed'}, {'url': 'http://52.197.69.198:1935/liveedge/7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed/playlist.m3u8', 'is_default': True, 'id': 2, 'type': 'hls', 'label': 'original spec'}, {'url': 'rtmp://52.197.69.198:1935/liveedge', 'id': 3, 'label': 'low spec(low latency)', 'is_default': False, 'type': 'rtmp', 'stream_name': '7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed_low'}, {'url': 'http://52.197.69.198:1935/liveedge/7656a6d5baa1d77075c971f6d8b6dc61b979fc913dc5fe7cc1318281793436ed_low/playlist.m3u8', 'is_default': False, 'id': 4, 'type': 'hls', 'label': 'low spec'}]}
|
||||||
|
if len(html) >= 1:
|
||||||
|
break
|
||||||
|
log.w('The live show is currently offline.')
|
||||||
|
sleep(1)
|
||||||
|
|
||||||
|
#This is mainly for testing the M3U FFmpeg parser so I would ignore any non-m3u ones
|
||||||
|
stream_url = [i['url'] for i in html['streaming_url_list'] if i['is_default'] and i['type'] == 'hls'][0]
|
||||||
|
|
||||||
|
assert stream_url
|
||||||
|
|
||||||
|
#title
|
||||||
|
title = ''
|
||||||
|
profile_api = 'https://www.showroom-live.com/api/room/profile?room_id={room_id}'.format(room_id = room_id)
|
||||||
|
html = loads(get_content(profile_api))
|
||||||
|
try:
|
||||||
|
title = html['main_name']
|
||||||
|
except KeyError:
|
||||||
|
title = 'Showroom_{room_id}'.format(room_id = room_id)
|
||||||
|
|
||||||
|
type_, ext, size = url_info(stream_url)
|
||||||
|
print_info(site_info, title, type_, size)
|
||||||
|
if not info_only:
|
||||||
|
download_url_ffmpeg(url=stream_url, title=title, ext= 'mp4', output_dir=output_dir)
|
||||||
|
|
||||||
|
|
||||||
|
#----------------------------------------------------------------------
|
||||||
|
def showroom_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||||
|
""""""
|
||||||
|
if re.match( r'(\w+)://www.showroom-live.com/([-\w]+)', url):
|
||||||
|
room_url_key = match1(url, r'\w+://www.showroom-live.com/([-\w]+)')
|
||||||
|
room_id = showroom_get_roomid_by_room_url_key(room_url_key)
|
||||||
|
showroom_download_by_room_id(room_id, output_dir, merge,
|
||||||
|
info_only)
|
||||||
|
|
||||||
|
site_info = "Showroom"
|
||||||
|
download = showroom_download
|
||||||
|
download_playlist = playlist_not_supported('showroom')
|
@ -14,7 +14,7 @@ def get_k(vid, rand):
|
|||||||
|
|
||||||
def video_info_xml(vid):
|
def video_info_xml(vid):
|
||||||
rand = "0.{0}{1}".format(randint(10000, 10000000), randint(10000, 10000000))
|
rand = "0.{0}{1}".format(randint(10000, 10000000), randint(10000, 10000000))
|
||||||
url = 'http://v.iask.com/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, get_k(vid, rand))
|
url = 'http://ask.ivideo.sina.com.cn/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, get_k(vid, rand))
|
||||||
xml = get_content(url, headers=fake_headers, decoded=True)
|
xml = get_content(url, headers=fake_headers, decoded=True)
|
||||||
return xml
|
return xml
|
||||||
|
|
||||||
@ -71,7 +71,7 @@ def sina_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
|||||||
vid = vids[-1]
|
vid = vids[-1]
|
||||||
|
|
||||||
if vid is None:
|
if vid is None:
|
||||||
vid = match1(video_page, r'vid:(\d+)')
|
vid = match1(video_page, r'vid:"?(\d+)"?')
|
||||||
if vid:
|
if vid:
|
||||||
title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
|
title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
|
||||||
sina_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
sina_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
|
@ -32,9 +32,14 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
|
|||||||
set_proxy(tuple(extractor_proxy.split(":")))
|
set_proxy(tuple(extractor_proxy.split(":")))
|
||||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
|
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
|
||||||
for qtyp in ["oriVid","superVid","highVid" ,"norVid","relativeId"]:
|
for qtyp in ["oriVid","superVid","highVid" ,"norVid","relativeId"]:
|
||||||
|
if 'data' in info:
|
||||||
hqvid = info['data'][qtyp]
|
hqvid = info['data'][qtyp]
|
||||||
|
else:
|
||||||
|
hqvid = info[qtyp]
|
||||||
if hqvid != 0 and hqvid != vid :
|
if hqvid != 0 and hqvid != vid :
|
||||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % hqvid))
|
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % hqvid))
|
||||||
|
if not 'allot' in info:
|
||||||
|
continue
|
||||||
break
|
break
|
||||||
if extractor_proxy:
|
if extractor_proxy:
|
||||||
unset_proxy()
|
unset_proxy()
|
||||||
|
@ -6,13 +6,14 @@ from ..common import *
|
|||||||
import random
|
import random
|
||||||
import time
|
import time
|
||||||
from xml.dom import minidom
|
from xml.dom import minidom
|
||||||
|
#possible raw list types
|
||||||
#1. <li>type=tudou&vid=199687639</li>
|
#1. <li>type=tudou&vid=199687639</li>
|
||||||
#2. <li>type=tudou&vid=199506910|</li>
|
#2. <li>type=tudou&vid=199506910|</li>
|
||||||
#3. <li>type=video&file=http://xiaoshen140731.qiniudn.com/lovestage04.flv|</li>
|
#3. <li>type=video&file=http://xiaoshen140731.qiniudn.com/lovestage04.flv|</li>
|
||||||
#4 may ? <li>type=video&file=http://xiaoshen140731.qiniudn.com/lovestage04.flv|xx**type=&vid=?</li>
|
#4 may ? <li>type=video&file=http://xiaoshen140731.qiniudn.com/lovestage04.flv|xx**type=&vid=?</li>
|
||||||
#5. <li>type=tudou&vid=200003098|07**type=tudou&vid=200000350|08</li>
|
#5. <li>type=tudou&vid=200003098|07**type=tudou&vid=200000350|08</li>
|
||||||
|
#6. <li>vid=49454694&type=sina|</li>
|
||||||
|
#7. <li>type=189&vid=513031813243909|</li>
|
||||||
# re_pattern=re.compile(r"(type=(.+?)&(vid|file)=(.*?))[\|<]")
|
# re_pattern=re.compile(r"(type=(.+?)&(vid|file)=(.*?))[\|<]")
|
||||||
|
|
||||||
def tucao_single_download(type_link, title, output_dir=".", merge=True, info_only=False):
|
def tucao_single_download(type_link, title, output_dir=".", merge=True, info_only=False):
|
||||||
@ -22,8 +23,17 @@ def tucao_single_download(type_link, title, output_dir=".", merge=True, info_onl
|
|||||||
print_info(site_info, title, vtype, size)
|
print_info(site_info, title, vtype, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([url], title, ext, size, output_dir)
|
download_urls([url], title, ext, size, output_dir)
|
||||||
|
#fix for 189 video source, see raw list types 7
|
||||||
|
elif "189" in type_link:
|
||||||
|
vid = match1(type_link, r"vid=(\d+)")
|
||||||
|
assert vid, "vid not exsits"
|
||||||
|
url = "http://api.tucao.tv/api/down/{}".format(vid)
|
||||||
|
vtype, ext, size=url_info(url)
|
||||||
|
print_info(site_info, title, vtype, size)
|
||||||
|
if not info_only:
|
||||||
|
download_urls([url], title, ext, size, output_dir)
|
||||||
else:
|
else:
|
||||||
u="http://www.tucao.cc/api/playurl.php?{}&key=tucao{:07x}.cc&r={}".format(type_link,random.getrandbits(28),int(time.time()*1000))
|
u="http://www.tucao.tv/api/playurl.php?{}&key=tucao{:07x}.cc&r={}".format(type_link,random.getrandbits(28),int(time.time()*1000))
|
||||||
xml=minidom.parseString(get_content(u))
|
xml=minidom.parseString(get_content(u))
|
||||||
urls=[]
|
urls=[]
|
||||||
size=0
|
size=0
|
||||||
@ -38,7 +48,8 @@ def tucao_single_download(type_link, title, output_dir=".", merge=True, info_onl
|
|||||||
def tucao_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
|
def tucao_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
|
||||||
html=get_content(url)
|
html=get_content(url)
|
||||||
title=match1(html,r'<h1 class="show_title">(.*?)<\w')
|
title=match1(html,r'<h1 class="show_title">(.*?)<\w')
|
||||||
raw_list=match1(html,r"<li>(type=.+?)</li>")
|
#fix for raw list that vid goes before type, see raw list types 6
|
||||||
|
raw_list=match1(html,r"<li>\s*(type=.+?|vid=.+?)</li>")
|
||||||
raw_l=raw_list.split("**")
|
raw_l=raw_list.split("**")
|
||||||
if len(raw_l)==1:
|
if len(raw_l)==1:
|
||||||
format_link=raw_l[0][:-1] if raw_l[0].endswith("|") else raw_l[0]
|
format_link=raw_l[0][:-1] if raw_l[0].endswith("|") else raw_l[0]
|
||||||
@ -49,6 +60,6 @@ def tucao_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
|
|||||||
tucao_single_download(format_link,title+"-"+sub_title,output_dir,merge,info_only)
|
tucao_single_download(format_link,title+"-"+sub_title,output_dir,merge,info_only)
|
||||||
|
|
||||||
|
|
||||||
site_info = "tucao.cc"
|
site_info = "tucao.tv"
|
||||||
download = tucao_download
|
download = tucao_download
|
||||||
download_playlist = playlist_not_supported("tucao")
|
download_playlist = playlist_not_supported("tucao")
|
||||||
|
@ -4,6 +4,7 @@ __all__ = ['tudou_download', 'tudou_download_playlist', 'tudou_download_by_id',
|
|||||||
|
|
||||||
from ..common import *
|
from ..common import *
|
||||||
from xml.dom.minidom import parseString
|
from xml.dom.minidom import parseString
|
||||||
|
import you_get.extractors.acfun
|
||||||
|
|
||||||
def tudou_download_by_iid(iid, title, output_dir = '.', merge = True, info_only = False):
|
def tudou_download_by_iid(iid, title, output_dir = '.', merge = True, info_only = False):
|
||||||
data = json.loads(get_decoded_html('http://www.tudou.com/outplay/goto/getItemSegs.action?iid=%s' % iid))
|
data = json.loads(get_decoded_html('http://www.tudou.com/outplay/goto/getItemSegs.action?iid=%s' % iid))
|
||||||
@ -29,6 +30,13 @@ def tudou_download_by_id(id, title, output_dir = '.', merge = True, info_only =
|
|||||||
tudou_download_by_iid(iid, title, output_dir = output_dir, merge = merge, info_only = info_only)
|
tudou_download_by_iid(iid, title, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||||
|
|
||||||
def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||||
|
if 'acfun.tudou.com' in url: #wrong way!
|
||||||
|
url = url.replace('acfun.tudou.com', 'www.acfun.tv')
|
||||||
|
you_get.extractors.acfun.acfun_download(url, output_dir,
|
||||||
|
merge,
|
||||||
|
info_only)
|
||||||
|
return #throw you back
|
||||||
|
|
||||||
# Embedded player
|
# Embedded player
|
||||||
id = r1(r'http://www.tudou.com/v/([^/]+)/', url)
|
id = r1(r'http://www.tudou.com/v/([^/]+)/', url)
|
||||||
if id:
|
if id:
|
||||||
|
@ -68,7 +68,7 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
|||||||
real_url = r1(r'<source src="([^"]*)"', html)
|
real_url = r1(r'<source src="([^"]*)"', html)
|
||||||
if not real_url:
|
if not real_url:
|
||||||
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
|
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
|
||||||
if len(iframe_url) > 0:
|
if iframe_url:
|
||||||
iframe_html = get_content(iframe_url, headers=fake_headers)
|
iframe_html = get_content(iframe_url, headers=fake_headers)
|
||||||
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
|
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
|
||||||
else:
|
else:
|
||||||
|
@ -5,6 +5,13 @@ __all__ = ['twitter_download']
|
|||||||
from ..common import *
|
from ..common import *
|
||||||
from .vine import vine_download
|
from .vine import vine_download
|
||||||
|
|
||||||
|
def extract_m3u(source):
|
||||||
|
r1 = get_content(source)
|
||||||
|
s1 = re.findall(r'(/ext_tw_video/.*)', r1)
|
||||||
|
r2 = get_content('https://video.twimg.com%s' % s1[-1])
|
||||||
|
s2 = re.findall(r'(/ext_tw_video/.*)', r2)
|
||||||
|
return ['https://video.twimg.com%s' % i for i in s2]
|
||||||
|
|
||||||
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
html = get_html(url)
|
html = get_html(url)
|
||||||
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
|
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
|
||||||
@ -62,12 +69,20 @@ def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs)
|
|||||||
vmap = get_content(vmap_url)
|
vmap = get_content(vmap_url)
|
||||||
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
|
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
|
||||||
if not item_id: page_title = i['tweet_id']
|
if not item_id: page_title = i['tweet_id']
|
||||||
|
elif 'scribe_playlist_url' in i:
|
||||||
|
scribe_playlist_url = i['scribe_playlist_url']
|
||||||
|
return vine_download(scribe_playlist_url, output_dir, merge=merge, info_only=info_only)
|
||||||
|
|
||||||
mime, ext, size = url_info(source)
|
try:
|
||||||
|
urls = extract_m3u(source)
|
||||||
|
except:
|
||||||
|
urls = [source]
|
||||||
|
size = urls_size(urls)
|
||||||
|
mime, ext = 'video/mp4', 'mp4'
|
||||||
|
|
||||||
print_info(site_info, page_title, mime, size)
|
print_info(site_info, page_title, mime, size)
|
||||||
if not info_only:
|
if not info_only:
|
||||||
download_urls([source], page_title, ext, size, output_dir, merge=merge)
|
download_urls(urls, page_title, ext, size, output_dir, merge=merge)
|
||||||
|
|
||||||
site_info = "Twitter.com"
|
site_info = "Twitter.com"
|
||||||
download = twitter_download
|
download = twitter_download
|
||||||
|
@ -6,6 +6,8 @@ from ..common import *
|
|||||||
from .embed import *
|
from .embed import *
|
||||||
|
|
||||||
def universal_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def universal_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
content_type = get_head(url, headers=fake_headers)['Content-Type']
|
||||||
|
if content_type.startswith('text/html'):
|
||||||
try:
|
try:
|
||||||
embed_download(url, output_dir, merge=merge, info_only=info_only)
|
embed_download(url, output_dir, merge=merge, info_only=info_only)
|
||||||
except: pass
|
except: pass
|
||||||
@ -15,11 +17,9 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
|||||||
if len(domains) > 2: domains = domains[1:]
|
if len(domains) > 2: domains = domains[1:]
|
||||||
site_info = '.'.join(domains)
|
site_info = '.'.join(domains)
|
||||||
|
|
||||||
response = get_response(url, faker=True)
|
|
||||||
content_type = response.headers['Content-Type']
|
|
||||||
|
|
||||||
if content_type.startswith('text/html'):
|
if content_type.startswith('text/html'):
|
||||||
# extract an HTML page
|
# extract an HTML page
|
||||||
|
response = get_response(url, faker=True)
|
||||||
page = str(response.data)
|
page = str(response.data)
|
||||||
|
|
||||||
page_title = r1(r'<title>([^<]*)', page)
|
page_title = r1(r'<title>([^<]*)', page)
|
||||||
|
@ -1,47 +1,44 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
|
||||||
from ..common import *
|
__all__ = ['videomega_download']
|
||||||
from ..extractor import VideoExtractor
|
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
import ssl
|
import ssl
|
||||||
|
|
||||||
class Videomega(VideoExtractor):
|
def videomega_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
name = "Videomega"
|
|
||||||
|
|
||||||
stream_types = [
|
|
||||||
{'id': 'original'}
|
|
||||||
]
|
|
||||||
|
|
||||||
def prepare(self, **kwargs):
|
|
||||||
# Hot-plug cookie handler
|
# Hot-plug cookie handler
|
||||||
ssl_context = request.HTTPSHandler(
|
ssl_context = request.HTTPSHandler(
|
||||||
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
||||||
cookie_handler = request.HTTPCookieProcessor()
|
cookie_handler = request.HTTPCookieProcessor()
|
||||||
opener = request.build_opener(ssl_context, cookie_handler)
|
opener = request.build_opener(ssl_context, cookie_handler)
|
||||||
opener.addheaders = [('Referer', self.url),
|
opener.addheaders = [('Referer', url),
|
||||||
('Cookie', 'noadvtday=0')]
|
('Cookie', 'noadvtday=0')]
|
||||||
request.install_opener(opener)
|
request.install_opener(opener)
|
||||||
|
|
||||||
ref = match1(self.url, r'ref=(\w+)')
|
if re.search(r'view\.php', url):
|
||||||
php_url = 'http://videomega.tv/view.php?ref=' + ref
|
php_url = url
|
||||||
|
else:
|
||||||
|
content = get_content(url)
|
||||||
|
m = re.search(r'ref="([^"]*)";\s*width="([^"]*)";\s*height="([^"]*)"', content)
|
||||||
|
ref = m.group(1)
|
||||||
|
width, height = m.group(2), m.group(3)
|
||||||
|
php_url = 'http://videomega.tv/view.php?ref=%s&width=%s&height=%s' % (ref, width, height)
|
||||||
content = get_content(php_url)
|
content = get_content(php_url)
|
||||||
|
|
||||||
self.title = match1(content, r'<title>(.*)</title>')
|
title = match1(content, r'<title>(.*)</title>')
|
||||||
js = match1(content, r'(eval.*)')
|
js = match1(content, r'(eval.*)')
|
||||||
t = match1(js, r'\$\("\d+"\)\.\d+\("\d+","([^"]+)"\)')
|
t = match1(js, r'\$\("\w+"\)\.\w+\("\w+","([^"]+)"\)')
|
||||||
t = re.sub(r'(\w)', r'{\1}', t)
|
t = re.sub(r'(\w)', r'{\1}', t)
|
||||||
t = t.translate({87 + i: str(i) for i in range(10, 36)})
|
t = t.translate({87 + i: str(i) for i in range(10, 36)})
|
||||||
s = match1(js, r"'([^']+)'\.split").split('|')
|
s = match1(js, r"'([^']+)'\.split").split('|')
|
||||||
self.streams['original'] = {
|
src = t.format(*s)
|
||||||
'url': t.format(*s)
|
|
||||||
}
|
|
||||||
|
|
||||||
def extract(self, **kwargs):
|
type, ext, size = url_info(src, faker=True)
|
||||||
for i in self.streams:
|
|
||||||
s = self.streams[i]
|
|
||||||
_, s['container'], s['size'] = url_info(s['url'])
|
|
||||||
s['src'] = [s['url']]
|
|
||||||
|
|
||||||
site = Videomega()
|
print_info(site_info, title, type, size)
|
||||||
download = site.download_by_url
|
if not info_only:
|
||||||
download_playlist = site.download_by_url
|
download_urls([src], title, ext, size, output_dir, merge=merge, faker=True)
|
||||||
|
|
||||||
|
site_info = "Videomega.tv"
|
||||||
|
download = videomega_download
|
||||||
|
download_playlist = playlist_not_supported('videomega')
|
||||||
|
@ -4,21 +4,51 @@ __all__ = ['vk_download']
|
|||||||
|
|
||||||
from ..common import *
|
from ..common import *
|
||||||
|
|
||||||
def vk_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
|
||||||
|
def get_video_info(url):
|
||||||
video_page = get_content(url)
|
video_page = get_content(url)
|
||||||
title = unescape_html(r1(r'"title":"([^"]+)"', video_page))
|
title = r1(r'<div class="vv_summary">(.[^>]+?)</div', video_page)
|
||||||
info = dict(re.findall(r'\\"url(\d+)\\":\\"([^"]+)\\"', video_page))
|
sources = re.findall(r'<source src=\"(.[^>]+?)"', video_page)
|
||||||
for quality in ['1080', '720', '480', '360', '240']:
|
|
||||||
if quality in info:
|
for quality in ['.1080.', '.720.', '.480.', '.360.', '.240.']:
|
||||||
url = re.sub(r'\\\\\\/', r'/', info[quality])
|
for source in sources:
|
||||||
|
if source.find(quality) != -1:
|
||||||
|
url = source
|
||||||
break
|
break
|
||||||
assert url
|
assert url
|
||||||
|
|
||||||
type, ext, size = url_info(url)
|
type, ext, size = url_info(url)
|
||||||
|
|
||||||
print_info(site_info, title, type, size)
|
print_info(site_info, title, type, size)
|
||||||
if not info_only:
|
|
||||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
return url, title, ext, size
|
||||||
|
|
||||||
|
|
||||||
|
def get_image_info(url):
|
||||||
|
image_page = get_content(url)
|
||||||
|
# used for title - vk page owner
|
||||||
|
page_of = re.findall(r'Sender:</dt><dd><a href=.*>(.[^>]+?)</a', image_page)
|
||||||
|
# used for title - date when photo was uploaded
|
||||||
|
photo_date = re.findall(r'<span class="item_date">(.[^>]+?)</span', image_page)
|
||||||
|
|
||||||
|
title = (' ').join(page_of + photo_date)
|
||||||
|
image_link = r1(r'href="([^"]+)" class=\"mva_item\" target="_blank">Download full size', image_page)
|
||||||
|
type, ext, size = url_info(image_link)
|
||||||
|
print_info(site_info, title, type, size)
|
||||||
|
|
||||||
|
return image_link, title, ext, size
|
||||||
|
|
||||||
|
|
||||||
|
def vk_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
|
||||||
|
link = None
|
||||||
|
if re.match(r'(.+)z\=video(.+)', url):
|
||||||
|
link, title, ext, size = get_video_info(url)
|
||||||
|
elif re.match(r'(.+)vk\.com\/photo(.+)', url):
|
||||||
|
link, title, ext, size = get_image_info(url)
|
||||||
|
else:
|
||||||
|
raise NotImplementedError('Nothing to download here')
|
||||||
|
|
||||||
|
if not info_only and link is not None:
|
||||||
|
download_urls([link], title, ext, size, output_dir, merge=merge)
|
||||||
|
|
||||||
|
|
||||||
site_info = "VK.com"
|
site_info = "VK.com"
|
||||||
download = vk_download
|
download = vk_download
|
||||||
|
123
src/you_get/extractors/wanmen.py
Executable file
123
src/you_get/extractors/wanmen.py
Executable file
@ -0,0 +1,123 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
__all__ = ['wanmen_download', 'wanmen_download_by_course', 'wanmen_download_by_course_topic', 'wanmen_download_by_course_topic_part']
|
||||||
|
|
||||||
|
from ..common import *
|
||||||
|
from .bokecc import bokecc_download_by_id
|
||||||
|
from json import loads
|
||||||
|
|
||||||
|
|
||||||
|
##Helper functions
|
||||||
|
def _wanmen_get_json_api_content_by_courseID(courseID):
|
||||||
|
"""int->JSON
|
||||||
|
|
||||||
|
Return a parsed JSON tree of WanMen's API."""
|
||||||
|
|
||||||
|
return loads(get_content('http://api.wanmen.org/course/getCourseNested/{courseID}'.format(courseID = courseID)))
|
||||||
|
|
||||||
|
def _wanmen_get_title_by_json_topic_part(json_content, tIndex, pIndex):
|
||||||
|
"""JSON, int, int, int->str
|
||||||
|
|
||||||
|
Get a proper title with courseid+topicID+partID."""
|
||||||
|
|
||||||
|
return '_'.join([json_content[0]['name'],
|
||||||
|
json_content[0]['Topics'][tIndex]['name'],
|
||||||
|
json_content[0]['Topics'][tIndex]['Parts'][pIndex]['name']])
|
||||||
|
|
||||||
|
|
||||||
|
def _wanmen_get_boke_id_by_json_topic_part(json_content, tIndex, pIndex):
|
||||||
|
"""JSON, int, int, int->str
|
||||||
|
|
||||||
|
Get one BokeCC video ID with courseid+topicID+partID."""
|
||||||
|
|
||||||
|
return json_content[0]['Topics'][tIndex]['Parts'][pIndex]['ccVideoLink']
|
||||||
|
|
||||||
|
|
||||||
|
##Parsers
|
||||||
|
def wanmen_download_by_course(json_api_content, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
"""int->None
|
||||||
|
|
||||||
|
Download a WHOLE course.
|
||||||
|
Reuse the API call to save time."""
|
||||||
|
|
||||||
|
for tIndex in range(len(json_api_content[0]['Topics'])):
|
||||||
|
for pIndex in range(len(json_api_content[0]['Topics'][tIndex]['Parts'])):
|
||||||
|
wanmen_download_by_course_topic_part(json_api_content,
|
||||||
|
tIndex,
|
||||||
|
pIndex,
|
||||||
|
output_dir=output_dir,
|
||||||
|
merge=merge,
|
||||||
|
info_only=info_only,
|
||||||
|
**kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def wanmen_download_by_course_topic(json_api_content, tIndex, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
"""int, int->None
|
||||||
|
|
||||||
|
Download a TOPIC of a course.
|
||||||
|
Reuse the API call to save time."""
|
||||||
|
|
||||||
|
for pIndex in range(len(json_api_content[0]['Topics'][tIndex]['Parts'])):
|
||||||
|
wanmen_download_by_course_topic_part(json_api_content,
|
||||||
|
tIndex,
|
||||||
|
pIndex,
|
||||||
|
output_dir=output_dir,
|
||||||
|
merge=merge,
|
||||||
|
info_only=info_only,
|
||||||
|
**kwargs)
|
||||||
|
|
||||||
|
def wanmen_download_by_course_topic_part(json_api_content, tIndex, pIndex, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
"""int, int, int->None
|
||||||
|
|
||||||
|
Download ONE PART of the course."""
|
||||||
|
|
||||||
|
html = json_api_content
|
||||||
|
|
||||||
|
title = _wanmen_get_title_by_json_topic_part(html,
|
||||||
|
tIndex,
|
||||||
|
pIndex)
|
||||||
|
|
||||||
|
bokeccID = _wanmen_get_boke_id_by_json_topic_part(html,
|
||||||
|
tIndex,
|
||||||
|
pIndex)
|
||||||
|
|
||||||
|
bokecc_download_by_id(vid = bokeccID, title = title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
##Main entrance
|
||||||
|
def wanmen_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
|
|
||||||
|
if not 'wanmen.org' in url:
|
||||||
|
log.wtf('You are at the wrong place dude. This is for WanMen University!')
|
||||||
|
raise
|
||||||
|
|
||||||
|
courseID = int(match1(url, r'course\/(\d+)'))
|
||||||
|
assert courseID > 0 #without courseID we cannot do anything
|
||||||
|
|
||||||
|
tIndex = int(match1(url, r'tIndex=(\d+)'))
|
||||||
|
|
||||||
|
pIndex = int(match1(url, r'pIndex=(\d+)'))
|
||||||
|
|
||||||
|
json_api_content = _wanmen_get_json_api_content_by_courseID(courseID)
|
||||||
|
|
||||||
|
if pIndex: #only download ONE single part
|
||||||
|
assert tIndex >= 0
|
||||||
|
wanmen_download_by_course_topic_part(json_api_content, tIndex, pIndex,
|
||||||
|
output_dir = output_dir,
|
||||||
|
merge = merge,
|
||||||
|
info_only = info_only)
|
||||||
|
elif tIndex: #download a topic
|
||||||
|
wanmen_download_by_course_topic(json_api_content, tIndex,
|
||||||
|
output_dir = output_dir,
|
||||||
|
merge = merge,
|
||||||
|
info_only = info_only)
|
||||||
|
else: #download the whole course
|
||||||
|
wanmen_download_by_course(json_api_content,
|
||||||
|
output_dir = output_dir,
|
||||||
|
merge = merge,
|
||||||
|
info_only = info_only)
|
||||||
|
|
||||||
|
|
||||||
|
site_info = "WanMen University"
|
||||||
|
download = wanmen_download
|
||||||
|
download_playlist = wanmen_download_by_course
|
@ -17,7 +17,8 @@ def yinyuetai_download_by_id(vid, title=None, output_dir='.', merge=True, info_o
|
|||||||
download_urls([url], title, ext, size, output_dir, merge = merge)
|
download_urls([url], title, ext, size, output_dir, merge = merge)
|
||||||
|
|
||||||
def yinyuetai_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
def yinyuetai_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||||
id = r1(r'http://\w+.yinyuetai.com/video/(\d+)', url)
|
id = r1(r'http://\w+.yinyuetai.com/video/(\d+)', url) or \
|
||||||
|
r1(r'http://\w+.yinyuetai.com/video/h5/(\d+)', url)
|
||||||
if not id:
|
if not id:
|
||||||
yinyuetai_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
yinyuetai_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||||
return
|
return
|
||||||
|
0
src/you_get/extractors/yixia.py
Executable file → Normal file
0
src/you_get/extractors/yixia.py
Executable file → Normal file
@ -28,7 +28,11 @@ class Youku(VideoExtractor):
|
|||||||
f_code_1 = 'becaf9be'
|
f_code_1 = 'becaf9be'
|
||||||
f_code_2 = 'bf7e5f01'
|
f_code_2 = 'bf7e5f01'
|
||||||
|
|
||||||
|
ctype = 12 #differ from 86
|
||||||
|
|
||||||
def trans_e(a, c):
|
def trans_e(a, c):
|
||||||
|
"""str, str->str
|
||||||
|
This is an RC4 encryption."""
|
||||||
f = h = 0
|
f = h = 0
|
||||||
b = list(range(256))
|
b = list(range(256))
|
||||||
result = ''
|
result = ''
|
||||||
@ -49,14 +53,14 @@ class Youku(VideoExtractor):
|
|||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
def generate_ep(no, streamfileids, sid, token):
|
def generate_ep(self, no, streamfileids, sid, token):
|
||||||
number = hex(int(str(no), 10))[2:].upper()
|
number = hex(int(str(no), 10))[2:].upper()
|
||||||
if len(number) == 1:
|
if len(number) == 1:
|
||||||
number = '0' + number
|
number = '0' + number
|
||||||
fileid = streamfileids[0:8] + number + streamfileids[10:]
|
fileid = streamfileids[0:8] + number + streamfileids[10:]
|
||||||
ep = parse.quote(base64.b64encode(
|
ep = parse.quote(base64.b64encode(
|
||||||
''.join(Youku.trans_e(
|
''.join(self.__class__.trans_e(
|
||||||
Youku.f_code_2,
|
self.f_code_2, #use the 86 fcode if using 86
|
||||||
sid + '_' + fileid + '_' + token)).encode('latin1')),
|
sid + '_' + fileid + '_' + token)).encode('latin1')),
|
||||||
safe='~()*!.\''
|
safe='~()*!.\''
|
||||||
)
|
)
|
||||||
@ -72,7 +76,7 @@ class Youku(VideoExtractor):
|
|||||||
for x in xs:
|
for x in xs:
|
||||||
if x not in mem:
|
if x not in mem:
|
||||||
mem.add(x)
|
mem.add(x)
|
||||||
yield(x)
|
return mem
|
||||||
|
|
||||||
def get_vid_from_url(url):
|
def get_vid_from_url(url):
|
||||||
"""Extracts video ID from URL.
|
"""Extracts video ID from URL.
|
||||||
@ -85,7 +89,7 @@ class Youku(VideoExtractor):
|
|||||||
def get_playlist_id_from_url(url):
|
def get_playlist_id_from_url(url):
|
||||||
"""Extracts playlist ID from URL.
|
"""Extracts playlist ID from URL.
|
||||||
"""
|
"""
|
||||||
return match1(url, r'youku\.com/playlist_show/id_([a-zA-Z0-9=]+)')
|
return match1(url, r'youku\.com/albumlist/show\?id=([a-zA-Z0-9=]+)')
|
||||||
|
|
||||||
def download_playlist_by_url(self, url, **kwargs):
|
def download_playlist_by_url(self, url, **kwargs):
|
||||||
self.url = url
|
self.url = url
|
||||||
@ -93,15 +97,17 @@ class Youku(VideoExtractor):
|
|||||||
try:
|
try:
|
||||||
playlist_id = self.__class__.get_playlist_id_from_url(self.url)
|
playlist_id = self.__class__.get_playlist_id_from_url(self.url)
|
||||||
assert playlist_id
|
assert playlist_id
|
||||||
|
video_page = get_content('http://list.youku.com/albumlist/show?id=%s' % playlist_id)
|
||||||
video_page = get_content('http://www.youku.com/playlist_show/id_%s' % playlist_id)
|
|
||||||
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
|
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
|
||||||
|
|
||||||
# Parse multi-page playlists
|
# Parse multi-page playlists
|
||||||
for extra_page_url in Youku.oset(re.findall('href="(http://www\.youku\.com/playlist_show/id_%s_[^?"]+)' % playlist_id, video_page)):
|
last_page_url = re.findall(r'href="(/albumlist/show\?id=%s[^"]+)" title="末页"' % playlist_id, video_page)[0]
|
||||||
extra_page = get_content(extra_page_url)
|
num_pages = int(re.findall(r'page=([0-9]+)\.htm', last_page_url)[0])
|
||||||
|
if (num_pages > 0):
|
||||||
|
# download one by one
|
||||||
|
for pn in range(2, num_pages + 1):
|
||||||
|
extra_page_url = re.sub(r'page=([0-9]+)\.htm', r'page=%s.htm' % pn, last_page_url)
|
||||||
|
extra_page = get_content('http://list.youku.com' + extra_page_url)
|
||||||
videos |= Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', extra_page))
|
videos |= Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', extra_page))
|
||||||
|
|
||||||
except:
|
except:
|
||||||
# Show full list of episodes
|
# Show full list of episodes
|
||||||
if match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)'):
|
if match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)'):
|
||||||
@ -150,8 +156,17 @@ class Youku(VideoExtractor):
|
|||||||
self.download_playlist_by_url(self.url, **kwargs)
|
self.download_playlist_by_url(self.url, **kwargs)
|
||||||
exit(0)
|
exit(0)
|
||||||
|
|
||||||
|
#HACK!
|
||||||
|
if 'api_url' in kwargs:
|
||||||
|
api_url = kwargs['api_url'] #85
|
||||||
|
api12_url = kwargs['api12_url'] #86
|
||||||
|
self.ctype = kwargs['ctype']
|
||||||
|
self.title = kwargs['title']
|
||||||
|
|
||||||
|
else:
|
||||||
api_url = 'http://play.youku.com/play/get.json?vid=%s&ct=10' % self.vid
|
api_url = 'http://play.youku.com/play/get.json?vid=%s&ct=10' % self.vid
|
||||||
api12_url = 'http://play.youku.com/play/get.json?vid=%s&ct=12' % self.vid
|
api12_url = 'http://play.youku.com/play/get.json?vid=%s&ct=12' % self.vid
|
||||||
|
|
||||||
try:
|
try:
|
||||||
meta = json.loads(get_content(
|
meta = json.loads(get_content(
|
||||||
api_url,
|
api_url,
|
||||||
@ -171,13 +186,13 @@ class Youku(VideoExtractor):
|
|||||||
self.password_protected = True
|
self.password_protected = True
|
||||||
self.password = input(log.sprint('Password: ', log.YELLOW))
|
self.password = input(log.sprint('Password: ', log.YELLOW))
|
||||||
api_url += '&pwd={}'.format(self.password)
|
api_url += '&pwd={}'.format(self.password)
|
||||||
api_url12 += '&pwd={}'.format(self.password)
|
api12_url += '&pwd={}'.format(self.password)
|
||||||
meta = json.loads(get_content(
|
meta = json.loads(get_content(
|
||||||
api_url,
|
api_url,
|
||||||
headers={'Referer': 'http://static.youku.com/'}
|
headers={'Referer': 'http://static.youku.com/'}
|
||||||
))
|
))
|
||||||
meta12 = json.loads(get_content(
|
meta12 = json.loads(get_content(
|
||||||
api_url12,
|
api12_url,
|
||||||
headers={'Referer': 'http://static.youku.com/'}
|
headers={'Referer': 'http://static.youku.com/'}
|
||||||
))
|
))
|
||||||
data = meta['data']
|
data = meta['data']
|
||||||
@ -187,6 +202,7 @@ class Youku(VideoExtractor):
|
|||||||
else:
|
else:
|
||||||
log.wtf('[Failed] Video not found.')
|
log.wtf('[Failed] Video not found.')
|
||||||
|
|
||||||
|
if not self.title: #86
|
||||||
self.title = data['video']['title']
|
self.title = data['video']['title']
|
||||||
self.ep = data12['security']['encrypt_string']
|
self.ep = data12['security']['encrypt_string']
|
||||||
self.ip = data12['security']['ip']
|
self.ip = data12['security']['ip']
|
||||||
@ -264,7 +280,7 @@ class Youku(VideoExtractor):
|
|||||||
stream_id = self.streams_sorted[0]['id']
|
stream_id = self.streams_sorted[0]['id']
|
||||||
|
|
||||||
e_code = self.__class__.trans_e(
|
e_code = self.__class__.trans_e(
|
||||||
self.__class__.f_code_1,
|
self.f_code_1,
|
||||||
base64.b64decode(bytes(self.ep, 'ascii'))
|
base64.b64decode(bytes(self.ep, 'ascii'))
|
||||||
)
|
)
|
||||||
sid, token = e_code.split('_')
|
sid, token = e_code.split('_')
|
||||||
@ -279,10 +295,10 @@ class Youku(VideoExtractor):
|
|||||||
for no in range(0, len(segs)):
|
for no in range(0, len(segs)):
|
||||||
k = segs[no]['key']
|
k = segs[no]['key']
|
||||||
if k == -1: break # we hit the paywall; stop here
|
if k == -1: break # we hit the paywall; stop here
|
||||||
fileid, ep = self.__class__.generate_ep(no, streamfileid,
|
fileid, ep = self.__class__.generate_ep(self, no, streamfileid,
|
||||||
sid, token)
|
sid, token)
|
||||||
q = parse.urlencode(dict(
|
q = parse.urlencode(dict(
|
||||||
ctype = 12,
|
ctype = self.ctype,
|
||||||
ev = 1,
|
ev = 1,
|
||||||
K = k,
|
K = k,
|
||||||
ep = parse.unquote(ep),
|
ep = parse.unquote(ep),
|
||||||
@ -312,9 +328,69 @@ class Youku(VideoExtractor):
|
|||||||
if not kwargs['info_only']:
|
if not kwargs['info_only']:
|
||||||
self.streams[stream_id]['src'] = ksegs
|
self.streams[stream_id]['src'] = ksegs
|
||||||
|
|
||||||
|
def open_download_by_vid(self, client_id, vid, **kwargs):
|
||||||
|
"""self, str, str, **kwargs->None
|
||||||
|
|
||||||
|
Arguments:
|
||||||
|
client_id: An ID per client. For now we only know Acfun's
|
||||||
|
such ID.
|
||||||
|
|
||||||
|
vid: An video ID for each video, starts with "C".
|
||||||
|
|
||||||
|
kwargs['embsig']: Youku COOP's anti hotlinking.
|
||||||
|
For Acfun, an API call must be done to Acfun's
|
||||||
|
server, or the "playsign" of the content of sign_url
|
||||||
|
shall be empty.
|
||||||
|
|
||||||
|
Misc:
|
||||||
|
Override the original one with VideoExtractor.
|
||||||
|
|
||||||
|
Author:
|
||||||
|
Most of the credit are to @ERioK, who gave his POC.
|
||||||
|
|
||||||
|
History:
|
||||||
|
Jul.28.2016 Youku COOP now have anti hotlinking via embsig. """
|
||||||
|
self.f_code_1 = '10ehfkbv' #can be retrived by running r.translate with the keys and the list e
|
||||||
|
self.f_code_2 = 'msjv7h2b'
|
||||||
|
|
||||||
|
# as in VideoExtractor
|
||||||
|
self.url = None
|
||||||
|
self.vid = vid
|
||||||
|
self.name = "优酷开放平台 (Youku COOP)"
|
||||||
|
|
||||||
|
#A little bit of work before self.prepare
|
||||||
|
|
||||||
|
#Change as Jul.28.2016 Youku COOP updates its platform to add ant hotlinking
|
||||||
|
if kwargs['embsig']:
|
||||||
|
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}&embsig={embsig}".format(client_id = client_id, video_id = vid, embsig = kwargs['embsig'])
|
||||||
|
else:
|
||||||
|
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}".format(client_id = client_id, video_id = vid)
|
||||||
|
|
||||||
|
playsign = json.loads(get_content(sign_url))['playsign']
|
||||||
|
|
||||||
|
#to be injected and replace ct10 and 12
|
||||||
|
api85_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=85&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
|
||||||
|
api86_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=86&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
|
||||||
|
|
||||||
|
self.prepare(api_url = api85_url, api12_url = api86_url, ctype = 86, **kwargs)
|
||||||
|
|
||||||
|
#exact copy from original VideoExtractor
|
||||||
|
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
|
||||||
|
unset_proxy()
|
||||||
|
|
||||||
|
try:
|
||||||
|
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
|
||||||
|
except:
|
||||||
|
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
|
||||||
|
|
||||||
|
self.extract(**kwargs)
|
||||||
|
|
||||||
|
self.download(**kwargs)
|
||||||
|
|
||||||
site = Youku()
|
site = Youku()
|
||||||
download = site.download_by_url
|
download = site.download_by_url
|
||||||
download_playlist = site.download_playlist_by_url
|
download_playlist = site.download_playlist_by_url
|
||||||
|
|
||||||
youku_download_by_vid = site.download_by_vid
|
youku_download_by_vid = site.download_by_vid
|
||||||
|
youku_open_download_by_vid = site.open_download_by_vid
|
||||||
# Used by: acfun.py bilibili.py miomio.py tudou.py
|
# Used by: acfun.py bilibili.py miomio.py tudou.py
|
||||||
|
@ -56,7 +56,7 @@ class YouTube(VideoExtractor):
|
|||||||
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
|
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
|
||||||
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
|
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
|
||||||
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
|
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
|
||||||
f1def = 'function %s%s' % (re.escape(f1), f1def)
|
f1def = 'function %s%s' % (f1, f1def)
|
||||||
code = tr_js(f1def)
|
code = tr_js(f1def)
|
||||||
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
|
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
|
||||||
for f2 in f2s:
|
for f2 in f2s:
|
||||||
@ -236,7 +236,7 @@ class YouTube(VideoExtractor):
|
|||||||
start = '{:0>2}:{:0>2}:{:06.3f}'.format(int(h), int(m), s).replace('.', ',')
|
start = '{:0>2}:{:0>2}:{:06.3f}'.format(int(h), int(m), s).replace('.', ',')
|
||||||
m, s = divmod(finish, 60); h, m = divmod(m, 60)
|
m, s = divmod(finish, 60); h, m = divmod(m, 60)
|
||||||
finish = '{:0>2}:{:0>2}:{:06.3f}'.format(int(h), int(m), s).replace('.', ',')
|
finish = '{:0>2}:{:0>2}:{:06.3f}'.format(int(h), int(m), s).replace('.', ',')
|
||||||
content = text.firstChild.nodeValue
|
content = unescape_html(text.firstChild.nodeValue)
|
||||||
|
|
||||||
srt += '%s\n' % str(seq)
|
srt += '%s\n' % str(seq)
|
||||||
srt += '%s --> %s\n' % (start, finish)
|
srt += '%s --> %s\n' % (start, finish)
|
||||||
|
@ -3,6 +3,7 @@
|
|||||||
import os.path
|
import os.path
|
||||||
import subprocess
|
import subprocess
|
||||||
from ..util.strings import parameterize
|
from ..util.strings import parameterize
|
||||||
|
from ..common import print_more_compatible as print
|
||||||
|
|
||||||
def get_usable_ffmpeg(cmd):
|
def get_usable_ffmpeg(cmd):
|
||||||
try:
|
try:
|
||||||
@ -169,7 +170,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
|
|||||||
|
|
||||||
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
|
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
|
||||||
params.append(output + '.txt')
|
params.append(output + '.txt')
|
||||||
params += ['-c', 'copy', output]
|
params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc', output]
|
||||||
|
|
||||||
subprocess.check_call(params)
|
subprocess.check_call(params)
|
||||||
os.remove(output + '.txt')
|
os.remove(output + '.txt')
|
||||||
@ -199,3 +200,44 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
|
|||||||
for file in files:
|
for file in files:
|
||||||
os.remove(file + '.ts')
|
os.remove(file + '.ts')
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
|
||||||
|
"""str, str->True
|
||||||
|
WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!!
|
||||||
|
You can basicly download anything with this function
|
||||||
|
but better leave it alone with
|
||||||
|
"""
|
||||||
|
output = title + '.' + ext
|
||||||
|
|
||||||
|
if not (output_dir == '.'):
|
||||||
|
output = output_dir + '/' + output
|
||||||
|
|
||||||
|
print('Downloading streaming content with FFmpeg, press q to stop recording...')
|
||||||
|
ffmpeg_params = [FFMPEG] + ['-y', '-re', '-i']
|
||||||
|
ffmpeg_params.append(files) #not the same here!!!!
|
||||||
|
|
||||||
|
if FFMPEG == 'avconv': #who cares?
|
||||||
|
ffmpeg_params += ['-c', 'copy', output]
|
||||||
|
else:
|
||||||
|
ffmpeg_params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc']
|
||||||
|
|
||||||
|
if params is not None:
|
||||||
|
if len(params) > 0:
|
||||||
|
for k, v in params:
|
||||||
|
ffmpeg_params.append(k)
|
||||||
|
ffmpeg_params.append(v)
|
||||||
|
|
||||||
|
ffmpeg_params.append(output)
|
||||||
|
|
||||||
|
print(' '.join(ffmpeg_params))
|
||||||
|
|
||||||
|
try:
|
||||||
|
a = subprocess.Popen(ffmpeg_params, stdin= subprocess.PIPE)
|
||||||
|
a.communicate()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
try:
|
||||||
|
a.stdin.write('q'.encode('utf-8'))
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return True
|
||||||
|
@ -41,15 +41,32 @@ def download_rtmpdump_stream(url, title, ext,params={},output_dir='.'):
|
|||||||
subprocess.call(cmdline)
|
subprocess.call(cmdline)
|
||||||
return
|
return
|
||||||
|
|
||||||
#
|
|
||||||
#To be refactor
|
|
||||||
#
|
#
|
||||||
def play_rtmpdump_stream(player, url, params={}):
|
def play_rtmpdump_stream(player, url, params={}):
|
||||||
cmdline="rtmpdump -r '%s' "%url
|
|
||||||
|
#construct left side of pipe
|
||||||
|
cmdline = [RTMPDUMP, '-r']
|
||||||
|
cmdline.append(url)
|
||||||
|
|
||||||
|
#append other params if exist
|
||||||
for key in params.keys():
|
for key in params.keys():
|
||||||
cmdline+=key+" "+params[key] if params[key]!=None else ""+" "
|
cmdline.append(key)
|
||||||
cmdline+=" -o - | %s -"%player
|
if params[key]!=None:
|
||||||
print(cmdline)
|
cmdline.append(params[key])
|
||||||
os.system(cmdline)
|
|
||||||
|
cmdline.append('-o')
|
||||||
|
cmdline.append('-')
|
||||||
|
|
||||||
|
#pipe start
|
||||||
|
cmdline.append('|')
|
||||||
|
cmdline.append(player)
|
||||||
|
cmdline.append('-')
|
||||||
|
|
||||||
|
#logging
|
||||||
|
print("Call rtmpdump:\n"+" ".join(cmdline)+"\n")
|
||||||
|
|
||||||
|
#call RTMPDump!
|
||||||
|
subprocess.call(cmdline)
|
||||||
|
|
||||||
# os.system("rtmpdump -r '%s' -y '%s' -o - | %s -" % (url, playpath, player))
|
# os.system("rtmpdump -r '%s' -y '%s' -o - | %s -" % (url, playpath, player))
|
||||||
return
|
return
|
||||||
|
@ -10,6 +10,7 @@ def legitimize(text, os=platform.system()):
|
|||||||
text = text.translate({
|
text = text.translate({
|
||||||
0: None,
|
0: None,
|
||||||
ord('/'): '-',
|
ord('/'): '-',
|
||||||
|
ord('|'): '-',
|
||||||
})
|
})
|
||||||
|
|
||||||
if os == 'Windows':
|
if os == 'Windows':
|
||||||
@ -20,7 +21,6 @@ def legitimize(text, os=platform.system()):
|
|||||||
ord('*'): '-',
|
ord('*'): '-',
|
||||||
ord('?'): '-',
|
ord('?'): '-',
|
||||||
ord('\\'): '-',
|
ord('\\'): '-',
|
||||||
ord('|'): '-',
|
|
||||||
ord('\"'): '\'',
|
ord('\"'): '\'',
|
||||||
# Reserved in Windows VFAT
|
# Reserved in Windows VFAT
|
||||||
ord('+'): '-',
|
ord('+'): '-',
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
|
||||||
script_name = 'you-get'
|
script_name = 'you-get'
|
||||||
__version__ = '0.4.365'
|
__version__ = '0.4.575'
|
||||||
|
@ -21,9 +21,6 @@ class YouGetTests(unittest.TestCase):
|
|||||||
def test_mixcloud(self):
|
def test_mixcloud(self):
|
||||||
mixcloud.download("http://www.mixcloud.com/DJVadim/north-america-are-you-ready/", info_only=True)
|
mixcloud.download("http://www.mixcloud.com/DJVadim/north-america-are-you-ready/", info_only=True)
|
||||||
|
|
||||||
def test_vimeo(self):
|
|
||||||
vimeo.download("http://vimeo.com/56810854", info_only=True)
|
|
||||||
|
|
||||||
def test_youtube(self):
|
def test_youtube(self):
|
||||||
youtube.download("http://www.youtube.com/watch?v=pzKerr0JIPA", info_only=True)
|
youtube.download("http://www.youtube.com/watch?v=pzKerr0JIPA", info_only=True)
|
||||||
youtube.download("http://youtu.be/pzKerr0JIPA", info_only=True)
|
youtube.download("http://youtu.be/pzKerr0JIPA", info_only=True)
|
||||||
|
4
you-get
4
you-get
@ -1,7 +1,7 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python3
|
||||||
import os, sys
|
import os, sys
|
||||||
|
|
||||||
_srcdir = 'src/'
|
_srcdir = '%s/src/' % os.path.dirname(os.path.realpath(__file__))
|
||||||
_filepath = os.path.dirname(sys.argv[0])
|
_filepath = os.path.dirname(sys.argv[0])
|
||||||
sys.path.insert(1, os.path.join(_filepath, _srcdir))
|
sys.path.insert(1, os.path.join(_filepath, _srcdir))
|
||||||
|
|
||||||
|
@ -1,3 +1,3 @@
|
|||||||
#!/usr/bin/env zsh
|
#!/usr/bin/env zsh
|
||||||
alias you-get="noglob $(dirname $0)/you-get"
|
alias you-get="noglob python3 $(dirname $0)/you-get"
|
||||||
alias you-vlc="noglob $(dirname $0)/you-get --player vlc"
|
alias you-vlc="noglob python3 $(dirname $0)/you-get --player vlc"
|
||||||
|
Loading…
Reference in New Issue
Block a user