Merge pull request #1 from soimort/develop

1
This commit is contained in:
Justsoos 2019-08-12 05:50:17 +08:00 committed by GitHub
commit e0648a2ef8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
59 changed files with 2248 additions and 1424 deletions

View File

@ -1,39 +0,0 @@
Please make sure these boxes are checked before submitting your issue thank you!
- [ ] You can actually watch the video in your browser or mobile application, but not download them with `you-get`.
- [ ] Your `you-get` is up-to-date.
- [ ] I have read <https://github.com/soimort/you-get/wiki/FAQ> and tried to do so.
- [ ] The issue is not yet reported on <https://github.com/soimort/you-get/issues> or <https://github.com/soimort/you-get/wiki/Known-Bugs>. If so, please add your comments under the existing issue.
- [ ] The issue (or question) is really about `you-get`, not about some other code or project.
Run the command with the `--debug` option, and paste the full output inside the fences:
```
[PASTE IN ME]
```
If there's anything else you would like to say (e.g. in case your issue is not about downloading a specific video; it might as well be a general discussion or proposal for a new feature), fill in the box below; otherwise, you may want to post an emoji or meme instead:
> [WRITE SOMETHING]
> [OR HAVE SOME :icecream:!]
汉语翻译最终日期2016年02月26日
在提交前,请确保您已经检查了以下内容!
- [ ] 你可以在浏览器或移动端中观看视频,但不能使用`you-get`下载.
- [ ] 您的`you-get`为最新版.
- [ ] 我已经阅读并按 <https://github.com/soimort/you-get/wiki/FAQ> 中的指引进行了操作.
- [ ] 您的问题没有在<https://github.com/soimort/you-get/issues> , <https://github.com/soimort/you-get/wiki/FAQ><https://github.com/soimort/you-get/wiki/Known-Bugs> 报告否则请在原有issue下报告.
- [ ] 本问题确实关于`you-get`, 而不是其他项目.
请使用`--debug`运行,并将输出粘贴在下面:
```
[在这里粘贴完整日志]
```
如果您有其他附言,例如问题只在某个视频发生,或者是一般性讨论或者提出新功能,请在下面添加;或者您可以卖个萌:
> [您的内容]
> [舔 :icecream:!]

View File

@ -1,48 +0,0 @@
**(PLEASE DELETE ALL THESE AFTER READING)**
Thank you for the pull request! `you-get` is a growing open source project, which would not have been possible without contributors like you.
Here are some simple rules to follow, please recheck them before sending the pull request:
- [ ] If you want to propose two or more unrelated patches, please open separate pull requests for them, instead of one;
- [ ] All pull requests should be based upon the latest `develop` branch;
- [ ] Name your branch (from which you will send the pull request) properly; use a meaningful name like `add-this-shining-feature` rather than just `develop`;
- [ ] All commit messages, as well as comments in code, should be written in understandable English.
As a contributor, you must be aware that
- [ ] You agree to contribute your code to this project, under the terms of the MIT license, so that any person may freely use or redistribute them; of course, you will still reserve the copyright for your own authorship.
- [ ] You may not contribute any code not authored by yourself, unless they are licensed under either public domain or the MIT license, literally.
Not all pull requests can eventually be merged. I consider merged / unmerged patches as equally important for the community: as long as you think a patch would be helpful, someone else might find it helpful, too, therefore they could take your fork and benefit in some way. In any case, I would like to thank you in advance for taking your time to contribute to this project.
Cheers,
Mort
**(PLEASE REPLACE ALL ABOVE WITH A DETAILED DESCRIPTION OF YOUR PULL REQUEST)**
汉语翻译最后日期2016年02月26日
**(阅读后请删除所有内容)**
感谢您的pull request! `you-get`是稳健成长的开源项目,感谢您的贡献.
以下简单检查项目望您复查:
- [ ] 如果您预计提出两个或更多不相关补丁请为每个使用不同的pull requests而不是单一;
- [ ] 所有的pull requests应基于最新的`develop`分支;
- [ ] 您预计提出pull requests的分支应有有意义名称例如`add-this-shining-feature`而不是`develop`;
- [ ] 所有的提交信息与代码中注释应使用可理解的英语.
作为贡献者,您需要知悉
- [ ] 您同意在MIT协议下贡献代码以便任何人自由使用或分发;当然,你仍旧保留代码的著作权
- [ ] 你不得贡献非自己编写的代码除非其属于公有领域或使用MIT协议.
不是所有的pull requests都会被合并,然而我认为合并/不合并的补丁一样重要如果您认为补丁重要其他人也有可能这么认为那么他们可以从你的fork中提取工作并获益。无论如何感谢您费心对本项目贡献.
祝好,
Mort
**(请将本内容完整替换为PULL REQUEST的详细内容)**

View File

@ -1,15 +1,23 @@
# https://travis-ci.org/soimort/you-get
language: python
python:
- "3.2"
- "3.3"
- "3.4"
- "3.5"
- "3.6"
- "nightly"
- "pypy3"
matrix:
include:
- python: "3.7"
dist: xenial
- python: "3.8-dev"
dist: xenial
- python: "nightly"
dist: xenial
before_install:
- pip install flake8
before_script:
- flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics
script: make test
sudo: false
notifications:
webhooks:
urls:

View File

@ -1,27 +1,27 @@
# How to Contribute
# How to Report an Issue
`you-get` is currently experimenting with an aggressive approach to handling issues. Namely, a bug report must be addressed with some code via a pull request.
If you would like to report a problem you find when using `you-get`, please open a [Pull Request](https://github.com/soimort/you-get/pulls), which should include:
## Report a broken extractor
1. A detailed description of the encountered problem;
2. At least one commit, addressing the problem through some unit test(s).
* Examples of good commits: [#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
**How-To:** Please open a new pull request with the following changes:
PRs that fail to meet the above criteria may be closed summarily with no further action.
* Add a new test case in [tests/test.py](https://github.com/soimort/you-get/blob/develop/tests/test.py), with the failing URL(s).
A valid PR will remain open until its addressed problem is fixed.
The Travis CI build will (ideally) fail showing a :x:, which means you have successfully reported a broken extractor.
Such a valid PR will be either *closed* if it's fixed by another PR, or *merged* if it's fixed by follow-up commits from the reporter himself/herself.
## Report other issues / Suggest a new feature
# 如何汇报问题
**How-To:** Please open a pull request with the proposed changes directly.
为了防止对 GitHub Issues 的滥用,本项目不接受一般的 Issue。
A valid PR need not be complete (i.e., can be WIP), but it should contain at least one sensible, nontrivial commit.
如您在使用 `you-get` 的过程中发现任何问题,请开启一个 [Pull Request](https://github.com/soimort/you-get/pulls)。该 PR 应当包含:
## Hints
1. 详细的问题描述;
2. 至少一个 commit其内容是**与问题相关的**单元测试。**不要通过随意修改无关文件的方式来提交 PR**
* 有效的 commit 示例:[#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
* The [`develop`](https://github.com/soimort/you-get/tree/develop) branch is where your pull request goes.
* Remember to rebase.
* Document your PR clearly, and if applicable, provide some sample links for reviewers to test with.
* Write well-formatted, easy-to-understand commit messages. If you don't know how, look at existing ones.
* We will not ask you to sign a CLA, but you must assure that your code can be legally redistributed (under the terms of the MIT license).
不符合以上条件的 PR 可能被直接关闭。
有效的 PR 将会被一直保留,直至相应的问题得以修复。

View File

@ -1,15 +1,14 @@
==============================================
This is a copy of the MIT license.
==============================================
Copyright (C) 2012-2017 Mort Yao <mort.yao@gmail.com>
Copyright (C) 2012 Boyu Guo <iambus@gmail.com>
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
Copyright (c) 2012-2019 Mort Yao <mort.yao@gmail.com>
Copyright (c) 2012 Boyu Guo <iambus@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

117
README.md
View File

@ -4,6 +4,10 @@
[![Build Status](https://travis-ci.org/soimort/you-get.svg)](https://travis-ci.org/soimort/you-get)
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
**NOTICE: Read [this](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) if you are looking for the conventional "Issues" tab.**
---
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
Here's how you use `you-get` to download a video from [YouTube](https://www.youtube.com/watch?v=jNQXAC9IVRw):
@ -49,10 +53,10 @@ Are you a Python programmer? Then check out [the source](https://github.com/soim
### Prerequisites
The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows:
The following dependencies are necessary:
* **[Python 3](https://www.python.org/downloads/)**
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
* **[Python](https://www.python.org/downloads/)** 3.2 or above
* **[FFmpeg](https://www.ffmpeg.org/)** 1.0 or above
* (Optional) [RTMPDump](https://rtmpdump.mplayerhq.hu/)
### Option 1: Install via pip
@ -61,17 +65,13 @@ The official release of `you-get` is distributed on [PyPI](https://pypi.python.o
$ pip3 install you-get
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen)
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen) (for Zsh users)
Add the following line to your `.zshrc`:
antigen bundle soimort/you-get
### Option 3: Use a pre-built package (Windows only)
Download the `exe` (standalone) or `7z` (all dependencies included) from: <https://github.com/soimort/you-get/releases/latest>.
### Option 4: Download from GitHub
### Option 3: Download from GitHub
You may either download the [stable](https://github.com/soimort/you-get/archive/master.zip) (identical with the latest release on PyPI) or the [develop](https://github.com/soimort/you-get/archive/develop.zip) (more hotfixes, unstable features) branch of `you-get`. Unzip it, and put the directory containing the `you-get` script into your `PATH`.
@ -89,7 +89,7 @@ $ python3 setup.py install --user
to install `you-get` to a permanent path.
### Option 5: Git clone
### Option 4: Git clone
This is the recommended way for all developers, even if you don't often code in Python.
@ -99,13 +99,7 @@ $ git clone git://github.com/soimort/you-get.git
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only)
```
> choco install you-get
```
### Option 7: Homebrew (Mac only)
### Option 5: Homebrew (Mac only)
You can install `you-get` easily via:
@ -113,6 +107,14 @@ You can install `you-get` easily via:
$ brew install you-get
```
### Option 6: pkg (FreeBSD only)
You can install `you-get` easily via:
```
# pkg install you-get
```
### Shell completion
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](https://github.com/soimort/you-get/tree/develop/contrib/completion). Please consult your shell's manual for how to take advantage of them.
@ -131,12 +133,6 @@ or download the latest release via:
$ you-get https://github.com/soimort/you-get/archive/master.zip
```
or use [chocolatey package manager](https://chocolatey.org):
```
> choco upgrade you-get
```
In order to get the latest ```develop``` branch without messing up the PIP, you can try:
```
@ -154,22 +150,54 @@ $ you-get -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
streams: # Available quality and codecs
[ DASH ] ____________________________________
- itag: 242
container: webm
quality: 320x240
size: 0.6 MiB (618358 bytes)
# download-with: you-get --itag=242 [URL]
- itag: 395
container: mp4
quality: 320x240
size: 0.5 MiB (550743 bytes)
# download-with: you-get --itag=395 [URL]
- itag: 133
container: mp4
quality: 320x240
size: 0.5 MiB (498558 bytes)
# download-with: you-get --itag=133 [URL]
- itag: 278
container: webm
quality: 192x144
size: 0.4 MiB (392857 bytes)
# download-with: you-get --itag=278 [URL]
- itag: 160
container: mp4
quality: 192x144
size: 0.4 MiB (370882 bytes)
# download-with: you-get --itag=160 [URL]
- itag: 394
container: mp4
quality: 192x144
size: 0.4 MiB (367261 bytes)
# download-with: you-get --itag=394 [URL]
[ DEFAULT ] _________________________________
- itag: 43
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
size: 0.5 MiB (568748 bytes)
# download-with: you-get --itag=43 [URL]
- itag: 18
container: mp4
quality: medium
# download-with: you-get --itag=18 [URL]
- itag: 5
container: flv
quality: small
# download-with: you-get --itag=5 [URL]
# download-with: you-get --itag=18 [URL]
- itag: 36
container: 3gp
@ -182,23 +210,24 @@ streams: # Available quality and codecs
# download-with: you-get --itag=17 [URL]
```
The format marked with `DEFAULT` is the one you will get by default. If that looks cool to you, download it:
By default, the one on the top is the one you will get. If that looks cool to you, download it:
```
$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
stream:
- itag: 43
- itag: 242
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
# download-with: you-get --itag=43 [URL]
quality: 320x240
size: 0.6 MiB (618358 bytes)
# download-with: you-get --itag=242 [URL]
Downloading zoo.webm ...
100.0% ( 0.5/0.5 MB) ├████████████████████████████████████████┤[1/1] 7 MB/s
Downloading Me at the zoo.webm ...
100% ( 0.6/ 0.6MB) ├██████████████████████████████████████████████████████████████████████████████┤[2/2] 2 MB/s
Merging video parts... Merged into Me at the zoo.webm
Saving Me at the zoo.en.srt ...Done.
Saving Me at the zoo.en.srt ... Done.
```
(If a YouTube video has any closed captions, they will be downloaded together with the video file, in SubRip subtitle format.)
@ -298,7 +327,7 @@ However, the system proxy setting (i.e. the environment variable `http_proxy`) i
### Watch a video
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mplayer` or `vlc`, instead of downloading it:
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mpv` or `vlc`, instead of downloading it:
```
$ you-get -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
@ -374,11 +403,10 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| **niconico<br/>ニコニコ動画** | <http://www.nicovideo.jp/> |✓| | |
| **163<br/>网易视频<br/>网易云音乐** | <http://v.163.com/><br/><http://music.163.com/> |✓| |✓|
| 56网 | <http://www.56.com/> |✓| | |
| **AcFun** | <http://www.acfun.tv/> |✓| | |
| **AcFun** | <http://www.acfun.cn/> |✓| | |
| **Baidu<br/>百度贴吧** | <http://tieba.baidu.com/> |✓|✓| |
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
| Dilidili | <http://www.dilidili.com/> |✓| | |
| 豆瓣 | <http://www.douban.com/> |✓| |✓|
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
| Panda<br/>熊猫 | <http://www.panda.tv/> |✓| | |
@ -407,15 +435,16 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| **Youku<br/>优酷** | <http://www.youku.com/> |✓| | |
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
| 央视网 | <http://www.cntv.cn/> |✓| | |
| 花瓣 | <http://huaban.com/> | |✓| |
| Naver<br/>네이버 | <http://tvcast.naver.com/> |✓| | |
| 芒果TV | <http://www.mgtv.com/> |✓| | |
| 火猫TV | <http://www.huomao.com/> |✓| | |
| 全民直播 | <http://www.quanmin.tv/> |✓| | |
| 阳光宽频网 | <http://www.365yg.com/> |✓| | |
| 西瓜视频 | <https://www.ixigua.com/> |✓| | |
| 快手 | <https://www.kuaishou.com/> |✓|✓| |
| 抖音 | <https://www.douyin.com/> |✓| | |
| TikTok | <https://www.tiktok.com/> |✓| | |
| 中国体育(TV) | <http://v.zhibo.tv/> </br><http://video.zhibo.tv/> |✓| | |
| 知乎 | <https://www.zhihu.com/> |✓| | |
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
@ -423,7 +452,7 @@ For all other sites not on the list, the universal extractor will take care of f
If something is broken and `you-get` can't get you things you want, don't panic. (Yes, this happens all the time!)
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>. If not, follow the guidelines on [how to report a broken extractor](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md#report-a-broken-extractor).
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>. If not, follow the guidelines on [how to report an issue](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md).
## Getting Involved

View File

@ -10,6 +10,7 @@ import socket
import locale
import logging
import argparse
import ssl
from http import cookiejar
from importlib import import_module
from urllib import request, parse, error
@ -24,6 +25,7 @@ sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
SITES = {
'163' : 'netease',
'56' : 'w56',
'365yg' : 'toutiao',
'acfun' : 'acfun',
'archive' : 'archive',
'baidu' : 'baidu',
@ -36,13 +38,11 @@ SITES = {
'cbs' : 'cbs',
'coub' : 'coub',
'dailymotion' : 'dailymotion',
'dilidili' : 'dilidili',
'douban' : 'douban',
'douyin' : 'douyin',
'douyu' : 'douyutv',
'ehow' : 'ehow',
'facebook' : 'facebook',
'fantasy' : 'fantasy',
'fc2' : 'fc2video',
'flickr' : 'flickr',
'freesound' : 'freesound',
@ -50,7 +50,6 @@ SITES = {
'google' : 'google',
'giphy' : 'giphy',
'heavy-music' : 'heavymusic',
'huaban' : 'huaban',
'huomao' : 'huomaotv',
'iask' : 'sina',
'icourses' : 'icourses',
@ -64,6 +63,7 @@ SITES = {
'iqiyi' : 'iqiyi',
'ixigua' : 'ixigua',
'isuntv' : 'suntv',
'iwara' : 'iwara',
'joy' : 'joy',
'kankanews' : 'bilibili',
'khanacademy' : 'khan',
@ -74,6 +74,7 @@ SITES = {
'le' : 'le',
'letv' : 'le',
'lizhi' : 'lizhi',
'longzhu' : 'longzhu',
'magisto' : 'magisto',
'metacafe' : 'metacafe',
'mgtv' : 'mgtv',
@ -81,16 +82,15 @@ SITES = {
'mixcloud' : 'mixcloud',
'mtv81' : 'mtv81',
'musicplayon' : 'musicplayon',
'miaopai' : 'yixia',
'naver' : 'naver',
'7gogo' : 'nanagogo',
'nicovideo' : 'nicovideo',
'panda' : 'panda',
'pinterest' : 'pinterest',
'pixnet' : 'pixnet',
'pptv' : 'pptv',
'qingting' : 'qingting',
'qq' : 'qq',
'quanmin' : 'quanmin',
'showroom-live' : 'showroom',
'sina' : 'sina',
'smgbb' : 'bilibili',
@ -98,6 +98,7 @@ SITES = {
'soundcloud' : 'soundcloud',
'ted' : 'ted',
'theplatform' : 'theplatform',
'tiktok' : 'tiktok',
'tucao' : 'tucao',
'tudou' : 'tudou',
'tumblr' : 'tumblr',
@ -117,30 +118,32 @@ SITES = {
'xiaojiadianvideo' : 'fc2video',
'ximalaya' : 'ximalaya',
'yinyuetai' : 'yinyuetai',
'miaopai' : 'yixia',
'yizhibo' : 'yizhibo',
'youku' : 'youku',
'iwara' : 'iwara',
'youtu' : 'youtube',
'youtube' : 'youtube',
'zhanqi' : 'zhanqi',
'365yg' : 'toutiao',
'zhibo' : 'zhibo',
'zhihu' : 'zhihu',
}
dry_run = False
json_output = False
force = False
skip_existing_file_size_check = False
player = None
extractor_proxy = None
cookies = None
output_filename = None
auto_rename = False
insecure = False
fake_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # noqa
'Accept-Charset': 'UTF-8,*;q=0.5',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0', # noqa
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0', # noqa
}
if sys.stdout.isatty():
@ -268,7 +271,15 @@ def matchall(text, patterns):
def launch_player(player, urls):
import subprocess
import shlex
subprocess.call(shlex.split(player) + list(urls))
if (sys.version_info >= (3, 3)):
import shutil
exefile=shlex.split(player)[0]
if shutil.which(exefile) is not None:
subprocess.call(shlex.split(player) + list(urls))
else:
log.wtf('[Failed] Cannot find player "%s"' % exefile)
else:
subprocess.call(shlex.split(player) + list(urls))
def parse_query_param(url, param):
@ -366,20 +377,30 @@ def get_decoded_html(url, faker=False):
return data
def get_location(url):
def get_location(url, headers=None, get_method='HEAD'):
logging.debug('get_location: %s' % url)
response = request.urlopen(url)
# urllib will follow redirections and it's too much code to tell urllib
# not to do that
return response.geturl()
if headers:
req = request.Request(url, headers=headers)
else:
req = request.Request(url)
req.get_method = lambda: get_method
res = urlopen_with_retry(req)
return res.geturl()
def urlopen_with_retry(*args, **kwargs):
retry_time = 3
for i in range(retry_time):
try:
return request.urlopen(*args, **kwargs)
if insecure:
# ignore ssl errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
return request.urlopen(*args, context=ctx, **kwargs)
else:
return request.urlopen(*args, **kwargs)
except socket.timeout as e:
logging.debug('request attempt %s timeout' % str(i + 1))
if i + 1 == retry_time:
@ -423,17 +444,17 @@ def get_content(url, headers={}, decoded=True):
# Decode the response body
if decoded:
charset = match1(
response.getheader('Content-Type'), r'charset=([\w-]+)'
response.getheader('Content-Type', ''), r'charset=([\w-]+)'
)
if charset is not None:
data = data.decode(charset)
data = data.decode(charset, 'ignore')
else:
data = data.decode('utf-8', 'ignore')
return data
def post_content(url, headers={}, post_data={}, decoded=True):
def post_content(url, headers={}, post_data={}, decoded=True, **kwargs):
"""Post the content of a URL via sending a HTTP POST request.
Args:
@ -444,14 +465,19 @@ def post_content(url, headers={}, post_data={}, decoded=True):
Returns:
The content as a string.
"""
logging.debug('post_content: %s \n post_data: %s' % (url, post_data))
if kwargs.get('post_data_raw'):
logging.debug('post_content: %s\npost_data_raw: %s' % (url, kwargs['post_data_raw']))
else:
logging.debug('post_content: %s\npost_data: %s' % (url, post_data))
req = request.Request(url, headers=headers)
if cookies:
cookies.add_cookie_header(req)
req.headers.update(req.unredirected_hdrs)
post_data_enc = bytes(parse.urlencode(post_data), 'utf-8')
if kwargs.get('post_data_raw'):
post_data_enc = bytes(kwargs['post_data_raw'], 'utf-8')
else:
post_data_enc = bytes(parse.urlencode(post_data), 'utf-8')
response = urlopen_with_retry(req, data=post_data_enc)
data = response.read()
@ -493,7 +519,7 @@ def urls_size(urls, faker=False, headers={}):
return sum([url_size(url, faker=faker, headers=headers) for url in urls])
def get_head(url, headers={}, get_method='HEAD'):
def get_head(url, headers=None, get_method='HEAD'):
logging.debug('get_head: %s' % url)
if headers:
@ -502,7 +528,7 @@ def get_head(url, headers={}, get_method='HEAD'):
req = request.Request(url)
req.get_method = lambda: get_method
res = urlopen_with_retry(req)
return dict(res.headers)
return res.headers
def url_info(url, faker=False, headers={}):
@ -596,29 +622,60 @@ def url_save(
# the key must be 'Referer' for the hack here
if refer is not None:
tmp_headers['Referer'] = refer
file_size = url_size(url, faker=faker, headers=tmp_headers)
if type(url) is list:
file_size = urls_size(url, faker=faker, headers=tmp_headers)
is_chunked, urls = True, url
else:
file_size = url_size(url, faker=faker, headers=tmp_headers)
is_chunked, urls = False, [url]
if os.path.exists(filepath):
if not force and file_size == os.path.getsize(filepath):
if not is_part:
if bar:
bar.done()
print(
'Skipping {}: file already exists'.format(
tr(os.path.basename(filepath))
)
)
continue_renameing = True
while continue_renameing:
continue_renameing = False
if os.path.exists(filepath):
if not force and (file_size == os.path.getsize(filepath) or skip_existing_file_size_check):
if not is_part:
if bar:
bar.done()
if skip_existing_file_size_check:
log.w(
'Skipping {} without checking size: file already exists'.format(
tr(os.path.basename(filepath))
)
)
else:
log.w(
'Skipping {}: file already exists'.format(
tr(os.path.basename(filepath))
)
)
else:
if bar:
bar.update_received(file_size)
return
else:
if bar:
bar.update_received(file_size)
return
else:
if not is_part:
if bar:
bar.done()
print('Overwriting %s' % tr(os.path.basename(filepath)), '...')
elif not os.path.exists(os.path.dirname(filepath)):
os.mkdir(os.path.dirname(filepath))
if not is_part:
if bar:
bar.done()
if not force and auto_rename:
path, ext = os.path.basename(filepath).rsplit('.', 1)
finder = re.compile(' \([1-9]\d*?\)$')
if (finder.search(path) is None):
thisfile = path + ' (1).' + ext
else:
def numreturn(a):
return ' (' + str(int(a.group()[2:-1]) + 1) + ').'
thisfile = finder.sub(numreturn, path) + ext
filepath = os.path.join(os.path.dirname(filepath), thisfile)
print('Changing name to %s' % tr(os.path.basename(filepath)), '...')
continue_renameing = True
continue
if log.yes_or_no('File with this name already exists. Overwrite?'):
log.w('Overwriting %s ...' % tr(os.path.basename(filepath)))
else:
return
elif not os.path.exists(os.path.dirname(filepath)):
os.mkdir(os.path.dirname(filepath))
temp_filepath = filepath + '.download' if file_size != float('inf') \
else filepath
@ -633,70 +690,78 @@ def url_save(
else:
open_mode = 'wb'
if received < file_size:
if faker:
tmp_headers = fake_headers
'''
if parameter headers passed in, we have it copied as tmp_header
elif headers:
headers = headers
else:
headers = {}
'''
if received:
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
if refer:
tmp_headers['Referer'] = refer
for url in urls:
received_chunk = 0
if received < file_size:
if faker:
tmp_headers = fake_headers
'''
if parameter headers passed in, we have it copied as tmp_header
elif headers:
headers = headers
else:
headers = {}
'''
if received and not is_chunked: # only request a range when not chunked
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
if refer:
tmp_headers['Referer'] = refer
if timeout:
response = urlopen_with_retry(
request.Request(url, headers=tmp_headers), timeout=timeout
)
else:
response = urlopen_with_retry(
request.Request(url, headers=tmp_headers)
)
try:
range_start = int(
response.headers[
'content-range'
][6:].split('/')[0].split('-')[0]
)
end_length = int(
response.headers['content-range'][6:].split('/')[1]
)
range_length = end_length - range_start
except:
content_length = response.headers['content-length']
range_length = int(content_length) if content_length is not None \
else float('inf')
if timeout:
response = urlopen_with_retry(
request.Request(url, headers=tmp_headers), timeout=timeout
)
else:
response = urlopen_with_retry(
request.Request(url, headers=tmp_headers)
)
try:
range_start = int(
response.headers[
'content-range'
][6:].split('/')[0].split('-')[0]
)
end_length = int(
response.headers['content-range'][6:].split('/')[1]
)
range_length = end_length - range_start
except:
content_length = response.headers['content-length']
range_length = int(content_length) if content_length is not None \
else float('inf')
if file_size != received + range_length:
received = 0
if bar:
bar.received = 0
open_mode = 'wb'
with open(temp_filepath, open_mode) as output:
while True:
buffer = None
try:
buffer = response.read(1024 * 256)
except socket.timeout:
pass
if not buffer:
if received == file_size: # Download finished
break
# Unexpected termination. Retry request
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
response = urlopen_with_retry(
request.Request(url, headers=tmp_headers)
)
continue
output.write(buffer)
received += len(buffer)
if is_chunked: # always append if chunked
open_mode = 'ab'
elif file_size != received + range_length: # is it ever necessary?
received = 0
if bar:
bar.update_received(len(buffer))
bar.received = 0
open_mode = 'wb'
with open(temp_filepath, open_mode) as output:
while True:
buffer = None
try:
buffer = response.read(1024 * 256)
except socket.timeout:
pass
if not buffer:
if is_chunked and received_chunk == range_length:
break
elif not is_chunked and received == file_size: # Download finished
break
# Unexpected termination. Retry request
if not is_chunked: # when
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
response = urlopen_with_retry(
request.Request(url, headers=tmp_headers)
)
continue
output.write(buffer)
received += len(buffer)
received_chunk += len(buffer)
if bar:
bar.update_received(len(buffer))
assert received == os.path.getsize(temp_filepath), '%s == %s == %s' % (
received, os.path.getsize(temp_filepath), temp_filepath
@ -820,13 +885,16 @@ class DummyProgressBar:
pass
def get_output_filename(urls, title, ext, output_dir, merge):
def get_output_filename(urls, title, ext, output_dir, merge, **kwargs):
# lame hack for the --output-filename option
global output_filename
if output_filename:
result = output_filename
if kwargs.get('part', -1) >= 0:
result = '%s[%02d]' % (result, kwargs.get('part'))
if ext:
return output_filename + '.' + ext
return output_filename
result = '%s.%s' % (result, ext)
return result
merged_ext = ext
if (len(urls) > 1) and merge:
@ -843,7 +911,11 @@ def get_output_filename(urls, title, ext, output_dir, merge):
merged_ext = 'mkv'
else:
merged_ext = 'ts'
return '%s.%s' % (title, merged_ext)
result = title
if kwargs.get('part', -1) >= 0:
result = '%s[%02d]' % (result, kwargs.get('part'))
result = '%s.%s' % (result, merged_ext)
return result
def print_user_agent(faker=False):
urllib_default_user_agent = 'Python-urllib/%d.%d' % sys.version_info[:2]
@ -863,7 +935,10 @@ def download_urls(
return
if dry_run:
print_user_agent(faker=faker)
print('Real URLs:\n%s' % '\n'.join(urls))
try:
print('Real URLs:\n%s' % '\n'.join(urls))
except:
print('Real URLs:\n%s' % '\n'.join([j for i in urls for j in i]))
return
if player:
@ -883,9 +958,13 @@ def download_urls(
output_filepath = os.path.join(output_dir, output_filename)
if total_size:
if not force and os.path.exists(output_filepath) \
and os.path.getsize(output_filepath) >= total_size * 0.9:
print('Skipping %s: file already exists' % output_filepath)
if not force and os.path.exists(output_filepath) and not auto_rename\
and (os.path.getsize(output_filepath) >= total_size * 0.9\
or skip_existing_file_size_check):
if skip_existing_file_size_check:
log.w('Skipping %s without checking size: file already exists' % output_filepath)
else:
log.w('Skipping %s: file already exists' % output_filepath)
print()
return
bar = SimpleProgressBar(total_size, len(urls))
@ -903,16 +982,16 @@ def download_urls(
bar.done()
else:
parts = []
print('Downloading %s.%s ...' % (tr(title), ext))
print('Downloading %s ...' % tr(output_filename))
bar.update()
for i, url in enumerate(urls):
filename = '%s[%02d].%s' % (title, i, ext)
filepath = os.path.join(output_dir, filename)
parts.append(filepath)
output_filename_i = get_output_filename(urls, title, ext, output_dir, merge, part=i)
output_filepath_i = os.path.join(output_dir, output_filename_i)
parts.append(output_filepath_i)
# print 'Downloading %s [%s/%s]...' % (tr(filename), i + 1, len(urls))
bar.update_piece(i + 1)
url_save(
url, filepath, bar, refer=refer, is_part=True, faker=faker,
url, output_filepath_i, bar, refer=refer, is_part=True, faker=faker,
headers=headers, **kwargs
)
bar.done()
@ -1225,27 +1304,89 @@ def download_main(download, download_playlist, urls, playlist, **kwargs):
def load_cookies(cookiefile):
global cookies
try:
cookies = cookiejar.MozillaCookieJar(cookiefile)
cookies.load()
except Exception:
import sqlite3
if cookiefile.endswith('.txt'):
# MozillaCookieJar treats prefix '#HttpOnly_' as comments incorrectly!
# do not use its load()
# see also:
# - https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.MozillaCookieJar
# - https://github.com/python/cpython/blob/4b219ce/Lib/http/cookiejar.py#L2014
# - https://curl.haxx.se/libcurl/c/CURLOPT_COOKIELIST.html#EXAMPLE
#cookies = cookiejar.MozillaCookieJar(cookiefile)
#cookies.load()
from http.cookiejar import Cookie
cookies = cookiejar.MozillaCookieJar()
con = sqlite3.connect(cookiefile)
cur = con.cursor()
try:
cur.execute("""SELECT host, path, isSecure, expiry, name, value
FROM moz_cookies""")
for item in cur.fetchall():
c = cookiejar.Cookie(
0, item[4], item[5], None, False, item[0],
item[0].startswith('.'), item[0].startswith('.'),
item[1], False, item[2], item[3], item[3] == '', None,
None, {},
)
now = time.time()
ignore_discard, ignore_expires = False, False
with open(cookiefile, 'r') as f:
for line in f:
# last field may be absent, so keep any trailing tab
if line.endswith("\n"): line = line[:-1]
# skip comments and blank lines XXX what is $ for?
if (line.strip().startswith(("#", "$")) or
line.strip() == ""):
if not line.strip().startswith('#HttpOnly_'): # skip for #HttpOnly_
continue
domain, domain_specified, path, secure, expires, name, value = \
line.split("\t")
secure = (secure == "TRUE")
domain_specified = (domain_specified == "TRUE")
if name == "":
# cookies.txt regards 'Set-Cookie: foo' as a cookie
# with no name, whereas http.cookiejar regards it as a
# cookie with no value.
name = value
value = None
initial_dot = domain.startswith(".")
if not line.strip().startswith('#HttpOnly_'): # skip for #HttpOnly_
assert domain_specified == initial_dot
discard = False
if expires == "":
expires = None
discard = True
# assume path_specified is false
c = Cookie(0, name, value,
None, False,
domain, domain_specified, initial_dot,
path, False,
secure,
expires,
discard,
None,
None,
{})
if not ignore_discard and c.discard:
continue
if not ignore_expires and c.is_expired(now):
continue
cookies.set_cookie(c)
except Exception:
pass
elif cookiefile.endswith(('.sqlite', '.sqlite3')):
import sqlite3, shutil, tempfile
temp_dir = tempfile.gettempdir()
temp_cookiefile = os.path.join(temp_dir, 'temp_cookiefile.sqlite')
shutil.copy2(cookiefile, temp_cookiefile)
cookies = cookiejar.MozillaCookieJar()
con = sqlite3.connect(temp_cookiefile)
cur = con.cursor()
cur.execute("""SELECT host, path, isSecure, expiry, name, value
FROM moz_cookies""")
for item in cur.fetchall():
c = cookiejar.Cookie(
0, item[4], item[5], None, False, item[0],
item[0].startswith('.'), item[0].startswith('.'),
item[1], False, item[2], item[3], item[3] == '', None,
None, {},
)
cookies.set_cookie(c)
else:
log.e('[error] unsupported cookies format')
# TODO: Chromium Cookies
# SELECT host_key, path, secure, expires_utc, name, encrypted_value
# FROM cookies
@ -1332,6 +1473,10 @@ def script_main(download, download_playlist, **kwargs):
'-f', '--force', action='store_true', default=False,
help='Force overwriting existing files'
)
download_grp.add_argument(
'--skip-existing-file-size-check', action='store_true', default=False,
help='Skip existing file without checking file size'
)
download_grp.add_argument(
'-F', '--format', metavar='STREAM_ID',
help='Set video format to STREAM_ID'
@ -1370,6 +1515,15 @@ def script_main(download, download_playlist, **kwargs):
'-l', '--playlist', action='store_true',
help='Prefer to download a playlist'
)
download_grp.add_argument(
'-a', '--auto-rename', action='store_true', default=False,
help='Auto rename same name different files'
)
download_grp.add_argument(
'-k', '--insecure', action='store_true', default=False,
help='ignore ssl errors'
)
proxy_grp = parser.add_argument_group('Proxy options')
proxy_grp = proxy_grp.add_mutually_exclusive_group()
@ -1409,16 +1563,24 @@ def script_main(download, download_playlist, **kwargs):
logging.getLogger().setLevel(logging.DEBUG)
global force
global skip_existing_file_size_check
global dry_run
global json_output
global player
global extractor_proxy
global output_filename
global auto_rename
global insecure
output_filename = args.output_filename
extractor_proxy = args.extractor_proxy
info_only = args.info
if args.force:
force = True
if args.skip_existing_file_size_check:
skip_existing_file_size_check = True
if args.auto_rename:
auto_rename = True
if args.url:
dry_run = True
if args.json:
@ -1438,6 +1600,11 @@ def script_main(download, download_playlist, **kwargs):
player = args.player
caption = False
if args.insecure:
# ignore ssl
insecure = True
if args.no_proxy:
set_http_proxy('')
else:
@ -1523,9 +1690,9 @@ def google_search(url):
url = 'https://www.google.com/search?tbm=vid&q=%s' % parse.quote(keywords)
page = get_content(url, headers=fake_headers)
videos = re.findall(
r'<a href="(https?://[^"]+)" onmousedown="[^"]+">([^<]+)<', page
r'<a href="(https?://[^"]+)" onmousedown="[^"]+"><h3 class="[^"]*">([^<]+)<', page
)
vdurs = re.findall(r'<span class="vdur _dwc">([^<]+)<', page)
vdurs = re.findall(r'<span class="vdur[^"]*">([^<]+)<', page)
durs = [r1(r'(\d+:\d+)', unescape_html(dur)) for dur in vdurs]
print('Google Videos search:')
for v in zip(videos, durs):
@ -1554,6 +1721,11 @@ def url_to_module(url):
domain = r1(r'(\.[^.]+\.[^.]+)$', video_host) or video_host
assert domain, 'unsupported url: ' + url
# all non-ASCII code points must be quoted (percent-encoded UTF-8)
url = ''.join([ch if ord(ch) in range(128) else parse.quote(ch) for ch in url])
video_host = r1(r'https?://([^/]+)/', url)
video_url = r1(r'https?://[^/]+(.*)', url)
k = r1(r'([^.]+)', domain)
if k in SITES:
return (
@ -1561,15 +1733,11 @@ def url_to_module(url):
url
)
else:
import http.client
video_host = r1(r'https?://([^/]+)/', url) # .cn could be removed
if url.startswith('https://'):
conn = http.client.HTTPSConnection(video_host)
else:
conn = http.client.HTTPConnection(video_host)
conn.request('HEAD', video_url, headers=fake_headers)
res = conn.getresponse()
location = res.getheader('location')
try:
location = get_location(url) # t.co isn't happy with fake_headers
except:
location = get_location(url, headers=fake_headers)
if location and location != url and not location.startswith('/'):
return url_to_module(location)
else:

View File

@ -1,10 +1,11 @@
#!/usr/bin/env python
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run, player
from .common import print_more_compatible as print
from .util import log
from . import json_output
import os
import sys
class Extractor():
def __init__(self, *args):
@ -32,7 +33,8 @@ class VideoExtractor():
self.out = False
self.ua = None
self.referer = None
self.danmuku = None
self.danmaku = None
self.lyrics = None
if args:
self.url = args[0]
@ -105,7 +107,7 @@ class VideoExtractor():
if 'quality' in stream:
print(" quality: %s" % stream['quality'])
if 'size' in stream and stream['container'].lower() != 'm3u8':
if 'size' in stream and 'container' in stream and stream['container'].lower() != 'm3u8':
if stream['size'] != float('inf') and stream['size'] != 0:
print(" size: %s MiB (%s bytes)" % (round(stream['size'] / 1048576, 1), stream['size']))
@ -130,6 +132,8 @@ class VideoExtractor():
print(" url: %s" % self.url)
print()
sys.stdout.flush()
def p(self, stream_id=None):
maybe_print("site: %s" % self.__class__.name)
maybe_print("title: %s" % self.title)
@ -154,9 +158,10 @@ class VideoExtractor():
for stream in itags:
self.p_stream(stream)
# Print all other available streams
print(" [ DEFAULT ] %s" % ('_' * 33))
for stream in self.streams_sorted:
self.p_stream(stream['id'] if 'id' in stream else stream['itag'])
if self.streams_sorted:
print(" [ DEFAULT ] %s" % ('_' * 33))
for stream in self.streams_sorted:
self.p_stream(stream['id'] if 'id' in stream else stream['itag'])
if self.audiolang:
print("audio-languages:")
@ -164,6 +169,8 @@ class VideoExtractor():
print(" - lang: {}".format(i['lang']))
print(" download-url: {}\n".format(i['url']))
sys.stdout.flush()
def p_playlist(self, stream_id=None):
maybe_print("site: %s" % self.__class__.name)
print("playlist: %s" % self.title)
@ -195,7 +202,13 @@ class VideoExtractor():
else:
# Download stream with the best quality
from .processor.ffmpeg import has_ffmpeg_installed
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
if has_ffmpeg_installed() and player is None and self.dash_streams or not self.streams_sorted:
#stream_id = list(self.dash_streams)[-1]
itags = sorted(self.dash_streams,
key=lambda i: -self.dash_streams[i]['size'])
stream_id = itags[0]
else:
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
if 'index' not in kwargs:
self.p(stream_id)
@ -211,7 +224,7 @@ class VideoExtractor():
ext = self.dash_streams[stream_id]['container']
total_size = self.dash_streams[stream_id]['size']
if ext == 'm3u8':
if ext == 'm3u8' or ext == 'm4a':
ext = 'mp4'
if not urls:
@ -226,9 +239,11 @@ class VideoExtractor():
output_dir=kwargs['output_dir'],
merge=kwargs['merge'],
av=stream_id in self.dash_streams)
if 'caption' not in kwargs or not kwargs['caption']:
print('Skipping captions or danmuku.')
print('Skipping captions or danmaku.')
return
for lang in self.caption_tracks:
filename = '%s.%s.srt' % (get_filename(self.title), lang)
print('Saving %s ... ' % filename, end="", flush=True)
@ -237,11 +252,18 @@ class VideoExtractor():
'w', encoding='utf-8') as x:
x.write(srt)
print('Done.')
if self.danmuku is not None and not dry_run:
if self.danmaku is not None and not dry_run:
filename = '{}.cmt.xml'.format(get_filename(self.title))
print('Downloading {} ...\n'.format(filename))
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
fp.write(self.danmuku)
fp.write(self.danmaku)
if self.lyrics is not None and not dry_run:
filename = '{}.lrc'.format(get_filename(self.title))
print('Downloading {} ...\n'.format(filename))
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
fp.write(self.lyrics)
# For main_dev()
#download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size'])

View File

@ -13,20 +13,17 @@ from .ckplayer import *
from .cntv import *
from .coub import *
from .dailymotion import *
from .dilidili import *
from .douban import *
from .douyin import *
from .douyutv import *
from .ehow import *
from .facebook import *
from .fantasy import *
from .fc2video import *
from .flickr import *
from .freesound import *
from .funshion import *
from .google import *
from .heavymusic import *
from .huaban import *
from .icourses import *
from .ifeng import *
from .imgur import *
@ -41,6 +38,7 @@ from .kugou import *
from .kuwo import *
from .le import *
from .lizhi import *
from .longzhu import *
from .magisto import *
from .metacafe import *
from .mgtv import *
@ -53,7 +51,6 @@ from .nanagogo import *
from .naver import *
from .netease import *
from .nicovideo import *
from .panda import *
from .pinterest import *
from .pixnet import *
from .pptv import *
@ -66,6 +63,7 @@ from .sohu import *
from .soundcloud import *
from .suntv import *
from .theplatform import *
from .tiktok import *
from .tucao import *
from .tudou import *
from .tumblr import *
@ -87,3 +85,5 @@ from .ted import *
from .khan import *
from .zhanqi import *
from .kuaishou import *
from .zhibo import *
from .zhihu import *

View File

@ -65,7 +65,7 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
elif sourceType == 'tudou':
tudou_download_by_iid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'qq':
qq_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
qq_download_by_vid(sourceId, title, True, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'letv':
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'zhuzhan':
@ -85,9 +85,13 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
_, _, seg_size = url_info(url)
size += seg_size
#fallback to flvhd is not quite possible
print_info(site_info, title, 'mp4', size)
if re.search(r'fid=[0-9A-Z\-]*.flv', preferred[0][0]):
ext = 'flv'
else:
ext = 'mp4'
print_info(site_info, title, ext, size)
if not info_only:
download_urls(preferred[0], title, 'mp4', size, output_dir=output_dir, merge=merge)
download_urls(preferred[0], title, ext, size, output_dir=output_dir, merge=merge)
else:
raise NotImplementedError(sourceType)
@ -105,27 +109,46 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
pass
def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
assert re.match(r'http://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url)
html = get_content(url)
assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', url)
title = r1(r'data-title="([^"]+)"', html)
if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url):
html = get_content(url)
json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});")
json_data = json.loads(json_text)
vid = json_data.get('currentVideoInfo').get('id')
up = json_data.get('user').get('name')
title = json_data.get('title')
video_list = json_data.get('videoList')
if len(video_list) > 1:
title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0]
# bangumi
elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/ab(\d+)", url):
html = get_content(url)
tag_script = match1(html, r'<script>window\.pageInfo([^<]+)</script>')
json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1]
json_data = json.loads(json_text)
title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title']
vid = str(json_data['videoId'])
up = "acfun"
else:
raise NotImplemented
assert title and vid
title = unescape_html(title)
title = escape_file_path(title)
assert title
if match1(url, r'_(\d+)$'): # current P
title = title + " " + r1(r'active">([^<]*)', html)
vid = r1('data-vid="(\d+)"', html)
up = r1('data-name="([^"]+)"', html)
p_title = r1('active">([^<]+)', html)
title = '%s (%s)' % (title, up)
if p_title: title = '%s - %s' % (title, p_title)
if p_title:
title = '%s - %s' % (title, p_title)
acfun_download_by_vid(vid, title,
output_dir=output_dir,
merge=merge,
info_only=info_only,
**kwargs)
site_info = "AcFun.tv"
site_info = "AcFun.cn"
download = acfun_download
download_playlist = playlist_not_supported('acfun')

View File

@ -38,7 +38,7 @@ def baidu_get_song_title(data):
def baidu_get_song_lyric(data):
lrc = data['lrcLink']
return None if lrc is '' else "http://music.baidu.com%s" % lrc
return "http://music.baidu.com%s" % lrc if lrc else None
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
@ -123,12 +123,22 @@ def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=
elif re.match('http://tieba.baidu.com/', url):
try:
# embedded videos
embed_download(url, output_dir, merge=merge, info_only=info_only)
embed_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
except:
# images
html = get_html(url)
title = r1(r'title:"([^"]+)"', html)
vhsrc = re.findall(r'"BDE_Image"[^>]+src="([^"]+\.mp4)"', html) or \
re.findall(r'vhsrc="([^"]+)"', html)
if len(vhsrc) > 0:
ext = 'mp4'
size = url_size(vhsrc[0])
print_info(site_info, title, ext, size)
if not info_only:
download_urls(vhsrc, title, ext, size,
output_dir=output_dir, merge=False)
items = re.findall(
r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i

View File

@ -1,362 +1,573 @@
#!/usr/bin/env python
__all__ = ['bilibili_download']
from ..common import *
from ..extractor import VideoExtractor
import hashlib
import re
import time
import json
import http.cookiejar
import urllib.request
import urllib.parse
from xml.dom.minidom import parseString
from ..common import *
from ..util.log import *
from ..extractor import *
from .qq import qq_download_by_vid
from .sina import sina_download_by_vid
from .tudou import tudou_download_by_id
from .youku import youku_download_by_vid
class Bilibili(VideoExtractor):
name = 'Bilibili'
live_api = 'http://live.bilibili.com/api/playurl?cid={}&otype=json'
api_url = 'http://interface.bilibili.com/playurl?'
bangumi_api_url = 'http://bangumi.bilibili.com/player/web_api/playurl?'
live_room_init_api_url = 'https://api.live.bilibili.com/room/v1/Room/room_init?id={}'
live_room_info_api_url = 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id={}'
name = "Bilibili"
SEC1 = '1c15888dc316e05a15fdd0a02ed6584f'
SEC2 = '9b288147e5474dd2aa67085f716c560d'
# Bilibili media encoding options, in descending quality order.
stream_types = [
{'id': 'hdflv'},
{'id': 'flv720'},
{'id': 'flv'},
{'id': 'hdmp4'},
{'id': 'mp4'},
{'id': 'live'},
{'id': 'vc'}
{'id': 'flv_p60', 'quality': 116, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P60'},
{'id': 'hdflv2', 'quality': 112, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P+'},
{'id': 'flv', 'quality': 80, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P'},
{'id': 'flv720_p60', 'quality': 74, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P60'},
{'id': 'flv720', 'quality': 64, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P'},
{'id': 'hdmp4', 'quality': 48, 'audio_quality': 30280,
'container': 'MP4', 'video_resolution': '720p', 'desc': '高清 720P (MP4)'},
{'id': 'flv480', 'quality': 32, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '480p', 'desc': '清晰 480P'},
{'id': 'flv360', 'quality': 16, 'audio_quality': 30216,
'container': 'FLV', 'video_resolution': '360p', 'desc': '流畅 360P'},
# 'quality': 15?
{'id': 'mp4', 'quality': 0},
]
fmt2qlt = dict(hdflv=4, flv=3, hdmp4=2, mp4=1)
@staticmethod
def bilibili_stream_type(urls):
url = urls[0]
if 'hd.flv' in url or '-112.flv' in url:
return 'hdflv', 'flv'
if '-64.flv' in url:
return 'flv720', 'flv'
if '.flv' in url:
return 'flv', 'flv'
if 'hd.mp4' in url or '-48.mp4' in url:
return 'hdmp4', 'mp4'
if '.mp4' in url:
return 'mp4', 'mp4'
raise Exception('Unknown stream type')
def api_req(self, cid, quality, bangumi, bangumi_movie=False, **kwargs):
ts = str(int(time.time()))
if not bangumi:
params_str = 'cid={}&player=1&quality={}&ts={}'.format(cid, quality, ts)
chksum = hashlib.md5(bytes(params_str+self.SEC1, 'utf8')).hexdigest()
api_url = self.api_url + params_str + '&sign=' + chksum
def height_to_quality(height):
if height <= 360:
return 16
elif height <= 480:
return 32
elif height <= 720:
return 64
else:
mod = 'movie' if bangumi_movie else 'bangumi'
params_str = 'cid={}&module={}&player=1&quality={}&ts={}'.format(cid, mod, quality, ts)
chksum = hashlib.md5(bytes(params_str+self.SEC2, 'utf8')).hexdigest()
api_url = self.bangumi_api_url + params_str + '&sign=' + chksum
return 80
xml_str = get_content(api_url, headers={'referer': self.url, 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'})
return xml_str
@staticmethod
def bilibili_headers(referer=None, cookie=None):
# a reasonable UA
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
headers = {'User-Agent': ua}
if referer is not None:
headers.update({'Referer': referer})
if cookie is not None:
headers.update({'Cookie': cookie})
return headers
def parse_bili_xml(self, xml_str):
urls_list = []
total_size = 0
doc = parseString(xml_str.encode('utf8'))
durls = doc.getElementsByTagName('durl')
for durl in durls:
size = durl.getElementsByTagName('size')[0]
total_size += int(size.firstChild.nodeValue)
url = durl.getElementsByTagName('url')[0]
urls_list.append(url.firstChild.nodeValue)
stream_type, container = self.bilibili_stream_type(urls_list)
if stream_type not in self.streams:
self.streams[stream_type] = {}
self.streams[stream_type]['src'] = urls_list
self.streams[stream_type]['size'] = total_size
self.streams[stream_type]['container'] = container
@staticmethod
def bilibili_api(avid, cid, qn=0):
return 'https://api.bilibili.com/x/player/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&fnver=0&fnval=16' % (avid, cid, qn)
def download_by_vid(self, cid, bangumi, **kwargs):
stream_id = kwargs.get('stream_id')
# guard here. if stream_id invalid, fallback as not stream_id
if stream_id and stream_id in self.fmt2qlt:
quality = stream_id
else:
quality = 'hdflv' if bangumi else 'flv'
@staticmethod
def bilibili_audio_api(sid):
return 'https://www.bilibili.com/audio/music-service-c/web/url?sid=%s' % sid
info_only = kwargs.get('info_only')
for qlt in range(4, -1, -1):
api_xml = self.api_req(cid, qlt, bangumi, **kwargs)
self.parse_bili_xml(api_xml)
if not info_only or stream_id:
self.danmuku = get_danmuku_xml(cid)
@staticmethod
def bilibili_audio_info_api(sid):
return 'https://www.bilibili.com/audio/music-service-c/web/song/info?sid=%s' % sid
@staticmethod
def bilibili_audio_menu_info_api(sid):
return 'https://www.bilibili.com/audio/music-service-c/web/menu/info?sid=%s' % sid
@staticmethod
def bilibili_audio_menu_song_api(sid, ps=100):
return 'https://www.bilibili.com/audio/music-service-c/web/song/of-menu?sid=%s&pn=1&ps=%s' % (sid, ps)
@staticmethod
def bilibili_bangumi_api(avid, cid, ep_id, qn=0):
return 'https://api.bilibili.com/pgc/player/web/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&ep_id=%s&fnver=0&fnval=16' % (avid, cid, qn, ep_id)
@staticmethod
def bilibili_interface_api(cid, qn=0):
entropy = 'rbMCKn@KuamXWlPMoJGsKcbiJKUfkPF_8dABscJntvqhRSETg'
appkey, sec = ''.join([chr(ord(i) + 2) for i in entropy[::-1]]).split(':')
params = 'appkey=%s&cid=%s&otype=json&qn=%s&quality=%s&type=' % (appkey, cid, qn, qn)
chksum = hashlib.md5(bytes(params + sec, 'utf8')).hexdigest()
return 'https://interface.bilibili.com/v2/playurl?%s&sign=%s' % (params, chksum)
@staticmethod
def bilibili_live_api(cid):
return 'https://api.live.bilibili.com/room/v1/Room/playUrl?cid=%s&quality=0&platform=web' % cid
@staticmethod
def bilibili_live_room_info_api(room_id):
return 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id=%s' % room_id
@staticmethod
def bilibili_live_room_init_api(room_id):
return 'https://api.live.bilibili.com/room/v1/Room/room_init?id=%s' % room_id
@staticmethod
def bilibili_space_channel_api(mid, cid, pn=1, ps=100):
return 'https://api.bilibili.com/x/space/channel/video?mid=%s&cid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (mid, cid, pn, ps)
@staticmethod
def bilibili_space_favlist_api(vmid, fid, pn=1, ps=100):
return 'https://api.bilibili.com/x/space/fav/arc?vmid=%s&fid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (vmid, fid, pn, ps)
@staticmethod
def bilibili_space_video_api(mid, pn=1, ps=100):
return 'https://space.bilibili.com/ajax/member/getSubmitVideos?mid=%s&page=%s&pagesize=%s&order=0&jsonp=jsonp' % (mid, pn, ps)
@staticmethod
def bilibili_vc_api(video_id):
return 'https://api.vc.bilibili.com/clip/v1/video/detail?video_id=%s' % video_id
@staticmethod
def url_size(url, faker=False, headers={},err_value=0):
try:
return url_size(url,faker,headers)
except:
return err_value
def prepare(self, **kwargs):
if socket.getdefaulttimeout() == 600: # no timeout specified
socket.setdefaulttimeout(2) # fail fast, very speedy!
# handle "watchlater" URLs
if '/watchlater/' in self.url:
aid = re.search(r'av(\d+)', self.url).group(1)
self.url = 'http://www.bilibili.com/video/av{}/'.format(aid)
self.ua = fake_headers['User-Agent']
self.url = url_locations([self.url])[0]
frag = urllib.parse.urlparse(self.url).fragment
# http://www.bilibili.com/video/av3141144/index_2.html#page=3
if frag:
hit = re.search(r'page=(\d+)', frag)
if hit is not None:
page = hit.group(1)
aid = re.search(r'av(\d+)', self.url).group(1)
self.url = 'http://www.bilibili.com/video/av{}/index_{}.html'.format(aid, page)
self.referer = self.url
self.page = get_content(self.url)
m = re.search(r'<h1.*?>(.*?)</h1>', self.page) or re.search(r'<h1 title="([^"]+)">', self.page)
if m is not None:
self.title = m.group(1)
if self.title is None:
m = re.search(r'property="og:title" content="([^"]+)"', self.page)
if m is not None:
self.title = m.group(1)
if 'subtitle' in kwargs:
subtitle = kwargs['subtitle']
self.title = '{} {}'.format(self.title, subtitle)
if 'bangumi.bilibili.com/movie' in self.url:
self.movie_entry(**kwargs)
elif 'bangumi.bilibili.com' in self.url:
self.bangumi_entry(**kwargs)
elif 'bangumi/' in self.url:
self.bangumi_entry(**kwargs)
elif 'live.bilibili.com' in self.url:
self.live_entry(**kwargs)
elif 'vc.bilibili.com' in self.url:
self.vc_entry(**kwargs)
else:
self.entry(**kwargs)
def movie_entry(self, **kwargs):
patt = r"var\s*aid\s*=\s*'(\d+)'"
aid = re.search(patt, self.page).group(1)
page_list = json.loads(get_content('http://www.bilibili.com/widget/getPageList?aid={}'.format(aid)))
# better ideas for bangumi_movie titles?
self.title = page_list[0]['pagename']
self.download_by_vid(page_list[0]['cid'], True, bangumi_movie=True, **kwargs)
def entry(self, **kwargs):
# tencent player
tc_flashvars = re.search(r'"bili-cid=\d+&bili-aid=\d+&vid=([^"]+)"', self.page)
if tc_flashvars:
tc_flashvars = tc_flashvars.group(1)
if tc_flashvars is not None:
self.out = True
qq_download_by_vid(tc_flashvars, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
return
has_plist = re.search(r'<option', self.page)
if has_plist and r1('index_(\d+).html', self.url) is None:
log.w('This page contains a playlist. (use --playlist to download all videos.)')
self.stream_qualities = {s['quality']: s for s in self.stream_types}
try:
cid = re.search(r'cid=(\d+)', self.page).group(1)
html_content = get_content(self.url, headers=self.bilibili_headers())
except:
cid = re.search(r'"cid":(\d+)', self.page).group(1)
if cid is not None:
self.download_by_vid(cid, re.search('bangumi', self.url) is not None, **kwargs)
html_content = '' # live always returns 400 (why?)
#self.title = match1(html_content,
# r'<h1 title="([^"]+)"')
# redirect: watchlater
if re.match(r'https?://(www\.)?bilibili\.com/watchlater/#/av(\d+)', self.url):
avid = match1(self.url, r'/av(\d+)')
p = int(match1(self.url, r'/p(\d+)') or '1')
self.url = 'https://www.bilibili.com/video/av%s?p=%s' % (avid, p)
html_content = get_content(self.url, headers=self.bilibili_headers())
# redirect: bangumi/play/ss -> bangumi/play/ep
# redirect: bangumi.bilibili.com/anime -> bangumi/play/ep
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ss(\d+)', self.url) or \
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)/play', self.url):
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
ep_id = initial_state['epList'][0]['id']
self.url = 'https://www.bilibili.com/bangumi/play/ep%s' % ep_id
html_content = get_content(self.url, headers=self.bilibili_headers())
# sort it out
if re.match(r'https?://(www\.)?bilibili\.com/audio/au(\d+)', self.url):
sort = 'audio'
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
sort = 'bangumi'
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
sort = 'bangumi'
elif re.match(r'https?://live\.bilibili\.com/', self.url):
sort = 'live'
elif re.match(r'https?://vc\.bilibili\.com/video/(\d+)', self.url):
sort = 'vc'
elif re.match(r'https?://(www\.)?bilibili\.com/video/av(\d+)', self.url):
sort = 'video'
else:
# flashvars?
flashvars = re.search(r'flashvars="([^"]+)"', self.page).group(1)
if flashvars is None:
raise Exception('Unsupported page {}'.format(self.url))
param = flashvars.split('&')[0]
t, cid = param.split('=')
t = t.strip()
cid = cid.strip()
if t == 'vid':
sina_download_by_vid(cid, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
elif t == 'ykid':
youku_download_by_vid(cid, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
elif t == 'uid':
tudou_download_by_id(cid, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
else:
raise NotImplementedError('Unknown flashvars {}'.format(flashvars))
self.download_playlist_by_url(self.url, **kwargs)
return
def live_entry(self, **kwargs):
# Extract room ID from the short display ID (seen in the room
# URL). The room ID is usually the same as the short ID, but not
# always; case in point: https://live.bilibili.com/48, with 48
# as the short ID and 63727 as the actual ID.
room_short_id = re.search(r'live.bilibili.com/([^?]+)', self.url).group(1)
room_init_api_response = json.loads(get_content(self.live_room_init_api_url.format(room_short_id)))
self.room_id = room_init_api_response['data']['room_id']
# regular av video
if sort == 'video':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
room_info_api_response = json.loads(get_content(self.live_room_info_api_url.format(self.room_id)))
self.title = room_info_api_response['data']['title']
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo = json.loads(playinfo_text) if playinfo_text else None
api_url = self.live_api.format(self.room_id)
json_data = json.loads(get_content(api_url))
urls = [json_data['durl'][0]['url']]
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
self.streams['live'] = {}
self.streams['live']['src'] = urls
self.streams['live']['container'] = 'flv'
self.streams['live']['size'] = 0
# warn if it is a multi-part video
pn = initial_state['videoData']['videos']
if pn > 1 and not kwargs.get('playlist'):
log.w('This is a multipart video. (use --playlist to download all parts.)')
def vc_entry(self, **kwargs):
vc_id = re.search(r'video/(\d+)', self.url)
if not vc_id:
vc_id = re.search(r'vcdetail\?vc=(\d+)', self.url)
if not vc_id:
log.wtf('Unknown url pattern')
endpoint = 'http://api.vc.bilibili.com/clip/v1/video/detail?video_id={}&need_playurl=1'.format(vc_id.group(1))
vc_meta = json.loads(get_content(endpoint, headers=fake_headers))
if vc_meta['code'] != 0:
log.wtf('{}\n{}'.format(vc_meta['msg'], vc_meta['message']))
item = vc_meta['data']['item']
self.title = item['description']
# set video title
self.title = initial_state['videoData']['title']
# refine title for a specific part, if it is a multi-part video
p = int(match1(self.url, r'[\?&]p=(\d+)') or match1(self.url, r'/index_(\d+)') or
'1') # use URL to decide p-number, not initial_state['p']
if pn > 1:
part = initial_state['videoData']['pages'][p - 1]['part']
self.title = '%s (P%s. %s)' % (self.title, p, part)
self.streams['vc'] = {}
self.streams['vc']['src'] = [item['video_playurl']]
self.streams['vc']['container'] = 'mp4'
self.streams['vc']['size'] = int(item['video_size'])
# construct playinfos
avid = initial_state['aid']
cid = initial_state['videoData']['pages'][p - 1]['cid'] # use p-number, not initial_state['videoData']['cid']
current_quality, best_quality = None, None
if playinfo is not None:
current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None
if 'accept_quality' in playinfo['data'] and playinfo['data']['accept_quality'] != []:
best_quality = playinfo['data']['accept_quality'][0]
playinfos = []
if playinfo is not None:
playinfos.append(playinfo)
if playinfo_ is not None:
playinfos.append(playinfo_)
# get alternative formats from API
for qn in [80, 64, 32, 16]:
# automatic format for durl: qn=0
# for dash, qn does not matter
if current_quality is None or qn < current_quality:
api_url = self.bilibili_api(avid, cid, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
else:
message = api_playinfo['data']['message']
if best_quality is None or qn <= best_quality:
api_url = self.bilibili_interface_api(cid, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo_data = json.loads(api_content)
if api_playinfo_data.get('quality'):
playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data})
if not playinfos:
log.w(message)
# use bilibili error video instead
url = 'https://static.hdslb.com/error.mp4'
_, container, size = url_info(url)
self.streams['flv480'] = {'container': container, 'size': size, 'src': [url]}
return
def bangumi_entry(self, **kwargs):
bangumi_id = re.search(r'(\d+)', self.url).group(1)
frag = urllib.parse.urlparse(self.url).fragment
if frag:
episode_id = frag
else:
episode_id = re.search(r'first_ep_id\s*=\s*"(\d+)"', self.page) or re.search(r'\/ep(\d+)', self.url).group(1)
# cont = post_content('http://bangumi.bilibili.com/web_api/get_source', post_data=dict(episode_id=episode_id))
# cid = json.loads(cont)['result']['cid']
cont = get_content('http://bangumi.bilibili.com/web_api/episode/{}.json'.format(episode_id))
ep_info = json.loads(cont)['result']['currentEpisode']
for playinfo in playinfos:
quality = playinfo['data']['quality']
format_id = self.stream_qualities[quality]['id']
container = self.stream_qualities[quality]['container'].lower()
desc = self.stream_qualities[quality]['desc']
bangumi_data = get_bangumi_info(str(ep_info['seasonId']))
bangumi_payment = bangumi_data.get('payment')
if bangumi_payment and bangumi_payment['price'] != '0':
log.w("It's a paid item")
# ep_ids = collect_bangumi_epids(bangumi_data)
if 'durl' in playinfo['data']:
src, size = [], 0
for durl in playinfo['data']['durl']:
src.append(durl['url'])
size += durl['size']
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
index_title = ep_info['indexTitle']
long_title = ep_info['longTitle'].strip()
cid = ep_info['danmaku']
# DASH formats
if 'dash' in playinfo['data']:
audio_size_cache = {}
for video in playinfo['data']['dash']['video']:
# prefer the latter codecs!
s = self.stream_qualities[video['id']]
format_id = 'dash-' + s['id'] # prefix
container = 'mp4' # enforce MP4 container
desc = s['desc']
audio_quality = s['audio_quality']
baseurl = video['baseUrl']
size = self.url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
self.title = '{} [{} {}]'.format(self.title, index_title, long_title)
self.download_by_vid(cid, bangumi=True, **kwargs)
# find matching audio track
audio_baseurl = playinfo['data']['dash']['audio'][0]['baseUrl']
for audio in playinfo['data']['dash']['audio']:
if int(audio['id']) == audio_quality:
audio_baseurl = audio['baseUrl']
break
if not audio_size_cache.get(audio_quality, False):
audio_size_cache[audio_quality] = self.url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
size += audio_size_cache[audio_quality]
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl], [audio_baseurl]], 'size': size}
def check_oversea():
url = 'https://interface.bilibili.com/player?id=cid:17778881'
xml_lines = get_content(url).split('\n')
for line in xml_lines:
key = line.split('>')[0][1:]
if key == 'country':
value = line.split('>')[1].split('<')[0]
if value != '中国':
return True
# get danmaku
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
# bangumi
elif sort == 'bangumi':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
# warn if this bangumi has more than 1 video
epn = len(initial_state['epList'])
if epn > 1 and not kwargs.get('playlist'):
log.w('This bangumi currently has %s videos. (use --playlist to download all videos.)' % epn)
# set video title
self.title = initial_state['h1Title']
# construct playinfos
ep_id = initial_state['epInfo']['id']
avid = initial_state['epInfo']['aid']
cid = initial_state['epInfo']['cid']
playinfos = []
api_url = self.bilibili_bangumi_api(avid, cid, ep_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
else:
return False
return False
log.e(api_playinfo['message'])
return
current_quality = api_playinfo['result']['quality']
# get alternative formats from API
for qn in [80, 64, 32, 16]:
# automatic format for durl: qn=0
# for dash, qn does not matter
if qn != current_quality:
api_url = self.bilibili_bangumi_api(avid, cid, ep_id, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
def check_sid():
if not cookies:
return False
for cookie in cookies:
if cookie.domain == '.bilibili.com' and cookie.name == 'sid':
return True
return False
for playinfo in playinfos:
if 'durl' in playinfo['result']:
quality = playinfo['result']['quality']
format_id = self.stream_qualities[quality]['id']
container = self.stream_qualities[quality]['container'].lower()
desc = self.stream_qualities[quality]['desc']
def fetch_sid(cid, aid):
url = 'http://interface.bilibili.com/player?id=cid:{}&aid={}'.format(cid, aid)
cookies = http.cookiejar.CookieJar()
req = urllib.request.Request(url)
res = urllib.request.urlopen(url)
cookies.extract_cookies(res, req)
for c in cookies:
if c.domain == '.bilibili.com' and c.name == 'sid':
return c.value
raise
src, size = [], 0
for durl in playinfo['result']['durl']:
src.append(durl['url'])
size += durl['size']
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
def collect_bangumi_epids(json_data):
eps = json_data['episodes'][::-1]
return [ep['episode_id'] for ep in eps]
# DASH formats
if 'dash' in playinfo['result']:
for video in playinfo['result']['dash']['video']:
# playinfo['result']['quality'] does not reflect the correct quality of DASH stream
quality = self.height_to_quality(video['height']) # convert height to quality code
s = self.stream_qualities[quality]
format_id = 'dash-' + s['id'] # prefix
container = 'mp4' # enforce MP4 container
desc = s['desc']
audio_quality = s['audio_quality']
baseurl = video['baseUrl']
size = url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
def get_bangumi_info(season_id):
BASE_URL = 'http://bangumi.bilibili.com/jsonp/seasoninfo/'
long_epoch = int(time.time() * 1000)
req_url = BASE_URL + season_id + '.ver?callback=seasonListCallback&jsonp=jsonp&_=' + str(long_epoch)
season_data = get_content(req_url)
season_data = season_data[len('seasonListCallback('):]
season_data = season_data[: -1 * len(');')]
json_data = json.loads(season_data)
return json_data['result']
# find matching audio track
audio_baseurl = playinfo['result']['dash']['audio'][0]['baseUrl']
for audio in playinfo['result']['dash']['audio']:
if int(audio['id']) == audio_quality:
audio_baseurl = audio['baseUrl']
break
size += url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
def get_danmuku_xml(cid):
return get_content('http://comment.bilibili.com/{}.xml'.format(cid))
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl], [audio_baseurl]], 'size': size}
def parse_cid_playurl(xml):
from xml.dom.minidom import parseString
try:
urls_list = []
total_size = 0
doc = parseString(xml.encode('utf-8'))
durls = doc.getElementsByTagName('durl')
cdn_cnt = len(durls[0].getElementsByTagName('url'))
for i in range(cdn_cnt):
urls_list.append([])
for durl in durls:
size = durl.getElementsByTagName('size')[0]
total_size += int(size.firstChild.nodeValue)
cnt = len(durl.getElementsByTagName('url'))
for i in range(cnt):
u = durl.getElementsByTagName('url')[i].firstChild.nodeValue
urls_list[i].append(u)
return urls_list, total_size
except Exception as e:
log.w(e)
return [], 0
# get danmaku
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
def bilibili_download_playlist_by_url(url, **kwargs):
url = url_locations([url])[0]
# a bangumi here? possible?
if 'live.bilibili' in url:
site.download_by_url(url)
elif 'bangumi.bilibili' in url:
bangumi_id = re.search(r'(\d+)', url).group(1)
bangumi_data = get_bangumi_info(bangumi_id)
ep_ids = collect_bangumi_epids(bangumi_data)
# vc video
elif sort == 'vc':
video_id = match1(self.url, r'https?://vc\.?bilibili\.com/video/(\d+)')
api_url = self.bilibili_vc_api(video_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo = json.loads(api_content)
# set video title
self.title = '%s (%s)' % (api_playinfo['data']['user']['name'], api_playinfo['data']['item']['id'])
height = api_playinfo['data']['item']['height']
quality = self.height_to_quality(height) # convert height to quality code
s = self.stream_qualities[quality]
format_id = s['id']
container = 'mp4' # enforce MP4 container
desc = s['desc']
playurl = api_playinfo['data']['item']['video_playurl']
size = int(api_playinfo['data']['item']['video_size'])
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': [playurl]}
# live
elif sort == 'live':
m = re.match(r'https?://live\.bilibili\.com/(\w+)', self.url)
short_id = m.group(1)
api_url = self.bilibili_live_room_init_api(short_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
room_init_info = json.loads(api_content)
room_id = room_init_info['data']['room_id']
api_url = self.bilibili_live_room_info_api(room_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
room_info = json.loads(api_content)
# set video title
self.title = room_info['data']['title'] + '.' + str(int(time.time()))
api_url = self.bilibili_live_api(room_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
video_info = json.loads(api_content)
durls = video_info['data']['durl']
playurl = durls[0]['url']
container = 'flv' # enforce FLV container
self.streams['flv'] = {'container': container, 'quality': 'unknown',
'size': 0, 'src': [playurl]}
# audio
elif sort == 'audio':
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/au(\d+)', self.url)
sid = m.group(1)
api_url = self.bilibili_audio_info_api(sid)
api_content = get_content(api_url, headers=self.bilibili_headers())
song_info = json.loads(api_content)
# set audio title
self.title = song_info['data']['title']
# get lyrics
self.lyrics = get_content(song_info['data']['lyric'])
api_url = self.bilibili_audio_api(sid)
api_content = get_content(api_url, headers=self.bilibili_headers())
audio_info = json.loads(api_content)
playurl = audio_info['data']['cdns'][0]
size = audio_info['data']['size']
container = 'mp4' # enforce MP4 container
self.streams['mp4'] = {'container': container,
'size': size, 'src': [playurl]}
def extract(self, **kwargs):
# set UA and referer for downloading
headers = self.bilibili_headers(referer=self.url)
self.ua, self.referer = headers['User-Agent'], headers['Referer']
if not self.streams_sorted:
# no stream is available
return
if 'stream_id' in kwargs and kwargs['stream_id']:
# extract the stream
stream_id = kwargs['stream_id']
if stream_id not in self.streams and stream_id not in self.dash_streams:
log.e('[Error] Invalid video format.')
log.e('Run \'-i\' command with no specific video format to view all available formats.')
exit(2)
else:
# extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
def download_playlist_by_url(self, url, **kwargs):
self.url = url
kwargs['playlist'] = True
html_content = get_content(self.url, headers=self.bilibili_headers())
# sort it out
if re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
sort = 'bangumi'
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
sort = 'bangumi'
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/media/md(\d+)', self.url) or \
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)', self.url):
sort = 'bangumi_md'
elif re.match(r'https?://(www\.)?bilibili\.com/video/av(\d+)', self.url):
sort = 'video'
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url):
sort = 'space_channel'
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url):
sort = 'space_favlist'
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url):
sort = 'space_video'
elif re.match(r'https?://(www\.)?bilibili\.com/audio/am(\d+)', self.url):
sort = 'audio_menu'
else:
log.e('[Error] Unsupported URL pattern.')
exit(1)
# regular av video
if sort == 'video':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
aid = initial_state['videoData']['aid']
pn = initial_state['videoData']['videos']
for pi in range(1, pn + 1):
purl = 'https://www.bilibili.com/video/av%s?p=%s' % (aid, pi)
self.__class__().download_by_url(purl, **kwargs)
elif sort == 'bangumi':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
epn, i = len(initial_state['epList']), 0
for ep in initial_state['epList']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
ep_id = ep['id']
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
self.__class__().download_by_url(epurl, **kwargs)
elif sort == 'bangumi_md':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
epn, i = len(initial_state['mediaInfo']['episodes']), 0
for ep in initial_state['mediaInfo']['episodes']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
ep_id = ep['ep_id']
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
self.__class__().download_by_url(epurl, **kwargs)
elif sort == 'space_channel':
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url)
mid, cid = m.group(1), m.group(2)
api_url = self.bilibili_space_channel_api(mid, cid)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
channel_info = json.loads(api_content)
# TBD: channel of more than 100 videos
epn, i = len(channel_info['data']['list']['archives']), 0
for video in channel_info['data']['list']['archives']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
url = 'https://www.bilibili.com/video/av%s' % video['aid']
self.__class__().download_playlist_by_url(url, **kwargs)
elif sort == 'space_favlist':
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url)
vmid, fid = m.group(1), m.group(2)
api_url = self.bilibili_space_favlist_api(vmid, fid)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
favlist_info = json.loads(api_content)
pc = favlist_info['data']['pagecount']
for pn in range(1, pc + 1):
api_url = self.bilibili_space_favlist_api(vmid, fid, pn=pn)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
favlist_info = json.loads(api_content)
epn, i = len(favlist_info['data']['archives']), 0
for video in favlist_info['data']['archives']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
url = 'https://www.bilibili.com/video/av%s' % video['aid']
self.__class__().download_playlist_by_url(url, **kwargs)
elif sort == 'space_video':
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url)
mid = m.group(1)
api_url = self.bilibili_space_video_api(mid)
api_content = get_content(api_url, headers=self.bilibili_headers())
videos_info = json.loads(api_content)
pc = videos_info['data']['pages']
for pn in range(1, pc + 1):
api_url = self.bilibili_space_video_api(mid, pn=pn)
api_content = get_content(api_url, headers=self.bilibili_headers())
videos_info = json.loads(api_content)
epn, i = len(videos_info['data']['vlist']), 0
for video in videos_info['data']['vlist']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
url = 'https://www.bilibili.com/video/av%s' % video['aid']
self.__class__().download_playlist_by_url(url, **kwargs)
elif sort == 'audio_menu':
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/am(\d+)', self.url)
sid = m.group(1)
#api_url = self.bilibili_audio_menu_info_api(sid)
#api_content = get_content(api_url, headers=self.bilibili_headers())
#menu_info = json.loads(api_content)
api_url = self.bilibili_audio_menu_song_api(sid)
api_content = get_content(api_url, headers=self.bilibili_headers())
menusong_info = json.loads(api_content)
epn, i = len(menusong_info['data']['data']), 0
for song in menusong_info['data']['data']:
i += 1; log.w('Extracting %s of %s songs ...' % (i, epn))
url = 'https://www.bilibili.com/audio/au%s' % song['id']
self.__class__().download_by_url(url, **kwargs)
base_url = url.split('#')[0]
for ep_id in ep_ids:
ep_url = '#'.join([base_url, ep_id])
Bilibili().download_by_url(ep_url, **kwargs)
else:
aid = re.search(r'av(\d+)', url).group(1)
page_list = json.loads(get_content('http://www.bilibili.com/widget/getPageList?aid={}'.format(aid)))
page_cnt = len(page_list)
for no in range(1, page_cnt+1):
page_url = 'http://www.bilibili.com/video/av{}/index_{}.html'.format(aid, no)
subtitle = page_list[no-1]['pagename']
Bilibili().download_by_url(page_url, subtitle=subtitle, **kwargs)
site = Bilibili()
download = site.download_by_url
download_playlist = bilibili_download_playlist_by_url
download_playlist = site.download_playlist_by_url
bilibili_download = download

View File

@ -25,10 +25,10 @@ def coub_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
loop_file_path = get_loop_file_path(title, output_dir)
single_file_path = audio_file_path
if audio_duration > video_duration:
write_loop_file(int(audio_duration / video_duration), loop_file_path, video_file_name)
write_loop_file(round(audio_duration / video_duration), loop_file_path, video_file_name)
else:
single_file_path = audio_file_path
write_loop_file(int(video_duration / audio_duration), loop_file_path, audio_file_name)
write_loop_file(round(video_duration / audio_duration), loop_file_path, audio_file_name)
ffmpeg.ffmpeg_concat_audio_and_video([loop_file_path, single_file_path], title + "_full", "mp4")
cleanup_files([video_file_path, audio_file_path, loop_file_path])

View File

@ -1,89 +0,0 @@
#!/usr/bin/env python
__all__ = ['dilidili_download']
from ..common import *
from .ckplayer import ckplayer_download
headers = {
'DNT': '1',
'Accept-Encoding': 'gzip, deflate, sdch, br',
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control': 'max-age=0',
'Referer': 'http://www.dilidili.com/',
'Connection': 'keep-alive',
'Save-Data': 'on',
}
#----------------------------------------------------------------------
def dilidili_parser_data_to_stream_types(typ ,vid ,hd2 ,sign, tmsign, ulk):
"""->list"""
another_url = 'https://newplayer.jfrft.com/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
html = get_content(another_url, headers=headers)
info = re.search(r'(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})', html).groups()
info = [i.strip('{}').split('->') for i in info]
info = {i[0]: i [1] for i in info}
stream_types = []
for i in zip(info['deft'].split('|'), info['defa'].split('|')):
stream_types.append({'id': str(i[1][-1]), 'container': 'mp4', 'video_profile': i[0]})
return stream_types
#----------------------------------------------------------------------
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
global headers
re_str = r'http://www.dilidili.com/watch\S+'
if re.match(r'http://www.dilidili.wang', url):
re_str = r'http://www.dilidili.wang/watch\S+'
headers['Referer'] = 'http://www.dilidili.wang/'
elif re.match(r'http://www.dilidili.mobi', url):
re_str = r'http://www.dilidili.mobi/watch\S+'
headers['Referer'] = 'http://www.dilidili.mobi/'
if re.match(re_str, url):
html = get_content(url)
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
# player loaded via internal iframe
frame_url = re.search(r'<iframe src=\"(.+?)\"', html).group(1)
logging.debug('dilidili_download: %s' % frame_url)
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
html = get_content(frame_url, headers=headers, decoded=False).decode('utf-8')
match = re.search(r'(.+?)var video =(.+?);', html)
vid = match1(html, r'var vid="(.+)"')
hd2 = match1(html, r'var hd2="(.+)"')
typ = match1(html, r'var typ="(.+)"')
sign = match1(html, r'var sign="(.+)"')
tmsign = match1(html, r'tmsign=([A-Za-z0-9]+)')
ulk = match1(html, r'var ulk="(.+)"')
# here s the parser...
stream_types = dilidili_parser_data_to_stream_types(typ, vid, hd2, sign, tmsign, ulk)
#get best
best_id = max([i['id'] for i in stream_types])
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = best_id, sign = sign, tmsign = tmsign, ulk = ulk)
another_url = 'https://newplayer.jfrft.com/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
ckplayer_download(another_url, output_dir, merge, info_only, is_xml = True, title = title, headers = headers)
#type_ = ''
#size = 0
#type_, ext, size = url_info(url)
#print_info(site_info, title, type_, size)
#if not info_only:
#download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
site_info = "dilidili"
download = dilidili_download
download_playlist = playlist_not_supported('dilidili')

View File

@ -7,6 +7,7 @@ from ..common import (
url_size,
print_info,
get_content,
fake_headers,
download_urls,
playlist_not_supported,
)
@ -16,13 +17,19 @@ __all__ = ['douyin_download_by_url']
def douyin_download_by_url(url, **kwargs):
page_content = get_content(url)
page_content = get_content(url, headers=fake_headers)
match_rule = re.compile(r'var data = \[(.*?)\];')
video_info = json.loads(match_rule.findall(page_content)[0])
video_url = video_info['video']['play_addr']['url_list'][0]
title = video_info['cha_list'][0]['cha_name']
# fix: https://www.douyin.com/share/video/6553248251821165832
# if there is no title, use desc
cha_list = video_info['cha_list']
if cha_list:
title = cha_list[0]['cha_name']
else:
title = video_info['desc']
video_format = 'mp4'
size = url_size(video_url)
size = url_size(video_url, faker=True)
print_info(
site_info='douyin.com', title=title,
type=video_format, size=size
@ -30,6 +37,7 @@ def douyin_download_by_url(url, **kwargs):
if not kwargs['info_only']:
download_urls(
urls=[video_url], title=title, ext=video_format, total_size=size,
faker=True,
**kwargs
)

View File

@ -9,6 +9,10 @@ import hashlib
import time
import re
headers = {
'user-agent': 'Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4'
}
def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
ep = 'http://vmobile.douyu.com/video/getInfo?vid='
patt = r'show/([0-9A-Za-z]+)'
@ -19,7 +23,7 @@ def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **k
log.wtf('Unknown url pattern')
vid = hit.group(1)
page = get_content(url)
page = get_content(url, headers=headers)
hit = re.search(title_patt, page)
if hit is None:
title = vid
@ -35,21 +39,18 @@ def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **k
urls = general_m3u8_extractor(m3u8_url)
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
def douyutv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if 'v.douyu.com/show/' in url:
douyutv_video_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return
headers = {
'user-agent': 'Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4'
}
url = re.sub(r'[w.]*douyu.com','m.douyu.com',url)
url = re.sub(r'.*douyu.com','https://m.douyu.com/room', url)
html = get_content(url, headers)
room_id_patt = r'room_id\s*:\s*(\d+),'
room_id_patt = r'"rid"\s*:\s*(\d+),'
room_id = match1(html, room_id_patt)
if room_id == "0":
room_id = url[url.rfind('/')+1:]
room_id = url[url.rfind('/') + 1:]
api_url = "http://www.douyutv.com/api/v1/"
args = "room/%s?aid=wp&client_sys=wp&time=%d" % (room_id, int(time.time()))
@ -60,20 +61,21 @@ def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **k
content = get_content(json_request_url, headers)
json_content = json.loads(content)
data = json_content['data']
server_status = json_content.get('error',0)
if server_status is not 0:
server_status = json_content.get('error', 0)
if server_status != 0:
raise ValueError("Server returned error:%s" % server_status)
title = data.get('room_name')
show_status = data.get('show_status')
if show_status is not "1":
if show_status != "1":
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
real_url = data.get('rtmp_url') + '/' + data.get('rtmp_live')
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir = output_dir, merge = merge)
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
site_info = "douyu.com"
download = douyutv_download

View File

@ -67,7 +67,7 @@ bokecc_patterns = [r'bokecc\.com/flash/pocle/player\.swf\?siteid=(.+?)&vid=(.{32
recur_limit = 3
def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwargs):
def embed_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
content = get_content(url, headers=fake_headers)
found = False
title = match1(content, '<title>([^<>]+)</title>')
@ -75,43 +75,43 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
vids = matchall(content, youku_embed_patterns)
for vid in set(vids):
found = True
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
vids = matchall(content, tudou_embed_patterns)
for vid in set(vids):
found = True
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
vids = matchall(content, yinyuetai_embed_patterns)
for vid in vids:
found = True
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
vids = matchall(content, iqiyi_embed_patterns)
for vid in vids:
found = True
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only)
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
urls = matchall(content, netease_embed_patterns)
for url in urls:
found = True
netease_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
netease_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
urls = matchall(content, vimeo_embed_patters)
for url in urls:
found = True
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only, referer=url)
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only, referer=url, **kwargs)
urls = matchall(content, dailymotion_embed_patterns)
for url in urls:
found = True
dailymotion_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
dailymotion_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
aids = matchall(content, bilibili_embed_patterns)
for aid in aids:
found = True
url = 'http://www.bilibili.com/video/av%s/' % aid
bilibili_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
bilibili_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
iqiyi_urls = matchall(content, iqiyi_patterns)
for url in iqiyi_urls:
@ -133,7 +133,7 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
r = 1
else:
r += 1
iframes = matchall(content, [r'<iframe.+?src=(?:\"|\')(.+?)(?:\"|\')'])
iframes = matchall(content, [r'<iframe.+?src=(?:\"|\')(.*?)(?:\"|\')'])
for iframe in iframes:
if not iframe.startswith('http'):
src = urllib.parse.urljoin(url, iframe)

View File

@ -6,6 +6,7 @@ from ..common import *
import json
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
url = re.sub(r'//.*?facebook.com','//facebook.com',url)
html = get_html(url)
title = r1(r'<title id="pageTitle">(.+)</title>', html)

View File

@ -1,54 +0,0 @@
#!/usr/bin/env python
__all__ = ['fantasy_download']
from ..common import *
import json
import random
from urllib.parse import urlparse, parse_qs
def fantasy_download_by_id_channelId(id = 0, channelId = 0, output_dir = '.', merge = True, info_only = False,
**kwargs):
api_url = 'http://www.fantasy.tv/tv/playDetails.action?' \
'myChannelId=1&id={id}&channelId={channelId}&t={t}'.format(id = id,
channelId = channelId,
t = str(random.random())
)
html = get_content(api_url)
html = json.loads(html)
if int(html['status']) != 100000:
raise Exception('API error!')
title = html['data']['tv']['title']
video_url = html['data']['tv']['videoPath']
headers = fake_headers.copy()
headers['Referer'] = api_url
type, ext, size = url_info(video_url, headers=headers)
print_info(site_info, title, type, size)
if not info_only:
download_urls([video_url], title, ext, size, output_dir, merge = merge, headers = headers)
def fantasy_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
if 'fantasy.tv' not in url:
raise Exception('Wrong place!')
q = parse_qs(urlparse(url).query)
if 'tvId' not in q or 'channelId' not in q:
raise Exception('No enough arguments!')
tvId = q['tvId'][0]
channelId = q['channelId'][0]
fantasy_download_by_id_channelId(id = tvId, channelId = channelId, output_dir = output_dir, merge = merge,
info_only = info_only, **kwargs)
site_info = "fantasy.tv"
download = fantasy_download
download_playlist = playlist_not_supported('fantasy.tv')

View File

@ -74,7 +74,7 @@ def get_api_key(page):
# this happens only when the url points to a gallery page
# that contains no inline api_key(and never makes xhr api calls)
# in fact this might be a better approch for getting a temporary api key
# since there's no place for a user to add custom infomation that may
# since there's no place for a user to add custom information that may
# misguide the regex in the homepage
if not match:
return match1(get_html('https://flickr.com'), pattern_inline_api_key)

View File

@ -59,7 +59,7 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
u = '/'.join(t)
real_urls.append(u)
if not real_urls:
real_urls = [r1(r'<meta property="og:image" content="([^"]+)', html)]
real_urls = re.findall(r'<meta property="og:image" content="([^"]+)', html)
real_urls = [re.sub(r'w\d+-h\d+-p', 's0', u) for u in real_urls]
post_date = r1(r'"?(20\d\d[-/]?[01]\d[-/]?[0123]\d)"?', html)
post_id = r1(r'/posts/([^"]+)', html)

View File

@ -1,85 +0,0 @@
#!/usr/bin/env python
import json
import os
import re
import math
import traceback
import urllib.parse as urlparse
from ..common import *
__all__ = ['huaban_download']
site_info = '花瓣 (Huaban)'
LIMIT = 100
class Board:
def __init__(self, title, pins):
self.title = title
self.pins = pins
self.pin_count = len(pins)
class Pin:
host = 'http://img.hb.aicdn.com/'
def __init__(self, pin_json):
img_file = pin_json['file']
self.id = str(pin_json['pin_id'])
self.url = urlparse.urljoin(self.host, img_file['key'])
self.ext = img_file['type'].split('/')[-1]
def construct_url(url, **params):
param_str = urlparse.urlencode(params)
return url + '?' + param_str
def extract_json_data(url, **params):
url = construct_url(url, **params)
html = get_content(url, headers=fake_headers)
json_string = match1(html, r'app.page\["board"\] = (.*?});')
json_data = json.loads(json_string)
return json_data
def extract_board_data(url):
json_data = extract_json_data(url, limit=LIMIT)
pin_list = json_data['pins']
title = json_data['title']
pin_count = json_data['pin_count']
pin_count -= len(pin_list)
while pin_count > 0:
json_data = extract_json_data(url, max=pin_list[-1]['pin_id'],
limit=LIMIT)
pins = json_data['pins']
pin_list += pins
pin_count -= len(pins)
return Board(title, list(map(Pin, pin_list)))
def huaban_download_board(url, output_dir, **kwargs):
kwargs['merge'] = False
board = extract_board_data(url)
output_dir = os.path.join(output_dir, board.title)
print_info(site_info, board.title, 'jpg', float('Inf'))
for pin in board.pins:
download_urls([pin.url], pin.id, pin.ext, float('Inf'),
output_dir=output_dir, faker=True, **kwargs)
def huaban_download(url, output_dir='.', **kwargs):
if re.match(r'http://huaban\.com/boards/\d+/', url):
huaban_download_board(url, output_dir, **kwargs)
else:
print('Only board (画板) pages are supported currently')
print('ex: http://huaban.com/boards/12345678/')
download = huaban_download
download_playlist = playlist_not_supported("huaban")

View File

@ -110,7 +110,7 @@ def icourses_playlist_download(url, output_dir='.', **kwargs):
video_list = re.findall(resid_courseid_patt, page)
if not video_list:
raise Exception('Unkown url pattern')
raise Exception('Unknown url pattern')
for video in video_list:
video_url = change_for_video_ip.format(video[0], video[1])

View File

@ -27,8 +27,11 @@ def instagram_download(url, output_dir='.', merge=True, info_only=False, **kwarg
for edge in edges:
title = edge['node']['shortcode']
image_url = edge['node']['display_url']
ext = image_url.split('.')[-1]
if 'video_url' in edge['node']:
image_url = edge['node']['video_url']
ext = image_url.split('?')[0].split('.')[-1]
size = int(get_head(image_url)['Content-Length'])
print_info(site_info, title, ext, size)
if not info_only:
download_urls(urls=[image_url],
@ -39,8 +42,11 @@ def instagram_download(url, output_dir='.', merge=True, info_only=False, **kwarg
else:
title = info['entry_data']['PostPage'][0]['graphql']['shortcode_media']['shortcode']
image_url = info['entry_data']['PostPage'][0]['graphql']['shortcode_media']['display_url']
ext = image_url.split('.')[-1]
if 'video_url' in info['entry_data']['PostPage'][0]['graphql']['shortcode_media']:
image_url =info['entry_data']['PostPage'][0]['graphql']['shortcode_media']['video_url']
ext = image_url.split('?')[0].split('.')[-1]
size = int(get_head(image_url)['Content-Length'])
print_info(site_info, title, ext, size)
if not info_only:
download_urls(urls=[image_url],

View File

@ -136,12 +136,9 @@ class Iqiyi(VideoExtractor):
r1(r'vid=([^&]+)', self.url) or \
r1(r'data-player-videoid="([^"]+)"', html) or r1(r'vid=(.+?)\&', html) or r1(r'param\[\'vid\'\]\s*=\s*"(.+?)"', html)
self.vid = (tvid, videoid)
info_u = 'http://mixer.video.iqiyi.com/jp/mixin/videos/' + tvid
mixin = get_content(info_u)
mixin_json = json.loads(mixin[len('var tvInfoJs='):])
real_u = mixin_json['url']
real_html = get_content(real_u)
self.title = match1(real_html, '<title>([^<]+)').split('-')[0]
info_u = 'http://pcw-api.iqiyi.com/video/video/playervideoinfo?tvid=' + tvid
json_res = get_content(info_u)
self.title = json.loads(json_res)['data']['vn']
tvid, videoid = self.vid
info = getVMS(tvid, videoid)
assert info['code'] == 'A00000', "can't play this video"

View File

@ -17,20 +17,20 @@ headers = {
def iwara_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
global headers
video_hash=match1(url, r'http://\w+.iwara.tv/videos/(\w+)')
video_url=match1(url, r'(http://\w+.iwara.tv)/videos/\w+')
html = get_content(url,headers=headers)
video_hash = match1(url, r'https?://\w+.iwara.tv/videos/(\w+)')
video_url = match1(url, r'(https?://\w+.iwara.tv)/videos/\w+')
html = get_content(url, headers=headers)
title = r1(r'<title>(.*)</title>', html)
api_url=video_url+'/api/video/'+video_hash
content=get_content(api_url,headers=headers)
data=json.loads(content)
type,ext,size=url_info(data[0]['uri'], headers=headers)
down_urls=data[0]['uri']
print_info(down_urls,title+data[0]['resolution'],type,size)
api_url = video_url + '/api/video/' + video_hash
content = get_content(api_url, headers=headers)
data = json.loads(content)
down_urls = 'https:' + data[0]['uri']
type, ext, size = url_info(down_urls, headers=headers)
print_info(site_info, title+data[0]['resolution'], type, size)
if not info_only:
download_urls([down_urls], title, ext, size, output_dir, merge = merge,headers=headers)
download_urls([down_urls], title, ext, size, output_dir, merge=merge, headers=headers)
site_info = "iwara"
site_info = "Iwara"
download = iwara_download
download_playlist = playlist_not_supported('iwara')

View File

@ -1,85 +1,132 @@
#!/usr/bin/env python
__all__ = ['ixigua_download', 'ixigua_download_playlist']
import base64
import random
import binascii
from ..common import *
import random
import ctypes
from json import loads
def get_video_id(text):
re_id = r"videoId: '(.*?)'"
return re.findall(re_id, text)[0]
__all__ = ['ixigua_download', 'ixigua_download_playlist_by_url']
def get_r():
return str(random.random())[2:]
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 "
"Safari/537.36",
}
def right_shift(val, n):
return val >> n if val >= 0 else (val + 0x100000000) >> n
def get_s(text):
"""get video info"""
id = get_video_id(text)
p = get_r()
url = 'http://i.snssdk.com/video/urls/v/1/toutiao/mp4/%s' % id
n = parse.urlparse(url).path + '?r=%s' % p
c = binascii.crc32(n.encode('utf-8'))
s = right_shift(c, 0)
title = ''.join(re.findall(r"title: '(.*?)',", text))
return url + '?r=%s&s=%s' % (p, s), title
def int_overflow(val):
maxint = 2147483647
if not -maxint - 1 <= val <= maxint:
val = (val + (maxint + 1)) % (2 * (maxint + 1)) - maxint - 1
return val
def get_moment(url, user_id, base_url, video_list):
"""Recursively obtaining a video list"""
video_list_data = json.loads(get_content(url))
if not video_list_data['next']['max_behot_time']:
return video_list
[video_list.append(i["display_url"]) for i in video_list_data["data"]]
max_behot_time = video_list_data['next']['max_behot_time']
_param = {
'user_id': user_id,
'base_url': base_url,
'video_list': video_list,
'url': base_url.format(user_id=user_id, max_behot_time=max_behot_time),
}
return get_moment(**_param)
def ixigua_download(url, output_dir='.', info_only=False, **kwargs):
""" Download a single video
Sample URL: https://www.ixigua.com/a6487187567887254029/#mid=59051127876
"""
try:
video_info_url, title = get_s(get_content(url))
video_info = json.loads(get_content(video_info_url))
except Exception:
raise NotImplementedError(url)
try:
video_url = base64.b64decode(video_info["data"]["video_list"]["video_1"]["main_url"]).decode()
except Exception:
raise NotImplementedError(url)
filetype, ext, size = url_info(video_url)
print_info(site_info, title, filetype, size)
def unsigned_right_shitf(n, i):
if n < 0:
n = ctypes.c_uint32(n).value
if i < 0:
return -int_overflow(n << abs(i))
return int_overflow(n >> i)
def get_video_url_from_video_id(video_id):
"""Splicing URLs according to video ID to get video details"""
# from js
data = [""] * 256
for index, _ in enumerate(data):
t = index
for i in range(8):
t = -306674912 ^ unsigned_right_shitf(t, 1) if 1 & t else unsigned_right_shitf(t, 1)
data[index] = t
def tmp():
rand_num = random.random()
path = "/video/urls/v/1/toutiao/mp4/{video_id}?r={random_num}".format(video_id=video_id,
random_num=str(rand_num)[2:])
e = o = r = -1
i, a = 0, len(path)
while i < a:
e = ord(path[i])
i += 1
if e < 128:
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ e)]
else:
if e < 2048:
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (192 | e >> 6 & 31))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
else:
if 55296 <= e < 57344:
e = (1023 & e) + 64
i += 1
o = 1023 & t.url(i)
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (240 | e >> 8 & 7))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 2 & 63))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | o >> 6 & 15 | (3 & e) << 4))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & o))]
else:
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (224 | e >> 12 & 15))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 6 & 63))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
return "https://ib.365yg.com{path}&s={param}".format(path=path, param=unsigned_right_shitf(r ^ -1, 0))
while 1:
url = tmp()
if url.split("=")[-1][0] != "-": # 参数s不能为负数
return url
def ixigua_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
# example url: https://www.ixigua.com/i6631065141750268420/#mid=63024814422
html = get_html(url, faker=True)
video_id = match1(html, r"\"vid\":\"([^\"]+)")
title = match1(html, r"\"title\":\"(\S+?)\",")
if not video_id:
log.e("video_id not found, url:{}".format(url))
return
video_info_url = get_video_url_from_video_id(video_id)
video_info = loads(get_content(video_info_url))
if video_info.get("code", 1) != 0:
log.e("Get video info from {} error: server return code {}".format(video_info_url, video_info.get("code", 1)))
return
if not video_info.get("data", None):
log.e("Get video info from {} error: The server returns JSON value"
" without data or data is empty".format(video_info_url))
return
if not video_info["data"].get("video_list", None):
log.e("Get video info from {} error: The server returns JSON value"
" without data.video_list or data.video_list is empty".format(video_info_url))
return
if not video_info["data"]["video_list"].get("video_1", None):
log.e("Get video info from {} error: The server returns JSON value"
" without data.video_list.video_1 or data.video_list.video_1 is empty".format(video_info_url))
return
size = int(video_info["data"]["video_list"]["video_1"]["size"])
print_info(site_info=site_info, title=title, type="mp4", size=size) # 该网站只有mp4类型文件
if not info_only:
download_urls([video_url], title, ext, size, output_dir=output_dir)
video_url = base64.b64decode(video_info["data"]["video_list"]["video_1"]["main_url"].encode("utf-8"))
download_urls([video_url.decode("utf-8")], title, "mp4", size, output_dir, merge=merge, headers=headers, **kwargs)
def ixigua_download_playlist_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
assert "user" in url, "Only support users to publish video list,Please provide a similar url:" \
"https://www.ixigua.com/c/user/6907091136/"
user_id = url.split("/")[-2] if url[-1] == "/" else url.split("/")[-1]
params = {"max_behot_time": "0", "max_repin_time": "0", "count": "20", "page_type": "0", "user_id": user_id}
while 1:
url = "https://www.ixigua.com/c/user/article/?" + "&".join(["{}={}".format(k, v) for k, v in params.items()])
video_list = loads(get_content(url, headers=headers))
params["max_behot_time"] = video_list["next"]["max_behot_time"]
for video in video_list["data"]:
ixigua_download("https://www.ixigua.com/i{}/".format(video["item_id"]), output_dir, merge, info_only,
**kwargs)
if video_list["next"]["max_behot_time"] == 0:
break
def ixigua_download_playlist(url, output_dir='.', info_only=False, **kwargs):
"""Download all video from the user's video list
Sample URL: https://www.ixigua.com/c/user/71141690831/
"""
if 'user' not in url:
raise NotImplementedError(url)
user_id = url.split('/')[-2]
max_behot_time = 0
if not user_id:
raise NotImplementedError(url)
base_url = "https://www.ixigua.com/c/user/article/?user_id={user_id}" \
"&max_behot_time={max_behot_time}&max_repin_time=0&count=20&page_type=0"
_param = {
'user_id': user_id,
'base_url': base_url,
'video_list': [],
'url': base_url.format(user_id=user_id, max_behot_time=max_behot_time),
}
for i in get_moment(**_param):
ixigua_download(i, output_dir, info_only, **kwargs)
site_info = "ixigua.com"
download = ixigua_download
download_playlist = ixigua_download_playlist
download_playlist = ixigua_download_playlist_by_url

View File

@ -16,11 +16,14 @@ def kuaishou_download_by_url(url, info_only=False, **kwargs):
# size = video_list[-1]['size']
# result wrong size
try:
og_video_url = re.search(r"<meta\s+property=\"og:video:url\"\s+content=\"(.+?)\"/>", page).group(1)
video_url = og_video_url
title = url.split('/')[-1]
search_result=re.search(r"\"playUrls\":\[(\{\"quality\"\:\"\w+\",\"url\":\".*?\"\})+\]", page)
all_video_info_str = search_result.group(1)
all_video_infos=re.findall(r"\{\"quality\"\:\"(\w+)\",\"url\":\"(.*?)\"\}", all_video_info_str)
# get the one of the best quality
video_url = all_video_infos[0][1].encode("utf-8").decode('unicode-escape')
title = re.search(r"<meta charset=UTF-8><title>(.*?)</title>", page).group(1)
size = url_size(video_url)
video_format = video_url.split('.')[-1]
video_format = "flv"#video_url.split('.')[-1]
print_info(site_info, title, video_format, size)
if not info_only:
download_urls([video_url], title, video_format, size, **kwargs)

View File

@ -8,46 +8,88 @@ from base64 import b64decode
import re
import hashlib
def kugou_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
if url.lower().find("5sing")!=-1:
#for 5sing.kugou.com
html=get_html(url)
ticket=r1(r'"ticket":\s*"(.*)"',html)
j=loads(str(b64decode(ticket),encoding="utf-8"))
url=j['file']
title=j['songName']
if url.lower().find("5sing") != -1:
# for 5sing.kugou.com
html = get_html(url)
ticket = r1(r'"ticket":\s*"(.*)"', html)
j = loads(str(b64decode(ticket), encoding="utf-8"))
url = j['file']
title = j['songName']
songtype, ext, size = url_info(url)
print_info(site_info, title, songtype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
elif url.lower().find("hash") != -1:
return kugou_download_by_hash(url, output_dir, merge, info_only)
else:
#for the www.kugou.com/
# for the www.kugou.com/
return kugou_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
# raise NotImplementedError(url)
def kugou_download_by_hash(title,hash_val,output_dir = '.', merge = True, info_only = False):
#sample
#url_sample:http://www.kugou.com/yy/album/single/536957.html
#hash ->key md5(hash+kgcloud")->key decompile swf
#cmd 4 for mp3 cmd 3 for m4a
key=hashlib.new('md5',(hash_val+"kgcloud").encode("utf-8")).hexdigest()
html=get_html("http://trackercdn.kugou.com/i/?pid=6&key=%s&acceptMp3=1&cmd=4&hash=%s"%(key,hash_val))
j=loads(html)
url=j['url']
def kugou_download_by_hash(url, output_dir='.', merge=True, info_only=False):
# sample
# url_sample:http://www.kugou.com/song/#hash=93F7D2FC6E95424739448218B591AEAF&album_id=9019462
hash_val = match1(url, 'hash=(\w+)')
album_id = match1(url, 'album_id=(\d+)')
if not album_id:
album_id = 123
html = get_html("http://www.kugou.com/yy/index.php?r=play/getdata&hash={}&album_id={}&mid=123".format(hash_val, album_id))
j = loads(html)
url = j['data']['play_url']
title = j['data']['audio_name']
# some songs cann't play because of copyright protection
if (url == ''):
return
songtype, ext, size = url_info(url)
print_info(site_info, title, songtype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
def kugou_download_playlist(url, output_dir = '.', merge = True, info_only = False, **kwargs):
html=get_html(url)
pattern=re.compile('title="(.*?)".* data="(\w*)\|.*?"')
pairs=pattern.findall(html)
for title,hash_val in pairs:
kugou_download_by_hash(title,hash_val,output_dir,merge,info_only)
def kugou_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
urls = []
# download music leaderboard
# sample: http://www.kugou.com/yy/html/rank.html
if url.lower().find('rank') != -1:
html = get_html(url)
pattern = re.compile('<a href="(http://.*?)" data-active=')
res = pattern.findall(html)
for song in res:
res = get_html(song)
pattern_url = re.compile('"hash":"(\w+)".*"album_id":(\d)+')
hash_val, album_id = res = pattern_url.findall(res)[0]
if not album_id:
album_id = 123
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (hash_val, album_id))
# download album
# album sample: http://www.kugou.com/yy/album/single/1645030.html
elif url.lower().find('album') != -1:
html = get_html(url)
pattern = re.compile('var data=(\[.*?\]);')
res = pattern.findall(html)[0]
for v in json.loads(res):
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v['hash'], v['album_id']))
# download the playlist
# playlist sample:http://www.kugou.com/yy/special/single/487279.html
else:
html = get_html(url)
pattern = re.compile('data="(\w+)\|(\d+)"')
for v in pattern.findall(html):
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
print('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
# download the list by hash
for url in urls:
kugou_download_by_hash(url, output_dir, merge, info_only)
site_info = "kugou.com"
download = kugou_download
# download_playlist = playlist_not_supported("kugou")
download_playlist=kugou_download_playlist
download_playlist = kugou_download_playlist

View File

@ -2,20 +2,23 @@
__all__ = ['letv_download', 'letvcloud_download', 'letvcloud_download_by_vu']
import json
import base64
import hashlib
import random
import xml.etree.ElementTree as ET
import base64, hashlib, urllib, time, re
import urllib
from ..common import *
#@DEPRECATED
# @DEPRECATED
def get_timestamp():
tn = random.random()
url = 'http://api.letv.com/time?tn={}'.format(tn)
result = get_content(url)
return json.loads(result)['stime']
#@DEPRECATED
# @DEPRECATED
def get_key(t):
for s in range(0, 8):
e = 1 & t
@ -24,42 +27,40 @@ def get_key(t):
t += e
return t ^ 185025305
def calcTimeKey(t):
ror = lambda val, r_bits, : ((val & (2**32-1)) >> r_bits%32) | (val << (32-(r_bits%32)) & (2**32-1))
ror = lambda val, r_bits,: ((val & (2 ** 32 - 1)) >> r_bits % 32) | (val << (32 - (r_bits % 32)) & (2 ** 32 - 1))
magic = 185025305
return ror(t, magic % 17) ^ magic
#return ror(ror(t,773625421%13)^773625421,773625421%17)
# return ror(ror(t,773625421%13)^773625421,773625421%17)
def decode(data):
version = data[0:5]
if version.lower() == b'vc_01':
#get real m3u8
# get real m3u8
loc2 = data[5:]
length = len(loc2)
loc4 = [0]*(2*length)
loc4 = [0] * (2 * length)
for i in range(length):
loc4[2*i] = loc2[i] >> 4
loc4[2*i+1]= loc2[i] & 15;
loc6 = loc4[len(loc4)-11:]+loc4[:len(loc4)-11]
loc7 = [0]*length
loc4[2 * i] = loc2[i] >> 4
loc4[2 * i + 1] = loc2[i] & 15;
loc6 = loc4[len(loc4) - 11:] + loc4[:len(loc4) - 11]
loc7 = [0] * length
for i in range(length):
loc7[i] = (loc6[2 * i] << 4) +loc6[2*i+1]
loc7[i] = (loc6[2 * i] << 4) + loc6[2 * i + 1]
return ''.join([chr(i) for i in loc7])
else:
# directly return
return data
return str(data)
def video_info(vid,**kwargs):
url = 'http://player-pc.le.com/mms/out/video/playJson?id={}&platid=1&splatid=101&format=1&tkey={}&domain=www.le.com&region=cn&source=1000&accesyx=1'.format(vid,calcTimeKey(int(time.time())))
def video_info(vid, **kwargs):
url = 'http://player-pc.le.com/mms/out/video/playJson?id={}&platid=1&splatid=105&format=1&tkey={}&domain=www.le.com&region=cn&source=1000&accesyx=1'.format(vid, calcTimeKey(int(time.time())))
r = get_content(url, decoded=False)
info=json.loads(str(r,"utf-8"))
info = json.loads(str(r, "utf-8"))
info = info['msgs']
stream_id = None
support_stream_id = info["playurl"]["dispatch"].keys()
if "stream_id" in kwargs and kwargs["stream_id"].lower() in support_stream_id:
@ -70,27 +71,28 @@ def video_info(vid,**kwargs):
elif "720p" in support_stream_id:
stream_id = '720p'
else:
stream_id =sorted(support_stream_id,key= lambda i: int(i[1:]))[-1]
stream_id = sorted(support_stream_id, key=lambda i: int(i[1:]))[-1]
url =info["playurl"]["domain"][0]+info["playurl"]["dispatch"][stream_id][0]
url = info["playurl"]["domain"][0] + info["playurl"]["dispatch"][stream_id][0]
uuid = hashlib.sha1(url.encode('utf8')).hexdigest() + '_0'
ext = info["playurl"]["dispatch"][stream_id][1].split('.')[-1]
url = url.replace('tss=0', 'tss=ios')
url+="&m3v=1&termid=1&format=1&hwtype=un&ostype=MacOS10.12.4&p1=1&p2=10&p3=-&expect=3&tn={}&vid={}&uuid={}&sign=letv".format(random.random(), vid, uuid)
url += "&m3v=1&termid=1&format=1&hwtype=un&ostype=MacOS10.12.4&p1=1&p2=10&p3=-&expect=3&tn={}&vid={}&uuid={}&sign=letv".format(random.random(), vid, uuid)
r2=get_content(url,decoded=False)
info2=json.loads(str(r2,"utf-8"))
r2 = get_content(url, decoded=False)
info2 = json.loads(str(r2, "utf-8"))
# hold on ! more things to do
# to decode m3u8 (encoded)
suffix = '&r=' + str(int(time.time() * 1000)) + '&appid=500'
m3u8 = get_content(info2["location"]+suffix,decoded=False)
m3u8 = get_content(info2["location"] + suffix, decoded=False)
m3u8_list = decode(m3u8)
urls = re.findall(r'^[^#][^\r]*',m3u8_list,re.MULTILINE)
return ext,urls
urls = re.findall(r'(http.*?)#', m3u8_list, re.MULTILINE)
return ext, urls
def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,**kwargs):
ext , urls = video_info(vid,**kwargs)
def letv_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
ext, urls = video_info(vid, **kwargs)
size = 0
for i in urls:
_, _, tmp = url_info(i)
@ -100,27 +102,29 @@ def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,
if not info_only:
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
def letvcloud_download_by_vu(vu, uu, title=None, output_dir='.', merge=True, info_only=False):
#ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
#str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
argumet_dict ={'cf' : 'flash', 'format': 'json', 'ran': str(int(time.time())), 'uu': str(uu),'ver': '2.2', 'vu': str(vu), }
sign_key = '2f9d6924b33a165a6d8b5d3d42f4f987' #ALL YOUR BASE ARE BELONG TO US
# ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
# str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
argumet_dict = {'cf': 'flash', 'format': 'json', 'ran': str(int(time.time())), 'uu': str(uu), 'ver': '2.2', 'vu': str(vu), }
sign_key = '2f9d6924b33a165a6d8b5d3d42f4f987' # ALL YOUR BASE ARE BELONG TO US
str2Hash = ''.join([i + argumet_dict[i] for i in sorted(argumet_dict)]) + sign_key
sign = hashlib.md5(str2Hash.encode('utf-8')).hexdigest()
request_info = urllib.request.Request('http://api.letvcloud.com/gpc.php?' + '&'.join([i + '=' + argumet_dict[i] for i in argumet_dict]) + '&sign={sign}'.format(sign = sign))
request_info = urllib.request.Request('http://api.letvcloud.com/gpc.php?' + '&'.join([i + '=' + argumet_dict[i] for i in argumet_dict]) + '&sign={sign}'.format(sign=sign))
response = urllib.request.urlopen(request_info)
data = response.read()
info = json.loads(data.decode('utf-8'))
type_available = []
for video_type in info['data']['video_info']['media']:
type_available.append({'video_url': info['data']['video_info']['media'][video_type]['play_url']['main_url'], 'video_quality': int(info['data']['video_info']['media'][video_type]['play_url']['vtype'])})
urls = [base64.b64decode(sorted(type_available, key = lambda x:x['video_quality'])[-1]['video_url']).decode("utf-8")]
urls = [base64.b64decode(sorted(type_available, key=lambda x: x['video_quality'])[-1]['video_url']).decode("utf-8")]
size = urls_size(urls)
ext = 'mp4'
print_info(site_info, title, ext, size)
if not info_only:
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
qs = parse.urlparse(url).query
vu = match1(qs, r'vu=([\w]+)')
@ -128,7 +132,8 @@ def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
title = "LETV-%s" % vu
letvcloud_download_by_vu(vu, uu, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
def letv_download(url, output_dir='.', merge=True, info_only=False ,**kwargs):
def letv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
url = url_locations([url])[0]
if re.match(r'http://yuntv.letv.com/', url):
letvcloud_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
@ -136,14 +141,15 @@ def letv_download(url, output_dir='.', merge=True, info_only=False ,**kwargs):
html = get_content(url)
vid = match1(url, r'video/(\d+)\.html')
title = match1(html, r'<h2 class="title">([^<]+)</h2>')
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only,**kwargs)
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
else:
html = get_content(url)
vid = match1(url, r'http://www.letv.com/ptv/vplay/(\d+).html') or \
match1(url, r'http://www.le.com/ptv/vplay/(\d+).html') or \
match1(html, r'vid="(\d+)"')
title = match1(html,r'name="irTitle" content="(.*?)"')
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only,**kwargs)
match1(url, r'http://www.le.com/ptv/vplay/(\d+).html') or \
match1(html, r'vid="(\d+)"')
title = match1(html, r'name="irTitle" content="(.*?)"')
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
site_info = "Le.com"
download = letv_download

View File

@ -2,8 +2,17 @@
__all__ = ['lizhi_download']
import json
import datetime
from ..common import *
#
# Worked well but not perfect.
# TODO: add option --format={sd|hd}
#
def get_url(ep):
readable = datetime.datetime.fromtimestamp(int(ep['create_time']) / 1000).strftime('%Y/%m/%d')
return 'http://cdn5.lizhi.fm/audio/{}/{}_hd.mp3'.format(readable, ep['id'])
# radio_id: e.g. 549759 from http://www.lizhi.fm/549759/
#
# Returns a list of tuples (audio_id, title, url) for each episode
@ -23,7 +32,7 @@ def lizhi_extract_playlist_info(radio_id):
# (au_cnt), then handle pagination properly.
api_url = 'http://www.lizhi.fm/api/radio_audios?s=0&l=65535&band=%s' % radio_id
api_response = json.loads(get_content(api_url))
return [(ep['id'], ep['name'], ep['url']) for ep in api_response]
return [(ep['id'], ep['name'], get_url(ep)) for ep in api_response]
def lizhi_download_audio(audio_id, title, url, output_dir='.', info_only=False):
filetype, ext, size = url_info(url)

View File

@ -0,0 +1,74 @@
#!/usr/bin/env python
__all__ = ['longzhu_download']
import json
from ..common import (
get_content,
general_m3u8_extractor,
match1,
print_info,
download_urls,
playlist_not_supported,
)
from ..common import player
def longzhu_download(url, output_dir = '.', merge=True, info_only=False, **kwargs):
web_domain = url.split('/')[2]
if (web_domain == 'star.longzhu.com') or (web_domain == 'y.longzhu.com'):
domain = url.split('/')[3].split('?')[0]
m_url = 'http://m.longzhu.com/{0}'.format(domain)
m_html = get_content(m_url)
room_id_patt = r'var\s*roomId\s*=\s*(\d+);'
room_id = match1(m_html,room_id_patt)
json_url = 'http://liveapi.plu.cn/liveapp/roomstatus?roomId={0}'.format(room_id)
content = get_content(json_url)
data = json.loads(content)
streamUri = data['streamUri']
if len(streamUri) <= 4:
raise ValueError('The live stream is not online!')
title = data['title']
streamer = data['userName']
title = str.format(streamer,': ',title)
steam_api_url = 'http://livestream.plu.cn/live/getlivePlayurl?roomId={0}'.format(room_id)
content = get_content(steam_api_url)
data = json.loads(content)
isonline = data.get('isTransfer')
if isonline == '0':
raise ValueError('The live stream is not online!')
real_url = data['playLines'][0]['urls'][0]['securityUrl']
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge=merge)
elif web_domain == 'replay.longzhu.com':
videoid = match1(url, r'(\d+)$')
json_url = 'http://liveapi.longzhu.com/livereplay/getreplayfordisplay?videoId={0}'.format(videoid)
content = get_content(json_url)
data = json.loads(content)
username = data['userName']
title = data['title']
title = str.format(username,':',title)
real_url = data['videoUrl']
if player:
print_info('Longzhu Video', title, 'm3u8', 0)
download_urls([real_url], title, 'm3u8', 0, output_dir, merge=merge)
else:
urls = general_m3u8_extractor(real_url)
print_info('Longzhu Video', title, 'm3u8', 0)
if not info_only:
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
else:
raise ValueError('Wrong url or unsupported link ... {0}'.format(url))
site_info = 'longzhu.com'
download = longzhu_download
download_playlist = playlist_not_supported('longzhu')

View File

@ -68,7 +68,7 @@ class MGTV(VideoExtractor):
self.title = content['data']['info']['title']
domain = content['data']['stream_domain'][0]
#stream_avalable = [i['name'] for i in content['data']['stream']]
#stream_available = [i['name'] for i in content['data']['stream']]
stream_available = {}
for i in content['data']['stream']:
stream_available[i['name']] = i['url']

View File

@ -2,9 +2,12 @@
__all__ = ['miaopai_download']
import string
import random
from ..common import *
import urllib.error
import urllib.parse
from ..util import fs
fake_headers_mobile = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
@ -20,6 +23,10 @@ def miaopai_download_by_fid(fid, output_dir = '.', merge = False, info_only = Fa
mobile_page = get_content(page_url, headers=fake_headers_mobile)
url = match1(mobile_page, r'<video id=.*?src=[\'"](.*?)[\'"]\W')
if url is None:
wb_mp = re.search(r'<script src=([\'"])(.+?wb_mp\.js)\1>', mobile_page).group(2)
return miaopai_download_by_wbmp(wb_mp, fid, output_dir=output_dir, merge=merge,
info_only=info_only, total_size=None, **kwargs)
title = match1(mobile_page, r'<title>((.|\n)+?)</title>')
if not title:
title = fid
@ -29,9 +36,79 @@ def miaopai_download_by_fid(fid, output_dir = '.', merge = False, info_only = Fa
if not info_only:
download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
#----------------------------------------------------------------------
def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
fid = match1(url, r'\?fid=(\d{4}:\w{32})')
def miaopai_download_by_wbmp(wbmp_url, fid, info_only=False, **kwargs):
headers = {}
headers.update(fake_headers_mobile)
headers['Host'] = 'imgaliyuncdn.miaopai.com'
wbmp = get_content(wbmp_url, headers=headers)
appid = re.search(r'appid:\s*?([^,]+?),', wbmp).group(1)
jsonp = re.search(r'jsonp:\s*?([\'"])(\w+?)\1', wbmp).group(2)
population = [i for i in string.ascii_lowercase] + [i for i in string.digits]
info_url = '{}?{}'.format('http://p.weibo.com/aj_media/info', parse.urlencode({
'appid': appid.strip(),
'fid': fid,
jsonp.strip(): '_jsonp' + ''.join(random.sample(population, 11))
}))
headers['Host'] = 'p.weibo.com'
jsonp_text = get_content(info_url, headers=headers)
jsonp_dict = json.loads(match1(jsonp_text, r'\(({.+})\)'))
if jsonp_dict['code'] != 200:
log.wtf('[Failed] "%s"' % jsonp_dict['msg'])
video_url = jsonp_dict['data']['meta_data'][0]['play_urls']['l']
title = jsonp_dict['data']['description']
title = title.replace('\n', '_')
ext = 'mp4'
headers['Host'] = 'f.us.sinaimg.cn'
print_info(site_info, title, ext, url_info(video_url, headers=headers)[2])
if not info_only:
download_urls([video_url], fs.legitimize(title), ext, headers=headers, **kwargs)
def miaopai_download_story(url, output_dir='.', merge=False, info_only=False, **kwargs):
data_url = 'https://m.weibo.cn/s/video/object?%s' % url.split('?')[1]
data_content = get_content(data_url, headers=fake_headers_mobile)
data = json.loads(data_content)
title = data['data']['object']['summary']
stream_url = data['data']['object']['stream']['url']
ext = 'mp4'
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
if not info_only:
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
def miaopai_download_direct(url, output_dir='.', merge=False, info_only=False, **kwargs):
mobile_page = get_content(url, headers=fake_headers_mobile)
try:
title = re.search(r'([\'"])title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
except:
title = re.search(r'([\'"])status_title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
title = title.replace('\n', '_')
try:
stream_url = re.search(r'([\'"])stream_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
except:
page_url = re.search(r'([\'"])page_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
return miaopai_download_story(page_url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
ext = 'mp4'
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
if not info_only:
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
def miaopai_download(url, output_dir='.', merge=False, info_only=False, **kwargs):
if re.match(r'^http[s]://.*\.weibo\.com/\d+/.+', url):
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
if re.match(r'^http[s]://.*\.weibo\.(com|cn)/s/video/.+', url):
return miaopai_download_story(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
# FIXME!
if re.match(r'^http[s]://.*\.weibo\.com/tv/v/(\w+)', url):
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
fid = match1(url, r'\?fid=(\d{4}:\w+)')
if fid is not None:
miaopai_download_by_fid(fid, output_dir, merge, info_only)
elif '/p/230444' in url:
@ -46,6 +123,7 @@ def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **
escaped_url = hit.group(1)
miaopai_download(urllib.parse.unquote(escaped_url), output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
site_info = "miaopai"
download = miaopai_download
download_playlist = playlist_not_supported('miaopai')

View File

@ -7,31 +7,40 @@ import re
from ..util import log
from ..common import get_content, download_urls, print_info, playlist_not_supported, url_size
from .universal import *
__all__ = ['naver_download_by_url']
def naver_download_by_url(url, info_only=False, **kwargs):
def naver_download_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
ep = 'https://apis.naver.com/rmcnmv/rmcnmv/vod/play/v2.0/{}?key={}'
page = get_content(url)
og_video_url = re.search(r"<meta\s+property=\"og:video:url\"\s+content='(.+?)'>", page).group(1)
params_dict = urllib.parse.parse_qs(urllib.parse.urlparse(og_video_url).query)
vid = params_dict['vid'][0]
key = params_dict['outKey'][0]
meta_str = get_content(ep.format(vid, key))
meta_json = json.loads(meta_str)
if 'errorCode' in meta_json:
log.wtf(meta_json['errorCode'])
title = meta_json['meta']['subject']
videos = meta_json['videos']['list']
video_list = sorted(videos, key=lambda video: video['encodingOption']['width'])
video_url = video_list[-1]['source']
# size = video_list[-1]['size']
# result wrong size
size = url_size(video_url)
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls([video_url], title, 'mp4', size, **kwargs)
try:
temp = re.search(r"<meta\s+property=\"og:video:url\"\s+content='(.+?)'>", page)
if temp is not None:
og_video_url = temp.group(1)
params_dict = urllib.parse.parse_qs(urllib.parse.urlparse(og_video_url).query)
vid = params_dict['vid'][0]
key = params_dict['outKey'][0]
else:
vid = re.search(r"\"videoId\"\s*:\s*\"(.+?)\"", page).group(1)
key = re.search(r"\"inKey\"\s*:\s*\"(.+?)\"", page).group(1)
meta_str = get_content(ep.format(vid, key))
meta_json = json.loads(meta_str)
if 'errorCode' in meta_json:
log.wtf(meta_json['errorCode'])
title = meta_json['meta']['subject']
videos = meta_json['videos']['list']
video_list = sorted(videos, key=lambda video: video['encodingOption']['width'])
video_url = video_list[-1]['source']
# size = video_list[-1]['size']
# result wrong size
size = url_size(video_url)
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls([video_url], title, 'mp4', size, **kwargs)
except:
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
site_info = "naver.com"
download = naver_download_by_url

View File

@ -1,43 +0,0 @@
#!/usr/bin/env python
__all__ = ['panda_download']
from ..common import *
from ..util.log import *
import json
import time
def panda_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
roomid = re.search('/(\d+)', url)
if roomid is None:
log.wtf('Cannot found room id for this url')
roomid = roomid.group(1)
json_request_url ="http://www.panda.tv/api_room_v2?roomid={}&__plat=pc_web&_={}".format(roomid, int(time.time()))
content = get_html(json_request_url)
api_json = json.loads(content)
errno = api_json["errno"]
errmsg = api_json["errmsg"]
if errno:
raise ValueError("Errno : {}, Errmsg : {}".format(errno, errmsg))
data = api_json["data"]
title = data["roominfo"]["name"]
room_key = data["videoinfo"]["room_key"]
plflag = data["videoinfo"]["plflag"].split("_")
status = data["videoinfo"]["status"]
if status is not "2":
raise ValueError("The live stream is not online! (status:%s)" % status)
data2 = json.loads(data["videoinfo"]["plflag_list"])
rid = data2["auth"]["rid"]
sign = data2["auth"]["sign"]
ts = data2["auth"]["time"]
real_url = "http://pl{}.live.panda.tv/live_panda/{}.flv?sign={}&ts={}&rid={}".format(plflag[1], room_key, sign, ts, rid)
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
site_info = "panda.tv"
download = panda_download
download_playlist = playlist_not_supported('panda')

View File

@ -190,16 +190,16 @@ class PPTV(VideoExtractor):
def prepare(self, **kwargs):
if self.url and not self.vid:
if not re.match(r'http://v.pptv.com/show/(\w+)\.html', self.url):
if not re.match(r'https?://v.pptv.com/show/(\w+)\.html', self.url):
raise('Unknown url pattern')
page_content = get_content(self.url)
page_content = get_content(self.url,{"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"})
self.vid = match1(page_content, r'webcfg\s*=\s*{"id":\s*(\d+)')
if not self.vid:
raise('Cannot find id')
api_url = 'http://web-play.pptv.com/webplay3-0-{}.xml'.format(self.vid)
api_url += '?appplt=flp&appid=pptv.flashplayer.vod&appver=3.4.2.28&type=&version=4'
dom = parseString(get_content(api_url))
dom = parseString(get_content(api_url,{"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"}))
self.title, m_items, m_streams, m_segs = parse_pptv_xml(dom)
xml_streams = merge_meta(m_items, m_streams, m_segs)
for stream_id in xml_streams:

View File

@ -58,7 +58,7 @@ class QiE(VideoExtractor):
content = loads(content)
self.title = content['data']['room_name']
rtmp_url = content['data']['rtmp_url']
#stream_avalable = [i['name'] for i in content['data']['stream']]
#stream_available = [i['name'] for i in content['data']['stream']]
stream_available = {}
stream_available['normal'] = rtmp_url + '/' + content['data']['rtmp_live']
if len(content['data']['rtmp_multi_bitrate']) > 0:

View File

@ -2,36 +2,44 @@
__all__ = ['qq_download']
from ..common import *
from ..util.log import *
from .qie import download as qieDownload
from .qie_video import download_by_url as qie_video_download
from urllib.parse import urlparse,parse_qs
from ..common import *
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3.2.19.333&platform=11&defnpayver=1&vid={}'.format(vid)
info = get_content(info_api)
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
# http://v.sports.qq.com/#/cover/t0fqsm1y83r8v5j/a0026nvw5jr https://v.qq.com/x/cover/t0fqsm1y83r8v5j/a0026nvw5jr.html
video_json = None
platforms = [4100201, 11]
for platform in platforms:
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3.2.19.333&platform={}&defnpayver=1&defn=shd&vid={}'.format(platform, vid)
info = get_content(info_api)
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
if not video_json.get('msg')=='cannot play outside':
break
fn_pre = video_json['vl']['vi'][0]['lnk']
title = video_json['vl']['vi'][0]['ti']
host = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
streams = video_json['fl']['fi']
seg_cnt = video_json['vl']['vi'][0]['cl']['fc']
seg_cnt = fc_cnt = video_json['vl']['vi'][0]['cl']['fc']
filename = video_json['vl']['vi'][0]['fn']
if seg_cnt == 0:
seg_cnt = 1
best_quality = streams[-1]['name']
part_format_id = streams[-1]['id']
else:
fn_pre, magic_str, video_type = filename.split('.')
part_urls= []
total_size = 0
for part in range(1, seg_cnt+1):
#if seg_cnt == 1 and video_json['vl']['vi'][0]['vh'] <= 480:
# filename = fn_pre + '.mp4'
#else:
# filename = fn_pre + '.p' + str(part_format_id % 10000) + '.' + str(part) + '.mp4'
filename = fn_pre + '.p' + str(part_format_id % 10000) + '.' + str(part) + '.mp4'
if fc_cnt == 0:
# fix json parsing error
# example:https://v.qq.com/x/page/w0674l9yrrh.html
part_format_id = video_json['vl']['vi'][0]['cl']['keyid'].split('.')[-1]
else:
part_format_id = video_json['vl']['vi'][0]['cl']['ci'][part - 1]['keyid'].split('.')[1]
filename = '.'.join([fn_pre, magic_str, str(part), video_type])
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format={}&vid={}&filename={}&appver=3.2.19.333".format(part_format_id, vid, filename)
part_info = get_content(key_api)
key_json = json.loads(match1(part_info, r'QZOutputJson=(.*)')[:-1])
@ -47,6 +55,9 @@ def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
else:
log.w(key_json['msg'])
break
if key_json.get('filename') is None:
log.w(key_json['msg'])
break
part_urls.append(url)
_, ext, size = url_info(url)
@ -96,6 +107,7 @@ def kg_qq_download_by_shareid(shareid, output_dir='.', info_only=False, caption=
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
""""""
if re.match(r'https?://egame.qq.com/live\?anchorid=(\d+)', url):
from . import qq_egame
qq_egame.qq_egame_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
@ -121,19 +133,15 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
qq_download_by_vid(vid, vid, output_dir, merge, info_only)
return
#do redirect
if 'v.qq.com/page' in url:
# for URLs like this:
# http://v.qq.com/page/k/9/7/k0194pwgw97.html
new_url = url_locations([url])[0]
if url == new_url:
#redirect in js?
content = get_content(url)
url = match1(content,r'window\.location\.href="(.*?)"')
else:
url = new_url
if 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
if 'kuaibao.qq.com/s/' in url:
# https://kuaibao.qq.com/s/20180521V0Z9MH00
nid = match1(url, r'/s/([^/&?#]+)')
content = get_content('https://kuaibao.qq.com/getVideoRelate?id=' + nid)
info_json = json.loads(content)
vid=info_json['videoinfo']['vid']
title=info_json['videoinfo']['title']
elif 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
# http://daxue.qq.com/content/content/id/2321
content = get_content(url)
vid = match1(content, r'vid\s*=\s*"\s*([^"]+)"')
title = match1(content, r'title">([^"]+)</p>')
@ -142,6 +150,11 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
vid = match1(url, r'\bvid=(\w+)')
# for embedded URLs; don't know what the title is
title = vid
elif 'view.inews.qq.com' in url:
# view.inews.qq.com/a/20180521V0Z9MH00
content = get_content(url)
vid = match1(content, r'"vid":"(\w+)"')
title = match1(content, r'"title":"(\w+)"')
else:
content = get_content(url)
#vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
@ -149,6 +162,9 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
vid = ""
if rurl:
vid = rurl.split('/')[-1].split('.')[0]
# https://v.qq.com/x/page/d0552xbadkl.html https://y.qq.com/n/yqq/mv/v/g00268vlkzy.html
if vid == "undefined" or vid == "index":
vid = ""
vid = vid if vid else url.split('/')[-1].split('.')[0] #https://v.qq.com/x/cover/ps6mnfqyrfo7es3/q0181hpdvo5.html?
vid = vid if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
if not vid:
@ -158,6 +174,7 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
title = match1(content, r'"title":"([^"]+)"') if not title else title
title = vid if not title else title #general fallback
qq_download_by_vid(vid, title, output_dir, merge, info_only)
site_info = "QQ.com"

View File

@ -1,28 +0,0 @@
#!/usr/bin/env python
__all__ = ['quanmin_download']
from ..common import *
import json
def quanmin_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
roomid = url.split('/')[3].split('?')[0]
json_request_url = 'http://m.quanmin.tv/json/rooms/{}/noinfo6.json'.format(roomid)
content = get_html(json_request_url)
data = json.loads(content)
title = data["title"]
if not data["play_status"]:
raise ValueError("The live stream is not online!")
real_url = data["live"]["ws"]["flv"]["5"]["src"]
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
site_info = "quanmin.tv"
download = quanmin_download
download_playlist = playlist_not_supported('quanmin')

View File

@ -15,11 +15,13 @@ Changelog:
new api
'''
def real_url(host,vid,tvid,new,clipURL,ck):
url = 'http://'+host+'/?prot=9&prod=flash&pt=1&file='+clipURL+'&new='+new +'&key='+ ck+'&vid='+str(vid)+'&uid='+str(int(time.time()*1000))+'&t='+str(random())+'&rb=1'
return json.loads(get_html(url))['url']
def sohu_download(url, output_dir = '.', merge = True, info_only = False, extractor_proxy=None, **kwargs):
def real_url(fileName, key, ch):
url = "https://data.vod.itc.cn/ip?new=" + fileName + "&num=1&key=" + key + "&ch=" + ch + "&pt=1&pg=2&prod=h5n"
return json.loads(get_html(url))['servers'][0]['url']
def sohu_download(url, output_dir='.', merge=True, info_only=False, extractor_proxy=None, **kwargs):
if re.match(r'http://share.vrs.sohu.com', url):
vid = r1('id=(\d+)', url)
else:
@ -27,16 +29,16 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
vid = r1(r'\Wvid\s*[\:=]\s*[\'"]?(\d+)[\'"]?', html)
assert vid
if re.match(r'http[s]://tv.sohu.com/', url):
if extractor_proxy:
set_proxy(tuple(extractor_proxy.split(":")))
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
for qtyp in ["oriVid","superVid","highVid" ,"norVid","relativeId"]:
if extractor_proxy:
set_proxy(tuple(extractor_proxy.split(":")))
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
if info and info.get("data", ""):
for qtyp in ["oriVid", "superVid", "highVid", "norVid", "relativeId"]:
if 'data' in info:
hqvid = info['data'][qtyp]
else:
hqvid = info[qtyp]
if hqvid != 0 and hqvid != vid :
if hqvid != 0 and hqvid != vid:
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % hqvid))
if not 'allot' in info:
continue
@ -51,9 +53,8 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
title = data['tvName']
size = sum(data['clipsBytes'])
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
clipURL = urlparse(clip).path
urls.append(real_url(host,hqvid,tvid,new,clipURL,ck))
for fileName, key in zip(data['su'], data['ck']):
urls.append(real_url(fileName, key, data['ch']))
# assert data['clipsURL'][0].endswith('.mp4')
else:
@ -64,15 +65,15 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
urls = []
data = info['data']
title = data['tvName']
size = sum(map(int,data['clipsBytes']))
size = sum(map(int, data['clipsBytes']))
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
clipURL = urlparse(clip).path
urls.append(real_url(host,vid,tvid,new,clipURL,ck))
for fileName, key in zip(data['su'], data['ck']):
urls.append(real_url(fileName, key, data['ch']))
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls(urls, title, 'mp4', size, output_dir, refer = url, merge = merge)
download_urls(urls, title, 'mp4', size, output_dir, refer=url, merge=merge)
site_info = "Sohu.com"
download = sohu_download

View File

@ -0,0 +1,21 @@
#!/usr/bin/env python
__all__ = ['tiktok_download']
from ..common import *
def tiktok_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url, faker=True)
title = r1(r'<title.*?>(.*?)</title>', html)
video_id = r1(r'/video/(\d+)', url) or r1(r'musical\?id=(\d+)', html)
title = '%s [%s]' % (title, video_id)
source = r1(r'<video .*?src="([^"]+)"', html)
mime, ext, size = url_info(source)
print_info(site_info, title, mime, size)
if not info_only:
download_urls([source], title, ext, size, output_dir, merge=merge)
site_info = "TikTok.com"
download = tiktok_download
download_playlist = playlist_not_supported('tiktok')

View File

@ -1,27 +1,36 @@
#!/usr/bin/env python
import base64
import binascii
from ..common import *
import random
from json import loads
from urllib.parse import urlparse
from ..common import *
try:
from base64 import decodebytes
except ImportError:
from base64 import decodestring
decodebytes = decodestring
__all__ = ['toutiao_download', ]
def random_with_n_digits(n):
return random.randint(10 ** (n - 1), (10 ** n) - 1)
def sign_video_url(vid):
# some code from http://codecloud.net/110854.html
r = str(random.random())[2:]
r = str(random_with_n_digits(16))
def right_shift(val, n):
return val >> n if val >= 0 else (val + 0x100000000) >> n
url = 'http://i.snssdk.com/video/urls/v/1/toutiao/mp4/%s' % vid
n = url.replace("http://i.snssdk.com", "")+ '?r=' + r
c = binascii.crc32(n.encode("ascii"))
s = right_shift(c, 0)
return url + '?r=%s&s=%s' % (r, s)
url = 'https://ib.365yg.com/video/urls/v/1/toutiao/mp4/{vid}'.format(vid=vid)
n = urlparse(url).path + '?r=' + r
b_n = bytes(n, encoding="utf-8")
s = binascii.crc32(b_n)
aid = 1364
ts = int(time.time() * 1000)
return url + '?r={r}&s={s}&aid={aid}&vfrom=xgplayer&callback=axiosJsonpCallback1&_={ts}'.format(r=r, s=s, aid=aid,
ts=ts)
class ToutiaoVideoInfo(object):
@ -43,12 +52,12 @@ def get_file_by_vid(video_id):
vRet = []
url = sign_video_url(video_id)
ret = get_content(url)
ret = loads(ret)
ret = loads(ret[20:-1])
vlist = ret.get('data').get('video_list')
if len(vlist) > 0:
vInfo = vlist.get(sorted(vlist.keys(), reverse=True)[0])
vUrl = vInfo.get('main_url')
vUrl = base64.decodestring(vUrl.encode('ascii')).decode('ascii')
vUrl = decodebytes(vUrl.encode('ascii')).decode('ascii')
videoInfo = ToutiaoVideoInfo()
videoInfo.bitrate = vInfo.get('bitrate')
videoInfo.definition = vInfo.get('definition')
@ -63,8 +72,8 @@ def get_file_by_vid(video_id):
def toutiao_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url, faker=True)
video_id = match1(html, r"videoid\s*:\s*'([^']+)',\n")
title = match1(html, r"title: '([^']+)'.replace")
video_id = match1(html, r".*?videoId: '(?P<vid>.*)'")
title = match1(html, '.*?<title>(?P<title>.*?)</title>')
video_file_list = get_file_by_vid(video_id) # 调api获取视频源文件
type, ext, size = url_info(video_file_list[0].url, faker=True)
print_info(site_info=site_info, title=title, type=type, size=size)

View File

@ -13,7 +13,29 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
universal_download(url, output_dir, merge=merge, info_only=info_only)
return
html = parse.unquote(get_html(url)).replace('\/', '/')
import ssl
ssl_context = request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
cookie_handler = request.HTTPCookieProcessor()
opener = request.build_opener(ssl_context, cookie_handler)
request.install_opener(opener)
page = get_html(url)
form_key = match1(page, r'id="tumblr_form_key" content="([^"]+)"')
if form_key is not None:
# bypass GDPR consent page
referer = 'https://www.tumblr.com/privacy/consent?redirect=%s' % parse.quote_plus(url)
post_content('https://www.tumblr.com/svc/privacy/consent',
headers={
'Content-Type': 'application/json',
'User-Agent': fake_headers['User-Agent'],
'Referer': referer,
'X-tumblr-form-key': form_key,
'X-Requested-With': 'XMLHttpRequest'
},
post_data_raw='{"eu_resident":true,"gdpr_is_acceptable_age":true,"gdpr_consent_core":true,"gdpr_consent_first_party_ads":true,"gdpr_consent_third_party_ads":true,"gdpr_consent_search_history":true,"redirect_to":"%s","gdpr_reconsent":false}' % url)
page = get_html(url, faker=True)
html = parse.unquote(page).replace('\/', '/')
feed = r1(r'<meta property="og:type" content="tumblr-feed:(\w+)" />', html)
if feed in ['photo', 'photoset', 'entry'] or feed is None:
@ -21,23 +43,31 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
page_title = r1(r'<meta name="description" content="([^"\n]+)', html) or \
r1(r'<meta property="og:description" content="([^"\n]+)', html) or \
r1(r'<title>([^<\n]*)', html)
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.jpg)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.png)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^";]+_\d+\.gif)', html)
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.jpg)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.png)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^";&]+_\d+\.gif)', html)
tuggles = {}
for url in urls:
filename = parse.unquote(url.split('/')[-1])
if url.endswith('.gif'):
hd_url = url
elif url.endswith('.jpg'):
hd_url = r1(r'(.+)_\d+\.jpg$', url) + '_1280.jpg' # FIXME: decide actual quality
elif url.endswith('.png'):
hd_url = r1(r'(.+)_\d+\.png$', url) + '_1280.png' # FIXME: decide actual quality
else:
continue
filename = parse.unquote(hd_url.split('/')[-1])
title = '.'.join(filename.split('.')[:-1])
tumblr_id = r1(r'^tumblr_(.+)_\d+$', title)
quality = int(r1(r'^tumblr_.+_(\d+)$', title))
ext = filename.split('.')[-1]
try:
size = int(get_head(url)['Content-Length'])
size = int(get_head(hd_url)['Content-Length'])
if tumblr_id not in tuggles or tuggles[tumblr_id]['quality'] < quality:
tuggles[tumblr_id] = {
'title': title,
'url': url,
'url': hd_url,
'quality': quality,
'ext': ext,
'size': size,
@ -70,6 +100,11 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
real_url = r1(r'<source src="([^"]*)"', html)
if not real_url:
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
if iframe_url is None:
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
return
if iframe_url:
iframe_html = get_content(iframe_url, headers=fake_headers)
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
@ -94,11 +129,15 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
r1(r'<meta property="og:description" content="([^"]*)" />', html) or
r1(r'<title>([^<\n]*)', html) or url.split("/")[4]).replace('\n', '')
type, ext, size = url_info(real_url)
# this is better
vcode = r1(r'tumblr_(\w+)', real_url)
real_url = 'https://vt.media.tumblr.com/tumblr_%s.mp4' % vcode
type, ext, size = url_info(real_url, faker=True)
print_info(site_info, title, type, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
download_urls([real_url], title, ext, size, output_dir, merge=merge)
site_info = "Tumblr.com"
download = tumblr_download

View File

@ -3,6 +3,7 @@
__all__ = ['twitter_download']
from ..common import *
from .universal import *
from .vine import vine_download
def extract_m3u(source):
@ -15,13 +16,28 @@ def extract_m3u(source):
return ['https://video.twimg.com%s' % i for i in s2]
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if re.match(r'https?://pbs\.twimg\.com', url):
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
return
if re.match(r'https?://mobile', url): # normalize mobile URL
url = 'https://' + match1(url, r'//mobile\.(.+)')
html = get_html(url)
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
if re.match(r'https?://twitter\.com/i/moments/', url): # moments
html = get_html(url, faker=True)
paths = re.findall(r'data-permalink-path="([^"]+)"', html)
for path in paths:
twitter_download('https://twitter.com' + path,
output_dir=output_dir,
merge=merge,
info_only=info_only,
**kwargs)
return
html = get_html(url, faker=False) # disable faker to prevent 302 infinite redirect
screen_name = r1(r'twitter\.com/([^/]+)', url) or r1(r'data-screen-name="([^"]*)"', html) or \
r1(r'<meta name="twitter:title" content="([^"]*)"', html)
item_id = r1(r'data-item-id="([^"]*)"', html) or \
item_id = r1(r'twitter\.com/[^/]+/status/(\d+)', url) or r1(r'data-item-id="([^"]*)"', html) or \
r1(r'<meta name="twitter:site:id" content="([^"]*)"', html)
page_title = "{} [{}]".format(screen_name, item_id)
@ -53,39 +69,26 @@ def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs)
output_dir=output_dir)
except: # extract video
# always use i/cards or videos url
if not re.match(r'https?://twitter.com/i/', url):
url = r1(r'<meta\s*property="og:video:url"\s*content="([^"]+)"', html)
if not url:
url = 'https://twitter.com/i/videos/%s' % item_id
html = get_content(url)
#i_url = 'https://twitter.com/i/videos/' + item_id
#i_content = get_content(i_url)
#js_url = r1(r'src="([^"]+)"', i_content)
#js_content = get_content(js_url)
#authorization = r1(r'"(Bearer [^"]+)"', js_content)
authorization = 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
data_config = r1(r'data-config="([^"]*)"', html) or \
r1(r'data-player-config="([^"]*)"', html)
i = json.loads(unescape_html(data_config))
if 'video_url' in i:
source = i['video_url']
item_id = i['tweet_id']
page_title = "{} [{}]".format(screen_name, item_id)
elif 'playlist' in i:
source = i['playlist'][0]['source']
if not item_id: page_title = i['playlist'][0]['contentId']
elif 'vmap_url' in i:
vmap_url = i['vmap_url']
vmap = get_content(vmap_url)
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
item_id = i['tweet_id']
page_title = "{} [{}]".format(screen_name, item_id)
elif 'scribe_playlist_url' in i:
scribe_playlist_url = i['scribe_playlist_url']
return vine_download(scribe_playlist_url, output_dir, merge=merge, info_only=info_only)
ga_url = 'https://api.twitter.com/1.1/guest/activate.json'
ga_content = post_content(ga_url, headers={'authorization': authorization})
guest_token = json.loads(ga_content)['guest_token']
try:
urls = extract_m3u(source)
except:
urls = [source]
api_url = 'https://api.twitter.com/2/timeline/conversation/%s.json?tweet_mode=extended' % item_id
api_content = get_content(api_url, headers={'authorization': authorization, 'x-guest-token': guest_token})
info = json.loads(api_content)
variants = info['globalObjects']['tweets'][item_id]['extended_entities']['media'][0]['video_info']['variants']
variants = sorted(variants, key=lambda kv: kv.get('bitrate', 0))
urls = [ variants[-1]['url'] ]
size = urls_size(urls)
mime, ext = 'video/mp4', 'mp4'
mime, ext = variants[-1]['content_type'], 'mp4'
print_info(site_info, page_title, mime, size)
if not info_only:

View File

@ -31,16 +31,37 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
if page_title:
page_title = unescape_html(page_title)
meta_videos = re.findall(r'<meta property="og:video:url" content="([^"]*)"', page)
if meta_videos:
try:
for meta_video in meta_videos:
meta_video_url = unescape_html(meta_video)
type_, ext, size = url_info(meta_video_url)
print_info(site_info, page_title, type_, size)
if not info_only:
download_urls([meta_video_url], page_title,
ext, size,
output_dir=output_dir, merge=merge,
faker=True)
except:
pass
else:
return
hls_urls = re.findall(r'(https?://[^;"\'\\]+' + '\.m3u8?' +
r'[^;"\'\\]*)', page)
if hls_urls:
for hls_url in hls_urls:
type_, ext, size = url_info(hls_url)
print_info(site_info, page_title, type_, size)
if not info_only:
download_url_ffmpeg(url=hls_url, title=page_title,
ext='mp4', output_dir=output_dir)
return
try:
for hls_url in hls_urls:
type_, ext, size = url_info(hls_url)
print_info(site_info, page_title, type_, size)
if not info_only:
download_url_ffmpeg(url=hls_url, title=page_title,
ext='mp4', output_dir=output_dir)
except:
pass
else:
return
# most common media file extensions on the Internet
media_exts = ['\.flv', '\.mp3', '\.mp4', '\.webm',
@ -54,12 +75,12 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
urls = []
for i in media_exts:
urls += re.findall(r'(https?://[^;"\'\\]+' + i + r'[^;"\'\\]*)', page)
urls += re.findall(r'(https?://[^ ;&"\'\\<>]+' + i + r'[^ ;&"\'\\<>]*)', page)
p_urls = re.findall(r'(https?%3A%2F%2F[^;&]+' + i + r'[^;&]*)', page)
p_urls = re.findall(r'(https?%3A%2F%2F[^;&"]+' + i + r'[^;&"]*)', page)
urls += [parse.unquote(url) for url in p_urls]
q_urls = re.findall(r'(https?:\\\\/\\\\/[^;"\']+' + i + r'[^;"\']*)', page)
q_urls = re.findall(r'(https?:\\\\/\\\\/[^ ;"\'<>]+' + i + r'[^ ;"\'<>]*)', page)
urls += [url.replace('\\\\/', '/') for url in q_urls]
# a link href to an image is often an interesting one
@ -67,6 +88,17 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
urls += re.findall(r'href="(https?://[^"]+\.png)"', page, re.I)
urls += re.findall(r'href="(https?://[^"]+\.gif)"', page, re.I)
# <img> with high widths
urls += re.findall(r'<img src="([^"]*)"[^>]*width="\d\d\d+"', page, re.I)
# relative path
rel_urls = []
rel_urls += re.findall(r'href="(\.[^"]+\.jpe?g)"', page, re.I)
rel_urls += re.findall(r'href="(\.[^"]+\.png)"', page, re.I)
rel_urls += re.findall(r'href="(\.[^"]+\.gif)"', page, re.I)
for rel_url in rel_urls:
urls += [ r1(r'(.*/)', url) + rel_url ]
# MPEG-DASH MPD
mpd_urls = re.findall(r'src="(https?://[^"]+\.mpd)"', page)
for mpd_url in mpd_urls:
@ -80,34 +112,46 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
for url in set(urls):
filename = parse.unquote(url.split('/')[-1])
if 5 <= len(filename) <= 80:
title = '.'.join(filename.split('.')[:-1])
title = '.'.join(filename.split('.')[:-1]) or filename
else:
title = '%s' % i
i += 1
if r1(r'(https://pinterest.com/pin/)', url):
continue
candies.append({'url': url,
'title': title})
for candy in candies:
try:
mime, ext, size = url_info(candy['url'], faker=True)
if not size: size = float('Int')
try:
mime, ext, size = url_info(candy['url'], faker=False)
assert size
except:
mime, ext, size = url_info(candy['url'], faker=True)
if not size: size = float('Inf')
except:
continue
else:
print_info(site_info, candy['title'], ext, size)
if not info_only:
download_urls([candy['url']], candy['title'], ext, size,
output_dir=output_dir, merge=merge,
faker=True)
try:
download_urls([candy['url']], candy['title'], ext, size,
output_dir=output_dir, merge=merge,
faker=False)
except:
download_urls([candy['url']], candy['title'], ext, size,
output_dir=output_dir, merge=merge,
faker=True)
return
else:
# direct download
filename = parse.unquote(url.split('/')[-1])
title = '.'.join(filename.split('.')[:-1])
ext = filename.split('.')[-1]
_, _, size = url_info(url, faker=True)
url_trunk = url.split('?')[0] # strip query string
filename = parse.unquote(url_trunk.split('/')[-1]) or parse.unquote(url_trunk.split('/')[-2])
title = '.'.join(filename.split('.')[:-1]) or filename
_, ext, size = url_info(url, faker=True)
print_info(site_info, title, ext, size)
if not info_only:
download_urls([url], title, ext, size,

View File

@ -7,6 +7,24 @@ from urllib.parse import urlparse
from json import loads
import re
#----------------------------------------------------------------------
def miaopai_download_by_smid(smid, output_dir = '.', merge = True, info_only = False):
""""""
api_endpoint = 'https://n.miaopai.com/api/aj_media/info.json?smid={smid}'.format(smid = smid)
html = get_content(api_endpoint)
api_content = loads(html)
video_url = api_content['data']['meta_data'][0]['play_urls']['l']
title = api_content['data']['description']
type, ext, size = url_info(video_url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([video_url], title, ext, size, output_dir, merge=merge)
#----------------------------------------------------------------------
def yixia_miaopai_download_by_scid(scid, output_dir = '.', merge = True, info_only = False):
""""""
@ -47,14 +65,18 @@ def yixia_xiaokaxiu_download_by_scid(scid, output_dir = '.', merge = True, info_
def yixia_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
"""wrapper"""
hostname = urlparse(url).hostname
if 'miaopai.com' in hostname: #Miaopai
if 'n.miaopai.com' == hostname:
smid = match1(url, r'n\.miaopai\.com/media/([^.]+)')
miaopai_download_by_smid(smid, output_dir, merge, info_only)
return
elif 'miaopai.com' in hostname: #Miaopai
yixia_download_by_scid = yixia_miaopai_download_by_scid
site_info = "Yixia Miaopai"
scid = match1(url, r'miaopai\.com/show/channel/(.+)\.htm') or \
match1(url, r'miaopai\.com/show/(.+)\.htm') or \
match1(url, r'm\.miaopai\.com/show/channel/(.+)\.htm') or \
match1(url, r'm\.miaopai\.com/show/channel/(.+)')
scid = match1(url, r'miaopai\.com/show/channel/([^.]+)\.htm') or \
match1(url, r'miaopai\.com/show/([^.]+)\.htm') or \
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)\.htm') or \
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)')
elif 'xiaokaxiu.com' in hostname: #Xiaokaxiu
yixia_download_by_scid = yixia_xiaokaxiu_download_by_scid

View File

@ -78,7 +78,10 @@ class Youku(VideoExtractor):
self.api_error_code = None
self.api_error_msg = None
self.ccode = '0507'
self.ccode = '0519'
# Found in http://g.alicdn.com/player/ykplayer/0.5.64/youku-player.min.js
# grep -oE '"[0-9a-zA-Z+/=]{256}"' youku-player.min.js
self.ckey = 'DIl58SLFxFNndSV1GFNnMQVYkx1PP5tKe1siZu/86PR1u/Wh1Ptd+WOZsHHWxysSfAOhNJpdVWsdVJNsfJ8Sxd8WKVvNfAS8aS8fAOzYARzPyPc3JvtnPHjTdKfESTdnuTW6ZPvk2pNDh4uFzotgdMEFkzQ5wZVXl2Pf1/Y6hLK0OnCNxBj3+nb0v72gZ6b0td+WOZsHHWxysSo/0y9D2K42SaB8Y/+aD2K42SaB8Y/+ahU+WOZsHcrxysooUeND'
self.utid = None
def youku_ups(self):
@ -86,6 +89,7 @@ class Youku(VideoExtractor):
url += '&client_ip=192.168.1.1'
url += '&utid=' + self.utid
url += '&client_ts=' + str(int(time.time()))
url += '&ckey=' + urllib.parse.quote(self.ckey)
if self.password_protected:
url += '&password=' + self.password
headers = dict(Referer=self.referer)

View File

@ -8,35 +8,74 @@ from xml.dom.minidom import parseString
class YouTube(VideoExtractor):
name = "YouTube"
# YouTube media encoding options, in descending quality order.
# Non-DASH YouTube media encoding options, in descending quality order.
# http://en.wikipedia.org/wiki/YouTube#Quality_and_codecs. Retrieved July 17, 2014.
stream_types = [
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p',
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5',
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
#{'itag': '85', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '3-4', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '',
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p',
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3',
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
#{'itag': '102', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2',
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
#{'itag': '84', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2', 'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p',
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3',
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p',
'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2',
'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1',
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p',
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1',
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
#{'itag': '101', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
#{'itag': '100', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5',
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p',
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5',
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
#{'itag': '82', 'container': 'MP4', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '18', 'container': 'MP4', 'video_resolution': '270p/360p', 'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
{'itag': '18', 'container': 'MP4', 'video_resolution': '360p',
'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5',
'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p',
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8',
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
#{'itag': '83', 'container': 'MP4', 'video_resolution': '240p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '13', 'container': '3GP', 'video_resolution': '', 'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': ''},
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
{'itag': '36', 'container': '3GP', 'video_resolution': '240p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175', 'audio_encoding': 'AAC', 'audio_bitrate': '36'},
{'itag': '17', 'container': '3GP', 'video_resolution': '144p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05', 'audio_encoding': 'AAC', 'audio_bitrate': '24'},
{'itag': '13', 'container': '3GP', 'video_resolution': '',
'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5',
'audio_encoding': 'AAC', 'audio_bitrate': ''},
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p',
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25',
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
{'itag': '36', 'container': '3GP', 'video_resolution': '240p',
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175',
'audio_encoding': 'AAC', 'audio_bitrate': '32'},
{'itag': '17', 'container': '3GP', 'video_resolution': '144p',
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05',
'audio_encoding': 'AAC', 'audio_bitrate': '24'},
]
def decipher(js, s):
# Examples:
# - https://www.youtube.com/yts/jsbin/player-da_DK-vflWlK-zq/base.js
# - https://www.youtube.com/yts/jsbin/player-vflvABTsY/da_DK/base.js
# - https://www.youtube.com/yts/jsbin/player-vfls4aurX/da_DK/base.js
# - https://www.youtube.com/yts/jsbin/player_ias-vfl_RGK2l/en_US/base.js
# - https://www.youtube.com/yts/jsbin/player-vflRjqq_w/da_DK/base.js
# - https://www.youtube.com/yts/jsbin/player_ias-vfl-jbnrr/da_DK/base.js
def tr_js(code):
code = re.sub(r'function', r'def', code)
code = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', code)
@ -52,11 +91,14 @@ class YouTube(VideoExtractor):
return code
js = js.replace('\n', ' ')
f1 = match1(js, r'"signature",([$\w]+)\(\w+\.\w+\)')
f1 = match1(js, r'\.set\(\w+\.sp,encodeURIComponent\(([$\w]+)') or \
match1(js, r'\.set\(\w+\.sp,\(0,window\.encodeURIComponent\)\(([$\w]+)') or \
match1(js, r'\.set\(\w+\.sp,([$\w]+)\(\w+\.s\)\)') or \
match1(js, r'"signature",([$\w]+)\(\w+\.\w+\)')
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
f1def = 'function %s%s' % (f1, f1def)
f1def = 'function main_%s%s' % (f1, f1def) # prefix to avoid potential namespace conflict
code = tr_js(f1def)
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
for f2 in f2s:
@ -67,16 +109,26 @@ class YouTube(VideoExtractor):
else:
f2def = re.search(r'[^$\w]%s:function\((\w+)\)(\{[^\{\}]+\})' % f2e, js)
f2def = 'function {}({},b){}'.format(f2e, f2def.group(1), f2def.group(2))
f2 = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', f2)
f2 = re.sub(r'(as|if|in|is|or)', r'_\1', f2)
f2 = re.sub(r'\$', '_dollar', f2)
code = code + 'global %s\n' % f2 + tr_js(f2def)
f1 = re.sub(r'(as|if|in|is|or)', r'_\1', f1)
f1 = re.sub(r'\$', '_dollar', f1)
code = code + 'sig=%s(s)' % f1
code = code + 'sig=main_%s(s)' % f1 # prefix to avoid potential namespace conflict
exec(code, globals(), locals())
return locals()['sig']
def chunk_by_range(url, size):
urls = []
chunk_size = 10485760
start, end = 0, chunk_size - 1
urls.append('%s&range=%s-%s' % (url, start, end))
while end + 1 < size: # processed size < expected size
start, end = end + 1, end + chunk_size
urls.append('%s&range=%s-%s' % (url, start, end))
return urls
def get_url_from_vid(vid):
return 'https://youtu.be/{}'.format(vid)
@ -128,7 +180,10 @@ class YouTube(VideoExtractor):
for video in videos:
vid = parse_query_param(video, 'v')
index = parse_query_param(video, 'index')
self.__class__().download_by_url(self.__class__.get_url_from_vid(vid), index=index, **kwargs)
try:
self.__class__().download_by_url(self.__class__.get_url_from_vid(vid), index=index, **kwargs)
except:
pass
def prepare(self, **kwargs):
assert self.url or self.vid
@ -140,15 +195,22 @@ class YouTube(VideoExtractor):
self.download_playlist_by_url(self.url, **kwargs)
exit(0)
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}'.format(self.vid)))
if re.search('\Wlist=', self.url) and not kwargs.get('playlist'):
log.w('This video is from a playlist. (use --playlist to download all videos in the playlist.)')
# Get video info
# 'eurl' is a magic parameter that can bypass age restriction
# full form: 'eurl=https%3A%2F%2Fyoutube.googleapis.com%2Fv%2F{VIDEO_ID}'
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}&eurl=https%3A%2F%2Fy'.format(self.vid)))
logging.debug('STATUS: %s' % video_info['status'][0])
ytplayer_config = None
if 'status' not in video_info:
log.wtf('[Failed] Unknown status.')
log.wtf('[Failed] Unknown status.', exit_code=None)
raise
elif video_info['status'] == ['ok']:
if 'use_cipher_signature' not in video_info or video_info['use_cipher_signature'] == ['False']:
self.title = parse.unquote_plus(video_info['title'][0])
self.title = parse.unquote_plus(json.loads(video_info["player_response"][0])["videoDetails"]["title"])
# Parse video page (for DASH)
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
try:
@ -156,27 +218,50 @@ class YouTube(VideoExtractor):
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
# Workaround: get_video_info returns bad s. Why?
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
#stream_list = ytplayer_config['args']['adaptive_fmts'].split(',')
except:
stream_list = video_info['url_encoded_fmt_stream_map'][0].split(',')
self.html5player = None
if re.search('([^"]*/base\.js)"', video_page):
self.html5player = 'https://www.youtube.com' + re.search('([^"]*/base\.js)"', video_page).group(1)
else:
self.html5player = None
else:
# Parse video page instead
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+?});', video_page).group(1))
self.title = ytplayer_config['args']['title']
self.title = json.loads(ytplayer_config["args"]["player_response"])["videoDetails"]["title"]
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
elif video_info['status'] == ['fail']:
logging.debug('ERRORCODE: %s' % video_info['errorcode'][0])
if video_info['errorcode'] == ['150']:
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
# FIXME: still relevant?
if cookies:
# Load necessary cookies into headers (for age-restricted videos)
consent, ssid, hsid, sid = 'YES', '', '', ''
for cookie in cookies:
if cookie.domain.endswith('.youtube.com'):
if cookie.name == 'SSID':
ssid = cookie.value
elif cookie.name == 'HSID':
hsid = cookie.value
elif cookie.name == 'SID':
sid = cookie.value
cookie_str = 'CONSENT=%s; SSID=%s; HSID=%s; SID=%s' % (consent, ssid, hsid, sid)
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid,
headers={'Cookie': cookie_str})
else:
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
try:
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+});ytplayer', video_page).group(1))
except:
msg = re.search('class="message">([^<]+)<', video_page).group(1)
log.wtf('[Failed] "%s"' % msg.strip())
log.wtf('[Failed] Got message "%s". Try to login with --cookies.' % msg.strip())
if 'title' in ytplayer_config['args']:
# 150 Restricted from playback on certain sites
@ -185,22 +270,30 @@ class YouTube(VideoExtractor):
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
else:
log.wtf('[Error] The uploader has not made this video available in your country.')
log.wtf('[Error] The uploader has not made this video available in your country.', exit_code=None)
raise
#self.title = re.search('<meta name="title" content="([^"]+)"', video_page).group(1)
#stream_list = []
elif video_info['errorcode'] == ['100']:
log.wtf('[Failed] This video does not exist.', exit_code=int(video_info['errorcode'][0]))
log.wtf('[Failed] This video does not exist.', exit_code=None) #int(video_info['errorcode'][0])
raise
else:
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=int(video_info['errorcode'][0]))
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=None) #int(video_info['errorcode'][0])
raise
else:
log.wtf('[Failed] Invalid status.')
log.wtf('[Failed] Invalid status.', exit_code=None)
raise
# YouTube Live
if ytplayer_config and (ytplayer_config['args'].get('livestream') == '1' or ytplayer_config['args'].get('live_playback') == '1'):
hlsvp = ytplayer_config['args']['hlsvp']
if 'hlsvp' in ytplayer_config['args']:
hlsvp = ytplayer_config['args']['hlsvp']
else:
player_response= json.loads(ytplayer_config['args']['player_response'])
log.e('[Failed] %s' % player_response['playabilityStatus']['reason'], exit_code=1)
if 'info_only' in kwargs and kwargs['info_only']:
return
@ -216,7 +309,8 @@ class YouTube(VideoExtractor):
'url': metadata['url'][0],
'sig': metadata['sig'][0] if 'sig' in metadata else None,
's': metadata['s'][0] if 's' in metadata else None,
'quality': metadata['quality'][0],
'quality': metadata['quality'][0] if 'quality' in metadata else None,
#'quality': metadata['quality_label'][0] if 'quality_label' in metadata else None,
'type': metadata['type'][0],
'mime': metadata['type'][0].split(';')[0],
'container': mime_to_container(metadata['type'][0].split(';')[0]),
@ -286,13 +380,15 @@ class YouTube(VideoExtractor):
if not dash_size:
try: dash_size = url_size(dash_url)
except: continue
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
self.dash_streams[itag] = {
'quality': '%sx%s' % (w, h),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'mp4',
'src': [dash_url, dash_mp4_a_url],
'src': [dash_urls, dash_mp4_a_urls],
'size': int(dash_size) + int(dash_mp4_a_size)
}
elif mimeType == 'video/webm':
@ -306,75 +402,97 @@ class YouTube(VideoExtractor):
if not dash_size:
try: dash_size = url_size(dash_url)
except: continue
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
dash_webm_a_urls = self.__class__.chunk_by_range(dash_webm_a_url, int(dash_webm_a_size))
self.dash_streams[itag] = {
'quality': '%sx%s' % (w, h),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'webm',
'src': [dash_url, dash_webm_a_url],
'src': [dash_urls, dash_webm_a_urls],
'size': int(dash_size) + int(dash_webm_a_size)
}
except:
# VEVO
if not self.html5player: return
self.js = get_content(self.html5player)
if 'adaptive_fmts' in ytplayer_config['args']:
try:
# Video info from video page (not always available)
streams = [dict([(i.split('=')[0],
parse.unquote(i.split('=')[1]))
for i in afmt.split('&')])
for afmt in ytplayer_config['args']['adaptive_fmts'].split(',')]
for stream in streams: # get over speed limiting
stream['url'] += '&ratebypass=yes'
for stream in streams: # audio
if stream['type'].startswith('audio/mp4'):
dash_mp4_a_url = stream['url']
except:
streams = [dict([(i.split('=')[0],
parse.unquote(i.split('=')[1]))
for i in afmt.split('&')])
for afmt in video_info['adaptive_fmts'][0].split(',')]
for stream in streams: # get over speed limiting
stream['url'] += '&ratebypass=yes'
for stream in streams: # audio
if stream['type'].startswith('audio/mp4'):
dash_mp4_a_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_mp4_a_url += '&sig={}'.format(sig)
dash_mp4_a_size = stream['clen']
elif stream['type'].startswith('audio/webm'):
dash_webm_a_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_webm_a_url += '&sig={}'.format(sig)
dash_webm_a_size = stream['clen']
for stream in streams: # video
if 'size' in stream:
if stream['type'].startswith('video/mp4'):
mimeType = 'video/mp4'
dash_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_mp4_a_url += '&signature={}'.format(sig)
dash_mp4_a_size = stream['clen']
elif stream['type'].startswith('audio/webm'):
dash_webm_a_url = stream['url']
dash_url += '&sig={}'.format(sig)
dash_size = stream['clen']
itag = stream['itag']
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
self.dash_streams[itag] = {
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'mp4',
'src': [dash_urls, dash_mp4_a_urls],
'size': int(dash_size) + int(dash_mp4_a_size)
}
elif stream['type'].startswith('video/webm'):
mimeType = 'video/webm'
dash_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_webm_a_url += '&signature={}'.format(sig)
dash_webm_a_size = stream['clen']
for stream in streams: # video
if 'size' in stream:
if stream['type'].startswith('video/mp4'):
mimeType = 'video/mp4'
dash_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_url += '&signature={}'.format(sig)
dash_size = stream['clen']
itag = stream['itag']
self.dash_streams[itag] = {
'quality': stream['size'],
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'mp4',
'src': [dash_url, dash_mp4_a_url],
'size': int(dash_size) + int(dash_mp4_a_size)
}
elif stream['type'].startswith('video/webm'):
mimeType = 'video/webm'
dash_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_url += '&signature={}'.format(sig)
dash_size = stream['clen']
itag = stream['itag']
self.dash_streams[itag] = {
'quality': stream['size'],
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'webm',
'src': [dash_url, dash_webm_a_url],
'size': int(dash_size) + int(dash_webm_a_size)
}
dash_url += '&sig={}'.format(sig)
dash_size = stream['clen']
itag = stream['itag']
audio_url = None
audio_size = None
try:
audio_url = dash_webm_a_url
audio_size = int(dash_webm_a_size)
except UnboundLocalError as e:
audio_url = dash_mp4_a_url
audio_size = int(dash_mp4_a_size)
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
audio_urls = self.__class__.chunk_by_range(audio_url, int(audio_size))
self.dash_streams[itag] = {
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'webm',
'src': [dash_urls, audio_urls],
'size': int(dash_size) + int(audio_size)
}
def extract(self, **kwargs):
if not self.streams_sorted:
@ -396,13 +514,13 @@ class YouTube(VideoExtractor):
src = self.streams[stream_id]['url']
if self.streams[stream_id]['sig'] is not None:
sig = self.streams[stream_id]['sig']
src += '&signature={}'.format(sig)
src += '&sig={}'.format(sig)
elif self.streams[stream_id]['s'] is not None:
if not hasattr(self, 'js'):
self.js = get_content(self.html5player)
s = self.streams[stream_id]['s']
sig = self.__class__.decipher(self.js, s)
src += '&signature={}'.format(sig)
src += '&sig={}'.format(sig)
self.streams[stream_id]['src'] = [src]
self.streams[stream_id]['size'] = urls_size(self.streams[stream_id]['src'])

View File

@ -0,0 +1,55 @@
#!/usr/bin/env python
__all__ = ['zhibo_download']
from ..common import *
def zhibo_vedio_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
# http://video.zhibo.tv/video/details/d103057f-663e-11e8-9d83-525400ccac43.html
html = get_html(url)
title = r1(r'<title>([\s\S]*)</title>', html)
total_size = 0
part_urls= []
video_html = r1(r'<script type="text/javascript">([\s\S]*)</script></head>', html)
# video_guessulike = r1(r"window.xgData =([s\S'\s\.]*)\'\;[\s\S]*window.vouchData", video_html)
video_url = r1(r"window.vurl = \'([s\S'\s\.]*)\'\;[\s\S]*window.imgurl", video_html)
part_urls.append(video_url)
ext = video_url.split('.')[-1]
print_info(site_info, title, ext, total_size)
if not info_only:
download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge)
def zhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
if 'video.zhibo.tv' in url:
zhibo_vedio_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return
# if 'v.zhibo.tv' in url:
# http://v.zhibo.tv/31609372
html = get_html(url)
title = r1(r'<title>([\s\S]*)</title>', html)
is_live = r1(r"window.videoIsLive=\'([s\S'\s\.]*)\'\;[\s\S]*window.resDomain", html)
if is_live != "1":
raise ValueError("The live stream is not online! (Errno:%s)" % is_live)
match = re.search(r"""
ourStreamName .*?
'(.*?)' .*?
rtmpHighSource .*?
'(.*?)' .*?
'(.*?)'
""", html, re.S | re.X)
real_url = match.group(3) + match.group(1) + match.group(2)
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
site_info = "zhibo.tv"
download = zhibo_download
download_playlist = playlist_not_supported('zhibo')

View File

@ -0,0 +1,79 @@
#!/usr/bin/env python
__all__ = ['zhihu_download', 'zhihu_download_playlist']
from ..common import *
import json
def zhihu_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
paths = url.split("/")
# question or column
if len(paths) < 3 and len(paths) < 6:
raise TypeError("URL does not conform to specifications, Support column and question only."
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
"https://www.zhihu.com/question/267782048/answer/490720324")
if ("question" not in paths or "answer" not in paths) and "zhuanlan.zhihu.com" not in paths:
raise TypeError("URL does not conform to specifications, Support column and question only."
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
"https://www.zhihu.com/question/267782048/answer/490720324")
html = get_html(url, faker=True)
title = match1(html, r'data-react-helmet="true">(.*?)</title>')
for index, video_id in enumerate(matchall(html, [r'<a class="video-box" href="\S+video/(\d+)"'])):
try:
video_info = json.loads(
get_content(r"https://lens.zhihu.com/api/videos/{}".format(video_id), headers=fake_headers))
except json.decoder.JSONDecodeError:
log.w("Video id not found:{}".format(video_id))
continue
play_list = video_info["playlist"]
# first High Definition
# second Second Standard Definition
# third ld. What is ld ?
# finally continue
data = play_list.get("hd", play_list.get("sd", play_list.get("ld", None)))
if not data:
log.w("Video id No play address:{}".format(video_id))
continue
print_info(site_info, title, data["format"], data["size"])
if not info_only:
ext = "_{}.{}".format(index, data["format"])
if kwargs.get("zhihu_offset"):
ext = "_{}".format(kwargs["zhihu_offset"]) + ext
download_urls([data["play_url"]], title, ext, data["size"],
output_dir=output_dir, merge=merge, **kwargs)
def zhihu_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
if "question" not in url or "answer" in url: # question page
raise TypeError("URL does not conform to specifications, Support question only."
" Example URL: https://www.zhihu.com/question/267782048")
url = url.split("?")[0]
if url[-1] == "/":
question_id = url.split("/")[-2]
else:
question_id = url.split("/")[-1]
videos_url = r"https://www.zhihu.com/api/v4/questions/{}/answers".format(question_id)
try:
questions = json.loads(get_content(videos_url))
except json.decoder.JSONDecodeError:
raise TypeError("Check whether the problem URL exists.Example URL: https://www.zhihu.com/question/267782048")
count = 0
while 1:
for data in questions["data"]:
kwargs["zhihu_offset"] = count
zhihu_download("https://www.zhihu.com/question/{}/answer/{}".format(question_id, data["id"]),
output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
count += 1
if questions["paging"]["is_end"]:
return
questions = json.loads(get_content(questions["paging"]["next"], headers=fake_headers))
site_info = "zhihu.com"
download = zhihu_download
download_playlist = zhihu_download_playlist

View File

@ -1,8 +1,9 @@
#!/usr/bin/env python
import logging
import os.path
import os
import subprocess
import sys
from ..util.strings import parameterize
from ..common import print_more_compatible as print
@ -21,12 +22,10 @@ def get_usable_ffmpeg(cmd):
out, err = p.communicate()
vers = str(out, 'utf-8').split('\n')[0].split()
assert (vers[0] == 'ffmpeg' and vers[2][0] > '0') or (vers[0] == 'avconv')
#set version to 1.0 for nightly build and print warning
try:
version = [int(i) for i in vers[2].split('.')]
v = vers[2][1:] if vers[2][0] == 'n' else vers[2]
version = [int(i) for i in v.split('.')]
except:
print('It seems that your ffmpeg is a nightly build.')
print('Please switch to the latest stable if merging failed.')
version = [1, 0]
return cmd, 'ffprobe', version
except:
@ -60,14 +59,25 @@ def ffmpeg_concat_av(files, output, ext):
params = [FFMPEG] + LOGLEVEL
for file in files:
if os.path.isfile(file): params.extend(['-i', file])
params.extend(['-c:v', 'copy'])
if ext == 'mp4':
params.extend(['-c:a', 'aac'])
elif ext == 'webm':
params.extend(['-c:a', 'vorbis'])
params.extend(['-strict', 'experimental'])
params.extend(['-c', 'copy'])
params.append(output)
return subprocess.call(params, stdin=STDIN)
if subprocess.call(params, stdin=STDIN):
print('Merging without re-encode failed.\nTry again re-encoding audio... ', end="", flush=True)
try: os.remove(output)
except FileNotFoundError: pass
params = [FFMPEG] + LOGLEVEL
for file in files:
if os.path.isfile(file): params.extend(['-i', file])
params.extend(['-c:v', 'copy'])
if ext == 'mp4':
params.extend(['-c:a', 'aac'])
params.extend(['-strict', 'experimental'])
elif ext == 'webm':
params.extend(['-c:a', 'opus'])
params.append(output)
return subprocess.call(params, stdin=STDIN)
else:
return 0
def ffmpeg_convert_ts_to_mkv(files, output='output.mkv'):
for file in files:
@ -210,7 +220,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.', stream=True):
"""str, str->True
WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!!
You can basicly download anything with this function
You can basically download anything with this function
but better leave it alone with
"""
output = title + '.' + ext
@ -257,6 +267,7 @@ def ffmpeg_concat_audio_and_video(files, output, ext):
if has_ffmpeg_installed:
params = [FFMPEG] + LOGLEVEL
params.extend(['-f', 'concat'])
params.extend(['-safe', '0']) # https://stackoverflow.com/questions/38996925/ffmpeg-concat-unsafe-file-name
for file in files:
if os.path.isfile(file):
params.extend(['-i', file])

View File

@ -1,8 +1,8 @@
#!/usr/bin/env python
import platform
from .os import detect_os
def legitimize(text, os=platform.system()):
def legitimize(text, os=detect_os()):
"""Converts a string to a valid filename.
"""
@ -13,7 +13,8 @@ def legitimize(text, os=platform.system()):
ord('|'): '-',
})
if os == 'Windows':
# FIXME: do some filesystem detection
if os == 'windows' or os == 'cygwin' or os == 'wsl':
# Windows (non-POSIX namespace)
text = text.translate({
# Reserved in Windows VFAT and NTFS
@ -28,10 +29,11 @@ def legitimize(text, os=platform.system()):
ord('>'): '-',
ord('['): '(',
ord(']'): ')',
ord('\t'): ' ',
})
else:
# *nix
if os == 'Darwin':
if os == 'mac':
# Mac OS HFS+
text = text.translate({
ord(':'): '-',

View File

@ -96,3 +96,9 @@ def wtf(message, exit_code=1):
print_log(message, RED, BOLD)
if exit_code is not None:
sys.exit(exit_code)
def yes_or_no(message):
ans = str(input('%s (y/N) ' % message)).lower().strip()
if ans == 'y':
return True
return False

32
src/you_get/util/os.py Normal file
View File

@ -0,0 +1,32 @@
#!/usr/bin/env python
from platform import system
def detect_os():
"""Detect operating system.
"""
# Inspired by:
# https://github.com/scivision/pybashutils/blob/78b7f2b339cb03b1c37df94015098bbe462f8526/pybashutils/windows_linux_detect.py
syst = system().lower()
os = 'unknown'
if 'cygwin' in syst:
os = 'cygwin'
elif 'darwin' in syst:
os = 'mac'
elif 'linux' in syst:
os = 'linux'
# detect WSL https://github.com/Microsoft/BashOnWindows/issues/423
try:
with open('/proc/version', 'r') as f:
if 'microsoft' in f.read().lower():
os = 'wsl'
except: pass
elif 'windows' in syst:
os = 'windows'
elif 'bsd' in syst:
os = 'bsd'
return os

View File

@ -1,4 +1,4 @@
#!/usr/bin/env python
script_name = 'you-get'
__version__ = '0.4.1025'
__version__ = '0.4.1328'

View File

@ -7,6 +7,7 @@ from you_get.extractors import (
magisto,
youtube,
bilibili,
toutiao,
)
@ -31,14 +32,6 @@ class YouGetTests(unittest.TestCase):
info_only=True
)
def test_bilibili(self):
bilibili.download(
'https://www.bilibili.com/video/av16907446/', info_only=True
)
bilibili.download(
'https://www.bilibili.com/video/av13228063/', info_only=True
)
if __name__ == '__main__':
unittest.main()

View File

@ -6,6 +6,7 @@ from you_get.util.fs import *
class TestUtil(unittest.TestCase):
def test_legitimize(self):
self.assertEqual(legitimize("1*2", os="Linux"), "1*2")
self.assertEqual(legitimize("1*2", os="Darwin"), "1*2")
self.assertEqual(legitimize("1*2", os="Windows"), "1-2")
self.assertEqual(legitimize("1*2", os="linux"), "1*2")
self.assertEqual(legitimize("1*2", os="mac"), "1*2")
self.assertEqual(legitimize("1*2", os="windows"), "1-2")
self.assertEqual(legitimize("1*2", os="wsl"), "1-2")

View File

@ -25,6 +25,7 @@
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Topic :: Internet",
"Topic :: Internet :: WWW/HTTP",
"Topic :: Multimedia",